Websites and mobile applications are the digital storefronts of eCommerce companies. The move to digital has lowered the entry barrier to establishing a retail business and made time and distance irrelevant in the pursuit of customer acquisition and sales. But the loss of in-person interaction has also introduced new and unique challenges – 

Low conversion rates

  • Subpar customer experience arising out of unoptimized customer journey
  • Misaligned digital marketing spend leading to poor ROI
  • Missed cross-sell and up-sell opportunities

Clickstream Analytics has proved to be a vital tool in addressing the above challenges faced by the eCommerce industry

What is Clickstream Analytics?

Before we take a closer look at how Clickstream Analytics addresses each of the challenge areas outlined above, let us try to understand – at a high level – what this technique/technology is really all about. 

To put very simply – a “clickstream” is a sequence (“stream”) of events that represent user actions (“clicks”) on a website or a mobile application. However, in practice, the ambit of clickstream extends beyond clicks to include product impressions, purchases and any such events that might be of relevance to the business.

Traditionally, such event collection on websites is done through the use of JavaScript-based trackers that send POST requests to remote collector servers that might then further enrich the incoming data and store it in formats appropriate for consumption by analytics systems.

Following is an indicative representation of a “page” view which is one of the basic events captured as part of clickstream



  “anonymousId”: “507f191e810c19729de860ea”,

  “channel”: “browser”,

  “context”: {

    “ip”: “”,

    “userAgent”: “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.115 Safari/537.36”


  “name”: “Home”,

  “properties”: {

    “title”: “Welcome | Initech”,

    “url”: “”


  “receivedAt”: “2019-02-23T22:28:55.387Z”,

  “sentAt”: “2019-02-23T22:28:55.111Z”,

  “timestamp”: “2019-02-23T22:28:55.111Z”,

  “type”: “page”,

  “userId”: “97980cfea0067”,


Even the simplistic representation above can yield a treasure trove of information if stored and analyzed properly. Following are a few examples

  • anonymousId and userId – till such time a user logs signs up or logs in to the web/mobile application, the underlying system can generate a unique identifier and associate all actions by the user to such anonymousId. As soon as the user signs up/logs in – the userId can get associated and the anonymousIduserId mapping can be utilized to attribute the entire browsing history, even for the part before signup/login,  to the user. This helps in better modeling of user behaviour
  • ip – IP addresses can provide information regarding the geography from which the user is accessing the site/application. This, in turn, can be used both for enrichment of the user profile and/or for displaying geo/location-specific promotions/campaigns. Furthermore, in case the user interactions span across sessions where the user does not login/signup but continues to access from the same IP address – then the IP address can be used as yet another correlating factor to prepare a comprehensive browsing history. Finally, analysing geo-wise access patterns can also provide pointers as to deployment locations for services and/or CDNs that can deliver maximum user experience while optimizing infrastructure expenditure
  • channel and userAgent – help in identifying the device from which the access is occurring. Such identification can help in deciding marketing spend distribution across channels
  • sentAt and receivedAt – can help in identifying network latencies which might be degrading user experience. This can help in performing preventive maintenance

There are a plethora of analytics platforms that also provide JavaScript and/or mobile SDKs for capturing and sending (to the platform) such clickstream data. However this not a one-size-fits-all domain and the choice of platforms depends on the business problems the enterprise is trying to solve or the insights it is trying to gain.

Integrating multiple platforms has its own set of technology implementation and project management challenges. Hence there is a growing trend among  customers to gravitate towards Customer Data Infrastructure tools and platforms which can provide them with a single gateway to all the various clickstream analytics platforms

With that background about clickstream analytics now established – let us proceed to review how clickstream analytics can address the challenges mentioned at the beginning.


Various surveys/research have pegged conversion (a user session actually culminating in a purchase) to anywhere between 1.3% – 3%. This is despite the fact that any eCommerce site of the day worth its salt provides some manner of product recommendation to its customers. Typically there are specialized Recommender Engines and/or Market Basket Analysis  Systems that are responsible for such recommendations. However, both of these tools suffer from some inherent weaknesses.


Recommender Systems use either Collaborative Filtering or Content-based Filtering. The former essentially leverages past purchase behavior of users to provide recommendations. Therefore, it only considers the purchase once it’s completed, it does not take into consideration the users’ train of thought leading to the purchase. Individual user’s purchase propensity might not always be in sync with what other users in the same cluster/segment/category are purchasing. Content-based Filtering is farther removed from user intent and attempts to make a recommendation based on similarity of product being viewed with others. Market Basket Analysis depends on “also bought” purchase records. Again, it is post-purchase and does not take into account the product views leading up to the purchase.

Clickstream data can be effective in capturing the browsing pattern that can then be leveraged to predict page displays that will lead to purchase. 

    • By considering all URLs corresponding to the same session id and corresponding page access timestamp – the complete sequence of page views by a user can be reconstructed
    • This will also include the pages for “Add to Cart”, “Remove from Cart” and “Buy/Checkout” page accesses
    • (Optional) Above data can be further enriched using the Product ID and associated data from a Product Information Management (PIM) system, should the eCommerce site have one
    • The clickstream data – containing dimensions like base URL, type of page (search result / product / cart / purchase), page sequence, product id (if applicable), PIM non-image attributes (single field containing JSON) – can be used as labeled inputs to a Deep Learning system
    • Recurrent Neural Networks or Long-Short Term Memory algorithms can be used to generate a model which in turn can predict the next page in a sequence that would culminate in a purchase
    • At runtime – every page request to be run through the model which would predict the next page in the sequence in order for the overall session to culminate in a purchase
    • The prediction can either be for a search results page or a product page
    • If the prediction is for a search results page, then top 5 products from that page can be shown as recommendation. If the prediction is for a product page, then that product itself can be recommended
    • Based on tuning, a threshold can be identified corresponding to the page view sequence falling under one of the “happy paths”. At that point, instead of recommending the next product(s) as per the sequence, directly the last product in the sequence, which is also the one that was finally bought, can be shown

Customer Experience and Customer Journey


Clickstream data can be combined with other site/application-specific data to enhance customer satisfaction. 

  • Link customer feedback survey to browsing sequences and determine recurrent patterns for unsatisfied customers. Review navigation for such instances
  • Perform multivariate key driver analysis for CSAT. For the drivers where CSAT is low – use the clickstream to identify the predominant paths to the affected artifacts. Review such paths and optimize
  • Perform text analytics especially Named Entity Recognition on free flowing customer input to identify artifacts similar to the ones derived in the above step. Leverage the clickstream data in a similar fashion as above
  • Even standalone records of clickstream data can help in performing product-wise or property-wise performance analysis viz. Identifying which products/pages/sites are attracting more user visits, where are users spending more time and so on

Digital Marketing Spend


Distribution of marketing budget should be driven by inferences drawn regarding end-customers’ propensity to respond to digital campaigns and banners. Clickstream data can help achieve that in the following ways:

    •  Compare overall traffic’s browse-to-buy behaviour against that of banner traffic to determine effectiveness. This should be done stage-wise (e.g. Product View, Cart Addition, Checkout, Purchase) considering both successful transitions to next stage in conversion funnel as well as exits
    • Perform source profiling. Referrer information can be embedded in the clickstream data and same can be used to determine what percentage of traffic culminating on a key target are being directed from banners/campaigns
  • Path analysis can be performed to determine whether and which banners/campaigns contributed to faster conversions
  • Time-Series Analysis can be performed, leveraging the clickstream timestamps, to determine if campaigns/banners are more effective at certain points of time during the year/month/day. A different analysis using the same timestamps can yield insights regarding time spent by users visiting the banners/campaigns vis-a-vis overall trends

Up-sell and Cross-Sell

Clustering algorithms can be applied on the clickstream data using different dimensions as the features to gain different insights to facilitate cross-sell and upsell

  • Prepare clusters of clickstream sequences that involve browsing of products in one category and/or price-band but the purchase is in a different category and/or price-band. At runtime – if a sequence is categorized as a member of a particular cluster then recommend the product(s) in the different category and/or price-band
  • Correlate demographic data with clickstream-based behavioral profiles for identified users. If a user is found to belong to a particular demographic cluster but not a member of one or more of the behavioral clusters, then suitable cues can be provided during next site/application visit to guide the user to products and/or pages that are captured by the clicks in the behavioral cluster

Conclusion : The Role of CDI in Delivering the Power of Clickstream

As seen in the previous sections, Clickstream Analytics is a must-have tool for eCommerce companies to leverage in their quest for delivering greater customer satisfaction and enhanced shareholder value through increased sales. While tools abound that empower enterprises to mine their clickstream data to deliver its fullest potential – the challenge of integrating such tools often holds back companies from whole-heartedly embracing the power of clickstreams. This is where Customer Data Infrastructure (CDI) tools and platforms can be a key differentiator that can help eCommerce companies realize their Clickstream Analytics dreams without burdening themselves with myriad integrations at their end.