In this modern age, all companies have to become data driven. Enterprise needs to know their customers inside-out to provide the personalised experience that every customer expects.

For any enterprise, key to knowing their customer is collecting all the customer interactions, be it on the website, mobile app or on indirect channels like customer support calls and so on. And analysing/acting-on that data using the best of breed tools. Typical enterprises usually end up using 5-10 such tools from analytics product like Google Analytics, Amplitude, MixPanel etc to marketing and advertising platforms like Google Adwords, Facebook to attribution tools like Appsflyer, Adjust etc & CRMs like Salesforce, SugarCRM. The list goes on.

Unfortunately, managing all these tool integrations ends up becoming a herculean task. Not only does one have to integrate all these disparate Javascripts and SDKS on their website and mobile app, each destination has its own event semantics and API that the engineering team needs to understand and implement in the app/website. Any bugs and errors are hard to fix, particularly in the mobile world where shipping an update to all the customers can take anywhere from a few weeks to months. For the same reason, adding a new integration can be very cumbersome. As a result most of the large organisations are struggling just to collect and route customer data, forget about making sense of it.

Welcome to the world of customer data infrastructure. Companies like segment and mParticle have solved this problem by building a single platform to collect and route all customer data. With these products, customers have to integrate just one SDK to collect and send customer data. They in turn can forward that data to 10s to 100s of destinations.

These are great tools to solve the problems of a medium size business but lacks the extensibility necessary in an enterprise. Furthermore, in a post-GDPR world, enterprises are increasingly looking to enforce increased oversight and control over customer data touching their systems. This means more on-premise or VPC deployments with complete enterprise IT insight into the mechanics of any system

At Rudder, we are building the enterprise customer data platform. Because it is built with keeping enterprises in mind, it comes with features such as privacy and security built in. Given the complex nature of workflows in enterprises, customisability and extensibility to the unique workflows becomes critical. Finally, we believe in open source and will continue to support the ecosystem by building something that engineers can love and use.

 

Rudder is built on some key principles

  • Privacy/Security – Existing solutions require sending all the event data to their cloud from where in turn they forward it to destinations. While this reduces the operational burden, this becomes a challenge if the event data is sensitive. In this era of privacy regulations like GDPR/CCPA and privacy conscious consumers, sensitive data does not mean just healthcare or finance data. Owning your customer data is not just a good practice, it is an important enterprise risk mitigating step.

While a subset of the event data ends up with 3rd party systems anyway, that data often does not contain sensitive personal information. For example, a cloud phone company’s event stream may include meta-data of the phone call (including to/from numbers etc) while the data sent to Google Analytics may only have the anonymised user id. Sending the entire event stream via a cloud service exposes sensitive data to 3rd party.

Security is another important aspect of owning customer data. Not a single day goes by without the news of data-breach. Enterprises need to take ownership of their critical data 

 

  • Extensibility and customisability – Sophisticated enterprises often want to do “interesting” things with the customer data than just forward it to a bunch of destinations. Examples of interesting processing we have seen are 
    1. Store raw events OR event stats in an internal database or data-warehouse.
    2. Enhance the event data with other meta-data from their internal transactional systems before forwarding it to destinations.
    3. Filter events based on users to manage end destination cost. For example, you may want to track all events of a paid user but only a fraction of events for your free users to Google Analytics .
    4. Implement sensitive data leak policy on event stream before it is sent to destinations.

A third party cloud hosted system makes it hard to enable such processing particularly if that requires joining the event data with other internal database systems behind a firewall. Running inside their own cloud infrastructure brings the control an enterprise organization can take advantage of. Open source gives the customizability 

  • Pricing is broken – These commercial systems are often “exorbitantly” priced that makes it out of reach lot of consumer apps & websites (like games) which often have a lot of users. Also, event based or user based pricing fundamentally goes against the philosophy of collecting as much customer data as possible. We have seen customers do weird tricks (like batching events or sampling events) before sending data to these vendors to reduce their event footprint. At Rudder we believe the true value of customer data can be unlocked only when every interaction is captured.