AppsFlyer is a commercial SaaS attribution platform. Its clients, some of the largest mobile app companies in the world, send a large amount of events daily made up of the installs, uninstalls, sessions, in-app events, clicks and impressions performed by their user base.
In this article, I will discuss a system AppsFlyer built for the purpose of quickly and accurately finding the approximate sizes of sets of unique users (represented by a non-PII user ID), segmented by any combination of criteria over the various dimensions of these events. This system (later referred to as “Audiences”) is used by AppsFlyer’s user segmentation product for supplying interactive feedback to its users while they are defining criteria in the UI. Every action in the UI queries this system to find the approximate size of a unique set of users which meet the criteria, allowing users to fine-tune their criteria until they reach a number that they are happy with.
As a brief example, advertisers of an e-commerce application might want to know how many of their unique users installed the app in the last month, and also purchased products A and B, but DID NOT purchase product C; or how many unique users in the US added more than X products to their shopping cart in the past week but never checked out.
**NGINX Plus is the complete application delivery platform for the modern web. **Start your 30 day free trial.
One of the challenges faced was that the events that reach AppsFlyer are schemaless: AppsFlyer clients are free to send any number of dimensions (i.e, “product_name” or “level_completed_num”) as part of the payload of their events. This leads to a very high number of different dimensions the multi-tenant system would need to make sense of.
This article will discuss how this system was designed and engineered to provide this approximation, with the following considerations in mind:
The core technologies used to build this system are Theta Sketches and HBase, both of which will be discussed with an overview of how they fit into the system’s architecture, and why they fit the specific problem at hand.
#hbase #nosql #article #apache