Building an Open Source Mixpanel Alternative. Part 2: Conversion Funnels

Building an Open Source Mixpanel Alternative. Part 2: Conversion Funnels

<em>This is the second part of a tutorial series on building an analytical web application with&nbsp;</em><a href="https://github.com/statsbotco/cube.js" target="_blank"><em>Cube.js</em></a><em>. You can find&nbsp;</em><a href="https://statsbot.co/blog/building-an-open-source-mixpanel-alternative-1/" target="_blank"><em>the first part here</em></a><em>. It expects the reader to be familiar with Javascript, Node.js, React, and have basic knowledge of SQL.&nbsp;</em><a href="https://github.com/statsbotco/cube.js/tree/master/examples/event-analytics" target="_blank"><em>The final source code is available here</em></a><em>&nbsp;and&nbsp;</em><a href="https://d1ygcqhosay4lt.cloudfront.net/" target="_blank"><em>the live demo is here</em></a><em>. The example app is serverless and running on AWS Lambda. It displays data about its own usage.</em>

This is the second part of a tutorial series on building an analytical web application with Cube.js. You can find the first part here. It expects the reader to be familiar with Javascript, Node.js, React, and have basic knowledge of SQL. The final source code is available here and the live demo is here. The example app is serverless and running on AWS Lambda. It displays data about its own usage.

In this part, we are going to add Funnel Analysis to our application. Funnel Analysis, alongside with Retention Analysis, is vital to analyze behavior across the customer journey. A funnel is a series of events that a user goes through within the app, such as completing an onboarding flow. A user is considered converted through a step in the funnel if she performs the event in the specified order. Calculating how many unique users made each event could show you a conversion rate between each step. It helps you to localize a problem down to a certain stage.

Since our application tracks its own usage, we'll build funnels to show how well users navigate through the funnels usage. Quite meta, right?

Here’s how it looks. You check the live demo here.

Building SQL for Funnels


Just a quick recap of part I—we are collecting data with the Snowplow tracker, storing it in S3, and querying with Athena and Cube.js. Athena is built on Presto, and supports standard SQL. So to build a funnel, we need to write a SQL code. Real-world funnel SQL could be quite complex and slow from a performance perspective. Since we are using Cube.js to organize data schema and generate SQL, we can solve both of these problems.

Cube.js allows the building of packages, which are a collection of reusable data schemas. Some of them are specific for datasets, such as the Stripe Package. Others provide helpful macros for common data transformations. And one of them we're going to use—the Funnels package.

If you are new to Cube.js Data Schema, I strongly recommend you check this or that tutorial first and then come back to learn about the funnels package.

The best way to organize funnels is to create a separate cube for each funnel. We'll use eventFunnel from the Funnel package. All we need to do is to pass an object with the required properties to the eventFunnel function. Check the Funnels package documentation for detailed information about its configuration.

Here is how this config could look. In production applications, you're most likely going to generate Cubes.js schema dynamically. You can read more about how to do it here.

import Funnels from "Funnels";

import { eventsSQl, PAGE_VIEW_EVENT, CUSTOM_EVENT } from "./Events.js";

cube("FunnelsUsageFunnel", {
extends: Funnels.eventFunnel({
userId: {
sql: user_id
},
time: {
sql: time
},
steps: [
{
name: viewAnyPage,
eventsView: {
sql: select * from (${eventsSQl}) WHERE event = '${PAGE_VIEW_EVENT}
}
},
{
name: viewFunnelsPage,
eventsView: {
sql: select * from (${eventsSQl}) WHERE event = '${PAGE_VIEW_EVENT} AND page_title = 'Funnels'
},
timeToConvert: "30 day"
},
{
name: funnelSelected,
eventsView: {
sql: select * from (${eventsSQl}) WHERE event = '${CUSTOM_EVENT} AND se_category = 'Funnels' AND se_action = 'Funnel Selected'
},
timeToConvert: "30 day"
}
]
})
});

The above, 3-step funnel, describes the user flow from viewing any page, such as the home page, to going to Funnels and then eventually selecting a funnel from the dropdown. We're setting timeToConvert to 30 days for the 2nd and 3rd steps. This means we give a user a 30 day window to let her complete the target action to make it to the funnel.

In our example app, we generate these configs dynamically. You can check the code on Github here.

Materialize Funnels SQL with Pre-Aggregations


As I mentioned before, there is a built-in way in Cube.js to accelerate queries’ performance. Cube.js can materialize query results in a table. It keeps them up to date and queries them instead of raw data. Pre-Aggregations can be quite complex, including multi-stage and dependency management. But for our case, the simplest originalSql pre-aggregation should be enough. It materializes the base SQL for the cube.

Learn more about pre-aggregations here.

import Funnels from 'Funnels';
import { eventsSQl, PAGE_VIEW_EVENT, CUSTOM_EVENT } from './Events.js';

cube('FunnelsUsageFunnel', {
extends: Funnels.eventFunnel({ ... }),
preAggregations: {
main: {
type: originalSql
}
}
});

Visualize


There are a lot of way to visualize a funnel. Cube.js is visualization-agnostic, so pick one that works for you and fits well into your app design. In our example app, we use a bar chart from the Recharts library.

The Funnels package generates a cube with conversions and conversionsPercent measures, and steps and time dimensions. To build a bar chart funnel, we need to query the conversions measure grouped by the step dimension. The time dimension should be used in the filter to allow users to select a specific date range of the funnel.

Here is the code (we are using React and the Cube.js React Client):

import React from "react";
import cubejs from "@cubejs-client/core";
import { QueryRenderer } from "@cubejs-client/react";
import CircularProgress from "@material-ui/core/CircularProgress";
import { BarChart, Bar, XAxis, YAxis, CartesianGrid, Tooltip } from "recharts";

const cubejsApi = cubejs(
"YOUR-API-KEI",
{ apiUrl: "http://localhost:4000/cubejs-api/v1" }
);

const Funnel = ({ dateRange, funnelId }) => (
<QueryRenderer
cubejsApi={cubejsApi}
query={{
measures: [${funnelId}.conversions],
dimensions: [${funnelId}.step],
filters: [
{
dimension: ${funnelId}.time,
operator: inDateRange,
values: dateRange
}
]
}}
render={({ resultSet, error }) => {
if (resultSet) {
return (
<BarChart
width={600}
height={300}
margin={{ top: 20 }}
data={resultSet.chartPivot()}
>
<CartesianGrid strokeDasharray="3 3" />
<XAxis dataKey="x" minTickGap={20} />
<YAxis />
<Tooltip />
<Bar dataKey={${funnelId}.conversions} fill="#7A77FF"/>
</BarChart>
);
}

  return "Loading...";
}}

/>
);

export default Funnel;

If you run this code in CodeSandbox, you should see something like this.

The above example is connected to the Cube.js backend from our event analytics app.

In the next part, we’ll walk through how to build a dashboard and dynamic query builder, like one in Mixpanel or Amplitude. Part 4 will cover the Retention Analysis. In the final part, we will discuss how to deploy the whole application in the serverless mode to AWS Lambda.

You can check out the full source code of the application here.

And the live demo is available here.

Top 7 Most Popular Node.js Frameworks You Should Know

Top 7 Most Popular Node.js Frameworks You Should Know

Node.js is an open-source, cross-platform, runtime environment that allows developers to run JavaScript outside of a browser. In this post, you'll see top 7 of the most popular Node frameworks at this point in time (ranked from high to low by GitHub stars).

Node.js is an open-source, cross-platform, runtime environment that allows developers to run JavaScript outside of a browser.

One of the main advantages of Node is that it enables developers to use JavaScript on both the front-end and the back-end of an application. This not only makes the source code of any app cleaner and more consistent, but it significantly speeds up app development too, as developers only need to use one language.

Node is fast, scalable, and easy to get started with. Its default package manager is npm, which means it also sports the largest ecosystem of open-source libraries. Node is used by companies such as NASA, Uber, Netflix, and Walmart.

But Node doesn't come alone. It comes with a plethora of frameworks. A Node framework can be pictured as the external scaffolding that you can build your app in. These frameworks are built on top of Node and extend the technology's functionality, mostly by making apps easier to prototype and develop, while also making them faster and more scalable.

Below are 7of the most popular Node frameworks at this point in time (ranked from high to low by GitHub stars).

Express

With over 43,000 GitHub stars, Express is the most popular Node framework. It brands itself as a fast, unopinionated, and minimalist framework. Express acts as middleware: it helps set up and configure routes to send and receive requests between the front-end and the database of an app.

Express provides lightweight, powerful tools for HTTP servers. It's a great framework for single-page apps, websites, hybrids, or public HTTP APIs. It supports over fourteen different template engines, so developers aren't forced into any specific ORM.

Meteor

Meteor is a full-stack JavaScript platform. It allows developers to build real-time web apps, i.e. apps where code changes are pushed to all browsers and devices in real-time. Additionally, servers send data over the wire, instead of HTML. The client renders the data.

The project has over 41,000 GitHub stars and is built to power large projects. Meteor is used by companies such as Mazda, Honeywell, Qualcomm, and IKEA. It has excellent documentation and a strong community behind it.

Koa

Koa is built by the same team that built Express. It uses ES6 methods that allow developers to work without callbacks. Developers also have more control over error-handling. Koa has no middleware within its core, which means that developers have more control over configuration, but which means that traditional Node middleware (e.g. req, res, next) won't work with Koa.

Koa already has over 26,000 GitHub stars. The Express developers built Koa because they wanted a lighter framework that was more expressive and more robust than Express. You can find out more about the differences between Koa and Express here.

Sails

Sails is a real-time, MVC framework for Node that's built on Express. It supports auto-generated REST APIs and comes with an easy WebSocket integration.

The project has over 20,000 stars on GitHub and is compatible with almost all databases (MySQL, MongoDB, PostgreSQL, Redis). It's also compatible with most front-end technologies (Angular, iOS, Android, React, and even Windows Phone).

Nest

Nest has over 15,000 GitHub stars. It uses progressive JavaScript and is built with TypeScript, which means it comes with strong typing. It combines elements of object-oriented programming, functional programming, and functional reactive programming.

Nest is packaged in such a way it serves as a complete development kit for writing enterprise-level apps. The framework uses Express, but is compatible with a wide range of other libraries.

LoopBack

LoopBack is a framework that allows developers to quickly create REST APIs. It has an easy-to-use CLI wizard and allows developers to create models either on their schema or dynamically. It also has a built-in API explorer.

LoopBack has over 12,000 GitHub stars and is used by companies such as GoDaddy, Symantec, and the Bank of America. It's compatible with many REST services and a wide variety of databases (MongoDB, Oracle, MySQL, PostgreSQL).

Hapi

Similar to Express, hapi serves data by intermediating between server-side and client-side. As such, it's can serve as a substitute for Express. Hapi allows developers to focus on writing reusable app logic in a modular and prescriptive fashion.

The project has over 11,000 GitHub stars. It has built-in support for input validation, caching, authentication, and more. Hapi was originally developed to handle all of Walmart's mobile traffic during Black Friday.

20. Node.js Lessons. Data Streams in Node.JS, fs.ReadStream

20. Node.js Lessons. Data Streams in Node.JS, fs.ReadStream

20. Node.js Lessons. Data Streams in Node.JS, fs.ReadStream

Hey all! Our topic for today is Data Streams In Node.js. We will try to learn all the aspects in details for the reason it turns out that on the one hand, common browser JavaScript development lack streams. And on the other hand, knowing and understanding stream principles is necessary for seamless server development because a stream is a versatile way of work with data sources universally used.

We can define two general stream types. The first one is

stream.Readable

It is a built-in class providing streams for reading. Generally, this type itself is never used, while its descendants are quite popular – in particular, we use fs.ReadStream to read from a file. To read from a visitor’s request for its handling, there is a special object familiar to us under its name req, which is the first argument of a request handler.

Stream.Writable

It is a versatile writing method. The very stream.Writable is rarely used, but its descendants – fs.WriteStream and res – are quite common.

There are some other stream types, but the most popular are these two and their variations.

The best way to understand streams is to see how they work in practice. So, right now we’ll start with using fs.ReadStream for reading a file. Let us create a file fs.js:

var fs = require('fs');
 
// fs.ReadStream nherits from stream.Readable
var stream = new fs.ReadStream(__filename);
 
stream.on('readable', function() {
    var data = stream.read();
    console.log(data);
});
 
stream.on('end', function() {
    console.log("THE END");
});

So, we get the module fs connected and create a stream:

var fs = require('fs');
 
var stream = new fs.ReadStream(__filename);

Stream is a JavaScript object receiving information about our resource – in our case, it is a path to the file (__filename) – which can work with this resource. fs.ReadStream implements a standard reading interface described in the stream.Readable class. Let us have a detailed look.

When a stream object new stream.Readable is created, it gets connected to the data source, which is file in our case, and tries to start reading from it. Once it has read something, it imitates the event readable. This event means that all the data have been computed and are contained within an inner stream buffer that can be received using the call read(). Then we can do something with data and wait for the next readable. This cycle will be the same.

Whenever the data source gets empty (however, there are certain sources that never get empty – for example, a random data generator), the file size is limited, so we will have the end, event in the very end meaning there will be no data anymore. Moreover we can call the method destroy() at any step of working with the stream. This method means we do not need the stream anymore and it can be closed, as well as the respective data sources and everything can be cleaned up.

So, let us refer to the original code. Here we create ReadStream, and it immediately wants to open up a file:

var stream = new fs.ReadStream(__filename);

but in our case it doesn’t necessarily mean the same string because any input/output-related operation is performed through libUV. At the same time, libUV has a structure that enables all synchronous input/output handlers to get implemented during the next event loop iteration, or once the current JavaScripthas finished its work. It means, we can seamlessly use all handlers knowing that they will be installed prior to the moment the first data fragment gets read. Launch fs.js.

Look at what has appeared in the console. The first one was the event readable. It outputted data. Right now it is an ordinary buffer, but we can transform it to the string by specifying the coding directly upon the stream opening.

var stream = new fs.ReadStream(__filename, {encoding: 'utf-8'});

Thus, the modification will be automatic. When a file ends, the event endoutputs THE END in the console. Here the file ended almost immediately because it was small at the moment. Let us modify our example a little bit by making a file big.html out of the current file contained in the current directory. Download this HTML file from our repository together with the other lesson materials.

Launch it. The file big.html is big, so the event readable has been initiated several times, and every time we received another data fragment as a buffer. So, let us calculate its length:

var fs = require('fs');
 
// fs.ReadStream nherits from stream.Readable
var stream = new fs.ReadStream("big.html");
 
stream.on('readable', function() {
    var data = stream.read();
    if (data){
        console.log(data.length);
    }
    else {
        console.log('data is null')
    }
});
 
stream.on('end', function() {
    console.log("THE END");
});

Get it launched. These numbers are the read file fragment length. When a stream opens a file, it reads only its part, but not the whole file, and inserts it into its internal variable. The maximum size is exactly 64 KB. Until we call stream.Read, it won’t read further. Once we’ve received the data, the internal buffer cleans up and can be ready for reading another abstract, etc. The last abstract length is 60,959 B. This example has vividly demonstrated the key advantages of stream usage. They help save memory. Whatever is the size our big file, we still handle only its small part at a moment. The second less obvious advantage is versatility of its interface. Here we use the stream ReadStream from the file. But we can replace it any time by any stream from our resource:

var stream = new OurStream("our resource");

It won’t need any change of the left code because streams are, first of all, our interface. So, it means, if theoretically our stream performs all needed events and methods – in particular, it inherits from stream.Readable – everything should be ok. Of course, it will happen only if we do not use any special abilities that only file streams have got. To be more specific, the stream fs.ReadStream has extra events

Here we can see a draft exactly for fs.ReadStream, new events are colored in red. First, it is a file opening, while the last event is its closure. Focus your attention on the fact that if a file is read till its end, the end event occurs followed by close. And if a file is not entirely read – for instance, because of an error or upon calling the destroy method – there will be no end because the file hasn’t been ended. But the event close is always ensured upon a file closure.

Finally, our last, but not least detail here is error handling. So, let us see what will happen, if there is no file.

var stream = new fs.ReadStream("noFile.html");

So, I get it launched. Oops! It crashed! Pay your attention to the fact the streams inherit from EventEmitter. If an error occurs, the whole Node.js process fails. It happens if an error of this kind does not have any handler. That’s why if we do not want our Node.js to fail because of an exception, we should install a handler:

var fs = require('fs');
 
// fs.ReadStream nherits from stream.Readable
var stream = new fs.ReadStream("noFile.html");
 
stream.on('readable', function() {
    var data = stream.read();
    if (data){
        console.log(data.length);
    }
    else {
        console.log('data is null')
    }
});
 
stream.on('error', function(err) {
    if (err.code == 'ENOENT') {
        console.log("File not Found");
    } else {
        console.error(err);
    }
});

So, we use streams to work with data sources in Node.js. Here we’ve analyzed a basic scheme, according to which they work, and a particular example – fs.ReadStream – that can read from a file.

This lesson’s coding can be found in our repository.