Redash Dashboard Tutorial with Cube.js

Redash Dashboard Tutorial with Cube.js

How to build an analytics dashboard with Redash and Cube.js

We recently covered how to build an analytics dashboard using react, vue, or angular. But if you don't want to code and host your own dashboard, you have some open-source alternatives. Today I'll show you how to build an analytics dashboard using Cube.js Backend and Redash. You can find the resulting dashboard live here.

Why Cube.js?

You can always connect Redash directly to your database. You can then write queries in pure SQL and build the same dashboard. It's easy to manage if you have one dashboard and just a couple of queries. But usually, a small dashboard is just a first step in building a complex analytics solution. As your business grows, your volume of data also grows. And soon, you may find a hundred dashboards with thousands of queries used by multiple departments. Now imagine you changed a column in the database. All of the queries should be checked and updated.

Another problem may be the data volume. Say you have a large dashboard powered by some complex queries that take some time to execute. Once you change a filter value on that dashboard, all queries will get executed on the database. In a simple case, it may take a long time to update the dashboard, but if you're using a per-query billed database like Google Bigquery or AWS Athena, this will also hit your monthly bill.

Cube.js solves both of these problems. The queries used in Redash may be updated independently, so updating a column name or adding a join condition is a simple update in the data schema file. Also, Cube.js Backend manages pre-aggregations, which helps with loading lots of data fast and efficiently. Cube.js pre-aggregates data for huge queries in separate tables and updates them in the background. This is efficient in terms of performance and cost. Here you can find a guide on Cube.js Data Schema to understand core concepts better.

Setting up Cube.js Backend

To launch a Cube.js Backend, you'll need to provide a database where your analytics data is stored and a hosting option. In this demo, we'll use Postgresql database with our demo dataset. If you don't have access to your analytics database right now, you can use this dataset:

curl > ecom-dump.sql
createdb ecom
psql --dbname ecom -f ecom-dump.sql

Now when you have your database running, we have to choose a deployment option. For this demo, we'll host our backend on Heroku. There's a guide for other deployment options in the documentation.

Let's install cubejs-cli and create a new Cube.js app:

npm install -g cubejs-cli
cubejs new redash-demo -d postgres
cd redash-demo

Now we have to set a data model. For this demo, we'll define only one cube and skip advanced options, which are covered in the documentation.

cube(`Users`, {
  sql: `SELECT * FROM users`,

measures: {
count: {
sql: id,
type: count

dimensions: {
city: {
sql: city,
type: string

signedUp: {
  sql: `created_at`,
  type: `time`

companyName: {
  sql: `company_name`,
  type: `string`


Now when we have our data model set we can deploy this app to Heroku:

git init
git add -A
git commit -am "Initial commit"
heroku create cubejs-redash-demo
git push heroku master
Creating a Redash Dashboard

To use Cube.js Backend with Redash, you need to create a JSON Data Source.

Once you've done that, you can create a query. There are a couple of options like URL and Authorization Headers that will be shared across all of the queries, so you can create one query and then fork it to create others. Here's a simple query that will count all Orders in our demo dataset:

url: "YOUR-CUBEJS-URL/cubejs-api/v1/load"
Authorization: YOUR-API-TOKEN
path: data
query: '{"measures": ["Orders.count"]}'

Here you need to provide your API URL, which depends on the deployment option you chose. Also, you need to create an API Token to authenticate requests. Now click "Save" and "Execute." You should get a response that looks like this:

Now you can add a visualization. For this query, we'll use a simple "counter" option:

Let's create a more sophisticated query. We'll create a Stacked Bar Chart for the orders. First, let's define the query:

url: ""
Authorization: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.e30.K9PiJkjegbhnw4Ca5pPlkTmZihoOm42w8bja9Qs2qJg
path: data
query: '{

Save and execute the query and you'll get a table, which is the default visualization. Let's create a Bar Chart:

Now we have enough queries to create a dashboard. Click "Create" -> "Dashboard," choose a name, and go ahead adding your query widgets.

And the dashboard is done!

Setting Filters

There's also a Redash option called Parameters. We can set some query values using it. Let's add a daterange filter to our queries:

url: ""
Authorization: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.e30.K9PiJkjegbhnw4Ca5pPlkTmZihoOm42w8bja9Qs2qJg
path: data
query: '{"measures":["Orders.count"],
"timeDimensions": [{
"dimension" : "Orders.createdAt",
"dateRange": ["{{ daterange.start }}","{{ daterange.end }}"]

You can see we've added params using the {{ }} expression. Now let's change the parameter type to daterange:

Click "Save" and "Execute." Now this query has a daterange filter. You can add other parameters to the query in the same manner. Also, you can set multiple queries to use one dashboard-level filter value using the "edit Parameters" menu. Here's an example:

And it's done! Now we have a live dashboard in Redash, which you can publish, query, or even run on a display with auto-update.

Python For Data Analysis | Build a Data Analysis Library from Scratch | Learn Python in 2019

Python For Data Analysis - Build a Data Analysis Library from Scratch - Learn Python in 2019


Immerse yourself in a long, comprehensive project that teaches advanced Python concepts to build an entire library

You’ll learn

  • How to build a Python library similar pandas
  • How to complete a large, comprehensive project
  • Test-driven development with pytest
  • Environment creation
  • Advanced Python topics such as special methods and property decorators
  • A fully-functioning library that you can use to data analysis

Data Science vs Data Analytics vs Big Data

Data Science vs Data Analytics vs Big Data

When we talk about data processing, Data Science vs Big Data vs Data Analytics are the terms that one might think of and there has always been a confusion between them. In this article on Data science vs Big Data vs Data Analytics, I will understand the similarities and differences between them

When we talk about data processing, Data Science vs Big Data vs Data Analytics are the terms that one might think of and there has always been a confusion between them. In this article on Data science vs Big Data vs Data Analytics, I will understand the similarities and differences between them

We live in a data-driven world. In fact, the amount of digital data that exists is growing at a rapid rate, doubling every two years, and changing the way we live. Now that Hadoop and other frameworks have resolved the problem of storage, the main focus on data has shifted to processing this huge amount of data. When we talk about data processing, Data Science vs Big Data vs Data Analytics are the terms that one might think of and there has always been a confusion between them.

In this article on Data Science vs Data Analytics vs Big Data, I will be covering the following topics in order to make you understand the similarities and differences between them.
Introduction to Data Science, Big Data & Data AnalyticsWhat does Data Scientist, Big Data Professional & Data Analyst do?Skill-set required to become Data Scientist, Big Data Professional & Data AnalystWhat is a Salary Prospect?Real time Use-case## Introduction to Data Science, Big Data, & Data Analytics

Let’s begin by understanding the terms Data Science vs Big Data vs Data Analytics.

What Is Data Science?

Data Science is a blend of various tools, algorithms, and machine learning principles with the goal to discover hidden patterns from the raw data.


It also involves solving a problem in various ways to arrive at the solution and on the other hand, it involves to design and construct new processes for data modeling and production using various prototypes, algorithms, predictive models, and custom analysis.

What is Big Data?

Big Data refers to the large amounts of data which is pouring in from various data sources and has different formats. It is something that can be used to analyze the insights which can lead to better decisions and strategic business moves.


What is Data Analytics?

Data Analytics is the science of examining raw data with the purpose of drawing conclusions about that information. It is all about discovering useful information from the data to support decision-making. This process involves inspecting, cleansing, transforming & modeling data.


What Does Data Scientist, Big Data Professional & Data Analyst Do?

What does a Data Scientist do?

Data Scientists perform an exploratory analysis to discover insights from the data. They also use various advanced machine learning algorithms to identify the occurrence of a particular event in the future. This involves identifying hidden patterns, unknown correlations, market trends and other useful business information.

Roles of Data Scientist

What do Big Data Professionals do?

The responsibilities of big data professional lies around dealing with huge amount of heterogeneous data, which is gathered from various sources coming in at a high velocity.

Roles of Big Data Professiona

Big data professionals describe the structure and behavior of a big data solution and how it can be delivered using big data technologies such as Hadoop, Spark, Kafka etc. based on requirements.

What does a Data Analyst do?

Data analysts translate numbers into plain English. Every business collects data, like sales figures, market research, logistics, or transportation costs. A data analyst’s job is to take that data and use it to help companies to make better business decisions.

Roles of Data Analyst

Skill-Set Required To Become Data Scientist, Big Data Professional, & Data Analyst

What Is The Salary Prospect?

The below figure shows the average salary structure of **Data Scientist, Big Data Specialist, **and Data Analyst.

A Scenario Illustrating The Use Of Data Science vs Big Data vs Data Analytics.

Now, let’s try to understand how can we garner benefits by combining all three of them together.

Let’s take an example of Netflix and see how they join forces in achieving the goal.

First, let’s understand the role of* Big Data Professional* in Netflix example.

Netflix generates a huge amount of unstructured data in forms of text, audio, video files and many more. If we try to process this dark (unstructured) data using the traditional approach, it becomes a complicated task.

Approach in Netflix

Traditional Data Processing

Hence a Big Data Professional designs and creates an environment using Big Data tools to ease the processing of Netflix Data.

Big Data approach to process Netflix data

Now, let’s see how Data Scientist Optimizes the Netflix Streaming experience.

Role of Data Scientist in Optimizing the Netflix streaming experience

1. Understanding the impact of QoE on user behavior

User behavior refers to the way how a user interacts with the Netflix service, and data scientists use the data to both understand and predict behavior. For example, how would a change to the Netflix product affect the number of hours that members watch? To improve the streaming experience, Data Scientists look at QoE metrics that are likely to have an impact on user behavior. One metric of interest is the rebuffer rate, which is a measure of how often playback is temporarily interrupted. Another metric is bitrate, that refers to the quality of the picture that is served/seen — a very low bitrate corresponds to a fuzzy picture.

2. Improving the streaming experience

How do Data Scientists use data to provide the best user experience once a member hits “play” on Netflix?

One approach is to look at the algorithms that run in real-time or near real-time once playback has started, which determine what bitrate should be served, what server to download that content from, etc.

For example, a member with a high-bandwidth connection on a home network could have very different expectations and experience compared to a member with low bandwidth on a mobile device on a cellular network.

By determining all these factors one can improve the streaming experience.

3. Optimize content caching

A set of big data problems also exists on the content delivery side.

The key idea here is to locate the content closer (in terms of network hops) to Netflix members to provide a great experience. By viewing the behavior of the members being served and the experience, one can optimize the decisions around content caching.

4. Improving content quality

Another approach to improving user experience involves looking at the quality of content, i.e. the video, audio, subtitles, closed captions, etc. that are part of the movie or show. Netflix receives content from the studios in the form of digital assets that are then encoded and quality checked before they go live on the content servers.

In addition to the internal quality checks, Data scientists also receive feedback from our members when they discover issues while viewing.

By combining member feedback with intrinsic factors related to viewing behavior, they build the models to predict whether a particular piece of content has a quality issue. Machine learning models along with natural language processing (NLP) and text mining techniques can be used to build powerful models to both improve the quality of content that goes live and also use the information provided by the Netflix users to close the loop on quality and replace content that does not meet the expectations of the users.

So this is how Data Scientist optimizes the Netflix streaming experience.

Now let’s understand how Data Analytics is used to drive the Netflix success.

Role of Data Analyst in Netflix

The above figure shows the different types of users who watch the video/play on Netflix. Each of them has their own choices and preferences.

So what does a Data Analyst do?

Data Analyst creates a user stream based on the preferences of users. For example, if user 1 and user 2 have the same preference or a choice of video, then data analyst creates a user stream for those choices. And also –
Orders the Netflix collection for each member profile in a personalized way.We know that the same genre row for each member has an entirely different selection of videos.Picks out the top personalized recommendations from the entire catalog, focusing on the titles that are top on ranking.By capturing all events and user activities on Netflix, data analyst pops out the trending video.Sorts the recently watched titles and estimates whether the member will continue to watch or rewatch or stop watching etc.
I hope you have *understood *the *differences *& *similarities *between Data Science vs Big Data vs Data Analytics.

Why is Python used so widely in big data analysis despite of it being slow?

I have noticed that Python is used a lot in big data.

People call C functions from Python, then process it further in Python, then call some other libraries, possibly again in Python that also look at gigantic data arrays.

Isn't this an extremely inefficient way of doing things? Python is much slower than C++. How can it make sense to use Python in situations when large data is processed, performance-wise?

One company asked me the question "How to bind a C-function to Python that computes a 1GB floating-point array, and then to compute a total of all numbers in Python?" They ask this question from the position when they assume that the use of Python is totally normal, and one should do such things as computing a 1GB fp array in C, then copying it into a gigantic Python list, then computing a total of numbers in Python. But this question in itself assumes that things are done extremely inefficiently, isn't it? They are just indoctrinated and think that things that they do are normal when they are far from normal.

So why is Python used so widely, as opposed to using C++, for example? Is this because many people feel that Python is much easier and C++ is too hard?