The vast majority of data visualization consists of a static set of data that is pulled upon user request. So, the data only gets updated when the user wants to refresh it. It is a request-response pattern: User requests info, server responds with data, client side visualizations are populated.
Real-time data visualization is applicable when you have data that is rapidly updating in real time and your application needs to keep a ‘pulse’ on and monitor data passively. This means we have charts that update automatically while you keep your browser open. Just some examples where you might want real-time data viz include but aren’t limited to:
This tutorial will aid those wanting to understand the components of a very basic real time data visualization implementation.
JS Libraries:
We are going to use these tools to build a websocket server that publishes some mock data every second. We will then build some static interactive charts with d3,crossfilter, and dc.js .
Finally, we will have a d3 chart connect to our websocket server, and updates to the chart will happen in real time.
I’m using python 2.7 for this walkthrough.
Download Python here. Install it then open a cmd or terminal and type “python” to ensure proper setup — if you get a command prompt everything should be good.
If you have issues or errors installing python there are plenty of resources on stackoverflow to help better than we have time for here.
Next:
First we’ll build our data source. For this tutorial we’re building a simple websocket server that periodically sends out new data.
In your ‘rt-data-viz’ folder, create a new file & save it with the name “websocket_server.py”
We need to install the tornado
package to run our websockets and ioloop:
In your terminal/console run pip install tornado
In websocket_server.py, we can start coding now.
First, import all required packages:
import time import random import json import datetime from tornado import websocket, web, ioloop from datetime import timedelta from random import randint
Using tornado’s websocket, we need to build a handler class. I’m adding some empty functions that we’ll fill out next:
class WebSocketHandler(websocket.WebSocketHandler):
def open(self):
print 'Connection established.'
#close connection
def on_close(self):
print 'Connection closed.'
# Our function to send new (random) data for charts
def send_data(self):
print "Sending Data"
empty_handler.py
Now to fill out our web socket handler. The most important function for us is send_data()
. It it where we’re building a json object of random data, sending it via self.write_message()
.
After we send the message we use ioLoop to create a timeout that will send data periodically. Finally, we create the websocket web app instance, set it to listen on port 8001, and start our ioloop instance. The completed websocket_server.py:
import time
import random
import json
import datetime
from tornado import websocket, web, ioloop
from datetime import timedelta
from random import randint
paymentTypes = ["cash", "tab", "visa","mastercard","bitcoin"]
namesArray = ['Ben', 'Jarrod', 'Vijay', 'Aziz']
class WebSocketHandler(websocket.WebSocketHandler):
#on open of this socket
def open(self):
print 'Connection established.'
#ioloop to wait for 3 seconds before starting to send data
ioloop.IOLoop.instance().add_timeout(datetime.
timedelta(seconds=3), self.send_data)
#close connection
def on_close(self):
print 'Connection closed.'
# Our function to send new (random) data for charts
def send_data(self):
print "Sending Data"
#create a bunch of random data for various dimensions we want
qty = random.randrange(1,4)
total = random.randrange(30,1000)
tip = random.randrange(10, 100)
payType = paymentTypes[random.randrange(0,4)]
name = namesArray[random.randrange(0,4)]
spent = random.randrange(1,150);
year = random.randrange(2012,2016)
#create a new data point
point_data = {
'quantity': qty,
'total' : total,
'tip': tip,
'payType': payType,
'Name': name,
'Spent': spent,
'Year' : year,
'x': time.time()
}
print point_data
#write the json object to the socket
self.write_message(json.dumps(point_data))
#create new ioloop instance to intermittently publish data
ioloop.IOLoop.instance().add_timeout(datetime.timedelta(seconds=1), self.send_data)
if __name__ == "__main__":
#create new web app w/ websocket endpoint available at /websocket
print "Starting websocket server program. Awaiting client requests to open websocket ..."
application = web.Application([(r'/websocket', WebSocketHandler)])
application.listen(8001)
ioloop.IOLoop.instance().start()
handler_complete.py
We can start this server by going to our command prompt in the rt-data-viz folder and typing python websocket_server.py
. You’ll notice nothing will happen.
In order for the socket to become active, we need to have our client-side code open the connection on that port. We’ll be able to see this after we create & run our client-side code next.
We’re going to use d3.js
and crossfilter.js
to create two charts that share the same data.
crossfilter
helps us explore multivariate data sets with functions that can create dimensions based on the data and group variants.The dc.js
library combines both d3.js and crossfilter.js together so that we can use the actual charts themselves to filter the data when on user interaction.
Visit their websites for further detail: d3.js , crossfilter.js, dc.js
Using one set of incoming data, we are going to create two charts:
If a user clicks a part of the chart, it will filter both the chart you click and the other chart will also reflect the data change.
Here I clicked the year 2014 on the chart and you can see the data reflects only 2014, where Aziz and Jarrod have no data so far:
To create this, add anindex.html file
in your rt-data-viz
folder. Here’s a basic template with the required libraries:
<html>
<head>
<meta charset="utf-8">
<title>Dimensional Charting</title>
<link rel="stylesheet" type="text/css" href="http://dc-js.github.io/dc.js/css/dc.css"/>
<script type="text/javascript" src="http://dc-js.github.io/dc.js/js/d3.js"></script>
<script type="text/javascript" src="http://dc-js.github.io/dc.js/js/crossfilter.js"></script>
<script type="text/javascript" src="http://dc-js.github.io/dc.js/js/dc.js"></script>
</head>
<body>
<!--Add charts here-->
</body>
</html>
datavis-tempate.html
Inside our body, we need to create two divs to hold each of our respective charts and data:
<div id=”chart-ring-year”></div>
<div id=”chart-row-spenders”></div>
Then we are going to start with our d3/crossfilter work. Create a static array of json objects with mock data like so:
var data1 = [
{Name: ‘Ben’, Spent: 330, Year: 2014, ‘total':1},
{Name: ‘Aziz’, Spent: 1350, Year: 2012, ‘total':2},
{Name: ‘Vijay’, Spent: 440, Year: 2014, ‘total':2},
{Name: ‘Jarrod’, Spent: 555, Year: 2015, ‘total':1},];
Then we create a variable to hold our eventually ‘crossfiltered’ data: var xfilter = crossfilter(data1);
With that data we can now create dimensions using crossfilter’s dimension()
function. We have 3 dimensions we’re going to use for this example: Name, Year, and Spent:
var yearDim = xfilter.dimension(function(d) {return +d.Year;});
var spendDim = xfilter.dimension(function(d) {return Math.floor(d.Spent/10);});
var nameDim = xfilter.dimension(function(d) {return d.Name;});
With those dimensions set, we can now group them. To simplify it a bit, the groups are basically the end result that populates each chart. In the images above, the pie chart shows each year and the size of each piece of the pie represents the amount spent that year. So, we would call our group spendPerYear and use that in our chart. That looks like:
var spendPerYear = yearDim.group().reduceSum(function(d) {return +d.Spent;});
And our second chart shows the amount spent per person (or “Name” in our json object):
var spendPerName = nameDim.group().reduceSum(function(d) {return +d.Spent;});
Our code looks like this so far:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>Dimensional Charting</title>
<link rel="stylesheet" type="text/css" href="http://dc-js.github.io/dc.js/css/dc.css"/>
<script type="text/javascript" src="http://dc-js.github.io/dc.js/js/d3.js"></script>
<script type="text/javascript" src="http://dc-js.github.io/dc.js/js/crossfilter.js"></script>
<script type="text/javascript" src="http://dc-js.github.io/dc.js/js/dc.js"></script>
</head>
<body>
<div id="chart-ring-year"></div>
<div id="chart-row-spenders"></div>
<script type="text/javascript">
var yearRingChart = dc.pieChart("#chart-ring-year"),
spenderRowChart = dc.rowChart("#chart-row-spenders");
var data1 = [
{Name: 'Ben', Spent: 330, Year: 2014, 'total':1},
{Name: 'Aziz', Spent: 1350, Year: 2012, 'total':2},
{Name: 'Vijay', Spent: 440, Year: 2014, 'total':2},
{Name: 'Jarrod', Spent: 555, Year: 2015, 'total':1},
];
// set crossfilter with first dataset
var xfilter = crossfilter(data1),
yearDim = xfilter.dimension(function(d) {return +d.Year;}),
spendDim = xfilter.dimension(function(d) {return Math.floor(d.Spent/10);}),
nameDim = xfilter.dimension(function(d) {return d.Name;}),
spendPerYear = yearDim.group().reduceSum(function(d) {return +d.Spent;}),
spendPerName = nameDim.group().reduceSum(function(d) {return +d.Spent;}),
</script>
</body>
</html>
data-setup-crossfilter-d3.html
We now have our data, dimensions, and groups setup. Next, we will render the charts.
Let’s create a rendering function. We’ll call it render_plots()
. It will render charts with dc.js’s renderAll()
function:
function render_plots(){ #chart plots go here soon #render all the charts dc.renderAll() }
Inside render_plots()
We’re going to create a pie chart, which is going to render to our yearRingChart
div we created previously. In the pie chart we are using the year dimension and the spendPerYear grouping. We also set the widgth, height, and innerRadius attributes which is specific to pie charts:
yearRingChart.width(200).height(200).dimension(yearDim).group(spendPerYear).innerRadius(50);
For the rowChart/ bar chart we set it up similarly:
spenderRowChart.width(250).height(200).dimension(nameDim).group(spendPerName);
We are going to be adding a lot more data to this bar chart, so we’ll need to resize it dynamically to ‘fit’ larger data sets. We do that by setting elasticX to true:
spenderRowChart.width(250).height(200).dimension(nameDim).group(spendPerName)**.elasticX(true)**;
function render_plots(){
yearRingChart
.width(200).height(200)
.dimension(yearDim)
.group(spendPerYear)
.innerRadius(50);
spenderRowChart
.width(250).height(200)
.dimension(nameDim)
.group(spendPerName)
.elasticX(true);
dc.renderAll();
}
dc-js-render-function.js
And we can now see our functioning charts code…here’s our final d3.js + crossfilter.js + dc.js which calls the render_plot() function :
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>Dimensional Charting</title>
<link rel="stylesheet" type="text/css" href="http://dc-js.github.io/dc.js/css/dc.css"/>
<script type="text/javascript" src="http://dc-js.github.io/dc.js/js/d3.js"></script>
<script type="text/javascript" src="http://dc-js.github.io/dc.js/js/crossfilter.js"></script>
<script type="text/javascript" src="http://dc-js.github.io/dc.js/js/dc.js"></script>
</head>
<body>
<div id="chart-ring-year"></div>
<div id="chart-row-spenders"></div>
<script type="text/javascript">
var yearRingChart = dc.pieChart("#chart-ring-year"),
spenderRowChart = dc.rowChart("#chart-row-spenders");
var data1 = [
{Name: 'Ben', Spent: 330, Year: 2014, 'total':1},
{Name: 'Aziz', Spent: 1350, Year: 2012, 'total':2},
{Name: 'Vijay', Spent: 440, Year: 2014, 'total':2},
{Name: 'Jarrod', Spent: 555, Year: 2015, 'total':1},
];
// set crossfilter with first dataset
var xfilter = crossfilter(data1),
yearDim = xfilter.dimension(function(d) {return +d.Year;}),
spendDim = xfilter.dimension(function(d) {return Math.floor(d.Spent/10);}),
nameDim = xfilter.dimension(function(d) {return d.Name;}),
spendPerYear = yearDim.group().reduceSum(function(d) {return +d.Spent;}),
spendPerName = nameDim.group().reduceSum(function(d) {return +d.Spent;});
function render_plots(){
yearRingChart
.width(200).height(200)
.dimension(yearDim)
.group(spendPerYear)
.innerRadius(50);
spenderRowChart
.width(250).height(200)
.dimension(nameDim)
.group(spendPerName)
.elasticX(true);
dc.renderAll();
}
render_plots();
</script>
</body>
</html>
datavis-static.html
This is cool, you’ll want to see it in action. You’ll need to run a local webserver and point it to your file: Open up your folder rt-data-viz in cmd prompt/terminal and type:
python -m SimpleHTTPServer 3000
The
_SimpleHTTPServer_
module has been merged into_http.server_
in Python 3.0. The 2to3 tool will automatically adapt imports when converting your sources to 3.0.so for python 3 the command is : py -m http.server 3000
Keep your terminal open and open a tab on your web browser of choice. Go to the address http://localhost:3000/ to see it live.
So, now we have a websocket server that is posting new data every second or so, and and we have some static charts that expose d3 and crossfilter functionality. We need make them talk now.
We now need to modify our chart code to connect to the websocket , handle data from the websocket, and correctly update the charting solution in real-time.
In the javascript of our index file, we need to create a new websocket connection that connects to our running websocket_server.py on port 8001. Do this by creating a new websocket like so:
var connection = new WebSocket(‘ws://localhost:8001/websocket’);
Then we need a function that will update our charts any time the websocket publishes an update. We do that with Websocket’s onmessage()
function:
connection.onmessage = function(event){
//get data & parse var newData = JSON.parse(event.data);
#### put data into an array of json objects var updateObject = [{“Name”: newData.Name,“Year” : newData.Year,“Spent”: newData.Spent,“payType: newData.payType }]
####add this new array into our data xfilter.add(updateObject);
####redraw our charts with new data dc.redrawAll();
Here’s the function in code:
connection.onmessage = function(event) {
var newData = JSON.parse(event.data);
var updateObject =[{
"Name": newData.Name,
"Year": newData.Year,
"Spent": newData.Spent,
"payType": newData.payType
}]
//resetData(ndx, [yearDim, spendDim, nameDim]);
xfilter.add(updateObject);
dc.redrawAll();
}
connection-onmessage.js
And here’s the entire index.html which will work with your websocket server to retrieve data as it is published:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>Dimensional Charting</title>
<link rel="stylesheet" type="text/css" href="http://dc-js.github.io/dc.js/css/dc.css"/>
<script type="text/javascript" src="http://dc-js.github.io/dc.js/js/d3.js"></script>
<script type="text/javascript" src="http://dc-js.github.io/dc.js/js/crossfilter.js"></script>
<script type="text/javascript" src="http://dc-js.github.io/dc.js/js/dc.js"></script>
</head>
<body>
<div id="chart-ring-year"></div>
<div id="chart-row-spenders"></div>
<script type="text/javascript">
var yearRingChart = dc.pieChart("#chart-ring-year"),
spenderRowChart = dc.rowChart("#chart-row-spenders");
var connection = new WebSocket('ws://localhost:8001/websocket');
var data1 = [
{Name: 'Ben', Spent: 330, Year: 2014, 'total':1},
{Name: 'Aziz', Spent: 1350, Year: 2012, 'total':2},
{Name: 'Vijay', Spent: 440, Year: 2014, 'total':2},
{Name: 'Jarrod', Spent: 555, Year: 2015, 'total':1},
];
// set crossfilter with first dataset
var xfilter = crossfilter(data1),
yearDim = xfilter.dimension(function(d) {return +d.Year;}),
spendDim = xfilter.dimension(function(d) {return Math.floor(d.Spent/10);}),
nameDim = xfilter.dimension(function(d) {return d.Name;}),
spendPerYear = yearDim.group().reduceSum(function(d) {return +d.Spent;}),
spendPerName = nameDim.group().reduceSum(function(d) {return +d.Spent;});
function render_plots(){
yearRingChart
.width(200).height(200)
.dimension(yearDim)
.group(spendPerYear)
.innerRadius(50);
spenderRowChart
.width(250).height(200)
.dimension(nameDim)
.group(spendPerName)
.elasticX(true);
dc.renderAll();
}
render_plots();
// data reset function (adapted)
function resetData(ndx, dimensions) {
var yearChartFilters = yearRingChart.filters();
var spenderChartFilters = spenderRowChart.filters();
yearRingChart.filter(null);
spenderRowChart.filter(null);
xfilter.remove();
yearRingChart.filter([yearChartFilters]);
spenderRowChart.filter([spenderChartFilters]);
}
connection.onmessage = function(event) {
var newData = JSON.parse(event.data);
var updateObject =[{
"Name": newData.Name,
"Year": newData.Year,
"Spent": newData.Spent,
"payType": newData.payType
}]
//resetData(ndx, [yearDim, spendDim, nameDim]);
xfilter.add(updateObject);
dc.redrawAll();
}
</script>
</body>
</html>
datavis-full.html
So how do you finally make this all come together? We really have to spin up two servers to run the websocket_server and the client side code, respectively.
If you havn’t figured it out already, you start your server first , then your client. Open a command prompt in the rt-data-viz folder and start up the websocket:
python websocket_server.py
Then open another prompt and run the client:
python -m SimpleHTTPServer 3000
The
SimpleHTTPServer
module has been merged intohttp.server
in Python 3.0. The 2to3 tool will automatically adapt imports when converting your sources to 3.0.so for python 3 the command is : py -m http.server 3000
Open your localhost:3000 in your browser and after a few seconds you should see the charts updating! Not only are they updating with new data from the websocket server, but you can click on the charts and they will crossfilter each other.
Our end product will look as depicted in this youtube video:
We’ve covered websockets, d3, crossfilter, and dcjs. Hopefully you have taken a lot of value out of my efforts here! Let me know what you think. Tweet me @benjaminmbrown for fastest responses.
All the code is available on my github tutorial repo:
https://github.com/benjaminmbrown/real-time-data-viz-d3-crossfilter-websocket-tutorial
#python #D3 #programming