Nowadays, many domains have access to continuous flows of data. This firehose of information can often be difficult to manage for individuals or teams without some degree of automation. I’d like to start with a few definitions prior to demonstrating some of the exciting new developments in the field of stream learning.

Data stream:_ a near-continuous source of data. E.g. sensor data or telemetry data._

Stream learning:_ the sub-field of machine learning focused on developing techniques that specialise in learning from data streams._

Now that we’re on the same page, I’d like to introduce the toolkit! I’ll be making use of the new(ish) python library scikit-multiflow.

Image for post

Continuous data streams are fast becoming the norm in many machine learning applications. Purpose-built libraries like scikit-multiflow provide a familiar user interface for streaming data. Image from Unsplash (Joshua Sortino)

For the purposes of this short piece, I will use skmultiflow’s built in data generators to create data streams.

Here I use the AnomalySineGenerator class to generate a dataset containing anomalous points.

stream = AnomalySineGenerator(n_samples=1000, n_anomalies=250)

This generator randomly samples the Sine and Cosine functions to generate data that should live within a given range ± some noise.

To give an idea of what the dataset looks like, I’ve plotted it below with the anomalies highlighted in orange.

Image for post

A plot of the generated dataset. Anomalous points are highlighted in orange. Image by Author.

Now we can use skmultiflow’s Streaming-Half-Space-Tree anomaly detector and can assess how well it detects anomalies in our dataset.

half_space_trees = HalfSpaceTrees()

The HS-Tree anomaly detection technique used in skmultiflow was introduced in [1]. Although, coupling HS-Trees with streaming data is non-trivial, conceptually the algorithm works in a simple manner.

#machine-learning #steam-learning #python #deep learning

Detecting anomalies in data streams using half space trees
13.30 GEEK