A nomalies in systems occur rarely. The validation layers stand guard over correctness by catching them out and eliminating them from the process. A cash withdrawal request in a place that is unusual for the card owner or a sensor reading that exceeds the norms can be verified based on profiles or historical data. However, what happens when the event does not differ from the norm at first glance?

Multidimensional nature of events

Anomalies are not easy to detect. It is often the case that the values of the adopted characteristics subtly deviate from the correct distribution, or the deviation from the norm is only noticeable after taking into account a series of events and time characteristics. In such cases, the standard approach is to analyze the characteristics in terms of e.g. their mutual correlation.

Mirek Mamczur elaborated on this very well in his post.

For our needs, we will generate an artificial data set where one of the classes will be considered anomalies. The events will have 15 characteristics and they will be clustered fairly close together with a standard deviation of 1.75.

import matplotlib.pyplot as plt
	import pandas as pd
	import numpy as np
	from mpl_toolkits.mplot3d import Axes3D
	from sklearn import datasets, decomposition, preprocessing

	x, y = datasets.make_blobs(
		n_samples=500,
		n_features=15,
		centers=2,
		center_box=(-4.0, 4.0),
		cluster_std=1.75,
		random_state=42)

	x = preprocessing.MinMaxScaler().fit_transform(x)
	pca = decomposition.PCA(n_components=3)
	pca_result = pca.fit_transform(x)
	print(pca.explained_variance_ratio_)

	pca_df = pd.DataFrame(data=pca_result, columns=['pc_1', 'pc_2', 'pc_3'])
	pca_df = pd.concat([pca_df, pd.DataFrame({'label': y})], axis=1)

	ax = Axes3D(plt.figure(figsize=(8, 8)))
	ax.scatter(xs=pca_df['pc_1'], ys=pca_df['pc_2'], zs=pca_df['pc_3'], c=pca_df['label'], s=25)
	ax.set_xlabel("pc_1")
	ax.set_ylabel("pc_2")
	ax.set_zlabel("pc_3")
	plt.show()
view raw
pca.py hosted with ❤ by GitHub

To get the model to make an effort, we can force closer data generation by reducing [center_box].

Image for post

#keras #autoencoder #anomaly-detection #machine-learning #artificial-intelligence #deep learning

Anomaly detection with Autoencoders
1.45 GEEK