Clear and Creepy Danger of Machine Learning: Hacking Passwords. ... All the more reason we need to move to more advanced authentication systems that don't rely on just passwords such as FIDO2.Preface
Not too long ago, it was considered state of the art research to make a computer distinguish cats vs dogs. Now image classification is ‘Hello World’ of Machine Learning (ML), something one can implement in just a few lines of code using TensorFlow. In fact, in just a few short years the field of ML has advanced so much that today it is possible to build a potentially life saving, or lethal, application with equal ease. Thus it has become necessary to discuss both the use and abuse of the technology with the hope that we can find ways to mitigate or safeguard against the abuse. In this article, I will present one potential abuse of the technology — hacking passwords using ML.
To be more specific (see Fig. 1), can we figure out what someone is typing, just by listening to the keystrokes? As one can imagine, it has some serious security implications, e.g., hacking passwords.
So I worked on a project called kido (= keystroke decode) to explore if this was possible (https://github.com/tikeswar/kido).Outline
This can be treated as a supervised ML problem, and we will go over all the steps.
Used Python, Keras, and TensorFlow for this project.Data Gathering
The first step is, how do we collect the data to train a model?
There are many ways one can go about it, but just to prove if this idea works or not, I used my MacBook Pro keyboard to type, and QuickTime Player to record the audio of typing through the inbuilt mic (Fig. 2).
Fig. 2: Using MacBook Pro to make training data
This approach has couple of advantages, 1. the data has less variability, and thus, 2. it helps us focus on proving (or disproving) the idea without much distraction.Data Preparation
The next step is to prep the data so that we can feed it to a Neural Network (NN) for training.
Fig. 3: Converting mp4 to wav, and then splitting
Quicktime saves the recorded audio as mp4. We first convert the mp4 to wav, as there are good Python libraries to work with the wav files. Each spike in the top-right subplot corresponds to a keystroke (see Fig. 3). Then we split the audio into individual chunks using silence detection, so that each chunk contains only one letter. Now we could feed these individual chunks to a NN, but there is a better approach.
Fig. 4: Converting individual chunk to spectrogram
We convert the individual chunks into spectrograms (Fig. 4). And now we have images that are much more informative and easier to work with using a Convolutional Neural Network (CNN).
To train the network I collected about 16,000 samples as described above, making sure each letter had at least 600 samples (Fig. 5).
Fig. 5: Data samples
Then the data was shuffled and split into training and validation sets. Each letter had about 500 training samples + 100 validation samples (Fig. 6).
Fig. 6. Training-Validation split
So, in a nutshell, this is the ML problem we have … see Fig. 7.
Training and Validation
Fig. 7: The ML problem in a nutshell
I used a fairly small and simple network architecture (based on Laurence Moroney’s rock-paper-scissor example). See Fig. 8 — the input image is scaled to 150 x 150 pixels, and it has 3 color channels. Then it goes through a series of convolution + pooling layers, gets flattened (used dropout to prevent over-fitting), gets fed to a fully-connected layer, and the output layer at the end. The output layer has 26 classes, corresponding to each letters.
Fig. 8: Network architecture
In TensorFlow the model looks like:
model = tf.keras.models.Sequential([ # 1st convolution tf.keras.layers.Conv2D(64, (3,3), activation='relu', input_shape=(150, 150, 3)), tf.keras.layers.MaxPooling2D(2, 2), # 2nd convolution tf.keras.layers.Conv2D(64, (3,3), activation='relu'), tf.keras.layers.MaxPooling2D(2,2), # 3rd convolution tf.keras.layers.Conv2D(128, (3,3), activation='relu'), tf.keras.layers.MaxPooling2D(2,2), # 4th convolution tf.keras.layers.Conv2D(128, (3,3), activation='relu'), tf.keras.layers.MaxPooling2D(2,2), # Flatten the results to feed into a DNN tf.keras.layers.Flatten(), tf.keras.layers.Dropout(0.5), # FC layer tf.keras.layers.Dense(512, activation='relu'), # Output layer tf.keras.layers.Dense(26, activation='softmax') ])
and the model summary:
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_4 (Conv2D) (None, 148, 148, 64) 1792 _________________________________________________________________ max_pooling2d_4 (MaxPooling2 (None, 74, 74, 64) 0 _________________________________________________________________ conv2d_5 (Conv2D) (None, 72, 72, 64) 36928 _________________________________________________________________ max_pooling2d_5 (MaxPooling2 (None, 36, 36, 64) 0 _________________________________________________________________ conv2d_6 (Conv2D) (None, 34, 34, 128) 73856 _________________________________________________________________ max_pooling2d_6 (MaxPooling2 (None, 17, 17, 128) 0 _________________________________________________________________ conv2d_7 (Conv2D) (None, 15, 15, 128) 147584 _________________________________________________________________ max_pooling2d_7 (MaxPooling2 (None, 7, 7, 128) 0 _________________________________________________________________ flatten_1 (Flatten) (None, 6272) 0 _________________________________________________________________ dropout_1 (Dropout) (None, 6272) 0 _________________________________________________________________ dense_2 (Dense) (None, 512) 3211776 _________________________________________________________________ dense_3 (Dense) (None, 26) 13338 ================================================================= Total params: 3,485,274 Trainable params: 3,485,274 Non-trainable params: 0
The training result is shown in Fig. 9. In about 13 epochs it converges to 80% validation accuracy with 90% training accuracy. I was pleasantly surprised to get this level of accuracy, given the complexity of the problem and simple network architecture used.
Fig. 9: Training and validation accuracy
The result so far looks very promising … but, please note that this is character-level accuracy, not word-level accuracy.
What does that mean? In order to guess a password, we have to predict every single characters right, not just most of them right! See Fig. 10.
Fig. 10: Guessing a password
So, in order to test the model I digitized another 200 different passwords from the rockyou.txt list, and then tried to predict the words using the model we just trained (Fig. 11).
Fig. 11: Testing the model
Fig. 12 shows the test accuracy. The bar charts show character-level accuracy (the left chart shows number of right and wrong, while the right chart shows the same in percentage). The test accuracy is about 49% for character-level and 1.5% for word level (the network got 3 out of the 200 test words completely right).
Fig. 12: Test accuracy
Given the complexity of the task, 1.5% word-level accuracy is not bad! But can we improve the accuracy?Error Analysis
Let’s analyze the individual errors, and see if there are ways we can improve the prediction accuracy.
Fig. 13: Sample test results
Fig. 13 shows some sample test results. The first column contains the actual test words, the middle column contains the respective predicted words where individual characters are color coded to show right (green) and wrong (red) predictions. The third column only shows the correctly predicted characters with the incorrectly predicted characters replaced by an underscore (for easier visualization).
For the word ‘aaron’ our model gets barely one character right, for ‘canada’ it gets most of the characters right, and for ‘lokita’ it gets all characters right. As mentioned in Fig. 12, word-level accuracy was only 1.5%.
Squinting at the test examples (Fig. 14), especially ‘canada’, we realize it gets most of the characters right and is very close to the actual word. So, what if we pass the CNN result through a spellchecker?!
Fig. 14: Squinting at the test examples
This is precisely what I did (Fig. 15), and sure enough it boosted the accuracy from 1.5% to 8%! So with a fairly simple model architecture + spellchecker we can predict 8 out of 100 passwords correctly … this is nontrivial!!
Fig. 15: Use a spellchecker?
I think if we employed a sequence model (RNN?, Transformer?), instead of a simple spellchecker, we can get even higher word-level accuracy … and this can be a topic for further studies.
But let’s look even more closely at the test results (Fig. 16). We notice that ‘a’ gets predicted as ‘s’, ’n’ as ‘b’, etc.
Fig. 16: Squinting more at the test examples
So what if we map the errors on the keyboard? And once we have that mapped (see Fig. 17), is the error correlated to proximity? It seems like!
Fig. 17: Mapping the errors on the keyboard
Next, can we quantify this error correlation with proximity?
Fig. 18 shows the MacBook Pro keyboard with the mic and the key locations plotted to scale. Fig. 19 shows the error maps on the digitized keyboard for some sample letters.
Fig. 18: MacBook Pro keyboard with the mic and key locations (plotted to scale)
Fig. 19: Error maps for sample letters
In Fig. 19, the top-left plot shows that ‘a’ gets wrongly predicted as ‘z’, ‘x’, ‘y’, ‘k’, ‘s’, ‘w’, or ‘q’. The other sub-plots are interpreted similarly.
Fig. 19 gives a more clear indication that the error may be correlated to proximity. However, can we get a even more quantitative measure?
d_ref be the distance of the reference letter from the mic,
d_predicted be the distance of the predicted letter from the mic, and
d be the absolute value of the difference between
d_predicted (see Fig. 20).
Fig. 20: Some definitions
Fig. 21 shows histogram of the number of errors binned w.r.t.
d. We see a very clear trend — most errors are from the close proximity! This also means we can probably improve the model accuracy with more data, bigger network, or a network architecture that can capture this better.
Fig. 21: Binned errors w.r.t. d
But what about the mic location — is the error correlated to how far a key is from the mic? To investigate this, the % error plot from Fig. 12 was rearranged such that the letters on the x-axis are in the increasing order of distance from the mic (see Fig. 22). No strong correlation is seen here w.r.t.
d_ref, indicating the errors are independent of the mic location.
Fig. 22: Errors w.r.t d_ref
Fig. 22 highlights a very important point — one can place a mic anywhere to listen to the keystrokes and be able to hack! Creepy!Model Enhancements
For this study I had made some simplifications just to see if the idea of hacking simply by listening to the keystrokes works. Following are some thoughts on improving the model to handle more complex and real-life scenarios.
Keeping in mind the simplifications made for this study,
GitHub Link: https://github.com/tikeswar/kido
This complete Machine Learning full course video covers all the topics that you need to know to become a master in the field of Machine Learning.
Machine Learning Full Course | Learn Machine Learning | Machine Learning Tutorial
It covers all the basics of Machine Learning (01:46), the different types of Machine Learning (18:32), and the various applications of Machine Learning used in different industries (04:54:48).This video will help you learn different Machine Learning algorithms in Python. Linear Regression, Logistic Regression (23:38), K Means Clustering (01:26:20), Decision Tree (02:15:15), and Support Vector Machines (03:48:31) are some of the important algorithms you will understand with a hands-on demo. Finally, you will see the essential skills required to become a Machine Learning Engineer (04:59:46) and come across a few important Machine Learning interview questions (05:09:03). Now, let's get started with Machine Learning.
Below topics are explained in this Machine Learning course for beginners:
Basics of Machine Learning - 01:46
Why Machine Learning - 09:18
What is Machine Learning - 13:25
Types of Machine Learning - 18:32
Supervised Learning - 18:44
Reinforcement Learning - 21:06
Supervised VS Unsupervised - 22:26
Linear Regression - 23:38
Introduction to Machine Learning - 25:08
Application of Linear Regression - 26:40
Understanding Linear Regression - 27:19
Regression Equation - 28:00
Multiple Linear Regression - 35:57
Logistic Regression - 55:45
What is Logistic Regression - 56:04
What is Linear Regression - 59:35
Comparing Linear & Logistic Regression - 01:05:28
What is K-Means Clustering - 01:26:20
How does K-Means Clustering work - 01:38:00
What is Decision Tree - 02:15:15
How does Decision Tree work - 02:25:15
Random Forest Tutorial - 02:39:56
Why Random Forest - 02:41:52
What is Random Forest - 02:43:21
How does Decision Tree work- 02:52:02
K-Nearest Neighbors Algorithm Tutorial - 03:22:02
Why KNN - 03:24:11
What is KNN - 03:24:24
How do we choose 'K' - 03:25:38
When do we use KNN - 03:27:37
Applications of Support Vector Machine - 03:48:31
Why Support Vector Machine - 03:48:55
What Support Vector Machine - 03:50:34
Advantages of Support Vector Machine - 03:54:54
What is Naive Bayes - 04:13:06
Where is Naive Bayes used - 04:17:45
Top 10 Application of Machine Learning - 04:54:48
How to become a Machine Learning Engineer - 04:59:46
Machine Learning Interview Questions - 05:09:03
This Machine Learning tutorial for beginners will enable you to learn Machine Learning algorithms with python examples. Become a pro in Machine Learning.
Mastering the Machine Learning Course would easily develop one's career. This is the reason why studying Machine Learning Tutorial becomes so important in the career of a particular student.
Making a part of the machine learning course would enact and studying the Machine Learning Tutorial would make one carve out a new niche.
Machine learning is changing the dimensions of business in many industries. A report projects that the value added by machine learning systems shall reach up to $3.9 Trillion by 2022.Machine lear...
Machine learning is proving it's worth in many industries like manufacturing, financial services, healthcare, and retail, to name a few. We hope that we have dispelled some of the myths associated with Machine Learning. It wouldn't be MLan incorrect to say that we have both overestimated and underestimated the potential of Machine learning systems.