<strong>Originally published by </strong><a href="https://towardsdatascience.com/@ljinstat" target="_blank">Ling Jin</a><strong> </strong><em>at </em><a href="https://towardsdatascience.com/using-deep-learning-for-finger-vein-based-biometric-authentication-3f6601635821" target="_blank">towardsdatascience.com</a>
This finger-vein recognition project was done on the AlgoDeep AI platform. For more details, you can read this article about the whole process of performing this project, from research to deployment. The COO of AlgoDeep Rudy Delouya a is the co-writer of these two articles and the collaborator of the finger-vein recognition project.
Did you marvel at the authentication system when Ethan Hunt used biometrics of the UK’s Prime Minister to unlock the red box in Mission Impossible 5?
Many kinds of biometric authentication systems such as retina scanners, iris scanners, face recognition, fingerprints, voice recognition, and even gait recognitions appeared in action and science fiction films. The research topic about biometric authentication systems has been a hotspot these years. Some of the resolutions have evolved rapidly and have already been applied to real-life security scenes.
A biometric authentication system is a real-time system to verify a person’s identity by measuring particular characteristics or behavior of the person’s body. Biometric devices such as iris scanners collect a person’s biometric data and transform them into digital forms. By using algorithms to matching patterns, a biometric authentication system can accomplish the task of identifying or verifying the person by comparing the data with other registered biometric data in the database. Two main modes of biometric authentication systems, identification or verification, are designed. In the identification mode, the input data is compared with all registered patterns in the database. The system allows to figure out whether this person is from the database. While in the verification mode, the biometric input data is compared to the specific pattern of one person. It aims to figure out whether they are the same person and prevent multiple people from using the same identity.
However, just as we have seen in films and science fictions, some biometric authentication systems can be fooled by fake resources. In Dan Brown’s novel “Angels & Demons ”, the Hassassin cut out Leonardo’s eye to steal the secured antimatter which was behind the door equipped with retina scanners. Although biometric characteristics from retinas are unique for each person, hackers can also find ways to crack the authentication system. Among different biometric features, the security levels also vary. The finger-vein based biometric authentication system that we talk about today is harder to be fooled compared to other methods because it only identifies unique finger-vein patterns beneath the skin of the living person.
Finger-vein capture device. From the paper 
How to capture finger veins 
Special capture machines are used to collect the finger-vein data. This capture device mainly consists of a near-infrared light source, lens, light filter, and image capture equipment. Since finger veins hide beneath the skin, they cannot be seen by visible rays of light. This capture device uses near-infrared light which can pass through human body tissues. Moreover, pigments such as hemoglobin and melanin block the near-infrared light.
From the first famous neural networks LeNet to identify images of 10 handwritten digits, to much more complex neural networks to classify 1000 classes of images in ImageNet, deep neural networks (DNNs), especially convolutional neural networks (CNNs) are well known for their power in computer vision. CNNs usually perform well, even better than conventional computer vision methods because they are very good at extracting features from images automatically.
Finger-vein recognition can be considered as an image classification problem. It must be interesting to use CNNs to handle the finger-vein recognition problem! How should we design experiments to adapt to needs of biometric authentication systems? According to previously used finger-vein recognition methods, many of them conduct feature extractions then calculate the distance between two features. A threshold is fixed according to distributions of feature distances. If the distance value is higher than the threshold, these two features are not categorized as being from the same person. Otherwise, if the distance is lower than the threshold, these two features are treated as being from the same person.
CNN-based finger-vein recognition system. Image from 
Several research institutions provide public finger-vein datasets. The dataset we have used is the finger-vein dataset from the SDUMLA-HMT database. We would like to express our thanks to the MLA Lab of Shandong University for SDUMLA-HMT Database. This dataset registered finger-vein images of 106 people. Three fingers, index, middle and ring fingers, of both hands were captured. Each finger has 6 pictures. Therefore, in total, it consists of 3,816 images. The format of images is “bmp” with 320x240 pixels in size.
Image from 
As we can observe from the image above, a captured image contains not only the finger but also the background which is the capture machine. The purpose of extracting ROI is to save the finger part and remove the background. The upper and lower boundaries have to be found to capture the ROI.
Image from 
We have established our ROI system with the following steps:
From left to right: cropping, masking limits, keeping areas and linear stretching
In general, CNN models which provide satisfying results on large datasets have enormous parameters. If we want to train a complicated CNN model from scratch, it is very time-consuming and resource-consuming. Also usually, we do not have enough data to train from scratch. We should think about overfitting issue too. If the model is too complicated and our datasets are rather small, it gets a great chance to overfit.
Transfer Learning is here to solve this problem. Transfer Learning aims to adapt an existing model (pre-trained models on large amounts of data) to other domains or other kinds of tasks. For example, we can use a model which was pre-trained on a large cat and dog dataset to classify elephants and monkeys or to classify cartoon cats and dogs.
As we can imagine, a direct application of a pre-trained model to other domains or other tasks may not work well because the model does not see information from this domain when it was trained. We usually have two choices. On the one hand, the pre-trained CNN model can be treated as a feature extractor. A linear classifier can be built by using extracted features as input. On the other hand, the fine-tuning method is often carried out to fine-tune some high-level layers. Features in the early layers are more generic. While features in the later layers contain more specific information of original datasets. Freezing early layers can bring us general and useful features for many tasks. And fine-tuning following layers can generate more particular features existing in our datasets.
Image from 
In our case, we tried fine-tuning some later layers in the VGG-16 pre-trained model. It turns out that we had the best result when we trained from the last convolutional layer.
The experiments are mostly inspired by this paper “Convolutional Neural Network-Based Finger-Vein Recognition Using NIR Image Sensors”. It proposes two cases of finger-vein recognition. The first case uses finger-vein images as inputs. The classifier allows classifying different fingers. The other case aims to use different images to classify authentic matching (matching between input and enrolled finger-vein images of the same class) and imposter matching (matching between input and enrolled finger-vein images of different classes). Both cases use a pre-trained CNN model and fine-tune the model with finger-vein datasets. As we have previously introduced, a biometric authentication system has two possible modes, the identification, and the verification. The first case for experiments correspond to the identification mode, and the second one is for the verification mode.
The process of performing experiments is as follows:
The first step is to split the dataset. The SDUMLA-HMT dataset contains 636 classes (3 fingers * 2 hands * 106 people). Each finger is a class, and it consists of 6 images. For the two cases we implemented, we used different splitting strategies.
For the first case, in the beginning, the dataset was split into three datasets, the first three images for the training set, the fourth image for the validation set and the rest for the test set. However, using this method of splitting data resulted in strong overfitting on the test set. By checking the images of wrongly classified classes, we observed that the last two images of some classes are different from the four other images. It means that the distribution of the training set and the validation set does not correspond to the distribution of the test set! Before splitting the dataset, we have checked some classes. However, it happened that we only checked classes with similar images. A lesson to learn: always make sure that the distributions of validation sets and test sets are the same before training.
For the second case, the dataset was split randomly into two halves. Each part now contains 318 people. The second part was then split into two halves, one for the validation set and the other for the test set.
Next, since three images for each class are not enough to train a qualified model, the number of images in the training set was increased using data augmentation method. Data augmentation is a kind of regularization method. When sizes of datasets are not enough to generalize results, data augmentation can be applied to generate more data and reduce the risk of overfitting. Several actions can augment images, such as rotation, translation, flipping, modification of brightness. In our data augmentation method, 12 images were generated for each image in training set by slight translation, rotation, and modification of brightness.
As for the second case, difference images were generated to conduct authentic and imposter matching. From each class of the training set, one image for enrollment was selected. For each of the remaining image, an authentic difference image was calculated by subtracting this image and the enrollment image from the same class, and an imposter difference image was calculated by subtracting this image and one image randomly selected enrollment from a different class. At last, the numbers of authentic difference images and imposter difference images are the same. In the second case, the model introduced a binary classifier, which classifies the authentic and imposter matching.
VGG-16. Image from 
The VGG-16 is a very deep convolutional neural network which has 138 millions of parameters. Training on this deep architecture needs large-scale datasets. However, our training data is much smaller than the ImageNet. So a pre-trained VGG-16 model was fine-tuned on our datasets. Experiments have been tried to train from different layers. If more early layers are trained, the risk of overfitting will go up. However, if only output layers are trained, the performance is not satisfying. At last, the model fine-tuned from the last convolutional layer was selected.
Many experiments were conducted to find out better hyper-parameters and better training strategies. Although the models can have further improvements, the results we got so far are promising.
We have conducted experiments on two kinds of models. The first model is a multi-class classification model for the identification. The other model is a binary classification model for the verification. The inputs of the binary classification model are the difference images. Both models are fine-tuned from the last convolutional layer of a pre-trained VGG-16 model. The results below are from models without ROI preprocessing. We have also done experiments on models with ROI data, but we found out that the results are not within expectation. Since the dataset we used is a low-quality dataset, and the quality of some ROI preprocessed images are not guaranteed, the model cannot well classify the classes.
Results of classifiers
Unlike classification models that are evaluated by accuracy, biometric authentication systems have their metrics to measure the quality of authentication. The three most frequently used metrics are False Rejection Rate (FRR), False Acceptance Rate (FAR) and Equal Error Rate (EER).
For the identification model, the calculation of FAR, FRR and EER is based on Euclidean distances between images. Firstly, we calculated the intra-class and inter-class distances to define a range of thresholds. The spans of distance thresholds are between 83 and 2024. Then, a FAR-FRR plot was drawn to determine EER. The final EER is 4.1%.
Intra-class and Inter-class distance of the identification model
FAR-FRR plot of the identification model
For the verification model, if we use the Euclidean distance directly as the measure of distances of features, it will lead to results which are hard to explain. Inter-class and intra-class distances here are distances of the difference images of the same classes and distances of the difference images of different classes. It is tough to explain the meaning of “the distance of two difference images”. The difference images provide information about the difference between the two images, and the difference can be small or large depending on the original images. Therefore, a Euclidean distance of features of the difference images does not seem to be appropriate here to be a measurement. Moreover, the FAR and FRR are easy to be evaluated in the binary classification problem. FAR is FP (false positive) / #samples and FRR is FN (false negative) / #samples. We used the probability of the prediction as the measurement and then calculated the FAR and FRR. The ERR we got is 1.9%.
As we know that deep learning method is data-driven. The data quality is a key element to achieve successful experiment results. The dataset we used is not with good quality, but the results are still promising. Both final results of the identification model and the verification model beat the results in the paper to which is referred. Unlike other models, deep learning methods are fast to implement and more straightforward to build up without diving into too much complicated feature handling and engineering. However, compared to some results in other papers (not always using deep learning methods), we still have improvements to pursue in terms of the data preprocessing, architectures of models and choices of hyper-parameters.
Most popular Data Science and Machine Learning courses — August 2020. This list was last updated in August 2020 — and will be updated regularly so as to keep it relevant
PyTorch for Deep Learning | Data Science | Machine Learning | Python. PyTorch is a library in Python which provides tools to build deep learning models. What python does for programming PyTorch does for deep learning. Python is a very flexible language for programming and just like python, the PyTorch library provides flexible tools for deep learning.
Data Augmentation is a technique in Deep Learning which helps in adding value to our base dataset by adding the gathered information from various sources to improve the quality of data of an organisation.
In this article, I clarify the various roles of the data scientist, and how data science compares and overlaps with related fields such as machine learning, deep learning, AI, statistics, IoT, operations research, and applied mathematics.
PyTorch is a library in Python which provides tools to build deep learning models. What python does for programming PyTorch does for deep learning.