Royce  Reinger

Royce Reinger


Siamese & Triplet Networks with online Pair/triplet Mining in Pytorch

Siamese and triplet learning with online pair/triplet mining

PyTorch implementation of siamese and triplet networks for learning embeddings.

Siamese and triplet networks are useful to learn mappings from image to a compact Euclidean space where distances correspond to a measure of similarity [2]. Embeddings trained in such way can be used as features vectors for classification or few-shot learning tasks.


Requires pytorch 0.4 with torchvision 0.2.1

For pytorch 0.3 compatibility checkout tag torch-0.3.1

Code structure

    • SiameseMNIST class - wrapper for a MNIST-like dataset, returning random positive and negative pairs
    • TripletMNIST class - wrapper for a MNIST-like dataset, returning random triplets (anchor, positive and negative)
    • BalancedBatchSampler class - BatchSampler for data loader, randomly chooses n_classes and n_samples from each class based on labels
    • EmbeddingNet - base network for encoding images into embedding vector
    • ClassificationNet - wrapper for an embedding network, adds a fully connected layer and log softmax for classification
    • SiameseNet - wrapper for an embedding network, processes pairs of inputs
    • TripletNet - wrapper for an embedding network, processes triplets of inputs
    • ContrastiveLoss - contrastive loss for pairs of embeddings and pair target (same/different)
    • TripletLoss - triplet loss for triplets of embeddings
    • OnlineContrastiveLoss - contrastive loss for a mini-batch of embeddings. Uses a PairSelector object to find positive and negative pairs within a mini-batch using ground truth class labels and computes contrastive loss for these pairs
    • OnlineTripletLoss - triplet loss for a mini-batch of embeddings. Uses a TripletSelector object to find triplets within a mini-batch using ground truth class labels and computes triplet loss
    • fit - unified function for training a network with different number of inputs and different types of loss functions
    • Sample metrics that can be used with fit function from
    • PairSelector - abstract class defining objects generating pairs based on embeddings and ground truth class labels. Can be used with OnlineContrastiveLoss.
      • AllPositivePairSelector, HardNegativePairSelector - PairSelector implementations
    • TripletSelector - abstract class defining objects generating triplets based on embeddings and ground truth class labels. Can be used with OnlineTripletLoss.
      • AllTripletSelector, HardestNegativeTripletSelector, RandomNegativeTripletSelector, SemihardNegativeTripletSelector - TripletSelector implementations

Examples - MNIST

We'll train embeddings on MNIST dataset. Experiments were run in jupyter notebook.

We'll go through learning supervised feature embeddings using different loss functions on MNIST dataset. This is just for visualization purposes, thus we'll be using 2-dimensional embeddings which isn't the best choice in practice.

For every experiment the same embedding network is used (32 conv 5x5 -> PReLU -> MaxPool 2x2 -> 64 conv 5x5 -> PReLU -> MaxPool 2x2 -> Dense 256 -> PReLU -> Dense 256 -> PReLU -> Dense 2) and we don't perform any hyperparameter search.

Baseline - classification with softmax

We add a fully-connected layer with the number of classes and train the network for classification with softmax and cross-entropy. The network trains to ~99% accuracy. We extract 2 dimensional embeddings from penultimate layer:

Train set:

Test set:

While the embeddings look separable (which is what we trained them for), they don't have good metric properties. They might not be the best choice as a descriptor for new classes.

Siamese network

Now we'll train a siamese network that takes a pair of images and trains the embeddings so that the distance between them is minimized if they're from the same class and is greater than some margin value if they represent different classes. We'll minimize a contrastive loss function [1]:

SiameseMNIST class samples random positive and negative pairs that are then fed to Siamese Network.

After 20 epochs of training here are the embeddings we get for training set:

Test set:

The learned embeddings are clustered much better within class.

Triplet network

We'll train a triplet network, that takes an anchor, a positive (of same class as an anchor) and negative (of different class than an anchor) examples. The objective is to learn embeddings such that the anchor is closer to the positive example than it is to the negative example by some margin value.

alt text Source: Schroff, Florian, Dmitry Kalenichenko, and James Philbin. Facenet: A unified embedding for face recognition and clustering. CVPR 2015.

Triplet loss:

TripletMNIST class samples a positive and negative example for every possible anchor.

After 20 epochs of training here are the embeddings we get for training set:

Test set:

The learned embeddings are not as close to each other within class as in case of siamese network, but that's not what we optimized them for. We wanted the embeddings to be closer to other embeddings from the same class than from the other classes and we can see that's where the training is going to.

Online pair/triplet selection - negative mining

There are couple of problems with siamese and triplet networks:

  1. The number of possible pairs/triplets grows quadratically/cubically with the number of examples. It's infeasible to process them all and the training converges slowly.
  2. We generate pairs/triplets randomly. As the training continues, more and more pairs/triplets are easy to deal with (their loss value is very small or even 0), preventing the network from training. We need to provide the network with hard examples.
  3. Each image that is fed to the network is used only for computation of contrastive/triplet loss for only one pair/triplet. The computation is somewhat wasted; once the embedding is computed, it could be reused for many pairs/triplets.

To deal with these issues efficiently, we'll feed a network with standard mini-batches as we did for classification. The loss function will be responsible for selection of hard pairs and triplets within mini-batch. If we feed the network with 16 images per 10 classes, we can process up to 159*160/2 = 12720 pairs and 10*16*15/2*(9*16) = 172800 triplets, compared to 80 pairs and 53 triplets in previous implementation.

Usually it's not the best idea to process all possible pairs or triplets within a mini-batch. We can find some strategies on how to select triplets in [2] and [3].

Online pair selection

We'll feed a network with mini-batches, as we did for classification network. This time we'll use a special BatchSampler that will sample n_classes and n_samples within each class, resulting in mini batches of size n_classes*n_samples.

For each mini batch positive and negative pairs will be selected using provided labels.

MNIST is a rather easy dataset and the embeddings from the randomly selected pairs were quite good already, we don't see much improvement here.

Train embeddings:

Test embeddings:

Online triplet selection

We'll feed a network with mini-batches just like with online pair selection. There are couple of strategies we can use for triplet selection given labels and predicted embeddings:

  • All possible triplets (might be too many)
  • Hardest negative for each positive pair (will result in the same negative for each anchor)
  • Random hard negative for each positive pair (consider only triplets with positive triplet loss value)
  • Semi-hard negative for each positive pair (similar to [2])

The strategy for triplet selection must be chosen carefully. A bad strategy might lead to inefficient training or, even worse, to model collapsing (all embeddings ending up having the same values).

Here's what we got with random hard negatives for each positive pair.

Training set:

Test set:


Similar experiments were conducted for FashionMNIST dataset where advantages of online negative mining are slightly more visible. The exact same network architecture with only 2-dimensional embeddings was used, which is probably not complex enough for learning good embeddings. More complex datasets with higher number classses should benefit even more from online mining.

Baseline - classification

Siamese vs online contrastive loss with negative mining

Siamese network with randomly selected pairs

Online contrastive loss with negative mining

Triplet vs online triplet loss with negative mining

Triplet network with random triplets

Online triplet loss with negative mining


  •  Optimize triplet selection
  •  Evaluate with a metric that is comparable between approaches
  •  Evaluate in one-shot setting when classes from test set are not in train set
  •  Show online triplet selection example on more difficult datasets


[1] Raia Hadsell, Sumit Chopra, Yann LeCun, Dimensionality reduction by learning an invariant mapping, CVPR 2006

[2] Schroff, Florian, Dmitry Kalenichenko, and James Philbin. Facenet: A unified embedding for face recognition and clustering. CVPR 2015

[3] Alexander Hermans, Lucas Beyer, Bastian Leibe, In Defense of the Triplet Loss for Person Re-Identification, 2017

[4] Brandon Amos, Bartosz Ludwiczuk, Mahadev Satyanarayanan, OpenFace: A general-purpose face recognition library with mobile applications, 2016

[5] Yi Sun, Xiaogang Wang, Xiaoou Tang, Deep Learning Face Representation by Joint Identification-Verification, NIPS 2014

Download Details:

Author: Adambielski
Source Code: 
License: BSD-3-Clause license

#machinelearning #deeplearning #pytorch #embeddings 

Siamese & Triplet Networks with online Pair/triplet Mining in Pytorch
Royce  Reinger

Royce Reinger


Face Recognition & Facial Attribute Analysis Library for Python


Deepface is a lightweight face recognition and facial attribute analysis (age, gender, emotion and race) framework for python. It is a hybrid face recognition framework wrapping state-of-the-art models: VGG-Face, Google FaceNet, OpenFace, Facebook DeepFace, DeepID, ArcFace, Dlib and SFace.

Experiments show that human beings have 97.53% accuracy on facial recognition tasks whereas those models already reached and passed that accuracy level.


The easiest way to install deepface is to download it from PyPI. It's going to install the library itself and its prerequisites as well.

$ pip install deepface

DeepFace is also available at Conda. You can alternatively install the package via conda.

$ conda install -c conda-forge deepface

Then you will be able to import the library and use its functionalities.

from deepface import DeepFace

Facial Recognition - Demo

A modern face recognition pipeline consists of 5 common stages: detect, align, normalize, represent and verify. While Deepface handles all these common stages in the background, you don’t need to acquire in-depth knowledge about all the processes behind it. You can just call its verification, find or analysis function with a single line of code.

Face Verification - Demo

This function verifies face pairs as same person or different persons. It expects exact image paths as inputs. Passing numpy or base64 encoded images is also welcome. Then, it is going to return a dictionary and you should check just its verified key.

result = DeepFace.verify(img1_path = "img1.jpg", img2_path = "img2.jpg")

Face recognition - Demo

Face recognition requires applying face verification many times. Herein, deepface has an out-of-the-box find function to handle this action. It's going to look for the identity of input image in the database path and it will return pandas data frame as output.

df = DeepFace.find(img_path = "img1.jpg", db_path = "C:/workspace/my_db")


Face recognition models basically represent facial images as multi-dimensional vectors. Sometimes, you need those embedding vectors directly. DeepFace comes with a dedicated representation function.

embedding = DeepFace.represent(img_path = "img.jpg")

This function returns an array as output. The size of the output array would be different based on the model name. For instance, VGG-Face is the default model for deepface and it represents facial images as 2622 dimensional vectors.

assert isinstance(embedding, list)
assert model_name = "VGG-Face" and len(embedding) == 2622

Here, embedding is also plotted with 2622 slots horizontally. Each slot is corresponding to a dimension value in the embedding vector and dimension value is explained in the colorbar on the right. Similar to 2D barcodes, vertical dimension stores no information in the illustration.

Face recognition models - Demo

Deepface is a hybrid face recognition package. It currently wraps many state-of-the-art face recognition models: VGG-Face , Google FaceNet, OpenFace, Facebook DeepFace, DeepID, ArcFace, Dlib and SFace. The default configuration uses VGG-Face model.

models = [

#face verification
result = DeepFace.verify(img1_path = "img1.jpg", 
      img2_path = "img2.jpg", 
      model_name = models[1]

#face recognition
df = DeepFace.find(img_path = "img1.jpg",
      db_path = "C:/workspace/my_db", 
      model_name = models[1]

embedding = DeepFace.represent(img_path = "img.jpg", 
      model_name = models[1]

FaceNet, VGG-Face, ArcFace and Dlib are overperforming ones based on experiments. You can find out the scores of those models below on both Labeled Faces in the Wild and YouTube Faces in the Wild data sets declared by its creators.

ModelLFW ScoreYTF Score
Dlib99.38 %-


Face recognition models are regular convolutional neural networks and they are responsible to represent faces as vectors. We expect that a face pair of same person should be more similar than a face pair of different persons.

Similarity could be calculated by different metrics such as Cosine Similarity, Euclidean Distance and L2 form. The default configuration uses cosine similarity.

metrics = ["cosine", "euclidean", "euclidean_l2"]

#face verification
result = DeepFace.verify(img1_path = "img1.jpg", 
          img2_path = "img2.jpg", 
          distance_metric = metrics[1]

#face recognition
df = DeepFace.find(img_path = "img1.jpg", 
          db_path = "C:/workspace/my_db", 
          distance_metric = metrics[1]

Euclidean L2 form seems to be more stable than cosine and regular Euclidean distance based on experiments.

Facial Attribute Analysis - Demo

Deepface also comes with a strong facial attribute analysis module including age, gender, facial expression (including angry, fear, neutral, sad, disgust, happy and surprise) and race (including asian, white, middle eastern, indian, latino and black) predictions.

obj = DeepFace.analyze(img_path = "img4.jpg", 
        actions = ['age', 'gender', 'race', 'emotion']

Age model got ± 4.65 MAE; gender model got 97.44% accuracy, 96.29% precision and 95.05% recall as mentioned in its tutorial.

Face Detectors - Demo

Face detection and alignment are important early stages of a modern face recognition pipeline. Experiments show that just alignment increases the face recognition accuracy almost 1%. OpenCV, SSD, Dlib, MTCNN, RetinaFace and MediaPipe detectors are wrapped in deepface.

All deepface functions accept an optional detector backend input argument. You can switch among those detectors with this argument. OpenCV is the default detector.

backends = [

#face verification
obj = DeepFace.verify(img1_path = "img1.jpg", 
        img2_path = "img2.jpg", 
        detector_backend = backends[4]

#face recognition
df = DeepFace.find(img_path = "img.jpg", 
        db_path = "my_db", 
        detector_backend = backends[4]

embedding = DeepFace.represent(img_path = "img.jpg", 
        detector_backend = backends[4]

#facial analysis
demography = DeepFace.analyze(img_path = "img4.jpg", 
        detector_backend = backends[4]

#face detection and alignment
face = DeepFace.detectFace(img_path = "img.jpg", 
        target_size = (224, 224), 
        detector_backend = backends[4]

Face recognition models are actually CNN models and they expect standard sized inputs. So, resizing is required before representation. To avoid deformation, deepface adds black padding pixels according to the target size argument after detection and alignment.

RetinaFace and MTCNN seem to overperform in detection and alignment stages but they are much slower. If the speed of your pipeline is more important, then you should use opencv or ssd. On the other hand, if you consider the accuracy, then you should use retinaface or mtcnn.

The performance of RetinaFace is very satisfactory even in the crowd as seen in the following illustration. Besides, it comes with an incredible facial landmark detection performance. Highlighted red points show some facial landmarks such as eyes, nose and mouth. That's why, alignment score of RetinaFace is high as well.

You can find out more about RetinaFace on this repo.

Real Time Analysis - Demo

You can run deepface for real time videos as well. Stream function will access your webcam and apply both face recognition and facial attribute analysis. The function starts to analyze a frame if it can focus a face sequentially 5 frames. Then, it shows results 5 seconds. = "C:/User/Sefik/Desktop/database")

Even though face recognition is based on one-shot learning, you can use multiple face pictures of a person as well. You should rearrange your directory structure as illustrated below.

├── database
│   ├── Alice
│   │   ├── Alice1.jpg
│   │   ├── Alice2.jpg
│   ├── Bob
│   │   ├── Bob.jpg

API - Demo

Deepface serves an API as well. You can clone /api/ and pass it to python command as an argument. This will get a rest service up. In this way, you can call deepface from an external system such as mobile app or web.


Face recognition, facial attribute analysis and vector representation functions are covered in the API. You are expected to call these functions as http post methods. Service endpoints will be for face recognition, for facial attribute analysis, and for vector representation. You should pass input images as base64 encoded string in this case. Here, you can find a postman project.

Command Line Interface

DeepFace comes with a command line interface as well. You are able to access its functions in command line as shown below. The command deepface expects the function name as 1st argument and function arguments thereafter.

#face verification
$ deepface verify -img1_path tests/dataset/img1.jpg -img2_path tests/dataset/img2.jpg

#facial analysis
$ deepface analyze -img_path tests/dataset/img1.jpg

Tech Stack - Vlog, Tutorial

Face recognition models represent facial images as vector embeddings. The idea behind facial recognition is that vectors should be more similar for same person than different persons. The question is that where and how to store facial embeddings in a large scale system. Tech stack is vast to store vector embeddings. To determine the right tool, you should consider your task such as face verification or face recognition, priority such as speed or confidence, and also data size.


Pull requests are welcome! You should run the unit tests locally by running test/ Once a PR sent, GitHub test workflow will be run automatically and unit test results will be available in GitHub actions before approval.


There are many ways to support a project - starring⭐️ the GitHub repo is just one 🙏

You can also support this work on Patreon



Please cite deepface in your publications if it helps your research. Here are its BibTex entries:

If you use deepface for facial recogntion purposes, please cite the this publication.

  title        = {LightFace: A Hybrid Deep Face Recognition Framework},
  author       = {Serengil, Sefik Ilkin and Ozpinar, Alper},
  booktitle    = {2020 Innovations in Intelligent Systems and Applications Conference (ASYU)},
  pages        = {23-27},
  year         = {2020},
  doi          = {10.1109/ASYU50717.2020.9259802},
  url          = {},
  organization = {IEEE}

If you use deepface for facial attribute analysis purposes such as age, gender, emotion or ethnicity prediction, please cite the this publication.

  title        = {HyperExtended LightFace: A Facial Attribute Analysis Framework},
  author       = {Serengil, Sefik Ilkin and Ozpinar, Alper},
  booktitle    = {2021 International Conference on Engineering and Emerging Technologies (ICEET)},
  pages        = {1-4},
  year         = {2021},
  doi          = {10.1109/ICEET53442.2021.9659697},
  url          = {},
  organization = {IEEE}

Also, if you use deepface in your GitHub projects, please add deepface in the requirements.txt.

Download Details:

Author: Serengil
Source Code: 
License: MIT license

#machinelearning #python #deeplearning 

Face Recognition & Facial Attribute Analysis Library for Python