Our paper — FR: Too bias, or Not Too Bias? — published as part of The Workshop of Fair, Data-Efficient, and Trusted Computer Vision held in conjunction with the 2020 Conference on Computer Vision and Pattern Recognition (CVPR).

The purpose of this brief tutorial is to provide the reader with the following:

High-level understanding of the paper — answering So what? Who cares?
Spicing up the tutorial by sharing my opinion in subjective matters — something which, of course, was not included in our published paper.
Provide a one-stop resource. As part of this effort, we provide not only the paper but the data in an easy-to-use structure and source code (mostly Python). As part of the code-base, there are several notebooks exemplifying aspects of this work (i.e., a notebook per every figure in the paper provides a means of quickly reproducing).

Several components of (3) are listed on the list of _To-Do _for blogging. In other words, over the course of the next several weeks, expect tutorials to be shared in great detail (i.e., often more detail than in the actual paper), and mostly as independent blogs. As complete, links for each will be listed here.

Paper Summary

In the paper, we propose a solution for a specific to problems of bias in FR (i.e., face verification, the one-to-one setting). However, due to a lack of labeled data, we built and released the Balanced Faces in the Wild (BFW) dataset — a benchmark for face verification balanced in identity, ethnicity, and gender. We explored BFW to reveal evidence of bias and gain insights by applying concepts of fundamental signal detection theory. Finally, we benchmarked the data, after establishing there was indeed a skew in performance for minority demographics (e.g., Asian Females and Indian Males), and a clear percent difference for the majority (e.g., White Males and White Females). To mitigate the imbalance performances, we propose a sliding threshold that proves capable of removing skewness entirely (i.e., from several percent difference from the intended FPR down to 0%). We also include an experiment that shows the same type of bias in that of humans (i.e., we evaluated humans and analyzed per demographic, showing a favored performance when perceiving differences within one’s own subgroups). A key result in our paper is listed in the following figure.

Image for post

Percent difference from the intended FPR. Top: global threshold (t_g) yields FPR that span as large as 2x (i.e., 200%) that which was intended (i.e., WM for 1e-4). Furthermore, female F subgroups always have inferior performance ratings to their male M counterparts. With the exception of IM in the 1e-4 column, which drops to the lowest negative % difference. Bottom: Subgroup-specific thresholds reduce this difference to nearly zero, where there are small differences, the % difference across different subgroups is now balanced and, thus, fair.

The top of the figure show the percent difference in intended FPR to the actual on a per-subgroup basis. To avoid diving too deep into any one aspect of the paper (i.e., subject to future tutorials), let’s move onward. In any case, if of the reader’s interest, it is encouraged to take a closer look at the paper, check out the demo code for this figure, and or just raise questions or thoughts in the comment section below!

We hope BFW establishes a stronghold for the current state of FR performances on the different subgroups split (and balanced) for assessment. In other words, we need the proxy to measure the bias, which we must be especially strict when constructing, provided the potential for bias a variety of sources.

#fairness #data #benchmark #data analysis

Table of Contents

Paper Summary

towardsdatascience.com

Facial Recognition: Too Biased, or Not Too Biased?