Posted 2025-01-11CV

Face Recognition

How to solve face recognition problem with one-shot learning, which uses Siamese Network and Triplet loss function.
Face verification problem with binary classification, introduce logistic unit and chi-square norm.

Face Verification

1:1 problem
Input: image, name/ID of a certain person
Output: if input image is that specific person

Face Recognition

1:K problem
Input: image
Output: if input image is one of the K persons / not recognized

Note: face verification model can be used to face recognition problem unless it has a very high accuracy.

if the former has 1% chance to make a mistake, then it will be K% chance while in recognition task.

One-Shot Learning

Learning from 1 example to recognize the person again. For normal CNN, we don’t have that much image data. And every time add a new person to system, we don’t want to train the network again.

Learn a similarity function ‘d’

d(img1, img2) = difference between img1 and img2

d(img1, img2) ≤ τ: img1 and img2 are $same$ person
d(img1, img2) > τ: img1 and img2 are $different$ person

Siamese Network

Instead of using the ‘softmax’ output layer as a classifier learned before, we use a ‘fully-connected’ layer deeper in the network. The vector of this FC layer is called ‘encoding of input image1’, which is $f(image1)$. The difference between img1 and img2 is presented by ‘Euclidean Distance’, which is
$
d(\mathbf{f(img1)}, \mathbf{f(img2)}) = \sqrt{\sum_{i=1}^n (f(img1)_i - f(img2)_i)^2}
$

Goal of Learning Siamese Network

Params of network represent the encoding of $ f(img_i) $
Learn params so that:
- If $img_i$, $img_j$ are ‘same’ person, $d(img_i, img_j)$ is small ;
- If $img_i$, $img_j$ are ‘different’ person, $d(img_i, img_j)$ is large.

Triplet Loss Function

Triplet: should always watch 3 images: Anchor, Positive (same person with anchor) and Negative (different person with anchor)

Target: difference between d(A, P) and d(A, N) is greater or equal to a margin (-α), which is:

$d(A, P) + α - d(A, N) ≤ 0$

Loss Function: $L(A, P, N) = max(d(A, P) + α - d(A, N), 0)$

Cost Function: $J = \sum_{i=1}^n (L(A, P, N))$

Paper

FaceNet: A unified embedding for face recognition and clustering.

Face Verification and Binary Classification

Previous: Use ‘Triplet’ loss function to train the params in ConvNet

Binary Classification:

Take two images ($ x^i, x^j $) from new input and database, separately into the Siamese Network
Embed their encodings ($ f(x^i), f(x^j) $) into a logistic regression unit
Make a prediction. (1: same person, 0: different person)

Procedure of Logistic Unit

$
\begin{aligned}
\hat{y} = \text{sigmoid}\left(\sum_{k=1}^n w_k \cdot \left|{f(x^i)}_k - {f(x^j)}_k\right| + b\right),\ n = num\ of\ features\ in\ encoding
\end{aligned}
$

where
$
\left|{f(x^i)}_k - {f(x^j)}_k\right| = \frac{ \left({f(x^i)}_k - {f(x^j)}_k \right)^2 }{ {f(x^i)}_k + {f(x^j)}_k }
$, which is called ‘chi square norm’

Pre-Compute the encodings for Database Image

Pre-compute the encodings for all database images, when a new image comes,

we can just compute its encoding and embed it with the pre-computed encodings to make a prediction

Not only used in binary face verification, but also in triplet loss function.

Paper

DeepFace closing the gap to human level performance.

#Computer Vision

Buy me a coffee

Face Recognition

Face Verification

Face Recognition

One-Shot Learning

Learn a similarity function ‘d’

Siamese Network

Goal of Learning Siamese Network

Triplet Loss Function

Paper

Face Verification and Binary Classification

Procedure of Logistic Unit

Pre-Compute the encodings for Database Image

Paper

Like this article? Support the author with

Comments

Tags

Archives

Links

Categories

Recents

follow.it