【Face Detection】Face bounding box stabilization algorithm based on mtcnn

Introduction#

The face boxes obtained from face detection algorithms based on deep convolutional neural networks generally have significant jitter, which can have a significant impact on certain applications. This article shares some methods I have used to stabilize face boxes.

Main Content#

Currently, MTCNN is widely used for face detection on mobile devices, and I have also used this network structure. In fact, even if other methods such as SSD or YOLO are used for face detection, it is only necessary to pass the face boxes through the Onet outputting the five facial landmarks of MTCNN.

The specific idea is very simple. The Onet outputs five facial landmarks, and the coordinates of the nose are taken to calculate the average over N frames. Then, based on the coordinates of the nose, a fixed-size face box is obtained. The IOU is calculated between the face box in the current frame and the face box in the previous frame. If the result is greater than the IOU rate, the coordinates of the face box in the previous frame are used as the coordinates of the face box in the current frame. The coordinates of the face box are then averaged over N frames. In this way, the resulting face box has minimal jitter and will smoothly follow the movement of the face.

To explain, IOU can effectively suppress small jitter of the box, but when the face moves, IOU can cause the box to move in large steps. Therefore, an average over N frames is added later to make the box move smoothly. The averaging of the nose coordinates over N frames is done to minimize the occurrence of large steps caused by IOU.

This algorithm is implemented using Caffe + GPU + Python, and the speed is average. It takes 100ms per frame even with my 1080Ti.

The combination of C++, ncnn, and CPU is much faster, around 10ms per frame. The CPU used is Intel® Xeon(R) CPU E5-2673 v3.

The C++ + ncnn code and Python + Caffe code can be found at: GitHub - HansRen1024/Face-Tracking-Using-CNN-and-Optical-Flow: C++ implementation for paper: A Real-Time and Long-Term Face Tracking Method using Convolutional Neural Network and Optical Flow

The Python + PyTorch code can be found at: https://github.com/HansRen1024/C-OF