Detection facility#

Overview#

Object detection facility is responsible for quick and coarse detection tasks, like finding a face in an image.

Detection structure#

The detection structure represents an images-space bounding rectangle of the detected object as well as the detection score.

Detection score is a measure of confidence in the particular object classification result and may be used to pick the most "confident" face of many.

Detection score is the measure of classification confidence and not the source image quality. While the score is related to quality (low-quality data generally results in a lower score), it is not a valid metric to estimate the visual quality of an image.

Special estimators exist to fulfill this task (see section "Image Quality Estimation" in chapter "Parameter estimation facility" for details).

Face Detection#

Object detection is performed by the IDetector object. The function of interest is detect(). It requires an image to detect on and an area of interest (to virtually crop the image and look for faces only in the given location).

Also, face detector implements detectAsync() which allows you to asynchronously detect faces and their parameters on multiple images.

Note: Method detectAsync() is experimental, and it's interface may be changed in the future. Note: Method detectAsync() is not marked as noexcept and may throw an exception.

Image coordinate system#

The origin of the coordinate system for each processed image is located in the upper left corner.

Face detection#

When a face is detected, a rectangular area with the face is defined. The area is represented using coordinates in the image coordinate system.

Redetect method#

Face detector implements redetect() method which is intended for face detection optimization on video frame sequences. Instead of doing full-blown detection on each frame, one may detect() new faces at a lower frequency (say, each 5th frame) and just confirm them in between with redetect(). This dramatically improves performance at the cost of detection recall. Note that redetect() updates face landmarks as well.

Also, face detector implements redetectAsync() which allows you to asynchronously redetect faces on multiple images based on the detection results for the previous frames.

Note: Method redetectAsync() is experimental, and it's interface may be changed in the future. Note: Method redetectAsync() is not marked as noexcept and may throw an exception.

Detector works faster with larger value of minFaceSize.

Orientation Estimation#

Name: OrientationEstimator

Algorithm description:

This estimator aims to detect an orientation of the input image. The next outputs are supported:

The target image is normal oriented ;
The target image is turned to the left by 90 deg;
The target image is flipped upside-down;
The target image is turned to the right by 90 deg.

Implementation description:

The estimator (see IOrientationEstimator in IOrientationEstimator.h):

Implements the estimate() function that accepts source image in R8G8B8 format and returns the estimation result;
Implements the estimate() function that accepts fsdk::Span of the source images in R8G8B8 format and fsdk::Span of the fsdk::OrientationType enums to return results of estimation.

The OrientationType enumeration contains all possible results of the Orientation estimation:

    enum OrientationType : uint32_t {
        OT_NORMAL = 0,      //!< Normal orientation of image
        OT_LEFT = 1,        //!< Image is turned left by 90 deg
        OT_UPSIDE_DOWN = 2, //!< Image is flipped upsidedown
        OT_RIGHT = 3        //!< Image is turned right by 90 deg
    };

API structure name:

IOrientationEstimator

Plan files:

orientation_cpu.plan
orientation_cpu-avx2.plan
orientation_gpu.plan### Detector variants

Supported detector variants:

FaceDetV1
FaceDetV2
FaceDetV3

There are two basic detector families. The first of them includes two detector variants: FaceDetV1 and FaceDetV2. The second family currently includes only one detector variant - FaceDetV3. FaceDetV3 is the latest and most precise detector. For this type of detector can be passed sensor type. In terms of performance FaceDetV3 is similar to FaceDetV1 detector.

User code may specify necessary detector type while creating IDetector object using parameter.

FaceDetV1 and FaceDetV2 performance depends on number of faces on image and image complexity. FaceDetV3 performance depends only on the target image resolution.

FaceDetV3 works faster with batched redetect.

FaceDetV3 supports asynchronous methods for detection and redetection. FaceDetV1 and FaceDetV2 will return not implemented error.

FaceDetV1 and FaceDetV2 Configuration#

FaceDetV1 detector is more precise and FaceDetV2 works two times faster (See appendix A chapter "Appendix A. Specifications").

FaceDetV1 and FaceDetV2 detector's performance depend on number of faces in image. FaceDetV3 doesn't depend on it, so it may be slower then FaceDetV1 on images with one face and much more faster on images with many faces.

FaceDetV3 Configurating#

FaceDetV3 detects faces from minFaceSize to minFaceSize * 32. You can change the minimum size of the faces that will be searched in the photo from the faceengine.conf configuration.

For example:

config->setValue("FaceDetV3::Settings", "minFaceSize", 20);

The logic of the detector is very understandable. The smaller the face size we need to find the more time we need.

We recommend to use such meanings for minFaceSize: 20, 40 and 90. The size 90 pix is recommended for recognition. If you want to find faces with custom size value you will need to point with size with: 95% * value. For example we want to find faces with size of 50 pix, it means that in config we should set: 50 * 0.95 ~ 47 pix.

FaceDetV3 may provide accurate 5 landmarks only for faces with size greater then 40x40, for smaller faces it provides less accurate landmarks.

If you have few faces on target images and target face sizes after resize will less then 40x40, it's recommended to require 68 landmarks.

If you have many faces on target image (greater then 7) it will be faster increase minFaceSize to have big enough faces for accurate landmarks estimation.

All last changes in Face Detection logic are described in chapter "Migration guide".

Face Alignment#

Five landmarks#

Face alignment is the process of special key points (called "landmarks") detection on a face. FaceEngine does landmark detection at the same time as the face detection since some of the landmarks are by-products of that detection.

At the very minimum, just 5 landmarks are required: two for eyes, one for a nose tip and two for mouth corners. Using these coordinates, one may warp the source photo image (see Chapter "Image warping") for use with all other FaceEngine algorithms.

All detector may provide 5 landmarks for each detection without additional computations.

Typical use cases for 5 landmarks:

Image warping for use with other algorithms:
Quality and attribute estimators;
Descriptor extraction.

Sixty-eight landmarks#

More advanced 68-points face alignment is also implemented. Use this when you need precise information about face and its parts. The detected points look like in the image below.

The 68 landmarks require additional computation time, so don't use it if you don't need precise information about a face. If you use 68 landmarks , 5 landmarks will be reassigned to more precise subset of 68 landmarks.

The typical error for landmark estimation on a warped image (see Chapter "Image warping") is in the table below.

"Average point estimation error per landmark"

Point	Error (pixels)	Point	Error (pixels)	Point	Error (pixels)	Point	Error (pixels)
1	±3,88	18	±3,77	35	±1,62	52	±1,65
2	±3,53	19	±2,83	36	±1,90	53	±2,01
3	±3,88	20	±2,70	37	±1,78	54	±2,00
4	±4,30	21	±3,06	38	±1,69	55	±1,93
5	±4,67	22	±3,92	39	±1,63	56	±2,18
6	±4,87	23	±3,46	40	±1,52	57	±2,17
7	±4,67	24	±2,59	41	±1,54	58	±1,99
8	±4,01	25	±2,53	42	±1,60	59	±2,32
9	±3,46	26	±2,95	43	±1,55	60	±2,33
10	±3,87	27	±3,84	44	±1,60	61	±2,06
11	±4,56	28	±1,88	45	±1,74	62	±1,97
12	±4,94	29	±1,75	46	±1,72	63	±1,56
13	±4,55	30	±1,92	47	±1,68	64	±1,86
14	±4,45	31	±2,20	48	±1,65	65	±1,94
15	±4,13	32	±1,97	49	±1,99	66	±2,00
16	±3,68	33	±1,70	50	±1,99	67	±1,70
17	±4,09	34	±1,73	51	±1,95	68	±2,12

Simple 5-point landmarks roughly correspond to:

Average of positions 37, 40 for a left eye;
Average of positions 43, 46 for a right eye;
Number 31 for a nose tip;
Numbers 49 and 55 for mouth corners.

The landmarks for both cases are output by the face detector via Landmarks5 and Landmarks68 structures. Note, that performance-wise 5-point alignment result comes free with a face detection, whereas 68-point result does not. So you should generally request the lowest number of points for your task.

Typical use cases for 68 landmarks:

Segmentation;
Head pose estimation.

Human Detection#

This functionality enables you to detect human bodies in the image.

During the detection process we receive special points (called “landmarks” or exactly "HumanLandmarks17") for the body parts visible in the image. These landmarks represent the keypoints of a human body (see the Human keypoints section).

Human body detection is performed by the IHumanDetector object. The function of interest is detect(). It requires an image to detect on.

Image coordinate system#

The origin of the coordinate system for each processed image is located in the upper left corner.

Human body detection#

When a human body is detected, a rectangular area with the body is defined. The area is represented using coordinates in the image coordinate system.

Constraints#

Human body detection has the following constraints:

Human body detector works correctly only with adult humans in an image;
The detector may detect a body of size from 100 px to 640 px (in an image with a long side of 640 px). You may change the input image size in the config (see ./doc/ConfigurationGuide.pdf). The image will be resized to specified size by the larger side while maintaining the aspect ratio.

Camera position requirements#

In general, you should locate the camera for human detection according to the image below.

Follow these recommendations to correctly detect human body and keypoints:

A person's body should face the camera;
Keep angle of view as close to horizontal as possible;
There should be about 60% of the person's body in the frame (upper body);
There must not be any objects that overlap the person's body in the frame;
The camera should be located at about 165 cm from the floor, which corresponds to the average height of a human.

The examples of wrong camera positions are shown in the image below.

Human body redetection#

Like any other detector in Face Engine SDK, human detector also implements redetection model. The user can make full detection only in a first frame and then redetect the same human in the next "n" frames thereby boosting performance of the whole image processing loop.

User can use redetectOne() method if only a single human detection is required, for more complex use cases one should use redetect() which can redetect humans from multiple images.

Detector give an opportunity to detect human body keypoints in an image.

\newpage

Human Keypoints#

The image below shows the keypoints detected for a human body.

Point	Body Part	Point	Body Part
0	Nose	9	Left Wrist
1	Left Eye	10	Right Wrist
2	Right Eye	11	Left Hip
3	Left Ear	12	Right Hip
4	Right Ear	13	Left Knee
5	Left Shoulder	14	Right Knee
6	Right Shoulder	15	Left Ankle
7	Left Elbow	16	Right Ankle
8	Right Elbow

Cases that increase the probability of error:

Non-standard poses (head below the shoulders, vertical twine, lying head to the camera, etc.);
Camera position from above at a large angle;
Sometimes estimator predicts invisible points with high score, especially for points of elbows, wrists, ears.

Detection#

To detect Human Keypoints call detect() using fsdk::HumanDetectionType::DCT_BOX | fsdk::HumanDetectionType::DCT_POINTS argument.

Default is fsdk::HumanDetectionType::DCT_BOX.

Main Results of Each Detection#

The main result of each detection is an array. Each array element consists of a point (fsdk:: Point2f) and a score. If the score value is less than the threshold, then the value of “x” and “y” coordinates will be equal to 0.

see ConfigurationGuide.pdf ("HumanDetector settings" section) for more information about thresholds and configuration parameters.