Detection facility#

Overview#

Object detection facility is responsible for quick and coarse detection tasks, like finding a face in an image.

Detection structure#

The detection structure represents an images-space bounding rectangle of the detected object as well as the detection score.

Detection score is a measure of confidence in the particular object classification result and may be used to pick the most "confident" face of many.

Note: Detection score is the measure of classification confidence and not the source image quality. While the score is related to quality (low-quality data generally results in a lower score), it is not a valid metric to estimate the visual quality of an image. Special estimators exist to fulfill this task (see section "Image quality estimation" in chapter "Parameter estimation facility" for details).

Face Detection#

Object detection is performed by the IDetector object. The function of interest is detect(). It requires an image to detect on and an area of interest (to virtually crop the image and look for faces only in the given location).

Image coordinate system#

The origin of the coordinate system for each processed image is located in the upper left corner.

Face detection#

When a face is detected, a rectangular area with the face is defined. The area is represented using coordinates in the image coordinate system.

When a part of a face is outside of the frame, the detection area will also be beyond the frame borders. Hence coordinates of the detection area may have the following values:

When the face is beyond the left or the upper border of the frame, the detection coordinates will have negative values;

In the image below, the upper left detection point is outside of the frame. Hence the X and Y coordinates of the upper left detection point have negative values.

When the face is beyond the right or the lower border of the frame, the detection coordinates will have positive values, but their values will exceed the image size.

In the image below, the X coordinate is equal to X + n, where n is the length of the zone that exceeds the image frame size.

Lower right detection point is outside of the frame

NOTE! You must consider this feature when processing images to properly process the received coordinates.

A code example for detection cropping is given below.

const fsdk::Rect brect = detection.rect & image.getRect();

detection - face detection. image - source image.

Redetect method#

Face detector implements redetect() method which is intended for face detection optimization on video frame sequences. Instead of doing full-blown detection on each frame, one may detect() new faces at a lower frequency (say, each 5th frame) and just confirm them in between with redetect(). This dramatically improves performance at the cost of detection recall. Note that redetect() updates face landmarks as well.

Detector works faster with larger value of minFaceSize.

Detector variants#

Supported detector variants:

FaceDetV1
FaceDetV2
FaceDetV3

There are two basic detector families. The first of them includes two detector variants: FaceDetV1 and FaceDetV2. The second family currently includes only one detector variant - FaceDetV3. FaceDetV3 is the latest and most precise detector. In terms of performance it is similar to FaceDetV1 detector. User code may specify necessary detector type while creating IDetector object using parameter.

FaceDetV3 supports orientation mode and can estimate orientation of whole image (Normal, Right90deg, Left90deg or Upsidesown). Config option useOrientation should be 1 (see Configuration guide). You can estimate orientation of image by calling method estimateOrientation of Detector. Or orientation will be automatically estimated while regular detection call if useOrientation was turned of. Detector estimate orientation in the beginning, then flip image if it necessary and detect on the correct oriented image. Note: Correct oriented image will be stored in Face.img field (use this field in future processing ). Detection and landmarks coordinates are given in correct oriented image coordinates.

Note: FaceDetV1 and FaceDetV2 performance depends on number of faces on image and image complexity. FaceDetV3 performance depends only on the target image resolution.

Note: FaceDetV3 works faster with batched redetect.

FaceDetV1 and FaceDetV2 Configuration#

FaceDetV1 detector is more precise and FaceDetV2 works two times faster (See "Appendix A. Specifications").

FaceDetV1 and FaceDetV2 detector's performance depend on number of faces in image. FaceDetV3 doesn't depend on it, so it may be slower then FaceDetV1 on images with one face and much more faster on images with many faces.

FaceDetV3 Configurating#

FaceDetV3 detects faces from minFaceSize tomaxFaceSize (Note: maxFaceSize <=minFaceSize * 32). You can change the minimum and maximum sizes of the faces that will be searched in the photo from the faceengine.conf configuration.

For example:

config->setValue("FaceDetV3::Settings", "minFaceSize", 20);

The logic of the detector is very understandable. The smaller the face size we need to find the more time we need.

We recommend to use such meanings for minFaceSize: 20, 40 and 90. The size 90 pix is recommended for recognition. If you want to find faces with custom size value you will need to point with size with: 95% * value. For example we want to find faces with size of 50 pix, it means that in config we should set: 50 * 0.95 ~ 47 pix.

FaceDetV3 supports image orientation determining. Three main angles of image rotation are presented: 90, 180 and 270 degrees. In the case of rotated origin image the rectangles of detection and landmarks will be returned in origin coordinate system. For example if image was rotated on 90 degrees rectangles of detections and landmarks will be rotated on 90 degrees too. The total time for such detection will be 2 times longer comparably with detection without orientation defining. Mode of image orientation is switching on from faceengine.conf by setting useOrientationMode.

Note: FaceDetV3 may provide accurate 5 landmarks only for faces with size greater then 40x40, for smaller faces it provides less accurate landmarks. If you have few faces on target images and target face sizes after resize will less then 40x40, it's recommended to require 68 landmarks. If you have many faces on target image (greater then 7) it will be faster increase minFaceSize to have big enough faces for accurate landmarks estimation.

All last changes in Face Detection logic are described in Handbook/Chapter 10 Migration guide.

Face Alignment#

Five landmarks#

Face alignment is the process of special key points (called "landmarks") detection on a face. FaceEngine does landmark detection at the same time as the face detection since some of the landmarks are by-products of that detection.

At the very minimum, just 5 landmarks are required: two for eyes, one for a nose tip and two for mouth corners. Using these coordinates, one may warp the source photo image (see Chapter "Image warping") for use with all other FaceEngine algorithms.

All detector may provide 5 landmarks for each detection without additional computations.

Typical use cases for 5 landmarks:

Image warping for use with other algorithms:
- Quality and attribute estimators;
- Descriptor extraction.

Sixty-eight landmarks#

More advanced 68-points face alignment is also implemented. Use this when you need precise information about face and its parts. The detected points look like in the image below.

The 68 landmarks require additional computation time, so don't use it if you don't need precise information about a face. If you use 68 landmarks , 5 landmarks will be reassigned to more precise subset of 68 landmarks.

The typical error for landmark estimation on a warped image (see Chapter "Image warping") is in the table below.

"Average point estimation error per landmark"

Point	Error (pixels)	Point	Error (pixels)	Point	Error (pixels)	Point	Error (pixels)
1	±3,88	18	±3,77	35	±1,62	52	±1,65
2	±3,53	19	±2,83	36	±1,90	53	±2,01
3	±3,88	20	±2,70	37	±1,78	54	±2,00
4	±4,30	21	±3,06	38	±1,69	55	±1,93
5	±4,67	22	±3,92	39	±1,63	56	±2,18
6	±4,87	23	±3,46	40	±1,52	57	±2,17
7	±4,67	24	±2,59	41	±1,54	58	±1,99
8	±4,01	25	±2,53	42	±1,60	59	±2,32
9	±3,46	26	±2,95	43	±1,55	60	±2,33
10	±3,87	27	±3,84	44	±1,60	61	±2,06
11	±4,56	28	±1,88	45	±1,74	62	±1,97
12	±4,94	29	±1,75	46	±1,72	63	±1,56
13	±4,55	30	±1,92	47	±1,68	64	±1,86
14	±4,45	31	±2,20	48	±1,65	65	±1,94
15	±4,13	32	±1,97	49	±1,99	66	±2,00
16	±3,68	33	±1,70	50	±1,99	67	±1,70
17	±4,09	34	±1,73	51	±1,95	68	±2,12

Simple 5-point landmarks roughly correspond to:

Average of positions 37, 40 for a left eye;
Average of positions 43, 46 for a right eye;
Number 31 for a nose tip;
Numbers 49 and 55 for mouth corners.

The landmarks for both cases are output by the face detector via Landmarks5 and Landmarks68 structures. Note, that performance-wise 5-point alignment result comes free with a face detection, whereas 68-point result does not. So you should generally request the lowest number of points for your task.

Typical use cases for 68 landmarks:

Segmentation;
Head pose estimation.

Human Detection#

This functionality enables you to detect human bodies in the image.

During the detection process we receive special points (called “landmarks” or exactly "HumanLandmarks17") for the body parts visible in the image. These landmarks represent the keypoints of a human body (see the Human keypoints section).

Human body detection is performed by the IHumanDetector object. The function of interest is detect(). It requires an image to detect on.

Image coordinate system#

The origin of the coordinate system for each processed image is located in the upper left corner.

Human body detection#

When a human body is detected, a rectangular area with the body is defined. The area is represented using coordinates in the image coordinate system.

When a part of a human body is outside of the frame, the detection area will also be beyond the frame borders. Hence coordinates of the detection area may have the following values:

When the human body is beyond the left or the upper border of the frame, the detection coordinates will have negative values;
When the human body is beyond the right or the lower border of the frame, the detection coordinates will have positive values, but their values will exceed the image size.

In the image below, the upper left detection point and lower right points are outside of the frame.

The X and Y coordinates of the upper left detection point have negative values. The Y coordinate of the lower right detection point is equal to Y + n, where n is the length of the zone that exceeds the image frame size.

Detection points are outside of the frame

NOTE! You must consider this feature when processing images to properly process the received coordinates.

A code example for detection cropping is given below.

const fsdk::Rect brect = detection.rect & image.getRect();

detection - human body detection. image - source image.

Constraints#

Human body detection has the following constraints:

Human body detector works correctly only with adult humans in an image;
The detector may detect a body of size from 100 px to 640 px (in an image with a long side of 640 px). You may change the input image size in the config (see ./doc/ConfigurationGuide.pdf). The image will be resized to specified size by the larger side while maintaining the aspect ratio.

Camera position requirements#

In general, you should locate the camera for human detection according to the image below.

Follow these recommendations to correctly detect human body and keypoints:

A person's body should face the camera;
Keep angle of view as close to horizontal as possible;
There should be about 60% of the person's body in the frame (upper body);
There must not be any objects that overlap the person's body in the frame;
The camera should be located at about 165 cm from the floor, which corresponds to the average height of a human.

The examples of wrong camera positions are shown in the image below.

Human body redetection#

Like any other detector in Face Engine SDK, human detector also implements redetection model. The user can make full detection only in a first frame and then redetect the same human in the next "n" frames thereby boosting performance of the whole image processing loop.

User can use redetectOne() method if only a single human detection is required, for more complex use cases one should use redetect() which can redetect humans from multiple images.

Note: Detector give an opportunity to detect human body keypoints in an image.

\newpage

Human Keypoints#

The image below shows the keypoints detected for a human body.

Point	Body Part	Point	Body Part
0	Nose	9	Left Wrist
1	Left Eye	10	Right Wrist
2	Right Eye	11	Left Hip
3	Left Ear	12	Right Hip
4	Right Ear	13	Left Knee
5	Left Shoulder	14	Right Knee
6	Right Shoulder	15	Left Ankle
7	Left Elbow	16	Right Ankle
8	Right Elbow

Detection#

To detect Human Keypoints call detect() using fsdk::HumanDetectionType::DCT_BOX | fsdk::HumanDetectionType::DCT_POINTS argument.

Note: Default is fsdk::HumanDetectionType::DCT_BOX.

Main Results of Each Detection#

The main result of each detection is an array. Each array element consists of a point (fsdk:: Point2f) and a score. If the score value is less than the threshold, then the value of “x” and “y” coordinates will be equal to 0.

Note: see ConfigurationGuide.pdf ("HumanDetector settings" section) for more information about thresholds and configuration parameters.