Skip to content

Detection facility#

Overview#

Object detection facility is responsible for quick and coarse detection tasks, like finding a face in an image.

Detection structure#

The detection structure represents an images-space bounding rectangle of the detected object as well as the detection score.

Detection score is a measure of confidence in the particular object classification result and may be used to pick the most "confident" face of many.

Detection score is the measure of classification confidence and not the source image quality. While the score is related to quality (low-quality data generally results in a lower score), it is not a valid metric to estimate the visual quality of an image.

Special estimators exist to fulfill this task (see section "Image Quality Estimation" in chapter "Parameter estimation facility" for details).

Face Detection#

Object detection is performed by the IDetector object. The function of interest is detect(). It requires an image to detect on and an area of interest (to virtually crop the image and look for faces only in the given location).

Also, face detector implements detectAsync() which allows you to asynchronously detect faces and their parameters on multiple images.

Note: Method detectAsync() is experimental, and it's interface may be changed in the future.

Note: Method detectAsync() is not marked as noexcept and may throw an exception.

Image coordinate system#

The origin of the coordinate system for each processed image is located in the upper left corner.

Source image coordinate system
Source image coordinate system

Face detection#

When a face is detected, a rectangular area with the face is defined. The area is represented using coordinates in the image coordinate system.

Redetect method#

Face detector implements redetect() method which is intended for face detection optimization on video frame sequences. Instead of doing full-blown detection on each frame, one may detect() new faces at a lower frequency (say, each 5th frame) and just confirm them in between with redetect(). This dramatically improves performance at the cost of detection recall. Note that redetect() updates face landmarks as well.

Also, face detector implements redetectAsync() which allows you to asynchronously redetect faces on multiple images based on the detection results for the previous frames.

Note: Method redetectAsync() is experimental, and it's interface may be changed in the future.

Note: Method redetectAsync() is not marked as noexcept and may throw an exception.

Detector works faster with larger value of minFaceSize.

Orientation Estimation#

Name: OrientationEstimator

Algorithm description:

This estimator aims to detect an orientation of the input image. The next outputs are supported:

  • The target image is normal oriented ;
  • The target image is turned to the left by 90 deg;
  • The target image is flipped upside-down;
  • The target image is turned to the right by 90 deg.

Implementation description:

The estimator (see IOrientationEstimator in IOrientationEstimator.h):

  • Implements the estimate() function that accepts source image in R8G8B8 format and returns the estimation result;

  • Implements the estimate() function that accepts fsdk::Span of the source images in R8G8B8 format and fsdk::Span of the fsdk::OrientationType enums to return results of estimation.

The OrientationType enumeration contains all possible results of the Orientation estimation:

    enum OrientationType : uint32_t {
        OT_NORMAL = 0,      //!< Normal orientation of image
        OT_LEFT = 1,        //!< Image is turned left by 90 deg
        OT_UPSIDE_DOWN = 2, //!< Image is flipped upsidedown
        OT_RIGHT = 3        //!< Image is turned right by 90 deg
    };

API structure name:

IOrientationEstimator

Plan files:

  • orientation_v2_cpu.plan
  • orientation_v2_cpu-avx2.plan
  • orientation_v2_gpu.plan

Detector variants#

Supported detector variants:

  • FaceDetV1
  • FaceDetV2
  • FaceDetV3

There are two basic detector families. The first of them includes two detector variants: FaceDetV1 and FaceDetV2. The second family currently includes only one detector variant - FaceDetV3. FaceDetV3 is the latest and most precise detector. For this type of detector can be passed sensor type. In terms of performance FaceDetV3 is similar to FaceDetV1 detector.

User code may specify necessary detector type while creating IDetector object using parameter.

FaceDetV1 and FaceDetV2 performance depends on number of faces on image and image complexity. FaceDetV3 performance depends only on the target image resolution.

FaceDetV3 works faster with batched redetect.

FaceDetV3 supports asynchronous methods for detection and redetection. FaceDetV1 and FaceDetV2 will return not implemented error.

FaceDetV1 and FaceDetV2 Configuration#

FaceDetV1 detector is more precise and FaceDetV2 works two times faster (See appendix A chapter "Appendix A. Specifications").

FaceDetV1 and FaceDetV2 detector's performance depend on number of faces in image. FaceDetV3 doesn't depend on it, so it may be slower then FaceDetV1 on images with one face and much more faster on images with many faces.

FaceDetV3 Configurating#

FaceDetV3 detects faces from minFaceSize to minFaceSize * 32. You can change the minimum size of the faces that will be searched in the photo from the faceengine.conf configuration.

For example:

config->setValue("FaceDetV3::Settings", "minFaceSize", 20);

The logic of the detector is very understandable. The smaller the face size we need to find the more time we need.

We recommend to use such meanings for minFaceSize: 20, 40 and 90. The size 90 pix is recommended for recognition. If you want to find faces with custom size value you will need to point with size with: 95% * value. For example we want to find faces with size of 50 pix, it means that in config we should set: 50 * 0.95 ~ 47 pix.

FaceDetV3 may provide accurate 5 landmarks only for faces with size greater then 40x40, for smaller faces it provides less accurate landmarks.

If you have few faces on target images and target face sizes after resize will less then 40x40, it's recommended to require 68 landmarks.

If you have many faces on target image (greater then 7) it will be faster increase minFaceSize to have big enough faces for accurate landmarks estimation.

All last changes in Face Detection logic are described in chapter "Migration guide".

Face Alignment#

Five landmarks#

Face alignment is the process of special key points (called "landmarks") detection on a face. FaceEngine does landmark detection at the same time as the face detection since some of the landmarks are by-products of that detection.

At the very minimum, just 5 landmarks are required: two for eyes, one for a nose tip and two for mouth corners. Using these coordinates, one may warp the source photo image (see Chapter "Image warping") for use with all other FaceEngine algorithms.

All detector may provide 5 landmarks for each detection without additional computations.

Typical use cases for 5 landmarks:

  • Image warping for use with other algorithms:

  • Quality and attribute estimators;

  • Descriptor extraction.

Sixty-eight landmarks#

More advanced 68-points face alignment is also implemented. Use this when you need precise information about face and its parts. The detected points look like in the image below.

The 68 landmarks require additional computation time, so don't use it if you don't need precise information about a face. If you use 68 landmarks , 5 landmarks will be reassigned to more precise subset of 68 landmarks.

68-point face alignment
68-point face alignment

The typical error for landmark estimation on a warped image (see Chapter "Image warping") is in the table below.

"Average point estimation error per landmark"

Point Error (pixels) Point Error (pixels) Point Error (pixels) Point Error (pixels)
1 ±3,88 18 ±3,77 35 ±1,62 52 ±1,65
2 ±3,53 19 ±2,83 36 ±1,90 53 ±2,01
3 ±3,88 20 ±2,70 37 ±1,78 54 ±2,00
4 ±4,30 21 ±3,06 38 ±1,69 55 ±1,93
5 ±4,67 22 ±3,92 39 ±1,63 56 ±2,18
6 ±4,87 23 ±3,46 40 ±1,52 57 ±2,17
7 ±4,67 24 ±2,59 41 ±1,54 58 ±1,99
8 ±4,01 25 ±2,53 42 ±1,60 59 ±2,32
9 ±3,46 26 ±2,95 43 ±1,55 60 ±2,33
10 ±3,87 27 ±3,84 44 ±1,60 61 ±2,06
11 ±4,56 28 ±1,88 45 ±1,74 62 ±1,97
12 ±4,94 29 ±1,75 46 ±1,72 63 ±1,56
13 ±4,55 30 ±1,92 47 ±1,68 64 ±1,86
14 ±4,45 31 ±2,20 48 ±1,65 65 ±1,94
15 ±4,13 32 ±1,97 49 ±1,99 66 ±2,00
16 ±3,68 33 ±1,70 50 ±1,99 67 ±1,70
17 ±4,09 34 ±1,73 51 ±1,95 68 ±2,12

Simple 5-point landmarks roughly correspond to:

  • Average of positions 37, 40 for a left eye;
  • Average of positions 43, 46 for a right eye;
  • Number 31 for a nose tip;
  • Numbers 49 and 55 for mouth corners.

The landmarks for both cases are output by the face detector via Landmarks5 and Landmarks68 structures. Note, that performance-wise 5-point alignment result comes free with a face detection, whereas 68-point result does not. So you should generally request the lowest number of points for your task.

Typical use cases for 68 landmarks:

  • Segmentation;
  • Head pose estimation.

Face Landmarks Detector#

Every kind of detector provides an interface to find face landmarks. If you have a face detection without landmarks we provide additional interface to request them. The detection of landmarks is performed by the IFaceLandmarksDetector object. The functions of interest are detectLandmarks5() and detectLandmarks68. They need images and detections.

Human Detection#

This functionality enables you to detect human bodies in the image.

During the detection process we receive special points (called “landmarks” or exactly "HumanLandmarks17") for the body parts visible in the image. These landmarks represent the keypoints of a human body (see the Human keypoints section).

Human body detection is performed by the IHumanDetector object. The function of interest is detect(). It requires an image to detect on.

Image coordinate system#

The origin of the coordinate system for each processed image is located in the upper left corner.

Source image coordinate system
Source image coordinate system

Human body detection#

When a human body is detected, a rectangular area with the body is defined. The area is represented using coordinates in the image coordinate system.

Constraints#

Human body detection has the following constraints:

  • Human body detector works correctly only with adult humans in an image;
  • The detector may detect a body of size from 100 px to 640 px (in an image with a long side of 640 px). You may change the input image size in the config (see ./doc/ConfigurationGuide.pdf). The image will be resized to specified size by the larger side while maintaining the aspect ratio.

Camera position requirements#

In general, you should locate the camera for human detection according to the image below.

Camera position for human detection
Camera position for human detection

Follow these recommendations to correctly detect human body and keypoints:

  • A person's body should face the camera;

  • Keep angle of view as close to horizontal as possible;

  • There should be about 60% of the person's body in the frame (upper body);

  • There must not be any objects that overlap the person's body in the frame;

  • The camera should be located at about 165 cm from the floor, which corresponds to the average height of a human.

The examples of wrong camera positions are shown in the image below.

Wrong camera positions
Wrong camera positions

Human body redetection#

Like any other detector in Face Engine SDK, human detector also implements redetection model. The user can make full detection only in a first frame and then redetect the same human in the next "n" frames thereby boosting performance of the whole image processing loop.

User can use redetectOne() method if only a single human detection is required, for more complex use cases one should use redetect() which can redetect humans from multiple images.

Detector give an opportunity to detect human body keypoints in an image.

Human Keypoints#

The image below shows the keypoints detected for a human body.

17-points of human body
17-points of human body
Point Body Part Point Body Part
0 Nose 9 Left Wrist
1 Left Eye 10 Right Wrist
2 Right Eye 11 Left Hip
3 Left Ear 12 Right Hip
4 Right Ear 13 Left Knee
5 Left Shoulder 14 Right Knee
6 Right Shoulder 15 Left Ankle
7 Left Elbow 16 Right Ankle
8 Right Elbow

Cases that increase the probability of error:

  • Non-standard poses (head below the shoulders, vertical twine, lying head to the camera, etc.);
  • Camera position from above at a large angle;
  • Sometimes estimator predicts invisible points with high score, especially for points of elbows, wrists, ears.

Human detector provides an interface to find human landmarks. If you have a human structure without landmarks we provide additional interface to request them. The detection of landmarks is performed by the IHumanLandmarksDetector object. The function of interest is detectLandmarks(). It needs images and detections.

Detection#

To detect Human Keypoints call detect() using fsdk::HumanDetectionType::DCT_BOX | fsdk::HumanDetectionType::DCT_POINTS argument.

Default is fsdk::HumanDetectionType::DCT_BOX.

Main Results of Each Detection#

The main result of each detection is an array. Each array element consists of a point (fsdk:: Point2f) and a score. If the score value is less than the threshold, then the value of “x” and “y” coordinates will be equal to 0.

See ConfigurationGuide.pdf ("HumanDetector settings" section) for more information about thresholds and configuration parameters.

HumanFace Detection. Face to body association#

This functionality enables you to detect the bodies and faces of people and perform an association between them, determining whether the detected face and body belong to the same person.

This detector contains the implementation of both Human and Face(FaceDetV3) detectors. This means that all the requirements, constraints and recommendations for quality improvement imposed for these detectors will be relevant for the HumanFace detector.

Detector operation algorithm:

HumanFace detection
HumanFace detection

HumanFace redetection#

To perform redetection, you need to separately redetect body and face.

Performance#

User can skip computation of associations by selecting according HumanFaceDetectionType for detect() method, if he doesn't need this functionality. In such case, we estimate performance gain about 5% on cpu and about 20% on gpu devices. The more faces and bodies represented in image, the more gain user will enjoy after association skip.

Main results#

There are two output structures:

  • HumanFaceBatch
  • HumanFaceAssociations

The HumanFaceBatch contains three arrays - face detections, human detections and associations:

    struct IHumanFaceBatch : public IRefCounted {
        virtual Span<const Detection> getHumanDetections(size_t index = 0) const noexcept = 0;
        virtual Span<const Detection> getFaceDetections(size_t index = 0) const noexcept = 0;
        virtual Span<const HumanFaceAssociation> getAssociations(size_t index = 0) const noexcept = 0;
    };

The HumanFaceAssociation structure contains results of the association:

    struct HumanFaceAssociation {
        uint32_t humanId;
        uint32_t faceId;
        float score;
    };

There are two groups of fields:

  1. The first group contains body and face detection indexes:
        uint32_t humanId;
        uint32_t faceId;
  1. The second group contains association score:
        float score;

The score is defined in [0,1] range.

Associations and detections whose scores are lower than the threshold will be rejected and not returned in the results.

See ConfigurationGuide.pdf ("HumanFace settings" section) for more information about thresholds and configuration parameters.

minFaceSize#

This detector could detect faces with size 20 px and more (minFaceSize parameter) and humans with size 100 px and more. In case if such small faces and humans are not required, user could change the minFaceSize parameter in the config.

Before processing, the images will be resized by minFaceSize/20 times. For example, if the value is minFaceSize=50, then the image will be additionally resized by minFaceSize=50/20=2.5 times.

Detector works faster with larger value of minFaceSize.

Head Detection#

This functionality enables you to detect the heads of people.

This detector implementation is similar to Face(FaceDetV3) detectors. This means that all the requirements, constraints and recommendations for quality improvement imposed for this detector will be relevant for the Head detector.

Object detection is performed by the IHeadDetector. The function of interest is detect(). It requires an image to detect on and an area of interest (to virtually crop the image and look for heads only in the given location).

Image coordinate system#

The origin of the coordinate system for each processed image is located in the upper left corner.

Source image coordinate system
Source image coordinate system

Main results#

Output structures:

  • DetectionBatch

The DetectionBatch contains an array of head detections:

    struct IDetectionBatch : public IRefCounted {

        virtual size_t getSize() const noexcept = 0;

        virtual Span<const Detection> getDetections(size_t index = 0) const noexcept = 0;

    };

minHeadSize#

This detector could detect heads with size 20 px and more (minHeadSize parameter). In case if such small heads, user could change the minHeadSize parameter in the config.

Before processing, the images will be resized by minHeadSize/20 times. For example, if the value is minHeadSize=50, then the image will be additionally resized by minHeadSize=50/20=2.5 times.

Detector works faster with larger value of minHeadSize.