Skip to content

Parameter Estimation Facility#

Overview#

The estimation facility is the only multi-purpose facility in FaceEngine. It is designed as a collection of tools that help to estimate various images or depicted object properties. These properties may be used to increase the precision of algorithms implemented by other FaceEngine facilities or to accomplish custom user tasks.

Use cases#

ISO estimation#

LUNA SDK provides algorithms for image check according to the requirements of the ISO/IEC 19794-5:2011 standard and compatible standards.

The requirements can be found on the official website: https://www.iso.org/obp/ui/#iso:std:iso-iec:19794:-5:en.

The following algorithms are provided:

  • Head rotation angles (pitch, yaw, and roll angles). According to section "7.2.2 Pose" in the standard, the angles should be +/- 5 degrees from frontal in pitch and yaw, less than +/- 8 degrees from frontal in roll. See additional information about the algorithm in section "Head Pose".

  • Gaze. See section "7.2.3 Expression" point "e" of the standard. See additional information about the algorithm in section "Gaze Estimation".

  • Mouth state (opened, closed, occluded) and additional properties for smile (regular smile, smile with teeths exposed) See section "7.2.3 Expression" points "a", "b", and "c" of the standard. See additional information about the algorithm in section "Mouth Estimation".

  • Quality of the image:

    • Contrast and saturation (insufficient or too large exposure). See sections "7.2.7 Subject and scene lighting" and "7.3.2 Contrast and saturation" of the standard.
    • Blurring. See section "7.3.3 Focus and depth of field" of the standard.
    • Specularity. See section "7.2.8 Hot spots and specular reflections" and "7.2.12 Lighting artefacts" of the standard.
    • Uniformity of illumination. See sections "7.2.7 Subject and scene lighting" and "7.2.12 Lighting artefacts" of the standard.

    See additional information about the algorithm in section "Image Quality Estimation".

  • Glasses state (no glasses, glasses, sunglasses). See section "7.2.9 Eye glasses" of the standard. See additional information about the algorithm in section "Glasses Estimation".

  • Eyes state (for each eye: opened, closed, occluded). See sections "7.2.3 Expression" point "a", "7.2.11 Visibility of pupils and irises" and "7.2.13 Eye patches" of the standard. See additional information about the algorithm in section "Eyes Estimation".

  • Natural light estimation. See section "7.3.4 Unnatural colour" of the standard. See additional information about the algorithm in section "Natural Light Estimation".

  • Eybrows state: neutral, raised, squinting, frowning. See section "7.2.3 Expression" points "d", "f", and "g" of the standard. See additional information about the algorithm in section "Eyebrows estimation".

  • Position of a person's shoulders in the original image: the shoulders are parallel to the camera or not. See section "7.2.5 Shoulders" of the standard. See additional information about the algorithm in section "Portrait Style Estimation".

  • Headwear. Checks if there is a headwear on a person or not. Several types of headwear can be estimated. See section "B.2.7 Head coverings" of the standard. See additional information about the algorithm in section "Headwear Estimation".

  • Red eyes estimation. Checks if there is a red eyes effect. See section "7.3.4 Unnatural colour" of the standard. See additional information about the algorithm in section "Red Eyes Estimation".

  • Radial distortion estimation. See section "7.3.6 Radial distortion of the camera lens" of the standard. See additional information about the algorithm in section "Fish Eye Estimation".

  • Image type estimation: color, grayscale, infrared. See section "7.4.4 Use of near infra-red cameras" of the standard. See additional information about the algorithm in section "Grayscale, color or infrared Estimation".

  • Background estimation: background uniformity and if a background is too light or too dark. See section "B.2.9 Backgrounds" of the standard. See additional information about the algorithm in section "Background Estimation".

Best shot selection functionality#

BestShotQuality Estimation#

Name: BestShotQualityEstimator

Algorithm description:

The BestShotQuality estimator is designed to evaluate image quality to choose the best image before descriptor extraction. The BestShotQuality estimator consists of two components - AGS (garbage score) and Head Pose.

AGS aims to determine the source image score for further descriptor extraction and matching.

Estimation output is a float score which is normalized in range [0..1]. The closer score to 1, the better matching result is received for the image.

When you have several images of a person, it is better to save the image with the highest AGS score.

Recommended threshold for AGS score is equal to 0.2. But it can be changed depending on the purpose of use. Consult VisionLabs about the recommended threshold value for this parameter.

Head Pose determines person head rotation angles in 3D space, namely pitch, yaw and roll.

Head pose
Head pose

Since 3D head translation is hard to determine reliably without camera-specific calibration, only 3D rotation component is estimated.

Head pose estimation characteristics:

  • Units (degrees);
  • Notation (Euler angles);
  • Precision (see table below).

Implementation description:

The estimator (see IBestShotQualityEstimator in IEstimator.h):

  • Implements the estimate() function that needs fsdk::Image in R8G8B8 format, fsdk::Detection structure of corresponding source image (see section "Detection structure" in chapter "Face detection facility"), fsdk::IBestShotQualityEstimator::EstimationRequest structure and fsdk::IBestShotQualityEstimator::EstimationResult to store estimation result;

  • Implements the estimate() function that needs the span of fsdk::Image in R8G8B8 format, the span of fsdk::Detection structures of corresponding source images (see section "Detection structure" in chapter "Face detection facility"), fsdk::IBestShotQualityEstimator::EstimationRequest structure and span of fsdk::IBestShotQualityEstimator::EstimationResult to store estimation results.

  • Implements the estimateAsync() function that needs fsdk::Image in R8G8B8 format, fsdk::Detection structure of corresponding source image (see section "Detection structure" in chapter "Face detection facility"), fsdk::IBestShotQualityEstimator::EstimationRequest structure;

Note: Method estimateAsync() is experimental, and it's interface may be changed in the future. Note: Method estimateAsync() is not marked as noexcept and may throw an exception.

Before using this estimator, user is free to decide whether to estimate or not some listed attributes. For this purpose, estimate() method takes one of the estimation requests:

  • fsdk::IBestShotQualityEstimator::EstimationRequest::estimateAGS to make only AGS estimation;
  • fsdk::IBestShotQualityEstimator::EstimationRequest::estimateHeadPose to make only Head Pose estimation;
  • fsdk::IBestShotQualityEstimator::EstimationRequest::estimateAll to make both AGS and Head Pose estimations;

The EstimationResult structure contains results of the estimation:

    struct EstimationResult {
        Optional<HeadPoseEstimation> headPose;  //!< HeadPose estimation if was requested, empty otherwise
        Optional<float> ags;                    //!< AGS estimation if was requested, empty otherwise
    };

Head Pose accuracy:

Prediction precision decreases as a rotation angle increases. We present typical average errors for different angle ranges in the table below.

"Head pose prediction precision"

Range -45°...+45° < -45° or > +45°
Average prediction error (per axis) Yaw ±2.7° ±4.6°
Average prediction error (per axis) Pitch ±3.0° ±4.8°
Average prediction error (per axis) Roll ±3.0° ±4.6°

Zero position corresponds to a face placed orthogonally to camera direction, with the axis of symmetry parallel to the vertical camera axis.

API structure name:

IBestShotQualityEstimator

Plan files:

For more information see Approximate Garbage Score Estimation (AGS) and Head Pose Estimation

Image Quality Estimation#

Name: QualityEstimator

Algorithm description:

The estimator is trained to work with warped images (see chapter "Image warping" for details).

This estimator is designed to determine the image quality. You can estimate the image according to the following criteria:

  • The image is blurred;
  • The image is underexposed (i.e., too dark);
  • The image is overexposed (i.e., too light);
  • The face in the image is illuminated unevenly (there is a great difference between light and dark regions);
  • Image contains flares on face (too specular).

Examples are presented in the images below. Good quality images are shown on the right.

Blurred image (left), not blurred image (right)
Blurred image (left), not blurred image (right)
Dark image (left), good quality image (right)
Dark image (left), good quality image (right)
Light image (left), good quality image (right)
Light image (left), good quality image (right)
Image with uneven illumination (left), image with even illumination (right)
Image with uneven illumination (left), image with even illumination (right)
Image with specularity - image contains flares on face (left), good quality image (right)
Image with specularity - image contains flares on face (left), good quality image (right)

Implementation description:

The general rule of thumb for quality estimation:

  1. Detect a face, see if detection confidence is high enough. If not, reject the detection.
  2. Produce a warped face image (see chapter "Descriptor processing facility") using a face detection and its landmarks.
  3. Estimate visual quality using the estimator, finally reject low-quality images.

While the scheme above might seem a bit complicated, it is the most efficient performance-wise, since possible rejections on each step reduce workload for the next step.

At the moment estimator exposes two interface functions to predict image quality:

  • virtual Result estimate(const Image& warp, Quality& quality);
  • virtual Result estimate(const Image& warp, SubjectiveQuality& quality);

Each one of this functions use its own CNN internally and return slightly different quality criteria.

The first CNN is trained specifically on pre-warped human face images and will produce lower score factors if one of the following conditions are satisfied:

  • Image is blurred;
  • Image is under-exposured (i.e., too dark);
  • Image is over-exposured (i.e., too light);
  • Image color variation is low (i.e., image is monochrome or close to monochrome).

Each one of this score factors is defined in [0..1] range, where higher value corresponds to better image quality and vice versa.

The second interface function output will produce lower factor if:

  • The image is blurred;
  • The image is underexposed (i.e., too dark);
  • The image is overexposed (i.e., too light);
  • The face in the image is illuminated unevenly (there is a great difference between light and dark regions);
  • Image contains flares on face (too specular).

The estimator determines the quality of the image based on each of the aforementioned parameters. For each parameter, the estimator function returns two values: the quality factor and the resulting verdict.

As with the first estimator function the second one will also return the quality factors in the range [0..1], where 0 corresponds to low image quality and 1 to high image quality. E. g., the estimator returns low quality factor for the Blur parameter, if the image is too blurry.

The resulting verdict is a quality output based on the estimated parameter. E. g., if the image is too blurry, the estimator returns “isBlurred = true”.

The threshold (see below) can be specified for each of the estimated parameters. The resulting verdict and the quality factor are linked through this threshold. If the received quality factor is lower than the threshold, the image quality is low and the estimator returns “true”. E. g., if the image blur quality factor is higher than the threshold, the resulting verdict is “false”.

If the estimated value for any of the parameters is lower than the corresponding threshold, the image is considered of bad quality. If resulting verdicts for all the parameters are set to "False" the quality of the image is considered good.

The quality factor is a value in the range [0..1] where 0 corresponds to low quality and 1 to high quality.

Illumination uniformity corresponds to the face illumination in the image. The lower the difference between light and dark zones of the face, the higher the estimated value. When the illumination is evenly distributed throughout the face, the value is close to "1".

Specularity is a face possibility to reflect light. The higher the estimated value, the lower the specularity and the better the image quality. If the estimated value is low, there are bright glares on the face.

The Quality structure contains results of the estimation made by first CNN. Each estimation is given in normalized [0, 1] range:

    struct Quality {
        float light;    //!< image overlighting degree. 1 - ok, 0 - overlighted.
        float dark;     //!< image darkness degree. 1 - ok, 0 - too dark.
        float gray;     //!< image grayness degree 1 - ok, 0 - too gray.
        float blur;     //!< image blur degree. 1 - ok, 0 - too blured.
        inline float getQuality() const noexcept;   //!< complex estimation of quality. 0 - low quality, 1 - high quality.
    };

The SubjectiveQuality structure contains results of the estimation made by second CNN. Each estimation is given in normalized [0, 1] range:

    struct SubjectiveQuality {
        float blur;         //!< image blur degree. 1 - ok, 0 - too blured.
        float light;        //!< image brightness degree. 1 - ok, 0 - too bright;
        float darkness;     //!< image darkness degree. 1 - ok, 0 - too dark;
        float illumination; //!< image illumination uniformity degree. 1 - ok, 0 - is too illuminated;
        float specularity;  //!< image specularity degree. 1 - ok, 0 - is not specular;
        bool isBlurred;     //!< image is blurred flag;
        bool isHighlighted; //!< image is overlighted flag;
        bool isDark;        //!< image is too dark flag;
        bool isIlluminated; //!< image is too illuminated flag;
        bool isNotSpecular; //!< image is not specular flag;
        inline bool isGood() const noexcept;    //!< if all boolean flags are false returns true - high quality, else false - low quality.
    };

Recommended thresholds: 

Table below contains thresholds from faceengine configuration file (faceengine.conf) in QualityEstimator::Settings section. By default, these threshold values are set to optimal.

"Image quality estimator recommended thresholds"

Threshold Recommended value
blurThreshold 0.61
darknessThreshold 0.50
lightThreshold 0.57
illuminationThreshold 0.1
specularityThreshold 0.1

The most important parameters for face recognition are "blurThreshold", "darknessThreshold" and "lightThreshold", so you should select them carefully.

You can select images of better visual quality by setting higher values of the "illuminationThreshold" and "specularityThreshold". Face recognition is not greatly affected by uneven illumination or glares.

Configurations:

See the "Quality estimator settings" section in the "ConfigurationGuide.pdf" document.

API structure name:

IQualityEstimator

Plan files:

  • model_subjective_quality__cpu.plan
  • model_subjective_quality__cpu-avx2.plan
  • model_subjective_quality__gpu.plan

Note: usePlanV1 toggles the Quality estimation, usePlanV2 toggles the SubjectiveQuality estimation. These parameters can enable or disable the corresponding functionality via the faceengine.conf configuration file.

<section name="QualityEstimator::Settings">
...
    <param name="usePlanV1" type="Value::Int1" x="1" />
    <param name="usePlanV2" type="Value::Int1" x="1" />
</section>

Note that you cannot disable both the parameters at the same time. In case you do this, you will receive the fsdk::FSDKError::InvalidConfig error code and the following logs:

[27.06.2024 12:38:59] [Error] QualityEstimator::Settings Failed to create QualityEstimator! The both parameters: "usePlanV1" and "usePlanV2" in section "QualityEstimator::Settings" are disabled at the same time.

Attributes estimation functionality#

Face Attribute Estimation#

Name: AttributeEstimator

Algorithm description:

The estimator is trained to work with warped images (see chapter "Image warping" for details).

The Attribute estimator determines face attributes. Currently, the following attributes are available:

  • Age: determines person's age;
  • Gender: determines person's gender;

The Attribute estimator returns Ethnicity estimation structure. Each estimation is given in normalized [0, 1] range.

The Ethnicity estimation structure looks like the struct below:

 struct EthnicityEstimation {
  float africanAmerican;
  float indian;
  float asian;
  float caucasian;

  enum Ethnicities {
   AfricanAmerican = 0,
   Indian,
   Asian,
   Caucasian,
   Count
  };

  /**
   * @brief Returns ethnicity with greatest score.
   * @see EthnicityEstimation::Ethnicities for more info.
   * */
  inline Ethnicities getPredominantEthnicity() const;

  /**
   * @brief Returns score of required ethnicity.
   * @param [in] ethnicity ethnicity.
   * @see EthnicityEstimation::Ethnicities for more info.
   * */
  inline float getEthnicityScore(Ethnicities ethnicity) const;
 };

Implementation description:

Before using attribute estimator, user is free to decide whether to estimate or not some specific attributes listed above through IAttributeEstimator::EstimationRequest structure, which later get passed in main estimate() method. Estimator overrides IAttributeEstimator::AttributeEstimationResult output structure, which consists of optional fields describing results of user requested attributes.

Recommended thresholds: 

Table below contains thresholds from faceengine configuration file (faceengine.conf) in AttributeEstimator::Settings section. By default, these threshold values are set to optimal.

"Attribute estimator recommended thresholds"

Threshold Recommended value
genderThreshold 0.5
adultThreshold 0.2

Accuracy:

Age:

  • For cooperative (see "Appendix B. Glossary") conditions: average error depends on person age, see table below for additional details. Estimation accuracy is 2.3.

Gender:

  • Estimation accuracy in cooperative mode is 99.81% with the threshold 0.5;
  • Estimation accuracy in non-cooperative mode is 92.5%.

"Average age estimation error per age group for cooperative conditions"

Age (years) Average error (years)
0-3 ±3.3
4-7 ±2.97
8-12 ±3.06
13-17 ±4.05
17-20 ±3.89
20-25 ±1.89
25-30 ±1.88
30-35 ±2.42
35-40 ±2.65
40-45 ±2.78
45-50 ±2.88
50-55 ±2.85
55-60 ±2.86
60-65 ±3.24
65-70 ±3.85
70-75 ±4.38
75-80 ±6.79

In earlier releases of Luna SDK Attribute estimator worked poorly in non-cooperative mode (only 56% gender estimation accuracy), and did not estimate child's age. Having solved these problems average estimation error per age group got a bit higher due to extended network functionality.

Configurations:

See the "AttributeEstimator settings" section in the "ConfigurationGuide.pdf" document.

API structure name:

IAttributeEstimator

Plan files:

  • attributes_estimation_v6_cpu.plan
  • attributes_estimation_v6_cpu-avx2.plan
  • attributes_estimation_v6_gpu.plan

Credibility Check Estimation#

Name: CredibilityCheckEstimator

Algorithm description:

This estimator estimates reliability of a person.

Implementation description:

The estimator (see ICredibilityCheckEstimator in ICredibilityCheckEstimator.h):

  • Implements the estimate() function that accepts warped image in R8B8G8 format and fsdk::CredibilityCheckEstimation structure.

  • Implements the estimate() function that accepts span of warped images in R8B8G8 format and span of fsdk::CredibilityCheckEstimation structures.

The CredibilityCheckEstimation structure contains results of the estimation:

    struct CredibilityCheckEstimation {
        float value;                          //!< estimation in [0,1] range.
                                              //!< The closer the score to 1,
                                              //!< the more likely that person is reliable.

        CredibilityStatus credibilityStatus;  //!< estimation result
                                              //!< (@see CredibilityStatus enum).
    };

Enumeration of possible credibility statuses:

    enum class CredibilityStatus : uint8_t {
        Reliable = 1,                          //!< person is reliable
        NonReliable = 2                        //!< person is not reliable
    };

Recommended thresholds: 

Table below contains thresholds from faceengine configuration file (faceengine.conf) in CredibilityEstimator::Settings section. By default, this threshold value is set to optimal.

"Credibility check estimator recommended threshold"

Threshold Recommended value
reliableThreshold 0.5

Filtration parameters:

The estimator is trained to work with face images that meet the following requirements:

"Requirements for fsdk::HeadPoseEstimation"

Attribute Acceptable angle range(degrees)
pitch [-20...20]
yaw [-20...20]
roll [-20...20]

"Requirements for fsdk::SubjectiveQuality"

Attribute Minimum value
blur 0.61
light 0.57

"Requirements for fsdk::AttributeEstimationResult"

Attribute Minimum value
age 18

"Requirements for fsdk::OverlapEstimation"

Attribute State
overlapped false

"Requirements for fsdk::Detection"

Attribute Minimum value
detection size 100

Detection size is detection width.

const fsdk::Detection detection = ... // somehow get fsdk::Detection object
const int detectionSize = detection.getRect().width;

Configurations:

See the "Credibility Estimator settings" section in the "ConfigurationGuide.pdf" document.

API structure name:

ICredibilityCheckEstimator

Plan files:

  • credibility_check_cpu.plan
  • credibility_check_cpu-avx2.plan
  • credibility_check_gpu.plan

Facial Hair Estimation#

Name: FacialHairEstimator

Algorithm description:

This estimator aims to detect a facial hair type on the face in the source image. It can return the next results:

  • There is no hair on the face (see FacialHair::NoHair field in the FacialHair enum);
  • There is stubble on the face (see FacialHair::Stubble field in the FacialHair enum);
  • There is mustache on the face (see FacialHair::Mustache field in the FacialHair enum);
  • There is beard on the face (see FacialHair::Beard field in the FacialHair enum).

Implementation description:

The estimator (see IFacialHairEstimator in IFacialHairEstimator.h):

  • Implements the estimate() function that accepts source warped image in R8G8B8 format and FacialHairEstimation structure to return results of estimation;

  • Implements the estimate() function that accepts fsdk::Span of the source warped images in R8G8B8 format and fsdk::Span of the FacialHairEstimation structures to return results of estimation.

The FacialHair enumeration contains all possible results of the FacialHair estimation:

    enum class FacialHair {
        NoHair = 0,                 //!< no hair on the face
        Stubble,                    //!< stubble on the face
        Mustache,                   //!< mustache on the face
        Beard                       //!< beard on the face
    };

The FacialHairEstimation structure contains results of the estimation:

    struct FacialHairEstimation {
        FacialHair result;          //!< estimation result (@see FacialHair enum)
        // scores
        float noHairScore;          //!< no hair on the face score
        float stubbleScore;         //!< stubble on the face score
        float mustacheScore;        //!< mustache on the face score
        float beardScore;           //!< beard on the face score
    };

There are two groups of the fields:

1․ The first group contains only the result enum:

        FacialHair result;          //!< estimation result (@see FacialHair enum)

Result enum field FacialHairEstimation contain the target results of the estimation.

2․ The second group contains scores:

        float noHairScore;          //!< no hair on the face score
        float stubbleScore;         //!< stubble on the face score
        float mustacheScore;        //!< mustache on the face score
        float beardScore;           //!< beard on the face score

The scores group contains the estimation scores for each possible result of the estimation.

All scores are defined in [0,1] range. Sum of scores always equals 1.

Filtration parameters:

The estimator is trained to work with face images that meet the following requirements:

"Requirements for fsdk::HeadPoseEstimation"

Attribute Acceptable angle range(degrees)
pitch [-40...40]
yaw [-40...40]
roll [-40...40]

"Requirements for fsdk::MedicalMaskEstimation"

Attribute State
result fsdk::MedicalMask::NoMask

"Requirements for fsdk::Detection"

Attribute Minimum value
detection size 40

Detection size is detection width.

const fsdk::Detection detection = ... // somehow get fsdk::Detection object
const int detectionSize = detection.getRect().width;

API structure name:

IFacialHairEstimator

Plan files:

  • face_hair_v2_cpu.plan
  • face_hair_v2_cpu-avx2.plan
  • face_hair_v2_gpu.plan

Natural Light Estimation#

Name: NaturalLightEstimator

Algorithm description:

This estimator aims to detect a natural light on the source face image. It can return the next results:

  • Light is not natural on the face image (see LightStatus::NonNatural field in the LightStatus enum);
  • Light is natural on the face image (see LightStatus::Natural field in the LightStatus enum).

Implementation description:

The estimator (see INaturalLightEstimator in INaturalLightEstimator.h):

  • Implements the estimate() function that accepts source warped image in R8G8B8 format and NaturalLightEstimation structure to return results of estimation;

  • Implements the estimate() function that accepts fsdk::Span of the source warped images in R8G8B8 format and fsdk::Span of the NaturalLightEstimation structures to return results of estimation.

The LightStatus enumeration contains all possible results of the NaturalLight estimation:

    enum class LightStatus : uint8_t {
        NonNatural = 0,                   //!< light is not natural
        Natural = 1                       //!< light is natural
    };

The NaturalLightEstimation structure contains results of the estimation:

    struct NaturalLightEstimation {
        LightStatus status;               //!< estimation result (@see NaturalLight enum).
        float score;                      //!< Numerical value in range [0, 1].
    };

There are two groups of the fields:

1․ The first group contains only the result enum:

        LightStatus status;               //!< estimation result (@see LightStatus enum).

Result enum field NaturalLightEstimation contain the target results of the estimation.

2․ The second group contains scores:

        float score;                      //!< Numerical value in range [0, 1].

The scores group contains the estimation scores for each possible result of the estimation.

All scores are defined in [0,1] range. Sum of scores always equals 1.

Recommended thresholds:

Table below contains thresholds from faceengine configuration file (faceengine.conf) in NaturalLightEstimator::Settings section. By default, this threshold value is set to optimal.

"Natural light estimator recommended threshold"

Threshold Recommended value
naturalLightThreshold 0.5

Filtration parameters:

The estimator is trained to work with face images that meet the following requirements:

"Requirements for fsdk::MedicalMaskEstimation"

Attribute State
result fsdk::MedicalMask::NoMask

"Requirements for fsdk::SubjectiveQuality"

Attribute Minimum value
blur 0.5

Also fsdk::GlassesEstimation must not be equal to fsdk::GlassesEstimation::SunGlasses.

Configurations:

See the "Natural Light Estimator settings" section in the "ConfigurationGuide.pdf" document.

API structure name:

INaturalLightEstimator

Plan files:

  • natural_light_cpu.plan
  • natural_light_cpu-avx2.plan
  • natural_light_gpu.plan

Fish Eye Estimation#

Name: FishEyeEstimator

Algorithm description:

This estimator aims to detect a fish eye effect on the source face image. It can return the next fish eye effect status results:

  • There is no fish eye effect on the face image (see FishEye::NoFishEyeEffect field in the FishEye enum);
  • There is fish eye effect on the face image (see FishEye::FishEyeEffect field in the FishEye enum).

Implementation description:

The estimator (see IFishEyeEstimator in IFishEyeEstimator.h):

  • Implements the estimate() function that accepts source image in R8G8B8 format, face detection and FishEyeEstimation structure to return results of estimation;

  • Implements the estimate() function that accepts fsdk::Span of the source images in R8G8B8 format, fsdk::Span of the face detections and fsdk::Span of the FishEyeEstimation structures to return results of estimation.

The FishEye enumeration contains all possible results of the FishEye estimation:

    enum class FishEye {
        NoFishEyeEffect = 0,  //!< no fish eye effect
        FishEyeEffect = 1     //!< with fish eye effect
    };

The FishEyeEstimation structure contains results of the estimation:

    struct FishEyeEstimation {
        FishEye result;       //!< estimation result (@see FishEye enum)
        float score;          //!< fish eye effect score
    };

There are two groups of the fields:

1․ The first group contains only the result enum:

        FishEye result;       //!< estimation result (@see FishEye enum)

Result enum field FishEyeEstimation contain the target results of the estimation.

2․ The second group contains scores:

        float score;          //!< fish eye effect score

The scores group contains the estimation score.

Recommended thresholds: 

Table below contains threshold from faceengine configuration file (faceengine.conf) in FishEyeEstimator::Settings section. By default, this threshold value is set to optimal.

"Fish Eye estimator recommended threshold"

Threshold Recommended value
fishEyeThreshold 0.5

Recommended scenarios of algorithm usage:

Data domain: Cooperative mode only. It is means:

  • High image quality;
  • Frontal face looking directly at the camera.

Filtration parameters:

The estimator is trained to work with face images that meet the following requirements:

"Requirements for fsdk::HeadPoseEstimation"

Attribute Acceptable angle range(degrees)
pitch [-8...8]
yaw [-8...8]
roll [-8...8]

"Requirements for fsdk::Detection"

Attribute Minimum value
detection size 80

Detection size is detection width.

const fsdk::Detection detection = ... // somehow get fsdk::Detection object
const int detectionSize = detection.getRect().width;

Configurations:

See the "Fish Eye Estimator settings" section in the "ConfigurationGuide.pdf" document.

API structure name:

IFishEyeEstimator

Plan files:

  • fisheye_v2_cpu.plan
  • fisheye_v2_cpu-avx2.plan
  • fisheye_v2_gpu.plan

Eyebrows Estimation#

Name: EyeBrowEstimator

Algorithm description:

This estimator is trained to estimate eyebrow expressions. The EyeBrowEstimator returning four scores for each possible eyebrow expression. Which are - neutral, raised, squinting, frowning. Possible scores are in the range [0, 1].

If score closer to 1, it means that detected expression on image is more likely to real expression and closer to 0 otherwise.

Along with the output score value estimator also returns an enum value (EyeBrowState). The index of the maximum score determines the EyeBrow state.

Implementation description:

The estimator (see IEyeBrowEstimator in IEyeBrowEstimator.h):

  • Implements the estimate() function accepts warped source image. Warped image is received from the warper (see IWarper::warp()); Output estimation is a structure fsdk::EyeBrowEstimation.

  • Implements the estimate() function that needs the span of warped source images and span of structure fsdk::EyeBrowEstimation. Output estimation is a span of structure fsdk::EyeBrowEstimation.

The EyeBrowEstimation structure contains results of the estimation:

struct EyeBrowEstimation {
        /**
         * @brief EyeBrow estimator output enum.
         * This enum contains all possible estimation results.
        **/
        enum class EyeBrowState {
            Neutral = 0,
            Raised,
            Squinting,
            Frowning
        };


        float neutralScore;        //!< 0(not neutral)..1(neutral).
        float raisedScore;         //!< 0(not raised)..1(raised).
        float squintingScore;      //!< 0(not squinting)..1(squinting).
        float frowningScore;       //!< 0(not frowning)..1(frowning).
        EyeBrowState eyeBrowState; //!< EyeBrow state
    };

Filtration parameters:

"Requirements for fsdk::EyeBrowEstimation"

Attribute Acceptable values
headPose.pitch [-20...20]
headPose.yaw [-20...20]
headPose.roll [-20...20]

"Requirements for fsdk::Detection"

Attribute Minimum value
detection size 80

Detection size is detection width.

const fsdk::Detection detection = ... // somehow get fsdk::Detection object
const int detectionSize = detection.getRect().width;

API structure name:

IEyeBrowEstimator

Plan files:

  • eyebrow_estimation_v2_cpu.plan
  • eyebrow_estimation_v2_cpu-avx2.plan
  • eyebrow_estimation_v2_gpu.plan

Portrait Style Estimation#

Name: PortraitStyleEstimator

Algorithm description:

This estimator is designed to estimate the position of a person's shoulders in the original image. It can return the following results:

  • The shoulders are not parallel to the camera (see the PortraitStyleStatus::NonPortrait field in the PortraitStyleStatus enum);
  • Shoulders are parallel to the camera (see the PortraitStyleStatus::Portrait field in the PortraitStyleStatus enum);
  • Shoulders are hidden (see the PortraitStyleStatus::HiddenShoulders field in the PortraitStyleStatus enum);

Implementation description:

The Estimator (see IPortraitStyleEstimator in IPortraitStyleEstimator.h):

  • Implements estimate() function that accepts R8G8B8 source image, detection and PortraitStyleEstimation structure to return estimation results;

  • Implements an estimate() function that accepts fsdk::Span of R8G8B8 source images, fsdk::Span of detections, and fsdk::Span of PortraitStyleEstimation structures to return estimation results.

The PortraitStyleStatus enumeration contains all possible results of the PortraitStyle estimation:

    enum class PortraitStyleStatus : uint8_t {
        NonPortrait = 0,       //!< NonPortrait
        Portrait = 1,          //!< Portrait
        HiddenShoulders = 2    //!< HiddenShoulders
    };

The PortraitStyleEstimation structure contains results of the estimation:

    struct PortraitStyleEstimation {
        PortraitStyleStatus status; //!< estimation result (@see PortraitStyleStatus enum).
        float nonPortraitScore;             //!< numerical value in range [0, 1]
        float portraitScore;                //!< numerical value in range [0, 1]
        float hiddenShouldersScore;         //!< numerical value in range [0, 1]
    };

There are two groups of the fields:

1․ The first group contains the enum:

        PortraitStyleStatus status; //!< estimation result (@see PortraitStyleStatus enum).

Result enum field PortraitStyleStatus contain the target results of the estimation.

2․ The second group contains score:

        float nonPortraitScore;             //!< numerical value in range [0, 1]
        float portraitScore;                //!< numerical value in range [0, 1]
        float hiddenShouldersScore;         //!< numerical value in range [0, 1]

The scores are defined in [0,1] range.

Recommended thresholds: 

Table below contains threshold from faceengine configuration file (faceengine.conf) in PortraitStyleEstimator::Settings section. By default, this threshold value is set to optimal.

"Portrait Style estimator recommended threshold"

Threshold Recommended value
notPortraitStyleThreshold 0.2
portraitStyleThreshold 0.35
hiddenShouldersThreshold 0.2

Filtration parameters:

The estimator is trained to work with face images that meet the following requirements:

Type of preferable detector is FaceDetV3.

"Requirements for Detector"

Attribute Min face size
result 40

"Requirements for fsdk::HeadPoseEstimation"

Attribute Maximum value
yaw 20.0
pitch 20.0
roll 20.0

Configurations:

See the "Portrait Style Estimator settings" section in the "ConfigurationGuide.pdf" document.

API structure name:

IPortraitStyleEstimator

Plan files:

  • portrait_style_v3_cpu.plan
  • portrait_style_v3_cpu-avx2.plan
  • portrait_style_v3_gpu.plan

DynamicRange Estimation#

Name: DynamicRangeEstimator

Algorithm description:

This estimator is designed to estimate dynamic range of an original image with person's face.

Implementation description:

The Estimator (see IDynamicRangeEstimator in IDynamicRangeEstimator.h):

  • Implements estimate() function that accepts R8G8B8 source image, detection and DynamicRangeEstimation structure to return estimation results;

  • Implements an estimate() function that accepts fsdk::Span of R8G8B8 source images, fsdk::Span of detections, and fsdk::Span of DynamicRangeEstimation structures to return estimation results.

The DynamicRangeEstimation structure contains results of the estimation:

    struct DynamicRangeEstimation {
        float dynamicRangeScore;             //!< numerical value in range [0, 1]
    };

Result estimation DynamicRangeEstimation contains the target score.

        float dynamicRangeScore;             //!< numerical value in range [0, 1]
The score is defined in [0,1] range.

Recommended thresholds: 

Table below contains recommended user's threshold.

"Dynamic Range estimator recommended threshold"

Threshold Recommended value
threshold 0.5

API structure name:

IDynamicRangeEstimator

Plan files:

DynamicRangeEstimator does not use any additional models (plans, files and etc.), this is an ISO-based algorithm that is currently only implemented on CPU devices.

Headwear Estimation#

Name: HeadWearEstimator

Algorithm description:

This estimator aims to detect a headwear status and headwear type on the face in the source image. It can return the next headwear status results:

  • There is headwear (see HeadWearState::Yes field in the HeadWearState enum);
  • There is no headwear (see HeadWearState::No field in the HeadWearState enum);

And this headwear type results:

  • There is no headwear on the head (see HeadWearType::NoHeadWear field in the HeadWearType enum);
  • There is baseball cap on the head (see HeadWearType::BaseballCap field in the HeadWearType enum);
  • There is beanie on the head (see HeadWearType::Beanie field in the HeadWearType enum);
  • There is peaked cap on the head (see HeadWearType::PeakedCap field in the HeadWearType enum);
  • There is shawl on the head (see HeadWearType::Shawl field in the HeadWearType enum);
  • There is hat with ear flaps on the head (see HeadWearType::HatWithEarFlaps field in the HeadWearType enum);
  • There is helmet on the head (see HeadWearType::Helmet field in the HeadWearType enum);
  • There is hood on the head (see HeadWearType::Hood field in the HeadWearType enum);
  • There is hat on the head (see HeadWearType::Hat field in the HeadWearType enum);
  • There is something other on the head (see HeadWearType::Other field in the HeadWearType enum);

Implementation description:

The estimator (see IHeadWearEstimator in IHeadWearEstimator.h):

  • Implements the estimate() function that accepts warped image in R8G8B8 format and HeadWearEstimation structure to return results of estimation;

  • Implements the estimate() function that accepts fsdk::Span of the source warped images in R8G8B8 format and fsdk::Span of the HeadWearEstimation structures to return results of estimation.

The HeadWearState enumeration contains all possible results of the Headwear state estimation:

    enum class HeadWearState {
        Yes = 0,           //< there is headwear
        No,                //< there is no headwear
        Count
    };

The HeadWearType enumeration contains all possible results of the Headwear type estimation:

    enum class HeadWearType : uint8_t {
        NoHeadWear = 0,     //< there is no headwear on the head
        BaseballCap,        //< there is baseball cap on the head
        Beanie,             //< there is beanie on the head
        PeakedCap,          //< there is peaked cap on the head
        Shawl,              //< there is shawl on the head
        HatWithEarFlaps,    //< there is hat with ear flaps on the head
        Helmet,             //< there is helmet on the head
        Hood,               //< there is hood on the head
        Hat,                //< there is hat on the head
        Other,              //< something other is on the head
        Count
    };

The HeadWearStateEstimation structure contains results of the Headwear state estimation:

    struct HeadWearStateEstimation {
        HeadWearState result; //!< estimation result (@see HeadWearState enum)
        float scores[static_cast<int>(HeadWearState::Count)]; //!< estimation scores

        /**
         * @brief Returns score of required headwear state.
         * @param [in] state headwear state.
         * @see HeadWearState for more info.
         * */
        inline float getScore(HeadWearState state) const;
    };

There are two groups of the fields:

1․ The first group contains only the result enum:

        HeadWearState result; //!< estimation result (@see HeadWearState enum)

2․ The second group contains scores:

        float scores[static_cast<int>(HeadWearState::Count)]; //!< estimation scores

The HeadWearTypeEstimation structure contains results of the Headwear type estimation:

    struct HeadWearTypeEstimation {
        HeadWearType result; //!< estimation result (@see HeadWearType enum)
        float scores[static_cast<int>(HeadWearType::Count)]; //!< estimation scores

        /**
         * @brief Returns score of required headwear type.
         * @param [in] type headwear type.
         * @see HeadWearType for more info.
         * */
        inline float getScore(HeadWearType type) const;
    };

There are two groups of the fields:

1․ The first group contains only the result enum:

        HeadWearType result; //!< estimation result (@see HeadWearType enum)

2․ The second group contains scores:

        float scores[static_cast<int>(HeadWearType::Count)]; //!< estimation scores

The HeadWearEstimation structure contains results of both Headwear state and type estimations:

    struct HeadWearEstimation {
        HeadWearStateEstimation state;  //!< headwear state estimation 
                                        //!< (@see HeadWearStateEstimation)
        HeadWearTypeEstimation type;    //!< headwear type estimation 
                                        //!< (@see HeadWearTypeEstimation)
    };

The scores group contains the estimation scores for each possible result of the estimation. All scores are defined in [0,1] range. Sum of scores always equals 1.

Filtration parameters:

"Requirements for fsdk::Detection"

Attribute Minimum value
detection size 80

Note. Detection size is detection width.

const fsdk::Detection detection = ... // somehow get fsdk::Detection object
const int detectionSize = detection.getRect().width;

API structure name:

IHeadWearEstimator

Plan files:

  • head_wear_v2_cpu.plan
  • head_wear_v2_cpu-avx2.plan
  • head_wear_v2_gpu.plan

Background Estimation#

Name: BackgroundEstimator

Algorithm description:

This estimator is designed to estimate the background in the original image. It can return the following results:

  • The background is non-solid (see the BackgroundStatus::NonSolid field in the BackgroundStatus enum);
  • The background is solid (see the BackgroundStatus::Solid field in the BackgroundStatus enum);

Implementation description:

The estimator (see IBackgroundEstimator in IBackgroundEstimator.h):

  • Implements an estimate() function that accepts R8G8B8 source image, detection and BackgroundEstimation structure to return estimation results;

  • Implements an estimate() function that accepts fsdk::Span of R8G8B8 source images, fsdk::Span of detections, and fsdk::Span of BackgroundEstimation structures to return estimation results.

The BackgroundStatus enumeration contains all possible results of the Background estimation:

    enum class BackgroundStatus : uint8_t {
        NonSolid = 0,     //!< NonSolid
        Solid = 1         //!< Solid
    };

The BackgroundEstimation structure contains results of the estimation:

    struct BackgroundEstimation {
        BackgroundStatus status;    //!< estimation result (@see BackgroundStatus enum).
        float backgroundScore;      //!< numerical value in range [0, 1], where 1 - is uniform background, 0 - is non uniform.
        float backgroundColorScore; //!< numerical value in range [0, 1], where 1 - is light background, 0 - is too dark.
    };

There are two groups of the fields:

1․ The first group contains the enum:

        BackgroundStatus status;    //!< estimation result (@see BackgroundStatus enum).

Result enum field BackgroundStatus contain the target results of the estimation.

2․ The second group contains scores:

        float backgroundScore;      //!< numerical value in range [0, 1], where 1 - is solid background, 0 - is non solid.
        float backgroundColorScore; //!< numerical value in range [0, 1], where 1 - is light background, 0 - is too dark.

The scores are defined in the [0,1] range. If two scores are above the threshold, then the background is solid, otherwise the background is not solid.

Recommended thresholds:

The table below contains thresholds specified in BackgroundEstimator::Settings section of the FaceEngine configuration file (faceengine.conf). By default, these threshold values are set to optimal.

"Background estimator recommended thresholds"

Threshold Recommended value
backgroundThreshold 0.5
backgroundColorThreshold 0.3

Filtration parameters:

The estimator is trained to work with face images that meet the following requirements: The face in a frame should be large in relation to frame sizes. The face should occupy about half of the frame area.

max(frameWidth, frameHeight) / max(faceWidth, faceHeight) <= 2.0

The type of preferable detector is FaceDetV3.

"Requirements for Detector"

Attribute Min face size
result 40

"Requirements for fsdk::HeadPoseEstimation"

Attribute Maximum value
yaw 20.0
pitch 20.0
roll 20.0

Configurations:

See the "Background Estimator settings" section in the "ConfigurationGuide.pdf" document.

API structure name:

IBackgroundEstimator

Plan files:

  • background_v2_cpu.plan
  • background_v2_cpu-avx2.plan
  • background_v2_gpu.plan

Grayscale, color or infrared Estimation#

Name: BlackWhiteEstimator

Algorithm description:

BlackWhite estimator has two interfaces.

The "By full frame" interface detects if an input image is grayscale or color. It is indifferent to image content and dimensions; you can pass both face crops (including warped images) and full frames.

The "By warped frame" interface can be used only with warped images (see chapter "Image warping" for details). Checks if an image is color, grayscale or infrared.

Implementation description:

The "By full frame" interface of estimator (see ImageColorEstimation in IBlackWhiteEstimator.h):

  • Implements estimate() function that accepts source image and outputs a boolean, indicating if the image is grayscale (true) or not (false).

The "By warped frame" interface of estimator (see IBlackWhiteEstimator in IBlackWhiteEstimator.h):

  • Implements the estimate() function that accepts warped source image.

  • Outputs ImageColorEstimation structures.

    struct ImageColorEstimation {

        float colorScore;       //!< 0(grayscale)..1(color);
        float infraredScore;    //!< 0(infrared)..1(not infrared);

        /**
         * @brief Enumeration of possible image color types.
         * */
        enum class ImageColorType : uint8_t {
            Color = 0,     //!< image is color.
            Grayscale,     //!< Image is grayscale.
            Infrared,      //!< Image is infrared.
        };

        ImageColorType colorType;
    };

ImageColorEstimation::ImageColorType presents color image type as enum with possible values: Color, Grayscale, Infrared.

- For color image score `colorScore` will be close to 1.0 and the second one `infraredScore` - to 0.0;
- for infrared image score `colorScore` will be close to 0.0 and the second one `infraredScore` - to 1.0;
- for grayscale images both of scores will be near 0.0.

Both interfaces use different principles of color type estimation.

BlackWhite estimator is trained to work with real warped photo of faces. We do not guarantee correctness when the people in the photo are fake (not real, such as the photo in the photo).

Recommended thresholds: 

Table below contains threshold from faceengine configuration file (faceengine.conf) in BlackWhiteEstimator::Settings section. By default, these threshold values are set to optimal.

"Black and white estimator recommended thresholds"

Threshold Recommended value
colorThreshold 0.5
irThreshold 0.5

Configurations:

See the "BlackWhite Estimator settings" section in the "ConfigurationGuide.pdf" document.

API structure name:

IBlackWhiteEstimator

Plan files:

  • black_white_and_ir_v1_cpu.plan
  • black_white_and_ir_v1_cpu-avx2.plan
  • black_white_and_ir_v1_gpu.plan

Face features extraction functionality#

Eyes Estimation#

Name: EyeEstimator

Algorithm description:

The estimator is trained to work with warped images (see chapter "Image warping" for details).

For this type of estimator can be defined sensor type.

This estimator aims to determine:

  • Eye state: Open, Closed, Occluded;
  • Precise eye iris location as an array of landmarks;
  • Precise eyelid location as an array of landmarks.

You can only pass warped image with detected face to the estimator interface. Better image quality leads to better results.

Eye state classifier supports three categories: "Open", "Closed", "Occluded". Poor quality images or ones that depict obscured eyes (think eyewear, hair, gestures) fall into the "Occluded" category. It is always a good idea to check eye state before using the segmentation result.

The precise location allows iris and eyelid segmentation. The estimator is capable of outputting iris and eyelid shapes as an array of points together forming an ellipsis. You should only use segmentation results if the state of that eye is "Open".

Implementation description:

The estimator:

  • Implements the estimate() function that accepts warped source image and warped landmarks, either of type Landmarks5 or Landmarks68. The warped image and landmarks are received from the warper (see IWarper::warp());

  • Classifies eyes state and detects its iris and eyelid landmarks;

  • Outputs EyesEstimation structures.

Orientation terms 'left' and 'right' refer to the way you see the image as it is shown on the screen. It means that left eye is not necessarily left from the person's point of view, but is on the left side of the screen. Consequently, right eye is the one on the right side of the screen. More formally, the label 'left' refers to subject's left eye (and similarly for the right eye), such that xright < xleft.

EyesEstimation::EyeAttributes presents eye state as enum EyeState with possible values: Open, Closed, Occluded.

Iris landmarks are presented with a template structure Landmarks that is specialized for 32 points.

Eyelid landmarks are presented with a template structure Landmarks that is specialized for 6 points.

The EyesEstimation structure contains results of the estimation:

    struct EyesEstimation {
        /**
         * @brief Eyes attribute structure.
         * */
        struct EyeAttributes {
            /**
             * @brief Enumeration of possible eye states.
             * */
            enum class State : uint8_t {
                Closed,     //!< Eye is closed.
                Open,       //!< Eye is open.
                Occluded    //!< Eye is blocked by something not transparent, or landmark passed to estimator doesn't point to an eye.
            };

            static constexpr uint64_t irisLandmarksCount = 32; //!< Iris landmarks amount.
            static constexpr uint64_t eyelidLandmarksCount = 6; //!< Eyelid landmarks amount.

            /// @brief alias for @see Landmarks template structure with irisLandmarksCount as param.
            using IrisLandmarks = Landmarks<irisLandmarksCount>;

            /// @brief alias for @see Landmarks template structure with eyelidLandmarksCount as param
            using EyelidLandmarks = Landmarks<eyelidLandmarksCount>;

            State state; //!< State of an eye.

            IrisLandmarks iris; //!< Iris landmarks.
            EyelidLandmarks eyelid; //!< Eyelid landmarks
        };

        EyeAttributes leftEye;  //!< Left eye attributes
        EyeAttributes rightEye; //!< Right eye attributes
    };

API structure name:

IEyeEstimator

Plan files:

  • eyes_estimation_flwr8_cpu.plan
  • eyes_estimation_ir_cpu.plan
  • eye_status_estimation_flwr_cpu.plan
  • eyes_estimation_flwr8_cpu-avx2.plan
  • eyes_estimation_ir_cpu-avx2.plan
  • eyes_estimation_ir_gpu.plan
  • eyes_estimation_flwr8_gpu.plan
  • eye_status_estimation_flwr_cpu.plan
  • eye_status_estimation_flwr_cpu-avx2.plan
  • eye_status_estimation_flwr_gpu.plan

Red Eyes Estimation#

Name: RedEyeEstimator

Algorithm description:

The estimator is trained to work with warped images (see chapter "Image warping" for details) and warped landmarks.

Red Eye estimator evaluates whether a person's eyes are red in a photo or not.

You can pass only warped images with detected faces to the estimator interface. Better image quality leads to better results.

Implementation description:

The estimator (see IRedEyeEstimator in IEstimator.h):

  • Implements the estimate() function that accepts warped source image in R8G8B8 format and warped Landmarks5. The warped image and landmarks are received from the warper (see IWarper::warp());.

  • Implements the estimate() function that accepts fsdk::Span of the source warped images in R8G8B8 format and fsdk::Span of warped Landmarks.

  • Outputs RedEyeEstimation structure.

RedEyeEstimation structure consists of attributes for each eye. Eye attributes consists of a score of and status. Scores is normalized float value in a range of [0..1] where 1 is red eye and 0 is not.

The RedEyeEstimation structure contains results of the estimation:

    struct RedEyeEstimation {
        /**
         * @brief Eyes attribute structure.
         * */
        struct RedEyeAttributes {
            RedEyeStatus status;    //!< Status of an eye.
            float score;            //!< Score, numerical value in range [0,1].
        };

        RedEyeAttributes leftEye;  //!< Left eye attributes
        RedEyeAttributes rightEye; //!< Right eye attributes
    };

There are two groups of the fields in RedEyeAttributes:

1․ The first field is a status:

        RedEyeStatus status;    //!< Status of an eye.

2․ The second field is a score, which defined in [0,1] range:

        float score;       //!< Score, numerical value in range [0, 1].

Enumeration of possible red eye statuses.

    enum class RedEyeStatus : uint8_t {
        NonRed,     //!< Eye is not red.
        Red,        //!< Eye is red.
    };

Recommended thresholds: 

Table below contains threshold from faceengine configuration file (faceengine.conf) in RedEyeEstimator::Settings section. By default, this threshold value is set to optimal.

"Red eye estimator recommended threshold"

Threshold Recommended value
redEyeThreshold 0.5

Filtration parameters:

The estimator is trained to work with face images that meet the following requirements:

"Requirements for fsdk::NaturalLight"

Attribute Minimum value
score 0.5

"Requirements for fsdk::SubjectiveQuality"

Attribute Minimum value
blur 0.61
light 0.57
darkness 0.5
illumination 0.1
specularity 0.1

Also fsdk::GlassesEstimation must not be equal to fsdk::GlassesEstimation::SunGlasses.

Configurations:

See the "RedEyeEstimator settings" section in the "ConfigurationGuide.pdf" document.

API structure name:

IRedEyeEstimator

Plan files:

  • red_eye_v1_cpu.plan
  • red_eye_v1_cpu-avx2.plan
  • red_eye_v1_gpu.plan

Gaze Estimation#

Name: GazeEstimator

Algorithm description:

This estimator is designed to determine gaze direction relatively to head pose estimation. Since 3D head translation is hard to determine reliably without camera-specific calibration, only 3D rotation component is estimated.

For this type of estimator can be defined sensor type.

Estimation characteristics:

  • Units (degrees);
  • Notation (Euler angles);
  • Accuracy (see table below).

Roll angle is not estimated, prediction accuracy decreases as a rotation angle increases. We present typical average errors for different angle ranges in the table below.

Implementation description:

The GazeEstimation structure contains results of the estimation. Each angle is measured in degrees and in [-180, 180] range:

    struct GazeEstimation {
        float yaw;      //!< Eye yaw angle.
        float pitch;    //!< Eye pitch angle.
    };

Metrics:

Table below contains gaze prediction accuracy values.

"Gaze prediction accuracy"

Range -25°...+25° -25° ... -45 ° or 25 ° ... +45°
Average prediction error (per axis) Yaw ±2.7° ±4.6°
Average prediction error (per axis) Pitch ±3.0° ±4.8°

Zero position corresponds to a gaze direction orthogonally to face plane, with the axis of symmetry parallel to the vertical camera axis.

API structure name:

IGazeEstimator

Plan files:

  • gaze_v2_cpu.plan
  • gaze_v2_cpu-avx2.plan
  • gaze_v2_gpu.plan
  • gaze_ir_v2_cpu.plan
  • gaze_ir_v2_cpu-avx2.plan
  • gaze_ir_v2_gpu.plan

Head Pose Estimation#

This estimator is designed to determine a camera-space head pose. Since the 3D head translation is hard to reliably determine without a camera-specific calibration, only the 3D rotation component is estimated.

There are two head pose estimation methods available:

  • Estimate by 68 face-aligned landmarks. You can get it from the Detector facility, see Chapter "Face detection facility" for details.
  • Estimate by the original input image in the RGB format.

An estimation by the image is more precise. If you have already extracted 68 landmarks for another facilities, you can save time and use the fast estimator from 68 landmarks.

By default, all methods are available to use in the faceengine.conf configuration file in section "HeadPoseEstimator". You can disable these methods to decrease RAM usage and initialization time.

Estimation characteristics:

  • Units (degrees)
  • Notation (Euler angles)
  • Precision (see table \ref{5.6})

Note: Prediction precision decreases as a rotation angle increases. We present typical average errors for different angle ranges in the table \ref{5.6}.

"Head pose prediction precision" \label{5.6}

Range -45°...+45° < -45° or > +45°
Average prediction error (per axis) Yaw ±2.7° ±4.6°
Average prediction error (per axis) Pitch ±3.0° ±4.8°
Average prediction error (per axis) Roll ±3.0° ±4.6°

Zero position corresponds to a face placed orthogonally to the camera direction, with the axis of symmetry parallel to the vertical camera axis. See figure \ref{fig:Head111} for a reference.

Head pose illustration \label{fig:Head111}
Head pose illustration \label{fig:Head111}

Note: In order to work, this estimator requires precise 68-point face alignment results, so familiarize with section "Face alignment" in the "Face detection facility" chapter, as well.

Approximate Garbage Score Estimation (AGS)#

This estimator aims to determine the source image score for further descriptor extraction and matching. The higher the score, the better matching result is received for the image.

When you have several images of a person, it is better to save the image with the highest AGS score.

Contact VisionLabs for the recommended threshold value for this parameter.

The estimator (see IAGSEstimator in IEstimator.h):

  • Implements the estimate() function that accepts the source image in the R8G8B8 format and the fsdk::Detection structure of corresponding source image. For details, see section "Detection structure" in chapter "Face detection facility".
  • Estimates garbage score of the input image.
  • Outputs a garbage score value.

Glasses Estimation#

Name: GlassesEstimator

Algorithm description:

Glasses estimator is designed to determine whether a person is currently wearing any glasses or not. There are 3 types of states the estimator is currently able to estimate:

  • NoGlasses - Determines whether a person is wearing any glasses at all.
  • EyeGlasses - Determines whether a person is wearing eyeglasses.
  • SunGlasses - Determines whether a person is wearing sunglasses.

Note: The source input image must be warped for the estimator to work properly (see chapter "Image warping" for details). Estimation quality depends on threshold values located in the faceengine.conf configuration file.

Implementation description:

Enumeration of possible glasses estimation statuses:

    enum class GlassesEstimation: uint8_t{
        NoGlasses,      //!< Person is not wearing glasses
        EyeGlasses,     //!< Person is wearing eyeglasses
        SunGlasses,     //!< Person is wearing sunglasses
        EstimationError //!< failed to estimate
    };

Recommended thresholds:

The table below contains thresholds specified in GlassesEstimator::Settings section of the FaceEngine configuration file (faceengine.conf). By default, these threshold values are set to optimal.

"Glasses estimator recommended thresholds"

Threshold Recommended value
noGlassesThreshold 1
eyeGlassesThreshold 1
sunGlassesThreshold 1

Configurations:

See the "GlassesEstimator settings" section in the "ConfigurationGuide.pdf" document.

Metrics:

The table below contains true positive rates corresponding to the selected false positive rates.

"Glasses estimator TPR/FPR rates"

State TPR FPR
NoGlasses 0.997 0.00234
EyeGlasses 0.9768 0.000783
SunGlasses 0.9712 0.000383

API structure name:

IGlassesEstimator

Plan files:

  • glasses_estimation_v2_cpu.plan
  • glasses_estimation_v2_cpu-avx2.plan
  • glasses_estimation_v2_gpu.plan

Overlap Estimation#

Name: OverlapEstimator

Algorithm description:

This estimator tells whether the face is overlapped by any object. It returns a structure with value of overlapping and Boolean answer. It returns a structure with 2 fields. One is the value of overlapping in the range [0..1] where 0 is not overlapped and 1.0 is overlapped, the second is a Boolean answer. A Boolean answer depends on the threshold listed below. If the value is greater than the threshold, the answer returns true, else false.

Implementation description:

The estimator (see IOverlapEstimator in IOverlapEstimator.h):

  • Implements the estimate() function that accepts source image in R8G8B8 format and fsdk::Detection structure of corresponding source image (see section "Detection structure");

  • Estimates whether the face is overlapped by any object on input image;

  • Outputs structure with value of overlapping and Boolean answer.

The OverlapEstimation structure contains results of the estimation:

    struct OverlapEstimation {
        float overlapValue; //!< Numerical value of face overlapping in range [0, 1].
        bool overlapped;    //!< Overlapped face (true) or not (false).
    };

Recommended thresholds: 

Table below contains threshold from faceengine configuration file (faceengine.conf) in OverlapEstimator::Settings section. By default, this threshold value is set to optimal.

"Overlap estimator recommended threshold"

Threshold Recommended value
overlapThreshold 0.01

Configurations:

See the "OverlapEstimator settings" section in the "ConfigurationGuide.pdf" document.

API structure name:

IOverlapEstimator

Plan files:

  • overlap_estimation_v1_cpu.plan
  • overlap_estimation_v1_cpu-avx2.plan
  • overlap_estimation_v1_gpu.plan

Emotion estimation functionality#

Emotions Estimation#

Name: EmotionsEstimator

Algorithm description:

The estimator is trained to work with warped images (see chapter "Image warping" for details).

This estimator aims to determine whether a face depicted on an image expresses the following emotions:

  • Anger
  • Disgust
  • Fear
  • Happiness
  • Surprise
  • Sadness
  • Neutrality

You can pass only warped images with detected faces to the estimator interface. Better image quality leads to better results.

Implementation description:

The estimator (see IEmotionsEstimator in IEmotionsEstimator.h):

  • Implements the estimate() function that accepts warped source image. Warped image is received from the warper (see IWarper::warp());

  • Estimates emotions expressed by the person on a given image;

  • Outputs EmotionsEstimation structure with aforementioned data.

EmotionsEstimation presents emotions as normalized float values in the range of [0..1] where 0 is lack of a specific emotion and 1 is the maximum intensity of an emotion.

The EmotionsEstimation structure contains results of the estimation:

    struct EmotionsEstimation {
        float anger;    //!< 0(not angry)..1(angry);
        float disgust;  //!< 0(not disgusted)..1(disgusted);
        float fear;     //!< 0(no fear)..1(fear);
        float happiness;//!< 0(not happy)..1(happy);
        float sadness;  //!< 0(not sad)..1(sad);
        float surprise; //!< 0(not surprised)..1(surprised);
        float neutral;  //!< 0(not neutral)..1(neutral).

        enum Emotions {
            Anger = 0,
            Disgust,
            Fear,
            Happiness,
            Sadness,
            Surprise,
            Neutral,
            Count
        };

        /**
         * @brief Returns emotion with greatest score
         * */
        inline Emotions getPredominantEmotion() const;

        /**
         * @brief Returns score of required emotion
         * @param [in] emotion emotion
         * @see Emotions for details.
         * */
        inline float getEmotionScore(Emotions emotion) const;
    };

API structure name:

IEmotionsEstimator

Plan files:

  • emotion_recognition_v2_cpu.plan
  • emotion_recognition_v2_cpu-avx2.plan
  • emotion_recognition_v2_gpu.plan

Mouth Estimation Functionality#

Name: MouthEstimator

Algorithm description:

This estimator is designed to predict person's mouth state.

Implementation description:

Mouth Estimation

It returns the following bool flags:

    bool isOpened;   //!< Mouth is opened flag
    bool isSmiling;  //!< Person is smiling flag
    bool isOccluded; //!< Mouth is occluded flag

Each of these flags indicate specific mouth state that was predicted.

The combined mouth state is assumed if multiple flags are set to true. For example there are many cases where person is smiling and its mouth is wide open.

Mouth estimator provides score probabilities for mouth states in case user need more detailed information:

    float opened;    //!< mouth opened score
    float smile;     //!< person is smiling score
    float occluded;  //!< mouth is occluded score

Mouth Estimation Extended

This estimation is extended version of regular Mouth Estimation (see above). In addition, It returns the following fields:

    SmileTypeScores smileTypeScores; //!< Smile types scores
    SmileType smileType; //!< Contains smile type if person "isSmiling"

If flag isSmiling is true, you can get more detailed information of smile using smileType variable. smileType can hold following states:

    enum class SmileType {
        None,  //!< No smile
        SmileLips, //!< regular smile, without teeths exposed
        SmileOpen //!< smile with teeths exposed
    };

If isSmiling is false, the smileType assigned to None. Otherwise, the field will be assigned with SmileLips (person is smiling with closed mouth) or SmileOpen (person is smiling with open mouth, with teeth's exposed).

Extended mouth estimation provides score probabilities for smile type in case user need more detailed information:

    struct SmileTypeScores {
        float smileLips; //!< person is smiling with lips score
        float smileOpen; //!< person is smiling with open mouth score
    };

smileType variable is set based on according scores hold by smileTypeScores variable - set based on maximum score from smileLips and smileOpen or to None if person not smiling at all.

    if (estimation.isSmiling)
        estimation.smileType = estimation.smileTypeScores.smileLips > estimation.smileTypeScores.smileOpen ? 
            fsdk::SmileType::SmileLips : fsdk::SmileType::SmileOpen;
    else
        estimation.smileType = fsdk::SmileType::None;

When you use Mouth Estimation Extended, the underlying computation are exactly the same as if you use regular Mouth Estimation. The regular Mouth Estimation was retained for backward compatibility.

These estimators are trained to work with warped images (see Chapter "Image warping" for details).

Recommended thresholds:

The table below contains thresholds specified in the MouthEstimator::Settings section of the FaceEngine configuration file (faceengine.conf). By default, these threshold values are set to optimal.

"Mouth estimator recommended thresholds"

Threshold Recommended value
occlusionThreshold 0.5
smileThreshold 0.5
openThreshold 0.5

Filtration parameters:

The estimator is trained to work with face images that meet the following requirements:

  • Requirements for Detector:
Attribute Minimum value
detection size 80

Detection size is detection width.

const fsdk::Detection detection = ... // somehow get fsdk::Detection object
const int detectionSize = detection.getRect().width;
  • Requirements for fsdk::MouthEstimator:
Attribute Acceptable values
headPose.pitch [-20...20]
headPose.yaw [-25...25]
headPose.roll [-10...10]

Configurations:

See the "Mouth Estimator settings" section in the "ConfigurationGuide.pdf" document.

API structure name:

IMouthEstimator

Plan files:

  • mouth_estimation_v4_arm.plan
  • mouth_estimation_v4_cpu.plan
  • mouth_estimation_v4_cpu-avx2.plan
  • mouth_estimation_v4_gpu.plan

Face Occlusion Estimation Functionality#

Name: FaceOcclusionEstimator

Algorithm description:

This estimator is designed to predict occlusions in different parts of the face, such as the forehead, eyes, nose, mouth, and lower face. It also provides an overall occlusion score.

Implementation description:

Face Occlusion Estimation

The estimator returns the following occlusion states:

/**
 * @brief FaceOcclusionType enum.
 * This enum contains all possible facial occlusion types.
 * */
enum class FaceOcclusionType : uint8_t {
    Forehead = 0, //!< Forehead
    LeftEye,      //!< Left eye
    RightEye,     //!< Right eye
    Nose,         //!< Nose
    Mouth,        //!< Mouth
    LowerFace,    //!< Lower part of the face (chin, mouth, etc.)
    Count         //!< Total number of occlusion types
};

/**
 * @brief FaceOcclusionState enum.
 * This enum contains all possible facial occlusion states.
 * */
enum class FaceOcclusionState : uint8_t {
    NotOccluded = 0, //!< Face is not occluded
    Occluded,        //!< Face is occluded
    Count            //!< Total number of states
};


FaceOcclusionState states[static_cast<uint8_t>(FaceOcclusionType::Count)]; //!< Occlusion states for each face region
float typeScores[static_cast<uint8_t>(FaceOcclusionType::Count)]; //!< Probability scores for occlusion types
FaceOcclusionState overallOcclusionState; //!< Overall occlusion state
float overallOcclusionScore;              //!< Overall occlusion score
float hairOcclusionScore;                 //!< Hair occlusion score

To get the occlusion score for a specific facial zone, you can use the following method:

float getScore(FaceOcclusionType type) const {
    return typeScores[static_cast<uint8_t>(type)];
}

To get the occlusion state for a specific facial zone, use the following:

FaceOcclusionState getState(FaceOcclusionType type) const {
    return states[static_cast<uint8_t>(type)];
}

This estimator is trained to work with warped images and Landmarks5 (see Chapter "Image warping" for details).

Recommended thresholds:

The table below contains thresholds specified in the FaceOcclusion::Settings section of the FaceEngine configuration file (faceengine.conf). These values are optimal by default.

Threshold Recommended value
normalHairCoeff 0.15
overallOcclusionThreshold 0.07
foreheadThreshold 0.2
eyeThreshold 0.15
noseThreshold 0.2
mouthThreshold 0.15
lowerFaceThreshold 0.2

Configurations

See the "Face Occlusion Estimator settings" section in the "ConfigurationGuide.pdf" document.

Filtration parameters:

Name Threshold
Face Size >80px
Yaw, Pitch, Roll ±20
Blur (Subjective Quality) >0.61

API structure name:

IFaceOcclusionEstimator

Plan files:

  • face_occlusion_v1_arm.plan
  • face_occlusion_v1_cpu.plan
  • face_occlusion_v1_cpu-avx2.plan
  • face_occlusion_v1_gpu.plan

DeepFake estimation functionality#

Name: DeepFakeEstimator

Algorithm description:

This estimator is designed to predict whether the face detected in the input image is synthetic or not.

Important notes:

The current implementation is experimental and does not support backward compatibility. The API can be modified in upcoming versions.

Tests were carried out with images generated by technologies from the list below:

  • Deepfacelive
  • FaceSwap
  • Face2Face
  • NeuralTextures
  • FSGAN
  • StyleGAN (v1, v2)
  • Roop (InsightFaceSwap)
  • Deepfacelab
  • SimSwap (also Dot)
  • FaceFusion
  • MidJourney (v5, v6)
  • StableDiffusion
  • Faceswapper
  • PicsiAI

Implementation description:

DeepFakeEstimator returns the following structure:

struct DeepFakeEstimation {
    enum class State {
        Real = 0,    //!< The person in image is real
        Fake         //!< The person in image is fake (media is synthetic)
    };

    float score;     //!< Estimation score
    State state;     //!< Liveness status
};

The estimation score normalized between 0.0 and 1.0, where 1.0 equals to 100% confidence that media is not synthetic (real), and 0.0 equals to 0% that the media is synthetic (fake).

Requirements for a detected face in the source image:

  • Minimum face height is 150 pixels.
  • Yaw angles should not exceed 30 degrees.
  • Pitch angles should not exceed 20 degrees.

Recommended thresholds:

The table below contains thresholds specified in DeepFakeEstimator::Settings section of the FaceEngine configuration file (faceengine.conf). By default, these threshold values are set to optimal.

"DeepFakeEstimator recommended settings"

Parameter Description Type Default value
realThreshold Threshold in [0..1] range. "Value::Float1" 0.5
defaultEstimatorType Configuration of plan files usage. Value::Int1 2

Possible values for defaultEstimatorType:

Currently, available values for selecting estimation scenario are 1 and 2.

Scenario M1 means usage of the first .plan file.

Scenario M2 means usage of both .plan file. At first, the estimator pre-estimates whether the detected face in the input image is synthetic only with the second .plan file. Then, if the result is fake, the first .plan file will not run, and the estimator returns Estimation score - 0 and Liveness status - Fake. But, if the second .plan file result is real, the estimator returns Estimation score and Liveness status processes with the first .plan file just like in the M1 scenario.

Other configurations of .plan file usage are not provided.

Configurations:

See the "DeepFake Estimator settings" section in the "ConfigurationGuide.pdf" document.

API structure name:

IDeepFakeEstimator

API namespace:

fsdk::experimental::IDeepFakeEstimator

Plan files:

  • deepfake_estimation_v5_model_1_cpu.plan
  • deepfake_estimation_v5_model_1_cpu-avx2.plan
  • deepfake_estimation_v5_model_1_gpu.plan

  • deepfake_estimation_v4_model_2_cpu.plan

  • deepfake_estimation_v4_model_2_cpu-avx2.plan
  • deepfake_estimation_v4_model_2_gpu.plan

Liveness check functionality#

LivenessFlyingFaces Estimation#

Name: LivenessFlyingFacesEstimator

Algorithm description:

This estimator tells whether the person's face is real or fake (photo, printed image).

Implementation description:

The estimator (see ILivenessFlyingFacesEstimator in ILivenessFlyingFacesEstimator.h):

  • Implements the estimate() function that needs fsdk::Image with valid image in R8G8B8 format and fsdk::Detection of corresponding source image (see section "Detection structure" in chapter "Face detection facility").

  • Implements the estimate() function that needs the span of fsdk::Image with valid source images in R8G8B8 formats and span of fsdk::Detection of corresponding source images (see section "Detection structure" in chapter "Face detection facility").

Those methods estimate whether different persons are real or not. Corresponding estimation output with float scores which are normalized in range [0..1], where 1 - is real person, 0 - is fake.

The estimator is trained to work in combination with fsdk::ILivenessRGBMEstimator.

The LivenessFlyingFacesEstimation structure contains results of the estimation:

    struct LivenessFlyingFacesEstimation {
        float score;    //!< Numerical value in range [0, 1].
        bool isReal;    //!< Is real face (true) or not (false).
    };

Recommended thresholds: 

Table below contains thresholds from faceengine configuration file (faceengine.conf) in LivenessFlyingFacesEstimator::Settings section. By default, these threshold values are set to optimal.

"Mouth estimator recommended thresholds"

Threshold Recommended value
realThreshold 0.5
aggregationCoeff 0.7

Filtration parameters:

The estimator is trained to work with face images that meet the following requirements:

"Requirements for fsdk::BestShotQualityEstimator::EstimationResult"

Attribute Acceptable values
headPose.pitch [-30...30]
headPose.yaw [-30...30]
headPose.roll [-40...40]
ags [0.5...1.0]

"Requirements for fsdk::Detection"

Attribute Minimum value
detection size 80

Detection size is detection width.

const fsdk::Detection detection = ... // somehow get fsdk::Detection object
const int detectionSize = detection.getRect().width;

Configurations:

See the "LivenessFlyingFaces Estimator settings" section in the "ConfigurationGuide.pdf" document.

API structure name:

ILivenessFlyingFacesEstimator

Plan files:

  • flying_faces_liveness_v4_cpu.plan
  • flying_faces_liveness_v4_cpu-avx2.plan
  • flying_faces_liveness_v4_gpu.plan

LivenessRGBM Estimation#

Name: LivenessRGBMEstimator

Algorithm description:

This estimator tells whether the person's face is real or fake (photo, printed image).

Implementation description:

The estimator (see ILivenessRGBMEstimator in ILivenessRGBMEstimator.h):

  • Implements the estimate() function that needs fsdk::Face with valid image in R8G8B8 format, detection structure of corresponding source image (see section "Detection structure" in chapter "Face detection facility") and fsdk::Image with accumulated background. This method estimates whether a real person or not. Output estimation structure contains the float score and boolean result. The float score normalized in range [0..1], where 1 - is real person, 0 - is fake. The boolean result has value true for real person and false otherwise.

  • Implements the update() function that needs the fsdk::Image with current frame, number of that image and previously accumulated background. The accumulated background will be overwritten by this call.

The LivenessRGBMEstimation structure contains results of the estimation:

    struct LivenessRGBMEstimation {
        float score = 0.0f; //!< Estimation score
        bool isReal = false;//!< Where person is real or not
    };

Recommended thresholds: 

Table below contains thresholds from faceengine configuration file (faceengine.conf) in LivenessRGBMEstimator::Settings section. By default, these threshold values are set to optimal.

"LivenessRGBM estimator recommended thresholds"

Threshold Recommended value
backgroundCount 100
threshold 0.8
coeff1 0.222
coeff2 0.222

Configurations:

See the "LivenessRGBM Estimator settings" section in the "ConfigurationGuide.pdf" document.

API structure name:

ILivenessRGBMEstimator

Plan files:

  • rgbm_liveness_cpu.plan
  • rgbm_liveness_cpu-avx2.plan
  • rgbm_liveness_gpu.plan

Depth Liveness Estimation (LivenessDepthEstimator)#

Name: LivenessDepthEstimator

Algorithm description:

This estimator tells whether the person's face is real or fake (photo, printed image).

Implementation description:

The estimator (see ILivenessDepthEstimator in ILivenessDepthEstimator.h):

  • Implements the estimate() function that accepts source warped image (see chapter "Image warping" for details) in R16 format and fsdk::DepthEstimation structure. This method estimates whether or not depth map corresponds to the real person. Corresponding estimation output with float score which is normalized in range [0..1], where 1 - is real person, 0 - is fake.

The DepthEstimation structure contains results of the estimation:

    struct DepthEstimation {
        float score; //!< confidence score in [0,1] range. The closer the score to 1, the more likely that person is alive.
        bool isReal; //!< boolean flag that indicates whether a person is real.
    };

Recommended thresholds: 

Table below contains thresholds from faceengine configuration file (faceengine.conf) in DepthEstimator::Settings section. By default, these threshold values are set to optimal.

"Depth estimator recommended thresholds"

Threshold Recommended value
maxDepthThreshold 3000
minDepthThreshold 100
zeroDepthThreshold 0.66
confidenceThreshold 0.89

Filtration parameters:

The estimator is trained to work with face images that meet the following requirements:

"Requirements for fsdk::HeadPoseEstimation"

Attribute Acceptable angle range(degrees)
pitch [-15...15]
yaw [-15...15]
roll [-10...10]

"Requirements for fsdk::Quality"

Attribute Minimum value
blur 0.94
light 0.90
dark 0.93

"Requirements for fsdk::EyesEstimation"

Attribute State
leftEye Open
rightEye Open

Also, the minimum distance between the face bounding box and the frame borders should be greater than 20 pixels.

Configurations:

See the "Depth Estimator settings" section in the "ConfigurationGuide.pdf" document.

API structure name:

ILivenessDepthEstimator

Plan files:

  • depth_estimation_v2_1_cpu.plan
  • depth_estimation_v2_1_cpu-avx2.plan
  • depth_estimation_v2_1_gpu.plan

Depth and RGB OneShotLiveness estimation#

Name: LivenessDepthRGBEstimator

Algorithm description:

This estimator shows whether the person's face is real or fake (photo, printed image). You can use this estimator in payment terminals (POS) and self-service cash registers (KCO) with two cameras - Depth and RGB.

The estimation is performed on the device with an Orbbec camera. The camera can be either built in a POS or KCO device or connected to it. This allows to perform the estimation at a higher speed and makes it more secure as data is not sent to the backend. Using the algorithm with Orbbec cameras lets you work with deep data. It increases system reliability and accuracy, as 3D data lets you assess facial shapes and detect fake masks more accurately.

The estimator is trained to work with warped images. For details, see chapter "Image warping".

Supported devices

The estimator works only on the following devices:

  • VLS LUNA CAMERA 3D
  • VLS LUNA CAMERA 3D Embedded

Different models of Orbbec cameras have different spacing between sensors. If you need to use another Orbbec Depth+RGB camera, you can change the calibration coefficients to match the device. Please, contact VisionLabs for details.

Image requirements

This estimator works based on two images:

  • RGB image from the RGB camera
  • Depth image (or depth map) from the depth camera

Input images must meet the following requirements:

Parameter Requirements
Resolution 640 × 480 pixels
Compression No
Image cropping No
Image rotation No
Effects overlay No
Number of faces in the frame 1
Face detection bounding box size 200 pixels
Frame edges offset 10 pixels
Head pose -20 to +20 degrees for head pitch, yaw, and roll.
Image quality The face in the frame should not be overexposed, underexposed, or blurred. For details, see section "Image Quality Estimation".

Implementation description:

The estimator implements the following:

  • The estimate() function that needs the depth frame as the first fsdk::Image object, the RGB frame as the second fsdk::Image object, fsdk::Detection and fsdk::Landmarks5 objects (see section "Detection structure" in chapter "Face detection facility"). The estimation output is the fsdk::DepthRGBEstimation srtucture.
  • The estimate() function that needs the first span of depth frames as the fsdk::Image objects, the second span of RGB frames as the fsdk::Image objects, a span of fsdk::Detection, and a span of fsdk::Landmarks5 (see section "Detection structure" in chapter "Face detection facility").
    The estimation output is a span of the fsdk::DepthRGBEstimation structure. The second output value is the fsdk::DepthRGBEstimationstructure.

DepthRGBEstimation

The DepthRGBEstimation structure contains results of the estimation:

struct DepthRGBEstimation {
    //!< confidence score in [0,1] range.
    //!< The closer the score to 1, the more likely that person is alive.
    float score;
    //!< boolean flag that indicates whether a person is real.
    bool isReal;
};

The estimation score is normalized in range [0..1], where 1 - is real person, 0 - is a fake.

The value of isReal depends on score and confidenceThreshold. The value of the confidenceThreshold can be changed in configuration file faceengine.conf (see ConfigurationGuide LivenessDepthRGBEstimator).

API structure name:

ILivenessDepthRGBEstimator

See ILivenessDepthRGBEstimator in ILivenessDepthRGBEstimator.h.

Plan files:

  • depth_rgb_v2_model_1_cpu.plan
  • depth_rgb_v2_model_1_gpu.plan
  • depth_rgb_v2_model_2_cpu.plan
  • depth_rgb_v2_model_2_gpu.plan
  • depth_rgb_v2_model_1_cpu-avx2.plan
  • depth_rgb_v2_model_2_cpu-avx2.plan

Depth liveness estimation (DepthLivenessEstimator)#

Name: DepthLivenessEstimator

Algorithm description:

Given a face depth warp, the estimator tells whether the face is real or fake (photo, printed image).

The estimator aims to unify different use cases of depth liveness estimation, while increasing the estimation accuracy compared to existing depth estimators.

The estimator can be used in payment terminals (POS) and self-service cash registers (KCO) with two cameras - Depth and RGB.

The estimator is trained to work with warped depth images of faces. For details, see chapter "Image warping".

The estimator can be used together with LivenessDepthRGBEstimator or as standalone. When DepthLivenessEstimator is used in conjunction with LivenessDepthRGBEstimator, the latter takes care of necessary preprocessing of RGB and depth frames, producing depth warps of faces required by DepthLivenessEstimator. When DepthLivenessEstimator is used as standalone, it is your responsibility to prepare a warped depth image of a face for estimation, including handling such issues as:

  1. detecting faces on RGB frames, quality checking of RGB frames and detections
  2. [possibly required] mapping between a) RGB frames used for face detection and b) depth frames
  3. obtaining depth warps of faces from depth frames

Supported devices

On its own, the estimator requires just a properly prepared depth warp of a face, and doesn't constrain the list of possible devices. However, if LivenessDepthRGBEstimator is involved, it has its own constraints.

Image requirements

The estimator works based on depth warps of faces. The warps must be 250x250 pixels, in the fsdk::Format::R16 format. If you prepare depth warps yourself, there are some basic quality requirements for RGB frames:

Parameter Requirements
Resolution 640 × 480 pixels
Compression No
Image cropping No
Image rotation No
Effects overlay No
Number of faces in the frame 1
Face detection bounding box size 200 pixels
Frame edges offset 10 pixels
Head pose -15 to +15 degrees for head pitch, yaw, and roll.
Image quality The face in the frame should not be overexposed, underexposed, or blurred. For details, see section "Image Quality Estimation".

Implementation description:

The estimator (see IDepthLivenessEstimator.h) implements the following:

  • The estimate() function that needs the depth warp as the first fsdk::Image object. The estimation output is the returned fsdk::DepthLivenessEstimation structure.
  • The estimate() function that needs a span of depth warps (fsdk::Image objects) as the first parameter, and a span of fsdk::DepthLivenessEstimation as the second parameter. The estimation output is saved in the second parameter.

DepthLivenessEstimation

The DepthLivenessEstimation structure contains results of the estimation:

struct DepthLivenessEstimation {
    //!< confidence score in [0,1] range.
    //!< The closer the score to 1, the more likely that person is alive.
    float score;
    //!< boolean flag that indicates whether a person is real.
    bool isReal;
};

The estimation score is normalized in the range [0..1], where 1 - is real person, 0 - is a fake.

The value of isReal depends on score and confidenceThreshold. The value of the confidenceThreshold can be changed in configuration file faceengine.conf (see ConfigurationGuide DepthLivenessEstimator).

API structure name:

IDepthLivenessEstimator

See IDepthLivenessEstimator in IDepthLivenessEstimator.h.

Examples:

  • C++ example: example_depth_liveness
  • Python example: example_depth_liveness.py

Plan files:

  • depth_liveness_v2_arm.plan
  • depth_liveness_v2_cpu.plan
  • depth_liveness_v2_cpu-avx2.plan
  • depth_liveness_v2_gpu.plan

LivenessOneShotRGB Estimation#

Name: LivenessOneShotRGBEstimator

Algorithm description:

This estimator shows whether the person's face is real or fake by the following types of attacks:

  • Printed Photo Attack. One or several photos of another person are used.
  • Video Replay Attack. A video of another person is used.
  • Printed Mask Attack. An imposter cuts out a face from a photo and covers his face with it.
  • 3D Mask Attack. An imposer puts on a 3D mask depicting the face of another person.

The requirements for the processed image and the face in the image are listed below.

Parameters Requirements
Minimum resolution for mobile devices 720x960 pixels
Maximum resolution for mobile devices 1080x1920 pixels
Minimum resolution for webcams 1280x720 pixels
Maximum resolution for webcams 1920x1080 pixels
Compression No
Image warping No
Image cropping No
Effects overlay No
Mask No
Number of faces in the frame 1
Face detection bounding box width More than 200 pixels
Frame edges offset More than 10 pixels
Head pose -20 to +20 degrees for head pitch, yaw, and roll
Image quality The face in the frame should not be overexposed, underexposed, or blurred.

See image quality thresholds in the "Image Quality Estimation" section.

Implementation description:

The estimator (see ILivenessOneShotRGBEstimator in ILivenessOneShotRGBEstimator.h):

  • Implements the estimate() function that needs fsdk::Image, fsdk::Detection and fsdk::Landmarks5 objects (see section "Detection structure" in chapter "Face detection facility"). Output estimation is a structure fsdk::LivenessOneShotRGBEstimation.

  • Implements the estimate() function that needs the span of fsdk::Image, span of fsdk::Detection and span of fsdk::Landmarks5 (see section "Detection structure" in chapter "Face detection facility").
    The first output estimation is a span of structure fsdk::LivenessOneShotRGBEstimation. The second output value (structure fsdk::LivenessOneShotRGBEstimation) is the result of aggregation based on span of estimations announced above. Pay attention the second output value (aggregation) is optional, i.e. default argument, which is nullptr.

The LivenessOneShotRGBEstimation structure contains results of the estimation:

struct LivenessOneShotRGBEstimation {
    enum class State {
        Alive = 0,   //!< The person on image is real
        Fake,        //!< The person on image is fake (photo, printed image)
        Unknown      //!< The liveness status of person on image is Unknown
    };

    float score;        //!< Estimation score
    State state;        //!< Liveness status
    float qualityScore; //!< Liveness quality score
};

Estimation score is normalized in range [0..1], where 1 - is real person, 0 - is fake.

Liveness quality score is an image quality estimation for the liveness recognition.

This parameter is used for filtering if it is possible to make bestshot when checking for liveness.

The reference score is 0,5.

The value of State depends on score and qualityThreshold. The value qualityThreshold can be given as an argument of method estimate (see ILivenessOneShotRGBEstimator), and in configuration file faceengine.conf (see ConfigurationGuide LivenessOneShotRGBEstimator).

Recommended thresholds: 

Table below contains thresholds from faceengine configuration file (faceengine.conf)
in the LivenessOneShotRGBEstimator::Settings section. By default, these threshold values are set to optimal.

"LivenessOneShotRGB estimator recommended thresholds"

Threshold Recommended value
realThreshold 0.5
qualityThreshold 0.5
calibrationCoeff 0.89

Configurations:

See the "LivenessOneShotRGBEstimator settings" section in the "ConfigurationGuide.pdf" document.

API structure name:

ILivenessOneShotRGBEstimator

Plan files:

  • oneshot_rgb_liveness_v8_model_1_cpu.plan
  • oneshot_rgb_liveness_v8_model_2_cpu.plan
  • oneshot_rgb_liveness_v8_model_3_cpu.plan
  • oneshot_rgb_liveness_v8_model_4_cpu.plan
  • oneshot_rgb_liveness_v8_model_1_cpu-avx2.plan
  • oneshot_rgb_liveness_v8_model_2_cpu-avx2.plan
  • oneshot_rgb_liveness_v8_model_3_cpu-avx2.plan
  • oneshot_rgb_liveness_v8_model_4_cpu-avx2.plan
  • oneshot_rgb_liveness_v8_model_1_gpu.plan
  • oneshot_rgb_liveness_v8_model_2_gpu.plan
  • oneshot_rgb_liveness_v8_model_3_gpu.plan
  • oneshot_rgb_liveness_v8_model_4_gpu.plan

Usage example#

The face in the image and the image itself should meet the estimator requirements.

You can find additional information in example (examples/example_estimation/main.cpp) or in the code below.

// Minimum detection size in pixels.
constexpr int minDetSize = 200;

// Step back from the borders.
constexpr int borderDistance = 10;

if (std::min(detectionRect.width, detectionRect.height) < minDetSize) {
    std::cerr << "Bounding Box width and/or height is less than `minDetSize` - " << minDetSize << std::endl;
    return false;
}

if ((detectionRect.x + detectionRect.width) > (image.getWidth() - borderDistance) || detectionRect.x < borderDistance) {
    std::cerr << "Bounding Box width is out of border distance - " << borderDistance << std::endl;
    return false;
}

if ((detectionRect.y + detectionRect.height) > (image.getHeight() - borderDistance) || detectionRect.y < borderDistance) {
    std::cerr << "Bounding Box height is out of border distance - " << borderDistance << std::endl;
    return false;
}

// Yaw, pitch and roll.
constexpr int principalAxes = 20;

if (std::abs(headPose.pitch) > principalAxes ||
    std::abs(headPose.yaw) > principalAxes ||
    std::abs(headPose.roll) > principalAxes ) {

    std::cerr << "Can't estimate LivenessOneShotRGBEstimation. " <<
        "Yaw, pith or roll absolute value is larger than expected value: " << principalAxes << "." <<
        "\nPitch angle estimation: " << headPose.pitch <<
        "\nYaw angle estimation: " << headPose.yaw <<
        "\nRoll angle estimation: " << headPose.roll << std::endl;
    return false;
}

We recommend using Detector type 3 (fsdk::ObjectDetectorClassType::FACE_DET_V3).

Personal Protection Equipment Estimation#

Name: PPEEstimator

Algorithm description:

The Personal Protection Equipment (PPE) estimator predicts whether a person is wearing one or multiple types of protection equipment, such as:

  • Helmet
  • Hood
  • Vest
  • Gloves
  • Safety harness

For each attribute, the estimator returns 3 prediction scores which indicate the possibility of person wearing that attribute, not wearing it, and an "unknown" score which will be the highest of them all, if the estimator wasn't able to tell whether a person in the image is wearing a particular attribute.

To correctly determine a personal protective equipment, the following requirements must be met:

  • Scene requirements:
    • Moving objects must be visually separated from each other in the image.
    • A background must be mostly static and must not change rapidly.
    • Maximum image shifts due to camera shakes is 1% of the frame size.
    • Overlapping of moving objects by static objects, such as columns, industrial items, and so on, must be minimal.
    • The analyzed scene must not have reflective surfaces. If any, they need to be disguised.
    • Large obstacles should be avoided in the camera's field of view. Pillars, tower cranes, stacked materials, and so on will cause tracks to break and also overlap people. If it is impossible, we recommend that you do not place an obstacle in the center of the frame.
    • Strong camera lights are allowed in a frame. We do not recommend that you point the camera at spotlights and active welding zones, especially in the foreground, because it reduces the visibility of people and the visibility of PPE on them.
    • The camera lens should be kept clean and free of dust. We do not recommend that you place cameras above a material unloading area or near ventilation shafts, because dust on the lens reduces the visibility of people and the visibility of PPE on them.
    • Shooting angle must be without tilting the camera too much. From a top-down perspective, PPE (vest and gloves) can be less visible.
  • Image requirements:
    • A person and PPE must be clearly visible to the human eye.
    • Overlapping of a person or PPE with an obstacle or another person and cropping by frame boundaries should not exceed 25%.
    • The linear dimensions of PPE should not exceed 65% of the corresponding frame size.
    • The image must not be noisy or distorted by compression algorithm artifacts. The image must be a color one.
    • The duration of visibility of a PPE must be at least 10-13 frames.
    • The height of the image of a person in pixels must be not less than 100. The minimum pixel density per meter (height of the object in pixels to the height of the object in meters) is 60ppm.
    • The minimum height and color of an equipment on body parts must be as follows:
Equipment Minimum hight, in pixels Color
Vest 50 Light green (green), yellow, orange
Helmet 20 White, yellow, orange, red
Hood 20 N/A
Gloves 20 White, gray, black
Safety harness 50 N/A
  • Video stream requirements:
Parameter Requirement
Minimum resolution 640х360 pixels
Maximum resolution 1920х1080 pixels
Minimum frame frequency 13 frames per second
  • Lighting requirements:
Parameter Requirement
Scene lighting 200 lux or more
Sudden changes in lighting None

Implementation description:

The Personal Protection Equipment Estimation structure for each attribute looks as follows:

    struct OnePPEEstimation {
        float positive = 0.0f;
        float negative = 0.0f;
        float unknown  = 0.0f;

        enum class PPEState : uint8_t {
            Positive, //!< person is wearing specific personal equipment;
            Negative, //!< person isn't wearing specific personal equipment;
            Unknown,  //!< it's hard to tell wether person wears specific personal equipment.
            Count     //!< state count
        };

        /**
         * @brief returns predominant personal equipment state
         * */
        inline PPEState getPredominantState();
    };

All three prediction scores sum up to 1.

The estimator takes an image and a human bounding box of a person for which attributes shall be predicted as an input. For more information about human detector, see "Human Detection" section.

API structure name:

IPPEEstimator

Plan files:

  • ppe_estimation_v3_cpu.plan
  • ppe_estimation_v3_cpu-avx2.plan
  • ppe_estimation_v3_gpu.plan

Medical Mask Estimation Functionality#

Name: MedicalMaskEstimator

This estimator aims to detect a medical mask on the face in the source image. For the interface with MedicalMaskEstimation it can return the next results:

  • A medical mask is on the face (see MedicalMask::Mask field in the MedicalMask enum);
  • There is no medical mask on the face (see MedicalMask::NoMask field in the MedicalMask enum);
  • The face is occluded with something (see MedicalMask::OccludedFace field in the MedicalMask enum);

For the interface with MedicalMaskEstimationExtended it can return the next results:

  • A medical mask is on the face (see MedicalMaskExtended::Mask field in the MedicalMaskExtended enum);
  • There is no medical mask on the face (see MedicalMaskExtended::NoMask field in the MedicalMaskExtended enum);
  • A medical mask is not on the right place (see MedicalMaskExtended::MaskNotInPlace field in the MedicalMaskExtended enum);
  • The face is occluded with something (see MedicalMaskExtended::OccludedFace field in the MedicalMaskExtended enum);

The estimator (see IMedicalMaskEstimator in IEstimator.h):

  • Implements the estimate() function that accepts source warped image in R8G8B8 format and medical mask estimation structure to return results of estimation;
  • Implements the estimate() function that accepts source image in R8G8B8 format, face detection to estimate and medical mask estimation structure to return results of estimation;
  • Implements the estimate() function that accepts fsdk::Span of the source warped images in R8G8B8 format and fsdk::Span of the medical mask estimation structures to return results of estimation;
  • Implements the estimate() function that accepts fsdk::Span of the source images in R8G8B8 format, fsdk::Span of face detections and fsdk::Span of the medical mask estimation structures to return results of the estimation.

Every method can be used with MedicalMaskEstimation and MedicalMaskEstimationExtended.

The estimator was implemented for two use-cases:

  1. When the user already has warped images. For example, when the medical mask estimation is performed right before (or after) the face recognition.
  2. When the user has face detections only.

Note: Calling the estimate() method with warped image and the estimate() method with image and detection for the same image and the same face could lead to different results.

MedicalMaskEstimator thresholds#

The estimator returns several scores, one for each possible result. The final result is based on that scores and thresholds. If some score is above the corresponding threshold, that result is estimated as final. If none of the scores exceed the matching threshold, the maximum value will be taken. If some of the scores exceed their thresholds, the results will take precedence in the following order for the case with MedicalMaskEstimation:

Mask, NoMask, OccludedFace
and for the case with MedicalMaskEstimationExtended:

Mask, NoMask, MaskNotInPlace, OccludedFace

The default values for all thresholds are taken from the configuration file. See Configuration guide for details.

MedicalMask enumeration#

The MedicalMask enumeration contains all possible results of the MedicalMask estimation:

    enum class MedicalMask {
        Mask = 0,                 //!< medical mask is on the face
        NoMask,                   //!< no medical mask on the face
        OccludedFace              //!< face is occluded by something
    };

    enum class DetailedMaskType {
        CorrectMask = 0,               //!< correct mask on the face (mouth and nose are covered correctly)
        MouthCoveredWithMask,          //!< mask covers only a mouth
        ClearFace,                     //!< clear face - no mask on the face
        ClearFaceWithMaskUnderChin,    //!< clear face with a mask around of a chin, mask does not cover anything in the face region (from mouth to eyes) 
        PartlyCoveredFace,             //!< face is covered with not a medical mask or a full mask
        FullMask,                      //!< face is covered with a full mask (such as balaclava, sky mask, etc.)
        Count
    };
  • Maskis according to CorrectMask or MouthCoveredWithMask;
  • NoMaskis according to ClearFace or ClearFaceWithMaskUnderChin;
  • OccludedFace is according to PartlyCoveredFace or FullMask.

Note - NoMask means absence of medical mask or any occlusion in the face region (from mouth to eyes). Note - DetailedMaskType is not supported for NPU-based platforms.

MedicalMaskEstimation structure#

The MedicalMaskEstimation structure contains results of the estimation:

    struct MedicalMaskEstimation {
        MedicalMask result;           //!< estimation result (@see MedicalMask enum)
        DetailedMaskType maskType;    //!< detailed type  (@see DetailedMaskType enum)

        // scores
        float maskScore;         //!< medical mask is on the face score
        float noMaskScore;       //!< no medical mask on the face score
        float occludedFaceScore; //!< face is occluded by something score

        float scores[static_cast<int>(DetailedMaskType::Count)]{};    //!< detailed estimation scores

        inline float getScore(DetailedMaskType type) const;
    };

There are two groups of the fields:

1․ The first group contains the result:

    MedicalMask result;

Result enum field MedicalMaskEstimation contains the target results of the estimation. Also you can see the more detailed type in MedicalMaskEstimation.

    DetailedMaskType maskType;           //!< detailed type

2․ The second group contains scores:

    float maskScore;          //!< medical mask is on the face score
    float noMaskScore;        //!< no medical mask on the face score
    float occludedFaceScore;  //!< face is occluded by something score

The score group contains the estimation scores for each possible result of the estimation. All scores are defined in [0,1] range. They can be useful for users who want to change the default thresholds for this estimator. If the default thresholds are used, the group with scores could be just ignored in the user code. More detailed scores for every type of a detailed type of face covering are

float scores[static_cast<int>(DetailedMaskType::Count)]{};    //!< detailed estimation scores
  • maskScore is the sum of scores for CorrectMask, MouthCoveredWithMask;
  • NoMask is the sum of scores for ClearFace and ClearFaceWithMaskUnderChin;
  • occludedFaceScore is the sum of scores for PartlyCoveredFace and FullMask fields.

Note - DetailedMaskType, scores, getScore are not supported for NPU-based platforms. It means a user cannot use this fields and methods in code.

MedicalMaskExtended enumeration#

The MedicalMask enumeration contains all possible results of the MedicalMask estimation:

    enum class MedicalMaskExtended {
        Mask = 0,                 //!< medical mask is on the face
        NoMask,                   //!< no medical mask on the face
        MaskNotInPlace,           //!< mask is not on the right place
        OccludedFace              //!< face is occluded by something
    };

MedicalMaskEstimationExtended structure#

The MedicalMaskEstimationExtended structure contains results of the estimation:

    struct MedicalMaskEstimationExtended {
        MedicalMaskExtended result;     //!< estimation result (@see MedicalMaskExtended enum)
        // scores
        float maskScore;         //!< medical mask is on the face score
        float noMaskScore;       //!< no medical mask on the face score
        float maskNotInPlace;    //!< mask is not on the right place
        float occludedFaceScore; //!< face is occluded by something score
    };

There are two groups of the fields:

1․ The first group contains only the result enum:

        MedicalMaskExtended result;

Result enum field MedicalMaskEstimationExtended contains the target results of the estimation.

2․ The second group contains scores:

        float maskScore;         //!< medical mask is on the face score
        float noMaskScore;       //!< no medical mask on the face score
        float maskNotInPlace;    //!< mask is not on the right place
        float occludedFaceScore; //!< face is occluded by something score

The score group contains the estimation scores for each possible result of the estimation. All scores are defined in [0,1] range.

Filtration parameters#

The estimator is trained to work with face images that meet the following requirements:

"Requirements for fsdk::MedicalMaskEstimator::EstimationResult"

Attribute Acceptable values
headPose.pitch [-40...40]
headPose.yaw [-40...40]
headPose.roll [-40...40]
ags [0.5...1.0]

Configurations:

See the "Medical mask estimator settings" section in the "ConfigurationGuide.pdf" document.

API structure name:

IMedicalMaskEstimator

Plan files:

  • mask_clf_v3_cpu.plan
  • mask_clf_v3_cpu-avx2.plan
  • mask_clf_v3_gpu.plan

Human Attribute Estimation#

Name: HumanAttributeEstimator

Algorithm description:

This estimator aims to detect next human attributes on the warped human image:

  • Age;
  • Gender;
  • Sleeve size;
  • The presence of a headwear;
  • The color of a headwear;
  • The presence of a backpack;
  • Estimation of the lower body clothing type;
  • The color of a lower body clothing;
  • Outwear color.
  • The color of the shoes;

Age estimation contains a single number - the number of years.

Gender estimation contains one of the next results (see HumanAttributeResult::Gender enum):

  • Person's gender is female;
  • Person's gender is male;
  • Person's gender is unknown.

Sleeve size estimation contains one of the next results (see HumanAttributeResult::SleeveSize enum):

  • Person's sleeves are short;
  • Person's sleeves are long;
  • Person's sleeves size is unknown.

Hat estimation contains one of the next results (see HumanAttributeResult::Hat enum):

  • There is no headwear;
  • There is a headwear;
  • Headwear state is unknown.

Backpack estimation contains one of the next results (see HumanAttributeResult::Backpack enum):

  • There is no backpack;
  • There is a backpack;
  • Backpack state is unknown.

LowerBodyClothing estimation contains one of the next results (see HumanAttributeResult::LowerBodyClothing enum):

  • There are pants;
  • There are shorts;
  • There is skirt;
  • Lower body clothing state is unknown.

Outwear color estimation contains the next results (see HumanAttributeResult::Color enum):

  • Person's outwear color is black;
  • Person's outwear color is blue;
  • Person's outwear color is green;
  • Person's outwear color is grey;
  • Person's outwear color is orange;
  • Person's outwear color is purple;
  • Person's outwear color is red;
  • Person's outwear color is white;
  • Person's outwear color is yellow;
  • Person's outwear color is pink;
  • Person's outwear color is brown;
  • Person's outwear color is beige;
  • Person's outwear color is khaki;
  • Person's outwear color is multicolored.

Apparent color estimation contains the next results (see HumanAttributeResult::ApparentColor enum):

  • Apparent color is black;
  • Apparent color is white;
  • Apparent color is some other color from full palette;
  • Apparent color is unknown.

Outwear color vs Apparent color:

For now, we have two color palettes Outwear color and Apparent color. Outwear color palette represents full palette supported by human attributes estimator. Apparent color palette is simplified version of Outwear color. Color of some attributes can be classified only of small pool of colors - Black and White for now. So, in sake of simplification for the user we introduce Apparent color palette. Apparent color palette can be extended with colors in the future.

Implementation description:

The Gender enumeration contains all possible results of the Gender estimation:

    enum class Gender {
        Female,     //!< person's gender is female
        Male,       //!< person's gender is male
        Unknown     //!< person's gender is unknown
    };

The SleeveSize enumeration contains all possible results of the SleeveSize estimation:

    enum class SleeveSize {
        Short,   //!< sleeves are short
        Long,    //!< sleeves are long
        Unknown  //!< sleeves state is unknown
    };

The Hat enumeration contains all possible results of the Hat estimation:

    enum class Hat {
        No,      //< there is no headwear
        Yes,     //< there is a headwear
        Unknown  //< headwear state is unknown
    };

The Backpack enumeration contains all possible results of the Backpack estimation:

    enum class Backpack {
        No,      //< there is no backpack
        Yes,     //< there is a backpack
        Unknown  //< backpack state is unknown
    };

The LowerBodyClothing enumeration contains all possible results of the LowerBodyClothing estimation:

    enum class LowerBodyClothing {
        Pants,    //< there is pants
        Shorts,   //< there is shorts
        Skirt,    //< there is skirt
        Unknown   //< lower body clothing state is unknown
    };

The Color enumeration contains all possible results of the OutwearColor estimation:

    enum class Color {
        Black,
        Blue,
        Green,
        Grey,
        Orange,
        Purple,
        Red,
        White,
        Yellow,
        Pink,
        Brown,
        Beige,
        Khaki,
        Multicolored,
        Count
    };

The ApparentColor enumeration contains all possible results of the ApparentColor estimation:

    enum class ApparentColor {
        Black,
        White,
        Other,
        Unknown,
        Count
    };

Human Attribute estimation request:

HumanAttributeRequest lists all possible estimation attributes that HumanAttributeEstimator is currently able to estimate.

    enum class HumanAttributeRequest {
        EstimateAge               = 1 << 0, //!< estimate age
        EstimateGender            = 1 << 1, //!< estimate gender
        EstimateSleeveSize        = 1 << 2, //!< estimate sleeves size
        EstimateBackpack          = 1 << 3, //!< estimate backpack state
        EstimateOutwearColor      = 1 << 4, //!< estimate outwear color
        EstimateHeadwear          = 1 << 5, //!< estimate headwear state
        EstimateLowerBodyClothing = 1 << 7, //!< estimate lower body clothing state
        EstimateShoeColor         = 1 << 8, //!< estimate shoe color
        EstimateAll               = 0xffff  //!< estimate all attributes
    };

The GenderEstimation structure contains results of the gender estimation:

    struct GenderEstimation {
        Gender result;   //!< estimation result (@see Gender enum).
        float female;    //!< female gender probability score
        float male;      //!< male gender probability score
        float unknown;   //!< unknown gender probability score
    };

1․ The first group contains only the result enum:

        Gender result;   //!< estimation result (@see Gender enum).
Result enum field GenderEstimation contain the target results of the estimation.

2․ The second group contains scores:

        float female;    //!< female gender probability score
        float male;      //!< male gender probability score
        float unknown;   //!< unknown gender probability score

The scores group contains the estimation score.

The SleeveSizeEstimation structure contains results of the sleeves size estimation:

    struct SleeveSizeEstimation {
        SleeveSize result; //!< estimation result (@see SleeveSize enum).
        float shortSize;   //!< short sleeves size probability score
        float longSize;    //!< long sleeves size probability score
        float unknown;     //!< unknown sleeves size probability score
    };

1․ The first group contains only the result enum:

        SleeveSize result; //!< estimation result (@see SleeveSize enum).

Result enum field SleeveSizeEstimation contain the target results of the estimation.

2․ The second group contains scores:

        float shortSize;   //!< short sleeves size probability score
        float longSize;    //!< long sleeves size probability score
        float unknown;     //!< unknown sleeves size probability score

The scores group contains the estimation score.

The HatEstimation structure contains results of the hat estimation:

    struct HatEstimation {
        Hat result;     //!< estimation result (@see Hat enum).
        float noHat;    //!< no hat probability score
        float hat;      //!< hat probability score
        float unknown;  //!< unknown hat state probability score

        ApparentColorEstimation hatColor; //!< hat color estimation
    };

1․ The first group contains only the result enum:

        Hat result;     //!< estimation result (@see Hat enum).

Result enum field HatEstimation contain the target results of the estimation.

2․ The second group contains scores:

        float noHat;    //!< no hat probability score
        float hat;      //!< hat probability score
        float unknown;  //!< unknown hat state probability score

The scores group contains the estimation score.

3․ The third group contains color estimation:

        ApparentColorEstimation hatColor; //!< hat color estimation.

The BackpackEstimation structure contains results of the backpack estimation:

    struct BackpackEstimation {
        Backpack result;  //!< estimation result (@see Backpack enum).
        float noBackpack; //!< no backpack probability score
        float backpack;   //!< backpack probability score
        float unknown;    //!< unknown backpack state probability score
    };

1․ The first group contains only the result enum:

        Backpack result;  //!< estimation result (@see Backpack enum).

Result enum field BackpackEstimation contain the target results of the estimation.

2․ The second group contains scores:

        float noBackpack; //!< no backpack probability score
        float backpack;   //!< backpack probability score
        float unknown;    //!< unknown backpack state probability score

The scores group contains the estimation score.

The LowerBodyClothingEstimation structure contains results of the lower body clothing estimation:

    struct LowerBodyClothingEstimation {
        LowerBodyClothing result;  //!< estimation result.
        float pants;               //!< pants probability score
        float shorts;              //!< shorts probability score
        float skirt;               //!< skirt probability score
        float unknown;             //!< unknown state probability score

        OutwearColorEstimation lowerBodyClothingColor; //!< lower body clothing color estimation.
    };

1․ The first group contains only the result enum:

        LowerBodyClothing result;  //!< estimation result.

Result enum field LowerBodyClothingEstimation contain the target results of the estimation.

2․ The second group contains scores:

        float pants;               //!< pants probability score
        float shorts;              //!< shorts probability score
        float skirt;               //!< skirt probability score
        float unknown;             //!< unknown state probability score

The scores group contains the estimation score.

3․ The third group contains color estimation:

        OutwearColorEstimation lowerBodyClothingColor; //!< lower body clothing color estimation.

The OutwearColorEstimation structure contains results of outwear color estimation:

    struct OutwearColorEstimation {
        bool isBlack;                                 //!< outwear is black
        bool isBlue;                                  //!< outwear is blue
        bool isGreen;                                 //!< outwear is green
        bool isGrey;                                  //!< outwear is grey
        bool isOrange;                                //!< outwear is orange
        bool isPurple;                                //!< outwear is purple
        bool isRed;                                   //!< outwear is red
        bool isWhite;                                 //!< outwear is white
        bool isYellow;                                //!< outwear is yellow
        bool isPink;                                  //!< outwear is pink
        bool isBrown;                                 //!< outwear is brown
        bool isBeige;                                 //!< outwear is beige
        bool isKhaki;                                 //!< outwear is khaki
        bool isMulticolored;                          //!< outwear is multicolored
        float scores[static_cast<int>(Color::Count)]; //!< estimation scores

        /**
         * @brief Returns score of required outwear color.
         * @param [in] color outwear color.
         * @see Color for more info.
         * */
        inline float getScore(Color color) const;
    };

1․ The first group contains plain answer:

        bool isBlack;                                 //!< outwear is black
        bool isBlue;                                  //!< outwear is blue
        bool isGreen;                                 //!< outwear is green
        bool isGrey;                                  //!< outwear is grey
        bool isOrange;                                //!< outwear is orange
        bool isPurple;                                //!< outwear is purple
        bool isRed;                                   //!< outwear is red
        bool isWhite;                                 //!< outwear is white
        bool isYellow;                                //!< outwear is yellow
        bool isPink;                                  //!< outwear is pink
        bool isBrown;                                 //!< outwear is brown
        bool isBeige;                                 //!< outwear is beige
        bool isKhaki;                                 //!< outwear is khaki
        bool isMulticolored;                          //!< outwear is multicolored

2․ The second group contains scores:

        float scores[static_cast<int>(Color::Count)]; //!< estimation scores

Note Not all color flags and according float scores in OutwearColorEstimation have valid values. Some colors were added to interface to support future colors expansion and will store valid values as algorithm will evolve release by release. Currently, Pink, Beige, Khaki and Multicolored are zeroed internally.

The ApparentColorEstimation structure contains results of apparent color estimation:

    struct ApparentColorEstimation {
        bool isBlack;                                           //!< attribute is black
        bool isWhite;                                           //!< attribute is white
        bool isOther;                                           //!< attribute is some other
        bool isUnknown;                                         //!< attribute is unknown
        float scores[static_cast<int>(ApparentColor::Count)];   //!< estimation scores

        /**
         * @brief Returns score of required color.
         * @param [in] color color.
         * @see ApparentColor for more info.
         * */
        inline float getScore(ApparentColor color) const;
    };

1․ The first group contains plain answer:

        bool isBlack;                                           //!< attribute is black
        bool isWhite;                                           //!< attribute is white
        bool isOther;                                           //!< attribute is some other
        bool isUnknown;                                         //!< attribute is unknown

2․ The second group contains scores:

        float scores[static_cast<int>(ApparentColor::Count)];   //!< estimation scores

The HumanAttributeResult structure contains optional results of all estimations depending on HumanAttributeRequest.

        /**
         * @brief Age estimation by human body.
         * @note This estimation may be very different from estimation by face.
         * */
        Optional<float> age;
        /**
         * @brief Gender estimation by human body.
         * @note This estimation may be very different from estimation by face.
         * */
        Optional<GenderEstimation> gender;
        Optional<SleeveSizeEstimation> sleeve;                      //!< sleeve estimation.
        Optional<HatEstimation> headwear;                           //!< headwear estimation.
        Optional<BackpackEstimation> backpack;                      //!< backpack estimation.
        Optional<OutwearColorEstimation> outwearColor;              //!< outwear color estimation.
        Optional<LowerBodyClothingEstimation> lowerBodyClothing;    //!< lower body clothing estimation.
        Optional<ApparentColorEstimation> shoeColor;                //!< shoe color color estimation.

HumanAttribute Aggregation:

The HumanAttribute provides a method to aggregate output results of a batch estimate call. All valid features are counted and the result is a mean of them. Invalid fields will be skipped and do not influence on aggregation result.

        /**
         * @brief Aggregate human body attributes.
         * @details All invalid fields will be skipped and do not influence on aggregation result
         * @param [in] estimations span of estimation results.
         * @param [in] request estimation request.
         * @param [out] result aggregated result.
         * @return Result with error code.
         * @see Span, HumanAttributeResult, IHumanAttributeEstimator::EstimationRequest, Result and FSDKError for details.
         * @note all spans should be based on user owned continuous collections.
         * @note all spans should be equal size.
         * */
        virtual Result<FSDKError> aggregate(
            Span<const HumanAttributeResult> estimations,
            HumanAttributeRequest request,
            HumanAttributeResult& result) const noexcept  = 0;

Attributes dependencies:

Some attribute results depend on the results of other attributes. The color flag and score of attribute depend on the predicted type of attribute. For example, it doesn't make sense to set color values to the attribute which was classified as Unknown. All these rules are also applied to the aggregation results.

  • In HatEstimation struct, hatColor field depends on result field. If result field has value No or Unknown, the hatColor will store value isUnknown = true and all scores will be zeroed.
  • In LowerBodyClothingEstimation struct, lowerBodyClothingColor field depends on result field. If result field has value Unknown, all flags in lowerBodyClothingColor will be set to false and all scores will be zeroed.
  • In HumanAttributeResult struct, shoeColor field depends on result field of LowerBodyClothingEstimation. If result field of LowerBodyClothingEstimation has value Unknown, the shoeColor will store value isUnknown = true and all scores will be zeroed.

Recommended thresholds: 

Human Attribute estimator sets outwear color bool values and age by comparing an output score with a corresponding threshold value listed in faceengine.conf file in HumanAttributeEstimator::Settings section. By default, these threshold values are set to optimal.

"Human Attribute Estimator recommended thresholds"

Thresholds Recommended values
blackUpperThreshold 0.740
blueUpperThreshold 0.655
brownUpperThreshold 0.985
greenUpperThreshold 0.700
greyUpperThreshold 0.710
orangeUpperThreshold 0.420
purpleUpperThreshold 0.650
redUpperThreshold 0.600
whiteUpperThreshold 0.820
yellowUpperThreshold 0.670
blackLowerThreshold 0.700
blueLowerThreshold 0.840
brownLowerThreshold 0.850
greenLowerThreshold 0.700
greyLowerThreshold 0.690
orangeLowerThreshold 0.760
purpleLowerThreshold 0.890
redLowerThreshold 0.600
whiteLowerThreshold 0.540
yellowLowerThreshold 0.930
adultThreshold 0.940

Configurations:

See the "Human Attribute Estimator settings" section in the "ConfigurationGuide.pdf" document.

API structure name:

IHumanAttributeEstimator

Plan files:

  • human_attributes_v2_cpu.plan
  • human_attributes_v2_cpu-avx2.plan
  • human_attributes_v2_gpu.plan

Crowd Estimation#

Name: CrowdEstimator

Algorithm description:

This estimator aims to count a humans (heads) in the input image. It returns a count and center coordinates of heads (optional).

There are several possible CrowdEstimator work modes:

  • Single network - Crowd estimation network is used. It works good with small heads in the image, but can lose big heads (which are closer to the camera).
  • Two networks mode - two networks are be used: Crowd estimation with HumanDetector or Crowd estimation with HeadDetector. This mode causes more accurate results, but the execution of the algorithms takes more time. Two variants of detector are possible. They are "HumanDetector" and "HeadDetector". User can change the detectorType parameter in the config.

Implementation description:

The estimator (see ICrowdEstimator in ICrowdEstimator.h):

  • Implements the estimate() function that accepts source image in R8G8B8 format, the region of interest (ROI), fsdk::ICrowdEstimator::EstimationRequest structure and returns the estimation result;

  • Implements the estimate() function that accepts fsdk::Span of the source images in R8G8B8 format, fsdk::Span of ROIs, fsdk::ICrowdEstimator::EstimationRequest structure and fsdk::Span of the fsdk::CrowdEstimation structures to return results of estimation.

User is free to choose an estimation type. For this purpose, estimate() method takes one of the estimation requests:

  • fsdk::ICrowdEstimator::EstimationRequest::estimateHeadCount to return people (heads) count only;
  • fsdk::ICrowdEstimator::EstimationRequest::estimateHeadCountAndCoords to return people (heads) count as well as head center coordinates;

The CrowdEstimation structure contains all possible results of the Crowd estimation:

    struct CrowdEstimation {
        size_t count; //!< The number of people (heads) in the image.
        IPointBatchPtr points; //!< Coordinates of people heads. Empty if not requested.
    };

minHeadSize

This estimator can estimate heads with size 3 px and more. In case when such small heads are not required (or not possible in the use-case), user can change the minHeadSize parameter in the config.

Before processing, the images will be resized by minHeadSize/3 times. For example, if the value is minHeadSize=12, then the image will be additionally resized by minHeadSize=12/3=4 times.

Estimator works faster with larger value of minHeadSize.

CrowdEstimatorType

The CrowdEstimation CrowdEstimatorType contains all possible working modes of the Crowd estimator:

    enum CrowdEstimatorType {
        CET_DEFAULT = 0,         //!< Default type which is specified in config file. @see ISettingsProvider
        CET_SINGLE_NET = 1,      //!< Single network mode - only Crowd estimation will be used
        CET_TWO_NETS = 2,        //!< Double network mode - Crowd + HeadDetector
        CET_COUNT
    };

Here are:

  • CET_DEFAULT - the default mode which is recommended to use. The result working mode will be determines by the value in the configuration file faceengine.conf.
  • CET_SINGLE_NET - single network working mode. Only Crowd estimation will be used.
  • CET_TWO_NETS - two networks mode: Crowd estimation and HumanDetector or Crowd estimation and HeadDetector.
  • CET_COUNT - just a stub to check an input correctness, do not use it.

API structure name:

ICrowdEstimator

Plan files:

  • crowd_v2_cpu.plan
  • crowd_v2_cpu-avx2.plan
  • crowd_v2_gpu.plan

Fights Estimation#

Name: FightsEstimator

Algorithm description:

This estimator detects fights on a video by processing several images sequences (batches) one by one from the target video.

This estimator works based on the several image sequences (batches). Each batch should contain the IFightsEstimator::getBatchSize() frames.

Every IFightsEstimator::estimate estimation call returns a context structure as a result. This context structure should be passed to the next estimation call for the current video. If several videos should be processed in parallel, you should keep different context structures - one for each video. For the first estimation call, the context structure should be empty (nullptr). After estimating the IFightsEstimator::getMinBatchCount() batches, the context structure will contain IFightsEstimatorContext::State::Ready. You can then take an estimation result by calling the IFightsEstimatorContext::getResult() method. If more frames should be processed, the succeeding IFightsEstimator::estimate calls are required with passing the context structure.

Input requirements:

  • Frames should be in the fsdk::Format::R8B8G8 format.
  • Video should be about 30 FPS.
    If the video contains more FPS (for example, 60 FPS), we recommend that you do not pass every frame to the estimator (for example, every second frame for the 60 FPS video).

Content requirements:

  • Human bounding box heights in the video should be >=30% frames hight.
    For example, for the video with 640 x 480 resolution the minimum humans bounding box height should be (640 * 0.3) = 192 px.
    For details, see the Human Detection section in the Face detection facility chapter.

Camera requirements:

  • A camera should be static.
  • An RGB camera. The estimator performance on IR cameras is worse.
  • The perspective should be from top to bottom, as on CCTV cameras. The recommended range is 30 to 60 degrees. The images below show examples of suitable angles.
FightsEstimator camera angles
FightsEstimator camera angles
FightsEstimator camera angles
FightsEstimator camera angles
FightsEstimator camera angles
FightsEstimator camera angles
FightsEstimator camera angles
FightsEstimator camera angles
FightsEstimator camera angles
FightsEstimator camera angles

Implementation description:

The estimator (see IFightsEstimator in IFightsEstimator.h):

  • Implements the estimate() function that needs the fsdk::Span (batch) of fsdk::Image objects and the fsdk::IFightsEstimatorContextPtr context object. The result is an error code with updated fsdk::IFightsEstimatorContextPtr context object.

The context structure (see IFightsEstimatorContext in IFightsEstimator.h):

  • Implements the getState() function that takes no arguments. The result is the current estimation state.
    Value IFightsEstimatorContext::State::Ready means that the estimation is completed and the result could be taken from the structure. Value IFightsEstimatorContext::State::NoReady means that the estimation requires more frames to proceed.

  • Implements the getResult() function that takes no arguments. The result is the current estimation result (FightsEstimation structure).

The FightsEstimation structure contains results of the estimation:

    struct FightsEstimation {
        enum class State {
            NoFight, //!< There is no fight on the input frames
            Fight    //!< Fight detected on the input frames
        };
        State state; //!< Estimation status
        float score; //!< Estimation score normalized to [0..1] range
    };

Estimation score is normalized in range [0..1], where 1 - is a real person, 0 - is a fake.

The value of state depends on threshold. You can change the threshold value in the faceengine.conf configuration file. For details, see the FightsEstimator settings section in Configuration Guide.