Parameter Estimation Facility#

Overview#

The estimation facility is the only multi-purpose facility in FaceEngine. It is designed as a collection of tools that help to estimate various images or depicted object properties. These properties may be used to increase the precision of algorithms implemented by other FaceEngine facilities or to accomplish custom user tasks.

Use cases#

ISO estimation#

LUNA SDK provides algorithms for image check according to the requirements of the ISO/IEC 19794-5:2011 standard and compatible standards.

The requirements can be found on the official website: https://www.iso.org/obp/ui/#iso:std:iso-iec:19794:-5:en.

The following algorithms are provided:

Head rotation angles (pitch, yaw, and roll angles). According to section "7.2.2 Pose" in the standard, the angles should be +/- 5 degrees from frontal in pitch and yaw, less than +/- 8 degrees from frontal in roll. See additional information about the algorithm in section "Head Pose".
Gaze. See section "7.2.3 Expression" point "e" of the standard. See additional information about the algorithm in section "Gaze Estimation".
Mouth state (opened, closed, occluded) and additional properties for smile (regular smile, smile with teeths exposed) See section "7.2.3 Expression" points "a", "b", and "c" of the standard. See additional information about the algorithm in section "Mouth Estimation".
Quality of the image:
- Contrast and saturation (insufficient or too large exposure). See sections "7.2.7 Subject and scene lighting" and "7.3.2 Contrast and saturation" of the standard.
- Blurring. See section "7.3.3 Focus and depth of field" of the standard.
- Specularity. See section "7.2.8 Hot spots and specular reflections" and "7.2.12 Lighting artefacts" of the standard.
- Uniformity of illumination. See sections "7.2.7 Subject and scene lighting" and "7.2.12 Lighting artefacts" of the standard.
See additional information about the algorithm in section "Image quality estimation".
Glasses state (no glasses, glasses, sunglasses). See section "7.2.9 Eye glasses" of the standard. See additional information about the algorithm in section "Glasses Estimation".
Eyes state (for each eye: opened, closed, occluded). See sections "7.2.3 Expression" point "a", "7.2.11 Visibility of pupils and irises" and "7.2.13 Eye patches" of the standard. See additional information about the algorithm in section "Eyes Estimation".
Natural light estimation. See section "7.3.4 Unnatural colour" of the standard. See additional information about the algorithm in section "Natural Light Estimation".
Eybrows state: neutral, raised, squinting, frowning. See section "7.2.3 Expression" points "d", "f", and "g" of the standard. See additional information about the algorithm in section "Eyebrows estimation".
Position of a person's shoulders in the original image: the shoulders are parallel to the camera or not. See section "7.2.5 Shoulders" of the standard. See additional information about the algorithm in section "Portrait Style Estimation".
Headwear. Checks if there is a headwear on a person or not. Several types of headwear can be estimated. See section "B.2.7 Head coverings" of the standard. See additional information about the algorithm in section "Headwear Estimation".
Red eyes estimation. Checks if there is a red eyes effect. See section "7.3.4 Unnatural colour" of the standard. See additional information about the algorithm in section "Red Eyes Estimation".
Radial distortion estimation. See section "7.3.6 Radial distortion of the camera lens" of the standard. See additional information about the algorithm in section "Fish Eye Estimation".
Image type estimation: color, grayscale, infrared. See section "7.4.4 Use of near infra-red cameras" of the standard. See additional information about the algorithm in section "Grayscale, color or infrared Estimation".
Background estimation: background uniformity and if a background is too light or too dark. See section "B.2.9 Backgrounds" of the standard. See additional information about the algorithm in section "Background Estimation".

Best shot selection functionality#

BestShotQuality Estimation#

The BestShotQuality estimator was added to evaluate image quality to choose the best image before descriptor extraction.

The estimator (see IBestShotQualityEstimator in IBestShotQualityEstimator.h): - Implements the estimate() function that needs fsdk::Image in R8G8B8 format, fsdk::Detection structure of corresponding source image (see section "Detection structure" in chapter "Face detection facility"), fsdk::IBestShotQualityEstimator::EstimationRequest structure and fsdk::IBestShotQualityEstimator::EstimationResult to store estimation result; - Implements the estimate() function that needs the span of fsdk::Image in R8G8B8 format, the span of fsdk::Detection structures of corresponding source images (see section "Detection structure" in chapter "Face detection facility"), fsdk::IBestShotQualityEstimator::EstimationRequest structure and span of fsdk::IBestShotQualityEstimator::EstimationResult to store estimation results.

Before using this estimator, user is free to decide whether to estimate or not some listed attributes. For this purpose, estimate() method takes one of the estimation requests:

fsdk::IBestShotQualityEstimator::EstimationRequest::estimateAGS to make only AGS estimation;
fsdk::IBestShotQualityEstimator::EstimationRequest::estimateHeadPose to make only Head Pose estimation;
fsdk::IBestShotQualityEstimator::EstimationRequest::estimateAll to make both AGS and Head Pose estimations;

The description of attributes returned by the estimate() method is given below.

AGS#

AGS (garbage score) aims to determine the source image score for further descriptor extraction and matching.

Estimation output is a float score which is normalized in range [0..1]. The closer score to 1, the better matching result is received for the image.

When you have several images of a person, it is better to save the image with the highest AGS score.

Recommended threshold for AGS score is equal to 0.2. But it can be changed depending on the purpose of use. Consult VisionLabs about the recommended threshold value for this parameter.

Head Pose#

Head Pose determines person head rotation angles in 3D space, namely pitch, yaw and roll.

Since 3D head translation is hard to determine reliably without camera-specific calibration, only 3D rotation component is estimated.

Head pose estimation characteristics:

Units (degrees);
Notation (Euler angles);
Precision (see table below).

Prediction precision decreases as a rotation angle increases. We present typical average errors for different angle ranges in the table below.

"Head pose prediction precision"

	Range	-45°...+45°	< -45° or > +45°
Average prediction error (per axis)	Yaw	±2.7°	±4.6°
Average prediction error (per axis)	Pitch	±3.0°	±4.8°
Average prediction error (per axis)	Roll	±3.0°	±4.6°

Zero position corresponds to a face placed orthogonally to camera direction, with the axis of symmetry parallel to the vertical camera axis.

Image quality estimation#

The estimator is trained to work with warped images (see Chapter "Image warping" for details).

The general rule of thumb for quality estimation:

Detect a face, see if detection confidence is high enough. If not, reject the detection;
Produce a warped face image (see chapter "Descriptor processing facility") using a face detection and its landmarks;
Estimate visual quality using the estimator, finally reject low-quality images.

While the scheme above might seem a bit complicated, it is the most efficient performance-wise, since possible rejections on each step reduce workload for the next step.

At the moment estimator exposes two interface functions to predict image quality:

virtual Result estimate(const Image& warp, Quality& quality);
virtual Result estimate(const Image& warp, SubjectiveQuality& quality);

Each one of this functions use its own CNN internally and return slightly different quality criteria.

The first CNN is trained specifically on pre-warped human face images and will produce lower score factors if one of the following conditions are satisfied:

Image is blurred;
Image is under-exposured (i.e., too dark);
Image is over-exposured (i.e., too light);
Image color variation is low (i.e., image is monochrome or close to monochrome).

Each one of this score factors is defined in [0..1] range, where higher value corresponds to better image quality and vice versa.

Recommended thresholds for image quality of the first interface function are given below:

"saturationThreshold": 0.0; "blurThreshold": 0.93; "lightThreshold": 0.9; "darkThreshold": 0.9;

The second interface function output will produce lower factor if:

The image is blurred;
The image is underexposed (i.e., too dark);
The image is overexposed (i.e., too light);
The face in the image is illuminated unevenly (there is a great difference between light and dark regions);
Image contains flares on face (too specular).

The estimator determines the quality of the image based on each of the aforementioned parameters. For each parameter, the estimator function returns two values: the quality factor and the resulting verdict.

As with the first estimator function the second one will also return the quality factors in the range [0..1], where 0 corresponds to low image quality and 1 to high image quality. E. g., the estimator returns low quality factor for the Blur parameter, if the image is too blurry.

The resulting verdict is a quality output based on the estimated parameter. E. g., if the image is too blurry, the estimator returns “isBlurred = true”.

The threshold can be specified for each of the estimated parameters. The resulting verdict and the quality factor are linked through this threshold. If the received quality factor is lower than the threshold, the image quality is low and the estimator returns “true”. E. g., if the image blur quality factor is higher than the threshold, the resulting verdict is “false”.

If the estimated value for any of the parameters is lower than the corresponding threshold, the image is considered of bad quality. If resulting verdicts for all the parameters are set to "False" the quality of the image is considered good.

Examples are presented in the images below. Good quality images are shown on the right.

Blurred image (left), not blurred image (right)

Dark image (left), good quality image (right)

Light image (left), good quality image (right)

Image with uneven illumination (left), image with even illumination (right)

Image with specularity - image contains flares on face (left), good quality image (right)

The quality factor is a value in the range [0..1] where 0 corresponds to low quality and 1 to high quality.

Illumination uniformity corresponds to the face illumination in the image. The lower the difference between light and dark zones of the face, the higher the estimated value. When the illumination is evenly distributed throughout the face, the value is close to "1".

Specularity is a face possibility to reflect light. The higher the estimated value, the lower the specularity and the better the image quality. If the estimated value is low, there are bright glares on the face.

Image quality parameters and their thresholds

Threshold	Estimated property	Recomended range	Default value
blurThreshold	Blur	[0.57..0.65]	0.61
darknessThreshold	Darkness	[0.45..0.52]	0.50
lightThreshold	Light	[0.44..0.61]	0.57
illuminationThreshold	Illumination uniformity	[0..0.3]	0.1
specularityThreshold	Specularity	[0..0.3]	0.1

The most important parameters for face recognition are "blurThreshold", "darknessThreshold" and "lightThreshold", so you should select them carefully.

You can select images of better visual quality by setting higher values of the "illuminationThreshold" and "specularityThreshold". Face recognition is not greatly affected by uneven illumination or glares.

Attributes estimation functionality#

Face Attribute Estimation#

The estimator is trained to work with warped images (see Chapter "Image warping" for details).

The Attribute estimator determines face attributes. Currently, the following attributes are available:

Age: determines person's age;
Gender: determines person's gender;

Before using attribute estimator, user is free to decide whether to estimate or not some specific attributes listed above through IAttributeEstimator::EstimationRequest structure, which later get passed in main estimate() method. Estimator overrides IAttributeEstimator::AttributeEstimationResult output structure, which consists of optional fields describing results of user requested attributes.

Age is reported in years:
- For cooperative (see "Appendix B. Glossary") conditions: average error depends on person age, see table below for additional details. Estimation precision is 2.3
For gender estimation 1 means male, 0 means female.
- Estimation precision in cooperative mode is 99.81% with the threshold 0.5;
- Estimation precision in non-cooperative mode is 92.5%.

"Average age estimation error per age group for cooperative conditions"

Age (years)	Average error (years)
0-3	±3.3
4-7	±2.97
8-12	±3.06
13-17	±4.05
17-20	±3.89
20-25	±1.89
25-30	±1.88
30-35	±2.42
35-40	±2.65
40-45	±2.78
45-50	±2.88
50-55	±2.85
55-60	±2.86
60-65	±3.24
65-70	±3.85
70-75	±4.38
75-80	±6.79

Note In earlier releases of Luna SDK Attribute estimator worked poorly in non-cooperative mode (only 56% gender estimation precision), and did not estimate child's age. Having solved these problems average estimation error per age group got a bit higher due to extended network functionality.

Child Estimation#

This estimator tells whether the person is child or not. Child is a person who younger than 18 years old. It returns a structure with 2 fields. One is the score in the range from 0.0 (is adult) to 1.0 (maximum, is child), the second is a boolean answer. Boolean answer depends on the threshold in config (faceengine.conf). If the value is more than the threshold, the answer is true (person is child), else - false (person is adult).

The estimator (see IChildEstimator in IChildEstimator.h):

Implements the estimate() function accepts warped source image (see chapter "Image warping"). Warped image is received from the warper (see IWarper::warp());
Estimates whether the person is child or not on input warped image;
Outputs ChildEstimation structure. Structure consists of score of and boolean answer.

Credibility Check Estimation#

This estimator estimates reliability of a person.

The estimator (see ICredibilityCheckEstimator in ICredibilityCheckEstimator.h):

Implements the estimate() function that accepts warped image in R8B8G8 format and fsdk::CredibilityCheckEstimation structure.
Implements the estimate() function that accepts span of warped images in R8B8G8 format and span of fsdk::CredibilityCheckEstimation structures.

Note. The estimator is trained to work with face images that meet the following requirements:

"Requirements for fsdk::HeadPoseEstimation"

Attribute	Acceptable angle range(degrees)
pitch	[-20...20]
yaw	[-20...20]
roll	[-20...20]

"Requirements for fsdk::SubjectiveQuality"

Attribute	Minimum value
blur	0.61
light	0.57

"Requirements for fsdk::AttributeEstimationResult"

Attribute	Minimum value
age	18

"Requirements for fsdk::OverlapEstimation"

Attribute	State
overlapped	false

"Requirements for fsdk::Detection"

Attribute	Minimum value
detection size	100

Note. Detection size is detection width.

const fsdk::Detection detection = ... // somehow get fsdk::Detection object
const int detectionSize = detection.getRect().width;

Facial Hair Estimation#

This estimator aims to detect a facial hair type on the face in the source image. It can return the next results:

There is no hair on the face (see FacialHair::NoHair field in the FacialHair enum);
There is stubble on the face (see FacialHair::Stubble field in the FacialHair enum);
There is mustache on the face (see FacialHair::Mustache field in the FacialHair enum);
There is beard on the face (see FacialHair::Beard field in the FacialHair enum);

The estimator (see IFacialHairEstimator in IFacialHairEstimator.h):

Implements the estimate() function that accepts source warped image in R8G8B8 format and FacialHairEstimation structure to return results of estimation;
Implements the estimate() function that accepts fsdk::Span of the source warped images in R8G8B8 format and fsdk::Span of the FacialHairEstimation structures to return results of estimation.

FacialHair enumeration#

The FacialHair enumeration contains all possible results of the FacialHair estimation:

    enum class FacialHair {
        NoHair = 0,                 //!< no hair on the face
        Stubble,                    //!< stubble on the face
        Mustache,                   //!< mustache on the face
        Beard                       //!< beard on the face
    };

FacialHairEstimation structure#

The FacialHairEstimation structure contains results of the estimation:

    struct FacialHairEstimation {
        FacialHair result;          //!< estimation result (@see FacialHair enum)
        // scores
        float noHairScore;          //!< no hair on the face score
        float stubbleScore;         //!< stubble on the face score
        float mustacheScore;        //!< mustache on the face score
        float beardScore;           //!< beard on the face score
    };

There are two groups of the fields:

The first group contains only the result enum:

        FacialHair result;          //!< estimation result (@see FacialHair enum)

Result enum field FacialHairEstimation contain the target results of the estimation.

The second group contains scores:

        float noHairScore;          //!< no hair on the face score
        float stubbleScore;         //!< stubble on the face score
        float mustacheScore;        //!< mustache on the face score
        float beardScore;           //!< beard on the face score

The scores group contains the estimation scores for each possible result of the estimation. All scores are defined in [0,1] range. Sum of scores always equals 1.

Note. The estimator is trained to work with face images that meet the following requirements:

"Requirements for fsdk::HeadPoseEstimation"

Attribute	Acceptable angle range(degrees)
pitch	[-40...40]
yaw	[-40...40]
roll	[-40...40]

"Requirements for fsdk::MedicalMaskEstimation"

Attribute	State
result	fsdk::MedicalMask::NoMask

"Requirements for fsdk::Detection"

Attribute	Minimum value
detection size	40

Note. Detection size is detection width.

const fsdk::Detection detection = ... // somehow get fsdk::Detection object
const int detectionSize = detection.getRect().width;

Natural Light Estimation#

This estimator aims to detect a natural light on the source face image. It can return the next results:

Light is not natural on the face image (see LightStatus::NonNatural field in the LightStatus enum);
Light is natural on the face image (see LightStatus::Natural field in the LightStatus enum);

The estimator (see INaturalLightEstimator in INaturalLightEstimator.h):

Implements the estimate() function that accepts source warped image in R8G8B8 format and NaturalLightEstimation structure to return results of estimation;
Implements the estimate() function that accepts fsdk::Span of the source warped images in R8G8B8 format and fsdk::Span of the NaturalLightEstimation structures to return results of estimation.

LightStatus enumeration#

The LightStatus enumeration contains all possible results of the NaturalLight estimation:

    enum class LightStatus : uint8_t {
        NonNatural = 0,                   //!< light is not natural
        Natural = 1                       //!< light is natural
    };

NaturalLightEstimation structure#

The NaturalLightEstimation structure contains results of the estimation:

    struct NaturalLightEstimation {
        LightStatus status;               //!< estimation result (@see NaturalLight enum).
        float score;                      //!< Numerical value in range [0, 1].
    };

There are two groups of the fields:

The first group contains only the result enum:

        LightStatus status;               //!< estimation result (@see LightStatus enum).

Result enum field NaturalLightEstimation contain the target results of the estimation.

The second group contains scores:

        float score;                      //!< Numerical value in range [0, 1].

The scores group contains the estimation scores for each possible result of the estimation. All scores are defined in [0,1] range. Sum of scores always equals 1.

Note. The estimator is trained to work with face images that meet the following requirements:

"Requirements for fsdk::MedicalMaskEstimation"

Attribute	State
result	fsdk::MedicalMask::NoMask

"Requirements for fsdk::SubjectiveQuality"

Attribute	Minimum value
blur	0.5

Also fsdk::GlassesEstimation must not be equal to fsdk::GlassesEstimation::SunGlasses

Fish Eye Estimation#

This estimator aims to detect a fish eye effect on the source face image. It can return the next results:

There is no fish eye effect on the face image (see FishEye::NoFishEyeEffect field in the FishEye enum);
There is fish eye effect on the face image (see FishEye::FishEyeEffect field in the FishEye enum).

The estimator (see IFishEyeEstimator in IFishEyeEstimator.h):

Implements the estimate() function that accepts source image in R8G8B8 format, face detection and FishEyeEstimation structure to return results of estimation;
Implements the estimate() function that accepts fsdk::Span of the source images in R8G8B8 format, fsdk::Span of the face detections and fsdk::Span of the FishEyeEstimation structures to return results of estimation.

FishEye enumeration#

The FishEye enumeration contains all possible results of the FishEye estimation:

    enum class FishEye {
        NoFishEyeEffect = 0,  //!< no fish eye effect
        FishEyeEffect = 1     //!< with fish eye effect
    };

FishEyeEstimation structure#

The FishEyeEstimation structure contains results of the estimation:

    struct FishEyeEstimation {
        FishEye result;       //!< estimation result (@see FishEye enum)
        float score;          //!< fish eye effect score
    };

There are two groups of the fields:

The first group contains only the result enum:

        FishEye result;       //!< estimation result (@see FishEye enum)

Result enum field FishEyeEstimation contain the target results of the estimation.

The second group contains scores:

        float score;          //!< fish eye effect score

The scores group contains the estimation score.

Note. The estimator is trained to work with face images that meet the following requirements:

"Requirements for fsdk::HeadPoseEstimation"

Attribute	Acceptable angle range(degrees)
pitch	[-20...20]
yaw	[-25...25]
roll	[-10...10]

"Requirements for fsdk::Detection"

Attribute	Minimum value
detection size	80

Note. Detection size is detection width.

const fsdk::Detection detection = ... // somehow get fsdk::Detection object
const int detectionSize = detection.getRect().width;

Also, the estimator is designed to be used on a face images from a cooperative domain. Which means:

High image quality;
Frontal face looking directly at the camera.

Eyebrows estimation#

The EyeBrowEstimator is trained to estiamte eyebrow expressions. The EyeBrowEstimator returning four scores for each possible eyebrow expression. Which are - neutral, raised, squinting, frowning. Possible scores are in the range [0, 1].

If score closer to 1, it means that detected expression on image is more likely to real expression and closer to 0 otherwise.

Along with the output score value estimator also returns an enum value (EyeBrowState). The index of the maximum score determines the EyeBrow state.

Implements the estimate() function accepts warped source image (see chapter "Image warping"). Warped image is received from the warper (see IWarper::warp()); Output estimation is a structure fsdk::EyeBrowEstimation.
Implements the estimate() function that needs the span of warped source images and span of structure fsdk::EyeBrowEstimation. Output estimation is a span of structure fsdk::EyeBrowEstimation.

EyeBrowState enumeration#

The EyeBrowEstimation structure contains results of the estimation:

struct EyeBrowEstimation {
        /**
         * @brief EyeBrow estimator output enum.
         * This enum contains all possible estimation results.
        **/
        enum class EyeBrowState {
            Neutral = 0,
            Raised,
            Squinting,
            Frowning
        };


        float neutralScore;        //!< 0(not neutral)..1(neutral).
        float raisedScore;         //!< 0(not raised)..1(raised).
        float squintingScore;      //!< 0(not squinting)..1(squinting).
        float frowningScore;       //!< 0(not frowning)..1(frowning).
        EyeBrowState eyeBrowState; //!< EyeBrow state
    };

"Requirements for fsdk::EyeBrowEstimation"

Attribute	Acceptable values
headPose.pitch	[-20...20]
headPose.yaw	[-20...20]
headPose.roll	[-20...20]

"Requirements for fsdk::Detection"

Attribute	Minimum value
detection size	80

Note. Detection size is detection width.

const fsdk::Detection detection = ... // somehow get fsdk::Detection object
const int detectionSize = detection.getRect().width;

Portrait Style Estimation#

This estimator is designed to estimate the position of a person's shoulders in the original image. It can return the following results:

The shoulders are not parallel to the camera (see the PortraitStyleStatus::NonPortrait field in the PortraitStyleStatus enum);
Shoulders are parallel to the camera (see the PortraitStyleStatus::Portrait field in the PortraitStyleStatus enum);

Estimator (see IPortraitStyleEstimator in IPortraitStyleEstimator.h):

Implemented estimate() function that accepts R8G8B8 source image, detection and PortraitStyleEstimation structure to return estimation results;
Implements an estimate() function that accepts fsdk::Span of R8G8B8 source images, fsdk::Span of detections, and fsdk::Span of PortraitStyleEstimation structures to return estimation results.

PortraitStyleStatus enumeration#

The PortraitStyleStatus enumeration contains all possible results of the PortraitStyle estimation:

    enum class PortraitStyleStatus : uint8_t {
        NonPortrait = 0,     //!< NonPortrait
        Portrait = 1         //!< Portrait
    };

PortraitStyleEstimation structure#

The PortraitStyleEstimation structure contains results of the estimation:

    struct PortraitStyleEstimation {
        PortraitStyleStatus status; //!< estimation result (@see PortraitStyleStatus enum).
        float score;                //!< numerical value in range [0, 1].
    };

There are two groups of the fields:

The first group contains the enum:

        PortraitStyleStatus status; //!< estimation result (@see PortraitStyleStatus enum).

Result enum field PortraitStyleStatus contain the target results of the estimation.

The second group contains score:

        float score;                //!< numerical value in range [0, 1].

The score is defined in [0,1] range.

Note. The estimator is trained to work with face images that meet the following requirements:

Type of preferable detector is FaceDetV3.

"Requirements for Detector"

Attribute	Min face size
result	40

"Requirements for fsdk::HeadPoseEstimation"

Attribute	Maximum value
yaw	20.0
pitch	20.0
rooll	20.0

Headwear Estimation#

This estimator aims to detect a headwear status and headwear type on the face in the source image.

It can return the next headwear status results:

There is headwear (see HeadWearState::Yes field in the HeadWearState enum);
There is no headwear (see HeadWearState::No field in the HeadWearState enum);

And this headwear type results:

There is no headwear on the head (see HeadWearType::NoHeadWear field in the HeadWearType enum);
There is baseball cap on the head (see HeadWearType::BaseballCap field in the HeadWearType enum);
There is beanie on the head (see HeadWearType::Beanie field in the HeadWearType enum);
There is peaked cap on the head (see HeadWearType::PeakedCap field in the HeadWearType enum);
There is shawl on the head (see HeadWearType::Shawl field in the HeadWearType enum);
There is hat with ear flaps on the head (see HeadWearType::HatWithEarFlaps field in the HeadWearType enum);
There is helmet on the head (see HeadWearType::Helmet field in the HeadWearType enum);
There is hood on the head (see HeadWearType::Hood field in the HeadWearType enum);
There is hat on the head (see HeadWearType::Hat field in the HeadWearType enum);
There is something other on the head (see HeadWearType::Other field in the HeadWearType enum);

The estimator (see IHeadWearEstimator in IHeadWearEstimator.h):

Implements the estimate() function that accepts warped image in R8G8B8 format and HeadWearEstimation structure to return results of estimation;
Implements the estimate() function that accepts fsdk::Span of the source warped images in R8G8B8 format and fsdk::Span of the HeadWearEstimation structures to return results of estimation.

HeadWearState enumeration#

The HHeadWearState enumeration contains all possible results of the Headwear state estimation:

    enum class HeadWearState {
        Yes = 0,           //< there is headwear
        No,                //< there is no headwear
        Count
    };

HeadWearType enumeration#

The HeadWearType enumeration contains all possible results of the Headwear type estimation:

    enum class HeadWearType : uint8_t {
        NoHeadWear = 0,     //< there is no headwear on the head
        BaseballCap,        //< there is baseball cap on the head
        Beanie,             //< there is beanie on the head
        PeakedCap,          //< there is peaked cap on the head
        Shawl,              //< there is shawl on the head
        HatWithEarFlaps,    //< there is hat with ear flaps on the head
        Helmet,             //< there is helmet on the head
        Hood,               //< there is hood on the head
        Hat,                //< there is hat on the head
        Other,              //< something other is on the head
        Count
    };

HeadWearStateEstimation structure#

The HeadWearStateEstimation structure contains results of the Headwear state estimation:

    struct HeadWearStateEstimation {
        HeadWearState result; //!< estimation result (@see HeadWearState enum)
        float scores[static_cast<int>(HeadWearState::Count)]; //!< estimation scores

        /**
         * @brief Returns score of required headwear state.
         * @param [in] state headwear state.
         * @see HeadWearState for more info.
         * */
        inline float getScore(HeadWearState state) const;
    };

There are two groups of the fields:

The first group contains only the result enum:

        HeadWearState result; //!< estimation result (@see HeadWearState enum)

The second group contains scores:

        float scores[static_cast<int>(HeadWearState::Count)]; //!< estimation scores

The scores group contains the estimation scores for each possible result of the estimation. All scores are defined in [0,1] range. Sum of scores always equals 1.

HeadWearTypeEstimation structure#

The HeadWearTypeEstimation structure contains results of the Headwear type estimation:

    struct HeadWearTypeEstimation {
        HeadWearType result; //!< estimation result (@see HeadWearType enum)
        float scores[static_cast<int>(HeadWearType::Count)]; //!< estimation scores

        /**
         * @brief Returns score of required headwear type.
         * @param [in] type headwear type.
         * @see HeadWearType for more info.
         * */
        inline float getScore(HeadWearType type) const;
    };

There are two groups of the fields:

The first group contains only the result enum:

        HeadWearType result; //!< estimation result (@see HeadWearType enum)

The second group contains scores:

        float scores[static_cast<int>(HeadWearType::Count)]; //!< estimation scores

The scores group contains the estimation scores for each possible result of the estimation. All scores are defined in [0,1] range. Sum of scores always equals 1.

HeadWearEstimation structure#

The HeadWearEstimation structure contains results of both Headwear state and type estimations:

    struct HeadWearEstimation {
        HeadWearStateEstimation state;  //!< headwear state estimation 
                                        //!< (@see HeadWearStateEstimation)
        HeadWearTypeEstimation type;    //!< headwear type estimation 
                                        //!< (@see HeadWearTypeEstimation)
    };

Background Estimation#

This estimator is designed to estimate the background in the original image. It can return the following results:

The background is non-solid (see the BackgroundStatus::NonSolid field in the BackgroundStatus enum);
The background is solid (see the BackgroundStatus::Solid field in the BackgroundStatus enum);

Estimator (see IBackgroundEstimator in IBackgroundEstimator.h):

Implemented estimate() function that accepts R8G8B8 source image, detection and BackgroundEstimation structure to return estimation results;
Implements an estimate() function that accepts fsdk::Span of R8G8B8 source images, fsdk::Span of detections, and fsdk::Span of BackgroundEstimation structures to return estimation results.

BackgroundStatus enumeration#

The BackgroundStatus enumeration contains all possible results of the Background estimation:

    enum class BackgroundStatus : uint8_t {
        NonSolid = 0,     //!< NonSolid
        Solid = 1         //!< Solid
    };

BackgroundEstimation structure#

The BackgroundEstimation structure contains results of the estimation:

    struct BackgroundEstimation {
        BackgroundStatus status;    //!< estimation result (@see BackgroundStatus enum).
        float backgroundScore;      //!< numerical value in range [0, 1], where 1 - is uniform background, 0 - is non uniform.
        float backgroundColorScore; //!< numerical value in range [0, 1], where 1 - is light background, 0 - is too dark.
    };

There are two groups of the fields:

The first group contains the enum:

        BackgroundStatus status;    //!< estimation result (@see BackgroundStatus enum).

Result enum field BackgroundStatus contain the target results of the estimation.

The second group contains scores:

        float backgroundScore;      //!< numerical value in range [0, 1], where 1 - is solid background, 0 - is non solid.
        float backgroundColorScore; //!< numerical value in range [0, 1], where 1 - is light background, 0 - is too dark.

The scores are defined in [0,1] range. If two scores are above the threshold, then the background is solid, otherwise the background is not solid.

Note. The estimator is trained to work with face images that meet the following requirements:

The type of preferable detector is FaceDetV3.

"Requirements for Detector"

Attribute	Min face size
result	40

"Requirements for fsdk::HeadPoseEstimation"

Attribute	Maximum value
yaw	20.0
pitch	20.0
roll	20.0

Grayscale, color or infrared Estimation#

BlackWhite estimator has two interfaces.

By full frame#

This interface detects if an input image is grayscale or color. It is indifferent to image content and dimensions; you can pass both face crops (including warped images) and full frames.

It implements estimate() function that accepts source image and outputs a boolean, indicating if the image is grayscale (true) or not (false).

By warped frame#

The second interface can be used only with warped images (see Chapter "Image warping" for details). Checks if an image is color, grayscale or infrared.

Implements the estimate() function that accepts warped source image (see Chapter "Image warping" for details).
Outputs ImageColorEstimation structures.

    struct ImageColorEstimation {

        float colorScore;       //!< 0(grayscale)..1(color);
        float infraredScore;    //!< 0(infrared)..1(not infrared);

        /**
         * @brief Enumeration of possible image color types.
         * */
        enum class ImageColorType : uint8_t {
            Color = 0,     //!< image is color.
            Grayscale,     //!< Image is grayscale.
            Infrared,      //!< Image is infrared.
        };

        ImageColorType colorType;
    };

ImageColorEstimation::ImageColorType presents color image type as enum with possible values: Color, Grayscale, Infrared.

- For color image score `colorScore` will be close to 1.0 and the second one `infraredScore` - to 0.0;
- for infrared image score `colorScore` will be close to 0.0 and the second one `infraredScore` - to 1.0;
- for grayscale images both of scores will be near 0.0.

Note. Both interfaces use different principles of color type estimation.

Note. BlackWhite estimator is trained to work with real warped photo of faces. We do not guarantee correctness when the people in the photo are fake (not real, such as the photo in the photo).

Face features extraction functionality#

Eyes Estimation#

The estimator is trained to work with warped images (see Chapter "Image warping" for details).

For this type of estimator can be defined sensor type.

This estimator aims to determine:

Eye state: Open, Closed, Occluded;
Precise eye iris location as an array of landmarks;
Precise eyelid location as an array of landmarks.

You can only pass warped image with detected face to the estimator interface. Better image quality leads to better results.

Eye state classifier supports three categories: "Open", "Closed", "Occluded". Poor quality images or ones that depict obscured eyes (e.g. eyewear, hair, gestures) fall into the "Occluded" category. It is always a good idea to check eye state before using the segmentation result.

The precise location allows iris and eyelid segmentation. The estimator is capable of outputting iris and eyelid shapes as an array of points together forming an ellipsis. You should only use segmentation results if the state of that eye is "Open".

The estimator:

Implements the estimate() function that accepts warped source image (see Chapter "Image warping") and warped landmarks, either of type Landmarks5 or Landmarks68. The warped image and landmarks are received from the warper (see IWarper::warp());
Classifies eyes state and detects its iris and eyelid landmarks;
Outputs EyesEstimation structures.

Orientation terms 'left' and 'right' refer to the way you see the image as it is shown on the screen. It means that left eye is not necessarily left from the person's point of view, but is on the left side of the screen. Consequently, right eye is the one on the right side of the screen. More formally, the label 'left' refers to subject left eye (and similarly for the right eye), such that xright < xleft.

EyesEstimation::EyeAttributes presents eye state as enum EyeState with possible values: Open, Closed, Occluded.

Iris landmarks are presented with a template structure Landmarks that is specialized for 32 points.

Eyelid landmarks are presented with a template structure Landmarks that is specialized for 6 points.

Red Eyes Estimation#

The estimator is trained to work with warped images (see Chapter "Image warping" for details) and warped landmarks.

Red Eye estimator evaluates whether a person's eyes are red in a photo or not.

You can pass only warped images with detected faces to the estimator interface. Better image quality leads to better results.

The estimator (see IRedEyeEstimator in IEstimator.h):

Implements the estimate() function that accepts warped source image (see Chapter "Image warping") in R8G8B8 format and warped Landmarks5. The warped image and landmarks are received from the warper (see IWarper::warp());.
Implements the estimate() function that accepts fsdk::Span of the source warped images in R8G8B8 format and fsdk::Span of warped Landmarks.
Outputs RedEyeEstimation structure.

RedEyeEstimation structure consists of attributes for each eye. Eye attributes consists of a score of and status. Scores is normalized float value in a range of [0..1] where 1 is red eye and 0 is not.

RedEyeEstimation structure#

The RedEyeEstimation structure contains results of the estimation:

    struct RedEyeEstimation {
        /**
         * @brief Eyes attribute structure.
         * */
        struct RedEyeAttributes {
            RedEyeStatus status;    //!< Status of an eye.
            float score;            //!< Score, numerical value in range [0,1].
        };

        RedEyeAttributes leftEye;  //!< Left eye attributes
        RedEyeAttributes rightEye; //!< Right eye attributes
    };

There are two groups of the fields in RedEyeAttributes:

The first field is a status:

        RedEyeStatus status;    //!< Status of an eye.

The second field is a score, which defined in [0,1] range:

        float score;       //!< Score, numerical value in range [0, 1].

Enumeration of possible red eye statuses.

    enum class RedEyeStatus : uint8_t {
        NonRed,     //!< Eye is not red.
        Red,        //!< Eye is red.
    };

Note. The estimator is trained to work with face images that meet the following requirements:

"Requirements for fsdk::NaturalLight"

Attribute	Minimum value
score	0.5

"Requirements for fsdk::SubjectiveQuality"

Attribute	Minimum value
blur	0.61
light	0.57
darkness	0.5
illumination	0.1
specularity	0.1

Also fsdk::GlassesEstimation must not be equal to fsdk::GlassesEstimation::SunGlasses

Gaze Estimation#

This estimator is designed to determine gaze direction relatively to head pose estimation. Since 3D head translation is hard to determine reliably without camera-specific calibration, only 3D rotation component is estimated.

For this type of estimator can be defined sensor type.

Estimation characteristics:

Units (degrees);
Notation (Euler angles);
Precision (see table below).

Roll angle is not estimated, prediction precision decreases as a rotation angle increases. We present typical average errors for different angle ranges in the table below.

"Gaze prediction precision"

	Range	-25°...+25°	-25° ... -45 ° or 25 ° ... +45°
Average prediction error (per axis)	Yaw	±2.7°	±4.6°
Average prediction error (per axis)	Pitch	±3.0°	±4.8°

Zero position corresponds to a gaze direction orthogonally to face plane, with the axis of symmetry parallel to the vertical camera axis.

Glasses Estimation#

Glasses estimator is designed to determine whether a person is currently wearing any glasses or not. There are 3 types of states estimator is currently able to estimate:

NoGlasses state determines whether a person is wearing any glasses at all;
EyeGlasses state determines whether a person is wearing eyeglasses;
SunGlasses state determines whether a person is wearing sunglasses.

Note. Source input image must be warped in order for estimator to work properly (see Chapter "Image warping"). Quality of estimation depends on threshold values located in faceengine configuration file (faceengine.conf) in GlassesEstimator::Settings section. By default, these threshold values are set to optimal.

Table below contain true positive rates corresponding to selected false positive rates.

"Glasses estimator TPR/FPR rates"

State	TPR	FPR
NoGlasses	0.997	0.00234
EyeGlasses	0.9768	0.000783
SunGlasses	0.9712	0.000383

Overlap Estimation#

This estimator tells whether the face is overlapped by any object. It returns a structure with 2 fields. One is the value of overlapping in the range [0..1] where 0 is not overlapped and 1.0 is overlapped, the second is a Boolean answer. A Boolean answer depends on the threshold listed below. If the value is greater than the threshold, the answer returns true, else false.

The estimator (see IOverlapEstimator in IOverlapEstimator.h):

Implements the estimate() function that accepts source image in R8G8B8 format and fsdk::Detection structure of corresponding source image (see section "Detection structure");
Estimates whether the face is overlapped by any object on input image;
Outputs structure with value of overlapping and Boolean answer.

Emotion estimation functionality#

Emotions Estimation#

The estimator is trained to work with warped images (see Chapter "Image warping" for details).

This estimator aims to determine whether a face depicted on an image expresses the following emotions:

Anger
Disgust
Fear
Happiness
Surprise
Sadness
Neutrality

You can pass only warped images with detected faces to the estimator interface. Better image quality leads to better results.

The estimator (see IEmotionsEstimator in IEmotionsEstimator.h):

Implements the estimate() function that accepts warped source image (see Chapter "Image warping"). Warped image is received from the warper (see IWarper::warp());
Estimates emotions expressed by the person on a given image;
Outputs EmotionsEstimation structure with aforementioned data.

EmotionsEstimation presents emotions as normalized float values in the range of [0..1] where 0 is lack of a specific emotion and 1 is the maximum intensity of an emotion.

Mouth Estimation Functionality#

Mouth Estimation#

This estimator is designed to predict person's mouth state. It returns the following bool flags:

    bool isOpened;   //!< Mouth is opened flag
    bool isSmiling;  //!< Person is smiling flag
    bool isOccluded; //!< Mouth is occluded flag

Each of these flags indicate specific mouth state that was predicted.

The combined mouth state is assumed if multiple flags are set to true. For example there are many cases where person is smiling and its mouth is wide open.

Mouth estimator provides score probabilities for mouth states in case user need more detailed information:

    float opened;    //!< mouth opened score
    float smile;     //!< person is smiling score
    float occluded;  //!< mouth is occluded score

MouthEstimator thresholds#

The estimator returns several scores - one for each possible result. The final result calculated based on that scores and thresholds. If some score is above the corresponding threshold, that result is estimated as final. The default values for all thresholds are taken from the configuration file. See "Mouth Estimator settings" in Configure guide for details.

Mouth Estimation Extended#

This estimation is extended version of regular Mouth Estimation. In addition, It returns the following fields:

    SmileTypeScores smileTypeScores; //!< Smile types scores
    SmileType smileType; //!< Contains smile type if person "isSmiling"

If flag isSmiling is true, you can get more detailed information of smile using smileType variable. smileType can hold following states:

    enum class SmileType {
        None,  //!< No smile
        SmileLips, //!< regular smile, without teeths exposed
        SmileOpen //!< smile with teeths exposed
    };

If isSmiling is false, the smileType assigned to None. Otherwise, the field will be assigned with SmileLips (person is smiling with closed mouth) or SmileOpen (person is smiling with open mouth, with teeth's exposed).

Extended mouth estimation provides score probabilities for smile type in case user need more detailed information:

    struct SmileTypeScores {
        float smileLips; //!< person is smiling with lips score
        float smileOpen; //!< person is smiling with open mouth score
    };

smileType variable is set based on according scores hold by smileTypeScores variable - set based on maximum score from smileLips and smileOpen or to None if person not smiling at all.

    if (estimation.isSmiling)
        estimation.smileType = estimation.smileTypeScores.smileLips > estimation.smileTypeScores.smileOpen ? 
            fsdk::SmileType::SmileLips : fsdk::SmileType::SmileOpen;
    else
        estimation.smileType = fsdk::SmileType::None;

Note. When you use Mouth Estimation Extended, the underlying computations are exactly the same as if you use regular Mouth Estimation. The regular Mouth Estimation was retained for backward compatibility.

These estimators are trained to work with warped images (see Chapter "Image warping" for details).

Liveness check functionality#

HeadAndShouldersLiveness Estimation#

This estimator tells whether the person's face is real or fake (photo, printed image) and confirms presence of a person's body in the frame. Face should be in the center of the frame and the distance between the face and the frame borders should be three times greater than space that face takes up in the frame. Both person's face and chest have to be in the frame. Camera should be placed at the waist level and directed from bottom to top. The estimator check for borders of a mobile device to detect fraud. So there should not be any rectangular areas within the frame (windows, pictures, etc.).

The estimator (see IHeadAndShouldersLiveness in IHeadAndShouldersLiveness.h):

Implements the estimateHeadLiveness() function that accepts source image in R8G8B8 format and fsdk::Detection structure of corresponding source image (see section "Detection structure" in chapter "Detection facility").
Estimates whether it is a real person or not. Outputs float normalized score in range [0..1], 1 - is real person, 0 - is fake. Implements the estimateShouldersLiveness() function that accepts source image in R8G8B8 format and fsdk::Detection structure of corresponding source image (see section "Detection structure" in chapter "Face detection facility"). Estimates whether real person or not. Outputs float score normalized in range [0..1], 1 - is real person, 0 - is fake.

LivenessFlyingFaces Estimation#

This estimator tells whether the person's face is real or fake (photo, printed image).

The estimator (see ILivenessFlyingFacesEstimator in ILivenessFlyingFacesEstimator.h):

Implements the estimate() function that needs fsdk::Image with valid image in R8G8B8 format and fsdk::Detection of corresponding source image (see section "Detection structure" in chapter "Face detection facility").
Implements the estimate() function that needs the span of fsdk::Image with valid source images in R8G8B8 formats and span of fsdk::Detection of corresponding source images (see section "Detection structure" in chapter "Face detection facility").

Those methods estimate whether different persons are real or not. Corresponding estimation output with float scores which are normalized in range [0..1], where 1 - is real person, 0 - is fake.

Note. The estimator is trained to work in combination with fsdk::ILivenessRGBMEstimator.

Note. The estimator is trained to work with face images that meet the following requirements:

"Requirements for fsdk::BestShotQualityEstimator::EstimationResult"

Attribute	Acceptable values
headPose.pitch	[-30...30]
headPose.yaw	[-30...30]
headPose.roll	[-40...40]
ags	[0.5...1.0]

"Requirements for fsdk::Detection"

Attribute	Minimum value
detection size	80

Note. Detection size is detection width.

const fsdk::Detection detection = ... // somehow get fsdk::Detection object
const int detectionSize = detection.getRect().width;

LivenessRGBM Estimation#

This estimator tells whether the person's face is real or fake (photo, printed image).

The estimator (see ILivenessRGBMEstimator in ILivenessRGBMEstimator.h):

Implements the estimate() function that needs fsdk::Face with valid image in R8G8B8 format, detection structure of corresponding source image (see section "Detection structure" in chapter "Face detection facility") and fsdk::Image with accumulated background. This method estimates whether a real person or not. Output estimation structure contains the float score and boolean result. The float score normalized in range [0..1], where 1 - is real person, 0 - is fake. The boolean result has value true for real person and false otherwise.
Implements the update() function that needs the fsdk::Image with current frame , number of that image and previously accumulated background. The accumulated background will be overwritten by this call.

Depth Liveness Estimation#

This estimator tells whether the person's face is real or fake (photo, printed image).

The estimator (see ILivenessDepthEstimator in ILivenessDepthEstimator.h):

Implements the estimate() function that accepts source warped image in R16 format and fsdk::DepthEstimation structure. This method estimates whether or not depth map corresponds to the real person. Corresponding estimation output with float score which is normalized in range [0..1], where 1 - is real person, 0 - is fake.

The estimator is trained to work with face images that meet the following requirements:

"Requirements for fsdk::HeadPoseEstimation"

Attribute	Acceptable angle range(degrees)
pitch	[-15...15]
yaw	[-15...15]
roll	[-10...10]

"Requirements for fsdk::Quality"

Attribute	Minimum value
blur	0.94
light	0.90
dark	0.93

"Requirements for fsdk::EyesEstimation"

Attribute	State
leftEye	Open
rightEye	Open

Also, the minimum distance between the face bounding box and the frame borders should be greater than 20 pixels.

LivenessOneShotRGBEstimator#

This estimator shows whether the person's face is real or fake (photo, printed image).

LivenessOneShotRGBEstimator requirements#

The requirements for the processed image and the face in the image are listed above.

This estimator supports images taken on mobile devices or webcams (PC or laptop). Image resolution minimum requirements:

Mobile devices - 720 × 960 px
Webcam (PC or laptop) - 1280 x 720 px

There should be only one face in the image. An error occurs when there are two or more faces in the image.

The minimum face detection size must be 200 pixels.

Yaw, pitch, and roll angles should be no more than 25 degrees in either direction.

The minimum indent between the face and the image borders should be 10 pixels.

LivenessOneShotRGBEstimation structure#

The estimator (see ILivenessOneShotRGBEstimator in ILivenessOneShotRGBEstimator.h):

Implements the estimate() function that needs fsdk::Image and fsdk::Face with valid image in R8G8B8 format and detection structure of corresponding source image (see section "Detection structure" in chapter "Face detection facility"). This method estimates whether a real person or not. Output estimation is a structure fsdk::LivenessOneShotRGBEstimation.
Implements the estimate() function that needs the span of fsdk::Image and span of fsdk::Face with valid image in R8G8B8 format and detection structure of corresponding source image (see section "Detection structure" in chapter "Face detection facility"). This method estimates whether a real person or not. Output estimation is a span of structure fsdk::LivenessOneShotRGBEstimation. The second output value (structure fsdk::LivenessOneShotRGBEstimation) is the result of aggregation based on span of estimations announced above. Pay attention the second output value (aggregation) is optional, i.e. default argument, which is nullptr.

The LivenessOneShotRGBEstimation structure contains results of the estimation:

struct LivenessOneShotRGBEstimation {
    enum class State {
        Alive = 0,   //!< The person on image is real
        Fake,        //!< The person on image is fake (photo, printed image)
        Unknown      //!< The liveness status of person on image is Unknown
    };

    float score;        //!< Estimation score
    State state;        //!< Liveness status
    float qualityScore; //!< Liveness quality score
};

Estimation score is normalized in range [0..1], where 1 - is real person, 0 - is fake.

Liveness quality score is an image quality estimation for the liveness recognition.

This parameter is used for filtering if it is possible to make bestshot when checking for liveness.

The reference score is 0,5.

The value of State depends on score and qualityThreshold. The value qualityThreshold can be given as an argument of method estiamte (see ILivenessOneShotRGBEstimator), and in configuration file faceengine.conf (see ConfigurationGuide LivenessOneShotRGBEstimator).

Usage example#

The face in the image and the image itself should meet the estimator requirements.

You can find additional information in example (examples/example_estimation/main.cpp) or in the code below.

// Minimum detection size in pixels.
constexpr int minDetSize = 200;

// Step back from the borders.
constexpr int borderDistance = 10;

if (std::min(detectionRect.width, detectionRect.height) < minDetSize) {
    std::cerr << "Bounding Box width and/or height is less than `minDetSize` - " << minDetSize << std::endl;
    return false;
}

if ((detectionRect.x + detectionRect.width) > (image.getWidth() - borderDistance) || detectionRect.x < borderDistance) {
    std::cerr << "Bounding Box width is out of border distance - " << borderDistance << std::endl;
    return false;
}

if ((detectionRect.y + detectionRect.height) > (image.getHeight() - borderDistance) || detectionRect.y < borderDistance) {
    std::cerr << "Bounding Box height is out of border distance - " << borderDistance << std::endl;
    return false;
}

// Yaw, pitch and roll.
constexpr int principalAxes = 25;

if (std::abs(headPose.pitch) > principalAxes ||
    std::abs(headPose.yaw) > principalAxes ||
    std::abs(headPose.roll) > principalAxes ) {

    std::cerr << "Can't estimate LivenessOneShotRGBEstimation. " <<
        "Yaw, pith or roll absolute value is larger than expected value: " << principalAxes << "." <<
        "\nPitch angle estimation: " << headPose.pitch <<
        "\nYaw angle estimation: " << headPose.yaw <<
        "\nRoll angle estimation: " << headPose.roll << std::endl;
    return false;
}

We recommend using Detector type 3 (fsdk::ObjectDetectorClassType::FACE_DET_V3).

Personal Protection Equipment Estimation#

The Personal Protection Equipment (a.k.a PPE) estimator predicts wether a person is wearing one or multiple types of protection equipment such as: - Helmet; - Hood; - Vest; - Gloves.

For each one of this attributes estimator returns 3 prediction scores which indicate the possibility of person wearing that attribute, not wearing it and an "unknown" score which will be the highest of them all if the estimator wasn't able to tell wether person on the image wears/doesn't wear a particular attribute.

Output structure for each attribute looks as foollows:

    struct OnePPEEstimation {
        float positive = 0.0f;
        float negative = 0.0f;
        float unknown  = 0.0f;

        enum class PPEState : uint8_t {
            Positive, //!< person is wearing specific personal equipment;
            Negative, //!< person isn't wearing specific personal equipment;
            Unknown,  //!< it's hard to tell wether person wears specific personal equipment.
            Count     //!< state count
        };

        /**
         * @brief returns predominant personal equipment state
         * */
        inline PPEState getPredominantState();
    };

All three prediction scores sum up to 1.

Estimator takes as input an image and a human bounding box of a person for which attributes shall be predicted. For more information about human detector see "Human Detection" section.

Medical Mask Estimation Functionality#

Medical Mask Estimation#

This estimator aims to detect a medical face mask on the face in the source image. It can return the next results:

A medical mask is on the face (see MedicalMask::Mask field in the MedicalMask enum);
There is no medical mask on the face (see MedicalMask::NoMask field in the MedicalMask enum);
The face is occluded with something (see MedicalMask::OccludedFace field in the MedicalMask enum);

Medical Mask Extended Estimation#

This estimator aims to detect a medical face mask on the face in the source image. It can return the next results:

A medical mask is on the face (see MedicalMaskExtended::Mask field in the MedicalMask enum);
There is no medical mask on the face (see MedicalMaskExtended::NoMask field in the MedicalMask enum);
A medical mask is not on the right place (see MedicalMaskExtended::MaskNotInPlace field in the MedicalMask enum);
The face is occluded with something (see MedicalMaskExtended::OccludedFace field in the MedicalMask enum);

The estimator (see IMedicalMaskEstimator in IMedicalMaskEstimator.h):

Implements the estimate() function that accepts source warped image in R8G8B8 format and MedicalMaskEstimation structure to return results of estimation;
Implements the estimate() function that accepts source image in R8G8B8 format, face detection to estimate and MedicalMaskEstimation structure to return results of estimation;
Implements the estimate() function that accepts fsdk::Span of the source warped images in R8G8B8 format and fsdk::Span of the MedicalMaskEstimation structures to return results of estimation;
Implements the estimate() function that accepts fsdk::Span of the source images in R8G8B8 format, fsdk::Span of face detections and fsdk::Span of the MedicalMaskEstimation structures to return results of the estimation.
Implements the estimate() function that accepts source warped image in R8G8B8 format and MedicalMaskEstimationExtended structure to return results of estimation;
Implements the estimate() function that accepts source image in R8G8B8 format, face detection to estimate and MedicalMaskEstimationExtended structure to return results of estimation;
Implements the estimate() function that accepts fsdk::Span of the source warped images in R8G8B8 format and fsdk::Span of the MedicalMaskEstimationExtended structures to return results of estimation;
Implements the estimate() function that accepts fsdk::Span of the source images in R8G8B8 format, fsdk::Span of face detections and fsdk::Span of the MedicalMaskEstimationExtended structures to return results of the estimation.

The estimator was implemented for two use-cases:

When the user already has warped images. For example, when the medical mask estimation is performed right before (or after) the face recognition;
When the user has face detections only.

Calling the estimate() method with warped image and the estimate() method with image and detection for the same image and the same face could lead to different results.

MedicalMaskEstimator thresholds#

The estimator returns several scores - one for each possible result. The final result calculated based on that scores and thresholds. If some score is above the corresponding threshold, that result is estimated as final. The default values for all thresholds are taken from the configuration file. See Configuration guide for details.

MedicalMask enumeration#

The MedicalMask enumeration contains all possible results of the MedicalMask estimation:

    enum class MedicalMask {
        Mask = 0,                 //!< medical mask is on the face
        NoMask,                   //!< no medical mask on the face
        OccludedFace              //!< face is occluded by something
    };

MedicalMaskExtended enumeration#

The MedicalMaskExtended enumeration contains all possible results of the MedicalMaskExtended estimation:

    enum class MedicalMaskExtended {
        Mask = 0,                 //!< medical mask is on the face
        NoMask,                   //!< no medical mask on the face
        MaskNotInPlace,           //!< mask is not on the right place
        OccludedFace              //!< face is occluded by something
    };

MedicalMaskEstimation structure#

The MedicalMaskEstimation structure contains results of the estimation:

    struct MedicalMaskEstimation {
        MedicalMask result;        //!< estimation result (@see MedicalMask enum)
        // scores
        float maskScore;          //!< medical mask is on the face score
        float noMaskScore;        //!< no medical mask on the face score
        float occludedFaceScore;  //!< face is occluded by something score
    };

There are two groups of the fields:

The first group contains only the result enum:

        MedicalMask result;

Result enum field MedicalMaskEstimation contain the target results of the estimation.

The second group contains scores:

        float maskScore;          //!< medical mask is on the face score
        float noMaskScore;        //!< no medical mask on the face score
        float occludedFaceScore;  //!< face is occluded by something score

The scores group contains the estimation scores for each possible result of the estimation. All scores are defined in [0,1] range. They can be useful for users who want to change the default thresholds for this estimator. If the default thresholds are used, the group with scores could be just ignored in the user code.

MedicalMaskEstimationExtended structure#

The MedicalMaskEstimationExtended structure contains results of the estimation:

    struct MedicalMaskEstimationExtended {
        MedicalMaskExtended result;     //!< estimation result (@see MedicalMaskExtended enum)
        // scores
        float maskScore;         //!< medical mask is on the face score
        float noMaskScore;       //!< no medical mask on the face score
        float maskNotInPlace;    //!< mask is not on the right place
        float occludedFaceScore; //!< face is occluded by something score
    };

There are two groups of the fields:

The first group contains only the result enum:

        MedicalMaskExtended result;

Result enum field MedicalMaskEstimationExtended contain the target results of the estimation.

The second group contains scores:

        float maskScore;         //!< medical mask is on the face score
        float noMaskScore;       //!< no medical mask on the face score
        float maskNotInPlace;    //!< mask is not on the right place
        float occludedFaceScore; //!< face is occluded by something score

The scores group contains the estimation scores for each possible result of the estimation. All scores are defined in [0,1] range. They can be useful for users who want to change the default thresholds for this estimator. If the default thresholds are used, the group with scores could be just ignored in the user code.

Note. The estimator is trained to work with face images that meet the following requirements:

"Requirements for fsdk::BestShotQualityEstimator::EstimationResult"

Attribute	Acceptable values
headPose.pitch	[-40...40]
headPose.yaw	[-40...40]
headPose.roll	[-40...40]
ags	[0.5...1.0]