Skip to content

Parameter Estimation Facility#

Overview#

The estimation facility is the only multi-purpose facility in FaceEngine. It is designed as a collection of tools that help to estimate various images or depicted object properties. These properties may be used to increase the precision of algorithms implemented by other FaceEngine facilities or to accomplish custom user tasks.

Best shot selection functionality#

Eyes Estimation#

Name: EyeEstimator

Algorithm description:

The estimator is trained to work with warped images (see chapter "Image warping" for details).

This estimator aims to determine:

  • Eye state: Open, Closed, Occluded;
  • Precise eye iris location as an array of landmarks;
  • Precise eyelid location as an array of landmarks.

You can only pass warped image with detected face to the estimator interface. Better image quality leads to better results.

Eye state classifier supports three categories: "Open", "Closed", "Occluded". Poor quality images or ones that depict obscured eyes (think eyewear, hair, gestures) fall into the "Occluded" category. It is always a good idea to check eye state before using the segmentation result.

The precise location allows iris and eyelid segmentation. The estimator is capable of outputting iris and eyelid shapes as an array of points together forming an ellipsis. You should only use segmentation results if the state of that eye is "Open".

Implementation description:

The estimator:

  • Implements the estimate() function that accepts warped source image and warped landmarks, either of type Landmarks5 or Landmarks68. The warped image and landmarks are received from the warper (see IWarper::warp());

  • Classifies eyes state and detects its iris and eyelid landmarks;

  • Outputs EyesEstimation structures.

Orientation terms 'left' and 'right' refer to the way you see the image as it is shown on the screen. It means that left eye is not necessarily left from the person's point of view, but is on the left side of the screen. Consequently, right eye is the one on the right side of the screen. More formally, the label 'left' refers to subject's left eye (and similarly for the right eye), such that xright < xleft.

EyesEstimation::EyeAttributes presents eye state as enum EyeState with possible values: Open, Closed, Occluded.

Iris landmarks are presented with a template structure Landmarks that is specialized for 32 points.

Eyelid landmarks are presented with a template structure Landmarks that is specialized for 6 points.

API structure name:

IEyeEstimator

Plan files:

  • eyes_estimation_flwr8_cpu.plan
  • eyes_estimation_ir_cpu.plan
  • eye_status_estimation_flwr_cpu.plan
  • eyes_estimation_flwr8_cpu-avx2.plan
  • eyes_estimation_ir_cpu-avx2.plan
  • eyes_estimation_ir_gpu.plan
  • eyes_estimation_flwr8_gpu.plan
  • eye_status_estimation_flwr_cpu.plan
  • eye_status_estimation_flwr_cpu-avx2.plan
  • eye_status_estimation_flwr_gpu.plan

BestShotQuality Estimation#

Name: BestShotQualityEstimator

Algorithm description:

The BestShotQuality estimator is designed to evaluate image quality to choose the best image before descriptor extraction. The BestShotQuality estimator consists of two components - AGS (garbage score) and Head Pose.

AGS aims to determine the source image score for further descriptor extraction and matching.

Estimation output is a float score which is normalized in range [0..1]. The closer score to 1, the better matching result is received for the image.

When you have several images of a person, it is better to save the image with the highest AGS score.

Recommended threshold for AGS score is equal to 0.2. But it can be changed depending on the purpose of use. Consult VisionLabs about the recommended threshold value for this parameter.

Head Pose determines person head rotation angles in 3D space, namely pitch, yaw and roll.

Head pose
Head pose

Since 3D head translation is hard to determine reliably without camera-specific calibration, only 3D rotation component is estimated.

Head pose estimation characteristics:

  • Units (degrees);
  • Notation (Euler angles);
  • Precision (see table below).

Implementation description:

The estimator (see IBestShotQualityEstimator in IEstimator.h):

  • Implements the estimate() function that needs fsdk::Image in R8G8B8 format, fsdk::Detection structure of corresponding source image (see section "Detection structure" in chapter "Face detection facility"), fsdk::IBestShotQualityEstimator::EstimationRequest structure and fsdk::IBestShotQualityEstimator::EstimationResult to store estimation result;

  • Implements the estimate() function that needs the span of fsdk::Image in R8G8B8 format, the span of fsdk::Detection structures of corresponding source images (see section "Detection structure" in chapter "Face detection facility"), fsdk::IBestShotQualityEstimator::EstimationRequest structure and span of fsdk::IBestShotQualityEstimator::EstimationResult to store estimation results.

  • Implements the estimateAsync() function that needs fsdk::Image in R8G8B8 format, fsdk::Detection structure of corresponding source image (see section "Detection structure" in chapter "Face detection facility"), fsdk::IBestShotQualityEstimator::EstimationRequest structure;

Note: Method estimateAsync() is experimental, and it's interface may be changed in the future. Note: Method estimateAsync() is not marked as noexcept and may throw an exception.

Before using this estimator, user is free to decide whether to estimate or not some listed attributes. For this purpose, estimate() method takes one of the estimation requests:

  • fsdk::IBestShotQualityEstimator::EstimationRequest::estimateAGS to make only AGS estimation;
  • fsdk::IBestShotQualityEstimator::EstimationRequest::estimateHeadPose to make only Head Pose estimation;
  • fsdk::IBestShotQualityEstimator::EstimationRequest::estimateAll to make both AGS and Head Pose estimations;

Head Pose accuracy:

Prediction precision decreases as a rotation angle increases. We present typical average errors for different angle ranges in the table below.

"Head pose prediction precision"

Range -45°...+45° < -45° or > +45°
Average prediction error (per axis) Yaw ±2.7° ±4.6°
Average prediction error (per axis) Pitch ±3.0° ±4.8°
Average prediction error (per axis) Roll ±3.0° ±4.6°

Zero position corresponds to a face placed orthogonally to camera direction, with the axis of symmetry parallel to the vertical camera axis.

API structure name:

IBestShotQualityEstimator

Plan files:

  • ags_angle_estimation_flwr_cpu.plan
  • ags_angle_estimation_flwr_cpu-avx2.plan
  • ags_angle_estimation_flwr_gpu.plan

LivenessOneShotRGB Estimation#

Name: LivenessOneShotRGBEstimator

Algorithm description:

This estimator shows whether the person's face is real or fake (photo, printed image).

The requirements for the processed image and the face in the image are listed above.

This estimator supports images taken on mobile devices or webcams (PC or laptop). Image resolution minimum requirements:

  • Mobile devices - 720 × 960 px
  • Webcam (PC or laptop) - 1280 x 720 px

There should be only one face in the image. An error occurs when there are two or more faces in the image.

The minimum face detection size must be 200 pixels.

Yaw, pitch, and roll angles should be no more than 25 degrees in either direction.

The minimum indent between the face and the image borders should be 10 pixels.

Implementation description:

The estimator (see ILivenessOneShotRGBEstimator in ILivenessOneShotRGBEstimator.h):

  • Implements the estimate() function that needs fsdk::Image and fsdk::Face with valid image in R8G8B8 format and detection structure of corresponding source image (see section "Detection structure" in chapter "Face detection facility"). Output estimation is a structure fsdk::LivenessOneShotRGBEstimation.

  • Implements the estimate() function that needs the span of fsdk::Image and span of fsdk::Face with valid image in R8G8B8 format and detection structure of corresponding source image (see section "Detection structure" in chapter "Face detection facility"). The first output estimation is a span of structure fsdk::LivenessOneShotRGBEstimation. The second output value (structure fsdk::LivenessOneShotRGBEstimation) is the result of aggregation based on span of estimations announced above. Pay attention the second output value (aggregation) is optional, i.e. default argument, which is nullptr.

The LivenessOneShotRGBEstimation structure contains results of the estimation:

struct LivenessOneShotRGBEstimation {
    enum class State {
        Alive = 0,   //!< The person on image is real
        Fake,        //!< The person on image is fake (photo, printed image)
        Unknown      //!< The liveness status of person on image is Unknown
    };

    float score;        //!< Estimation score
    State state;        //!< Liveness status
    float qualityScore; //!< Liveness quality score
};

Estimation score is normalized in range [0..1], where 1 - is real person, 0 - is fake.

Liveness quality score is an image quality estimation for the liveness recognition.

This parameter is used for filtering if it is possible to make bestshot when checking for liveness.

The reference score is 0,5.

The value of State depends on score and qualityThreshold. The value qualityThreshold can be given as an argument of method estimate (see ILivenessOneShotRGBEstimator), and in configuration file faceengine.conf (see ConfigurationGuide LivenessOneShotRGBEstimator).

Recommended thresholds: 

Table below contain thresholds from faceengine configuration file (faceengine.conf) in the LivenessOneShotRGBEstimator::Settings section. By default, these threshold values are set to optimal.

"LivenessOneShotRGB estimator recommended thresholds"

Threshold Recommended value
realThreshold 0.5
qualityThreshold 0.5
calibrationCoeff 0.94

Configurations:

See the "LivenessOneShotRGBEstimator settings" section in the "ConfigurationGuide.pdf" document.

API structure name:

ILivenessOneShotRGBEstimator

Plan files:

  • oneshot_rgb_liveness_model_1_cpu.plan
  • oneshot_rgb_liveness_model_2_cpu.plan
  • oneshot_rgb_liveness_model_3_cpu.plan
  • oneshot_rgb_liveness_model_4_cpu.plan
  • oneshot_rgb_liveness_model_5_cpu.plan
  • oneshot_rgb_liveness_model_6_cpu.plan
  • oneshot_rgb_liveness_model_7_cpu.plan
  • oneshot_rgb_liveness_model_1_cpu-avx2.plan
  • oneshot_rgb_liveness_model_2_cpu-avx2.plan
  • oneshot_rgb_liveness_model_3_cpu-avx2.plan
  • oneshot_rgb_liveness_model_4_cpu-avx2.plan
  • oneshot_rgb_liveness_model_5_cpu-avx2.plan
  • oneshot_rgb_liveness_model_6_cpu-avx2.plan
  • oneshot_rgb_liveness_model_7_cpu-avx2.plan
  • oneshot_rgb_liveness_model_1_gpu.plan
  • oneshot_rgb_liveness_model_2_gpu.plan
  • oneshot_rgb_liveness_model_3_gpu.plan
  • oneshot_rgb_liveness_model_4_gpu.plan
  • oneshot_rgb_liveness_model_5_gpu.plan
  • oneshot_rgb_liveness_model_6_gpu.plan
  • oneshot_rgb_liveness_model_7_gpu.plan

Usage example#

The face in the image and the image itself should meet the estimator requirements.

You can find additional information in example (examples/example_estimation/main.cpp) or in the code below.

// Minimum detection size in pixels.
constexpr int minDetSize = 200;

// Step back from the borders.
constexpr int borderDistance = 10;

if (std::min(detectionRect.width, detectionRect.height) < minDetSize) {
    std::cerr << "Bounding Box width and/or height is less than `minDetSize` - " << minDetSize << std::endl;
    return false;
}

if ((detectionRect.x + detectionRect.width) > (image.getWidth() - borderDistance) || detectionRect.x < borderDistance) {
    std::cerr << "Bounding Box width is out of border distance - " << borderDistance << std::endl;
    return false;
}

if ((detectionRect.y + detectionRect.height) > (image.getHeight() - borderDistance) || detectionRect.y < borderDistance) {
    std::cerr << "Bounding Box height is out of border distance - " << borderDistance << std::endl;
    return false;
}

// Yaw, pitch and roll.
constexpr int principalAxes = 25;

if (std::abs(headPose.pitch) > principalAxes ||
    std::abs(headPose.yaw) > principalAxes ||
    std::abs(headPose.roll) > principalAxes ) {

    std::cerr << "Can't estimate LivenessOneShotRGBEstimation. " <<
        "Yaw, pith or roll absolute value is larger than expected value: " << principalAxes << "." <<
        "\nPitch angle estimation: " << headPose.pitch <<
        "\nYaw angle estimation: " << headPose.yaw <<
        "\nRoll angle estimation: " << headPose.roll << std::endl;
    return false;
}

We recommend using Detector type 3 (fsdk::ObjectDetectorClassType::FACE_DET_V3).

BestShotQuality Estimation#

Name: BestShotQualityEstimator

Algorithm description:

The BestShotQuality estimator is designed to evaluate image quality to choose the best image before descriptor extraction. The BestShotQuality estimator consists of two components - AGS (garbage score) and Head Pose.

AGS aims to determine the source image score for further descriptor extraction and matching.

Estimation output is a float score which is normalized in range [0..1]. The closer score to 1, the better matching result is received for the image.

When you have several images of a person, it is better to save the image with the highest AGS score.

Recommended threshold for AGS score is equal to 0.2. But it can be changed depending on the purpose of use. Consult VisionLabs about the recommended threshold value for this parameter.

Head Pose determines person head rotation angles in 3D space, namely pitch, yaw and roll.

Head pose
Head pose

Since 3D head translation is hard to determine reliably without camera-specific calibration, only 3D rotation component is estimated.

Head pose estimation characteristics:

  • Units (degrees);
  • Notation (Euler angles);
  • Precision (see table below).

Implementation description:

The estimator (see IBestShotQualityEstimator in IEstimator.h):

  • Implements the estimate() function that needs fsdk::Image in R8G8B8 format, fsdk::Detection structure of corresponding source image (see section "Detection structure" in chapter "Face detection facility"), fsdk::IBestShotQualityEstimator::EstimationRequest structure and fsdk::IBestShotQualityEstimator::EstimationResult to store estimation result;

  • Implements the estimate() function that needs the span of fsdk::Image in R8G8B8 format, the span of fsdk::Detection structures of corresponding source images (see section "Detection structure" in chapter "Face detection facility"), fsdk::IBestShotQualityEstimator::EstimationRequest structure and span of fsdk::IBestShotQualityEstimator::EstimationResult to store estimation results.

  • Implements the estimateAsync() function that needs fsdk::Image in R8G8B8 format, fsdk::Detection structure of corresponding source image (see section "Detection structure" in chapter "Face detection facility"), fsdk::IBestShotQualityEstimator::EstimationRequest structure;

Note: Method estimateAsync() is experimental, and it's interface may be changed in the future. Note: Method estimateAsync() is not marked as noexcept and may throw an exception.

Before using this estimator, user is free to decide whether to estimate or not some listed attributes. For this purpose, estimate() method takes one of the estimation requests:

  • fsdk::IBestShotQualityEstimator::EstimationRequest::estimateAGS to make only AGS estimation;
  • fsdk::IBestShotQualityEstimator::EstimationRequest::estimateHeadPose to make only Head Pose estimation;
  • fsdk::IBestShotQualityEstimator::EstimationRequest::estimateAll to make both AGS and Head Pose estimations;

Head Pose accuracy:

Prediction precision decreases as a rotation angle increases. We present typical average errors for different angle ranges in the table below.

"Head pose prediction precision"

Range -45°...+45° < -45° or > +45°
Average prediction error (per axis) Yaw ±2.7° ±4.6°
Average prediction error (per axis) Pitch ±3.0° ±4.8°
Average prediction error (per axis) Roll ±3.0° ±4.6°

Zero position corresponds to a face placed orthogonally to camera direction, with the axis of symmetry parallel to the vertical camera axis.

API structure name:

IBestShotQualityEstimator

Plan files:

  • ags_angle_estimation_flwr_cpu.plan
  • ags_angle_estimation_flwr_cpu-avx2.plan
  • ags_angle_estimation_flwr_gpu.plan

Image Quality Estimation#

Name: QualityEstimator

Algorithm description:

The estimator is trained to work with warped images (see chapter "Image warping" for details).

This estimator is designed to determine the image quality. You can estimate the image according to the following criteria:

  • The image is blurred;
  • The image is underexposed (i.e., too dark);
  • The image is overexposed (i.e., too light);
  • The face in the image is illuminated unevenly (there is a great difference between light and dark regions);
  • Image contains flares on face (too specular).

Examples are presented in the images below. Good quality images are shown on the right.

Blurred image (left), not blurred image (right)
Blurred image (left), not blurred image (right)
Dark image (left), good quality image (right)
Dark image (left), good quality image (right)
Light image (left), good quality image (right)
Light image (left), good quality image (right)
Image with uneven illumination (left), image with even illumination (right)
Image with uneven illumination (left), image with even illumination (right)
Image with specularity - image contains flares on face (left), good quality image (right)
Image with specularity - image contains flares on face (left), good quality image (right)

Implementation description:

The general rule of thumb for quality estimation:

  1. Detect a face, see if detection confidence is high enough. If not, reject the detection;
  2. Produce a warped face image (see chapter "Descriptor processing facility") using a face detection and its landmarks;
  3. Estimate visual quality using the estimator, finally reject low-quality images.

While the scheme above might seem a bit complicated, it is the most efficient performance-wise, since possible rejections on each step reduce workload for the next step.

At the moment estimator exposes two interface functions to predict image quality:

  • virtual Result estimate(const Image& warp, Quality& quality);
  • virtual Result estimate(const Image& warp, SubjectiveQuality& quality);

Each one of this functions use its own CNN internally and return slightly different quality criteria.

The first CNN is trained specifically on pre-warped human face images and will produce lower score factors if one of the following conditions are satisfied:

  • Image is blurred;
  • Image is under-exposured (i.e., too dark);
  • Image is over-exposured (i.e., too light);
  • Image color variation is low (i.e., image is monochrome or close to monochrome).

Each one of this score factors is defined in [0..1] range, where higher value corresponds to better image quality and vice versa.

The second interface function output will produce lower factor if:

  • The image is blurred;
  • The image is underexposed (i.e., too dark);
  • The image is overexposed (i.e., too light);
  • The face in the image is illuminated unevenly (there is a great difference between light and dark regions);
  • Image contains flares on face (too specular).

The estimator determines the quality of the image based on each of the aforementioned parameters. For each parameter, the estimator function returns two values: the quality factor and the resulting verdict.

As with the first estimator function the second one will also return the quality factors in the range [0..1], where 0 corresponds to low image quality and 1 to high image quality. E. g., the estimator returns low quality factor for the Blur parameter, if the image is too blurry.

The resulting verdict is a quality output based on the estimated parameter. E. g., if the image is too blurry, the estimator returns “isBlurred = true”.

The threshold (see below) can be specified for each of the estimated parameters. The resulting verdict and the quality factor are linked through this threshold. If the received quality factor is lower than the threshold, the image quality is low and the estimator returns “true”. E. g., if the image blur quality factor is higher than the threshold, the resulting verdict is “false”.

If the estimated value for any of the parameters is lower than the corresponding threshold, the image is considered of bad quality. If resulting verdicts for all the parameters are set to "False" the quality of the image is considered good.

The quality factor is a value in the range [0..1] where 0 corresponds to low quality and 1 to high quality.

Illumination uniformity corresponds to the face illumination in the image. The lower the difference between light and dark zones of the face, the higher the estimated value. When the illumination is evenly distributed throughout the face, the value is close to "1".

Specularity is a face possibility to reflect light. The higher the estimated value, the lower the specularity and the better the image quality. If the estimated value is low, there are bright glares on the face.

Recommended thresholds: 

Table below contain thresholds from faceengine configuration file (faceengine.conf) in QualityEstimator::Settings section. By default, these threshold values are set to optimal.

"Image quality estimator recommended thresholds"

Threshold Recommended value
blurThreshold 0.61
darknessThreshold 0.50
lightThreshold 0.57
illuminationThreshold 0.1
specularityThreshold 0.1

The most important parameters for face recognition are "blurThreshold", "darknessThreshold" and "lightThreshold", so you should select them carefully.

You can select images of better visual quality by setting higher values of the "illuminationThreshold" and "specularityThreshold". Face recognition is not greatly affected by uneven illumination or glares.

Configurations:

See the "Quality estimator settings" section in the "ConfigurationGuide.pdf" document.

API structure name:

IQualityEstimator

Plan files:

  • model_subjective_quality_v2_cpu.plan
  • model_subjective_quality_v2_cpu-avx2.plan
  • model_subjective_quality_v2_gpu.plan

Medical Mask Estimation Functionality#

Name: MedicalMaskEstimator

This estimator aims to detect a medical mask on the face in the source image. For the interface with MedicalMaskEstimation it can return the next results:

  • A medical mask is on the face (see MedicalMask::Mask field in the MedicalMask enum);
  • There is no medical mask on the face (see MedicalMask::NoMask field in the MedicalMask enum);
  • The face is occluded with something (see MedicalMask::OccludedFace field in the MedicalMask enum);

For the interface with MedicalMaskEstimationExtended it can return the next results:

  • A medical mask is on the face (see MedicalMaskExtended::Mask field in the MedicalMaskExtended enum);
  • There is no medical mask on the face (see MedicalMaskExtended::NoMask field in the MedicalMaskExtended enum);
  • A medical mask is not on the right place (see MedicalMaskExtended::MaskNotInPlace field in the MedicalMaskExtended enum);
  • The face is occluded with something (see MedicalMaskExtended::OccludedFace field in the MedicalMaskExtended enum);

The estimator (see IMedicalMaskEstimator in IEstimator.h):

  • Implements the estimate() function that accepts source warped image in R8G8B8 format and medical mask estimation structure to return results of estimation;
  • Implements the estimate() function that accepts source image in R8G8B8 format, face detection to estimate and medical mask estimation structure to return results of estimation;
  • Implements the estimate() function that accepts fsdk::Span of the source warped images in R8G8B8 format and fsdk::Span of the medical mask estimation structures to return results of estimation;
  • Implements the estimate() function that accepts fsdk::Span of the source images in R8G8B8 format, fsdk::Span of face detections and fsdk::Span of the medical mask estimation structures to return results of the estimation.

Every method can be used with MedicalMaskEstimation and MedicalMaskEstimationExtended.

The estimator was implemented for two use-cases:

  1. When the user already has warped images. For example, when the medical mask estimation is performed right before (or after) the face recognition;
  2. When the user has face detections only.

Note: Calling the estimate() method with warped image and the estimate() method with image and detection for the same image and the same face could lead to different results.

MedicalMaskEstimator thresholds#

The estimator returns several scores, one for each possible result. The final result is based on that scores and thresholds. If some score is above the corresponding threshold, that result is estimated as final. If none of the scores exceed the matching threshold, the maximum value will be taken. If some of the scores exceed their thresholds, the results will take precedence in the following order for the case with MedicalMaskEstimation:

Mask, NoMask, OccludedFace

and for the case with MedicalMaskEstimationExtended:

Mask, NoMask, MaskNotInPlace, OccludedFace

The default values for all thresholds are taken from the configuration file. See Configuration guide for details.

MedicalMask enumeration#

The MedicalMask enumeration contains all possible results of the MedicalMask estimation:

    enum class MedicalMask {
        Mask = 0,                 //!< medical mask is on the face
        NoMask,                   //!< no medical mask on the face
        OccludedFace              //!< face is occluded by something
    };

    enum class DetailedMaskType {
        CorrectMask = 0,               //!< correct mask on the face (mouth and nose are covered correctly)
        MouthCoveredWithMask,          //!< mask covers only a mouth
        ClearFace,                     //!< clear face - no mask on the face
        ClearFaceWithMaskUnderChin,    //!< clear face with a mask around of a chin, mask does not cover anything in the face region (from mouth to eyes) 
        PartlyCoveredFace,             //!< face is covered with not a medical mask or a full mask
        FullMask,                      //!< face is covered with a full mask (such as balaclava, sky mask, etc.)
        Count
    };

  • Maskis according to CorrectMask or MouthCoveredWithMask;
  • NoMaskis according to ClearFace or ClearFaceWithMaskUnderChin;
  • OccludedFace is according to PartlyCoveredFace or FullMask.

Note - NoMask means absence of medical mask or any occlusion in the face region (from mouth to eyes). Note - DetailedMaskType is not supported for NPU-based platforms.

MedicalMaskEstimation structure#

The MedicalMaskEstimation structure contains results of the estimation:

    struct MedicalMaskEstimation {
        MedicalMask result;           //!< estimation result (@see MedicalMask enum)
        DetailedMaskType maskType;    //!< detailed type  (@see DetailedMaskType enum)

        // scores
        float maskScore;         //!< medical mask is on the face score
        float noMaskScore;       //!< no medical mask on the face score
        float occludedFaceScore; //!< face is occluded by something score

        float scores[static_cast<int>(DetailedMaskType::Count)]{};    //!< detailed estimation scores

        inline float getScore(DetailedMaskType type) const;
    };

There are two groups of the fields:

  1. The first group contains the result:
    MedicalMask result;

Result enum field MedicalMaskEstimation contains the target results of the estimation. Also you can see the more detailed type in MedicalMaskEstimation.

    DetailedMaskType maskType;           //!< detailed type
  1. The second group contains scores:
    float maskScore;          //!< medical mask is on the face score
    float noMaskScore;        //!< no medical mask on the face score
    float occludedFaceScore;  //!< face is occluded by something score

The score group contains the estimation scores for each possible result of the estimation. All scores are defined in [0,1] range. They can be useful for users who want to change the default thresholds for this estimator. If the default thresholds are used, the group with scores could be just ignored in the user code. More detailed scores for every type of a detailed type of face covering are

float scores[static_cast<int>(DetailedMaskType::Count)]{};    //!< detailed estimation scores
  • maskScore is the sum of scores for CorrectMask, MouthCoveredWithMask;
  • NoMask is the sum of scores for ClearFace and ClearFaceWithMaskUnderChin;
  • occludedFaceScore is the sum of scores for PartlyCoveredFace and FullMask fields.

Note - DetailedMaskType, scores, getScore are not supported for NPU-based platforms. It means a user cannot use this fields and methods in code.

MedicalMaskExtended enumeration#

The MedicalMask enumeration contains all possible results of the MedicalMask estimation:

    enum class MedicalMaskExtended {
        Mask = 0,                 //!< medical mask is on the face
        NoMask,                   //!< no medical mask on the face
        MaskNotInPlace,           //!< mask is not on the right place
        OccludedFace              //!< face is occluded by something
    };

MedicalMaskEstimationExtended structure#

The MedicalMaskEstimationExtended structure contains results of the estimation:

    struct MedicalMaskEstimationExtended {
        MedicalMaskExtended result;     //!< estimation result (@see MedicalMaskExtended enum)
        // scores
        float maskScore;         //!< medical mask is on the face score
        float noMaskScore;       //!< no medical mask on the face score
        float maskNotInPlace;    //!< mask is not on the right place
        float occludedFaceScore; //!< face is occluded by something score
    };

There are two groups of the fields:

  1. The first group contains only the result enum:
        MedicalMaskExtended result;

Result enum field MedicalMaskEstimationExtended contains the target results of the estimation.

  1. The second group contains scores:
        float maskScore;         //!< medical mask is on the face score
        float noMaskScore;       //!< no medical mask on the face score
        float maskNotInPlace;    //!< mask is not on the right place
        float occludedFaceScore; //!< face is occluded by something score

The score group contains the estimation scores for each possible result of the estimation. All scores are defined in [0,1] range.

Filtration parameters#

The estimator is trained to work with face images that meet the following requirements:

"Requirements for fsdk::MedicalMaskEstimator::EstimationResult"

Attribute Acceptable values
headPose.pitch [-40...40]
headPose.yaw [-40...40]
headPose.roll [-40...40]
ags [0.5...1.0]

Configurations:

See the "Medical mask estimator settings" section in the "ConfigurationGuide.pdf" document.

API structure name:

IMedicalMaskEstimator

Plan files:

  • mask_clf_v3_cpu.plan
  • mask_clf_v3_cpu-avx2.plan
  • mask_clf_v3_gpu.plan

Glasses Estimation#

Name: GlassesEstimator

Algorithm description:

Glasses estimator is designed to determine whether a person is currently wearing any glasses or not. There are 3 types of states estimator is currently able to estimate:

  • NoGlasses state determines whether a person is wearing any glasses at all;
  • EyeGlasses state determines whether a person is wearing eyeglasses;
  • SunGlasses state determines whether a person is wearing sunglasses.

Note. Source input image must be warped in order for estimator to work properly (see chapter "Image warping" for details). Quality of estimation depends on threshold values located in faceengine configuration file (see below).

Recommended thresholds: 

Table below contain thresholds from faceengine configuration file (faceengine.conf) in GlassesEstimator::Settings section. By default, these threshold values are set to optimal.

"Glasses estimator recommended thresholds"

Threshold Recommended value
noGlassesThreshold 0.986
eyeGlassesThreshold 0.57
sunGlassesThreshold 0.506

Configurations:

See the "GlassesEstimator settings" section in the "ConfigurationGuide.pdf" document.

Metrics:

Table below contain true positive rates corresponding to selected false positive rates.

"Glasses estimator TPR/FPR rates"

State TPR FPR
NoGlasses 0.997 0.00234
EyeGlasses 0.9768 0.000783
SunGlasses 0.9712 0.000383

API structure name:

IGlassesEstimator

Plan files:

  • glasses_estimation_flwr_cpu.plan
  • glasses_estimation_flwr_cpu-avx2.plan
  • glasses_estimation_flwr_gpu.plan