Parameter Estimation Facility#
Overview#
The estimation facility is the only multi-purpose facility in FaceEngine. It is designed as a collection of tools that help to estimate various images or depicted object properties. These properties may be used to increase the precision of algorithms implemented by other FaceEngine facilities or to accomplish custom user tasks.
Best shot selection functionality#
BestShotQuality Estimation#
Name: BestShotQualityEstimator
Algorithm description:
The BestShotQuality estimator is designed to evaluate image quality to choose the best image before descriptor extraction. The BestShotQuality estimator consists of two components - AGS (garbage score) and Head Pose.
AGS aims to determine the source image score for further descriptor extraction and matching.
Estimation output is a float score which is normalized in range [0..1]. The closer score to 1, the better matching result is received for the image.
When you have several images of a person, it is better to save the image with the highest AGS score.
Recommended threshold for AGS score is equal to 0.2. But it can be changed depending on the purpose of use. Consult VisionLabs about the recommended threshold value for this parameter.
Head Pose determines person head rotation angles in 3D space, namely pitch, yaw and roll.
Since 3D head translation is hard to determine reliably without camera-specific calibration, only 3D rotation component is estimated.
Head pose estimation characteristics:
- Units (degrees);
- Notation (Euler angles);
- Precision (see table below).
Implementation description:
The estimator (see IBestShotQualityEstimator in IEstimator.h):
-
Implements the estimate() function that needs
fsdk::Image
in R8G8B8 format,fsdk::Detection
structure of corresponding source image (see section "Detection structure" in chapter "Face detection facility"),fsdk::IBestShotQualityEstimator::EstimationRequest
structure andfsdk::IBestShotQualityEstimator::EstimationResult
to store estimation result; -
Implements the estimate() function that needs the span of
fsdk::Image
in R8G8B8 format, the span offsdk::Detection
structures of corresponding source images (see section "Detection structure" in chapter "Face detection facility"),fsdk::IBestShotQualityEstimator::EstimationRequest
structure and span offsdk::IBestShotQualityEstimator::EstimationResult
to store estimation results. -
Implements the estimateAsync() function that needs
fsdk::Image
in R8G8B8 format,fsdk::Detection
structure of corresponding source image (see section "Detection structure" in chapter "Face detection facility"),fsdk::IBestShotQualityEstimator::EstimationRequest
structure;
Note: Method estimateAsync() is experimental, and it's interface may be changed in the future. Note: Method estimateAsync() is not marked as noexcept and may throw an exception.
Before using this estimator, user is free to decide whether to estimate or not some listed attributes. For this purpose, estimate() method takes one of the estimation requests:
fsdk::IBestShotQualityEstimator::EstimationRequest::estimateAGS
to make only AGS estimation;fsdk::IBestShotQualityEstimator::EstimationRequest::estimateHeadPose
to make only Head Pose estimation;fsdk::IBestShotQualityEstimator::EstimationRequest::estimateAll
to make both AGS and Head Pose estimations;
The EstimationResult structure contains results of the estimation:
struct EstimationResult {
Optional<HeadPoseEstimation> headPose; //!< HeadPose estimation if was requested, empty otherwise
Optional<float> ags; //!< AGS estimation if was requested, empty otherwise
};
Head Pose accuracy:
Prediction precision decreases as a rotation angle increases. We present typical average errors for different angle ranges in the table below.
"Head pose prediction precision"
Range | -45°...+45° | < -45° or > +45° | |
---|---|---|---|
Average prediction error (per axis) | Yaw | ±2.7° | ±4.6° |
Average prediction error (per axis) | Pitch | ±3.0° | ±4.8° |
Average prediction error (per axis) | Roll | ±3.0° | ±4.6° |
Zero position corresponds to a face placed orthogonally to camera direction, with the axis of symmetry parallel to the vertical camera axis.
API structure name:
IBestShotQualityEstimator
Plan files:
For more information see Approximate Garbage Score Estimation (AGS)
and Head Pose Estimation
Image Quality Estimation#
Name: QualityEstimator
Algorithm description:
The estimator is trained to work with warped images (see chapter "Image warping" for details).
This estimator is designed to determine the image quality. You can estimate the image according to the following criteria:
- The image is blurred;
- The image is underexposed (i.e., too dark);
- The image is overexposed (i.e., too light);
- The face in the image is illuminated unevenly (there is a great difference between light and dark regions);
- Image contains flares on face (too specular).
Examples are presented in the images below. Good quality images are shown on the right.
Implementation description:
The general rule of thumb for quality estimation:
- Detect a face, see if detection confidence is high enough. If not, reject the detection.
- Produce a warped face image (see chapter "Descriptor processing facility") using a face detection and its landmarks.
- Estimate visual quality using the estimator, finally reject low-quality images.
While the scheme above might seem a bit complicated, it is the most efficient performance-wise, since possible rejections on each step reduce workload for the next step.
At the moment estimator exposes two interface functions to predict image quality:
- virtual Result
estimate(const Image& warp, Quality& quality); - virtual Result
estimate(const Image& warp, SubjectiveQuality& quality);
Each one of this functions use its own CNN internally and return slightly different quality criteria.
The first CNN is trained specifically on pre-warped human face images and will produce lower score factors if one of the following conditions are satisfied:
- Image is blurred;
- Image is under-exposured (i.e., too dark);
- Image is over-exposured (i.e., too light);
- Image color variation is low (i.e., image is monochrome or close to monochrome).
Each one of this score factors is defined in [0..1] range, where higher value corresponds to better image quality and vice versa.
The second interface function output will produce lower factor if:
- The image is blurred;
- The image is underexposed (i.e., too dark);
- The image is overexposed (i.e., too light);
- The face in the image is illuminated unevenly (there is a great difference between light and dark regions);
- Image contains flares on face (too specular).
The estimator determines the quality of the image based on each of the aforementioned parameters. For each parameter, the estimator function returns two values: the quality factor and the resulting verdict.
As with the first estimator function the second one will also return the quality factors in the range [0..1], where 0 corresponds to low image quality and 1 to high image quality. E. g., the estimator returns low quality factor for the Blur parameter, if the image is too blurry.
The resulting verdict is a quality output based on the estimated parameter. E. g., if the image is too blurry, the estimator returns “isBlurred = true”.
The threshold (see below) can be specified for each of the estimated parameters. The resulting verdict and the quality factor are linked through this threshold. If the received quality factor is lower than the threshold, the image quality is low and the estimator returns “true”. E. g., if the image blur quality factor is higher than the threshold, the resulting verdict is “false”.
If the estimated value for any of the parameters is lower than the corresponding threshold, the image is considered of bad quality. If resulting verdicts for all the parameters are set to "False" the quality of the image is considered good.
The quality factor is a value in the range [0..1] where 0 corresponds to low quality and 1 to high quality.
Illumination uniformity corresponds to the face illumination in the image. The lower the difference between light and dark zones of the face, the higher the estimated value. When the illumination is evenly distributed throughout the face, the value is close to "1".
Specularity is a face possibility to reflect light. The higher the estimated value, the lower the specularity and the better the image quality. If the estimated value is low, there are bright glares on the face.
The Quality structure contains results of the estimation made by first CNN. Each estimation is given in normalized [0, 1] range:
struct Quality {
float light; //!< image overlighting degree. 1 - ok, 0 - overlighted.
float dark; //!< image darkness degree. 1 - ok, 0 - too dark.
float gray; //!< image grayness degree 1 - ok, 0 - too gray.
float blur; //!< image blur degree. 1 - ok, 0 - too blured.
inline float getQuality() const noexcept; //!< complex estimation of quality. 0 - low quality, 1 - high quality.
};
The SubjectiveQuality structure contains results of the estimation made by second CNN. Each estimation is given in normalized [0, 1] range:
struct SubjectiveQuality {
float blur; //!< image blur degree. 1 - ok, 0 - too blured.
float light; //!< image brightness degree. 1 - ok, 0 - too bright;
float darkness; //!< image darkness degree. 1 - ok, 0 - too dark;
float illumination; //!< image illumination uniformity degree. 1 - ok, 0 - is too illuminated;
float specularity; //!< image specularity degree. 1 - ok, 0 - is not specular;
bool isBlurred; //!< image is blurred flag;
bool isHighlighted; //!< image is overlighted flag;
bool isDark; //!< image is too dark flag;
bool isIlluminated; //!< image is too illuminated flag;
bool isNotSpecular; //!< image is not specular flag;
inline bool isGood() const noexcept; //!< if all boolean flags are false returns true - high quality, else false - low quality.
};
Recommended thresholds:
Table below contains thresholds from faceengine configuration file (faceengine.conf) in QualityEstimator::Settings
section. By default, these threshold values are set to optimal.
"Image quality estimator recommended thresholds"
Threshold | Recommended value |
---|---|
blurThreshold | 0.61 |
darknessThreshold | 0.50 |
lightThreshold | 0.57 |
illuminationThreshold | 0.1 |
specularityThreshold | 0.1 |
The most important parameters for face recognition are "blurThreshold", "darknessThreshold" and "lightThreshold", so you should select them carefully.
You can select images of better visual quality by setting higher values of the "illuminationThreshold" and "specularityThreshold". Face recognition is not greatly affected by uneven illumination or glares.
Configurations:
See the "Quality estimator settings" section in the "ConfigurationGuide.pdf" document.
API structure name:
IQualityEstimator
Plan files:
- model_subjective_quality_
_cpu.plan - model_subjective_quality_
_cpu-avx2.plan - model_subjective_quality_
_gpu.plan
Note:
usePlanV1
toggles the Quality
estimation, usePlanV2
toggles the SubjectiveQuality
estimation. These parameters can enable or disable the corresponding functionality via the faceengine.conf configuration file.
<section name="QualityEstimator::Settings">
...
<param name="usePlanV1" type="Value::Int1" x="1" />
<param name="usePlanV2" type="Value::Int1" x="1" />
</section>
Note that you cannot disable both the parameters at the same time.
In case you do this, you will receive the fsdk::FSDKError::InvalidConfig
error code and the following logs:
[27.06.2024 12:38:59] [Error] QualityEstimator::Settings Failed to create QualityEstimator! The both parameters: "usePlanV1" and "usePlanV2" in section "QualityEstimator::Settings" are disabled at the same time.
Face features extraction functionality#
Eyes Estimation#
Name: EyeEstimator
Algorithm description:
The estimator is trained to work with warped images (see chapter "Image warping" for details).
This estimator aims to determine:
- Eye state: Open, Closed, Occluded;
- Precise eye iris location as an array of landmarks;
- Precise eyelid location as an array of landmarks.
You can only pass warped image with detected face to the estimator interface. Better image quality leads to better results.
Eye state classifier supports three categories: "Open", "Closed", "Occluded". Poor quality images or ones that depict obscured eyes (think eyewear, hair, gestures) fall into the "Occluded" category. It is always a good idea to check eye state before using the segmentation result.
The precise location allows iris and eyelid segmentation. The estimator is capable of outputting iris and eyelid shapes as an array of points together forming an ellipsis. You should only use segmentation results if the state of that eye is "Open".
Implementation description:
The estimator:
-
Implements the estimate() function that accepts warped source image and warped landmarks, either of type Landmarks5 or Landmarks68. The warped image and landmarks are received from the warper (see
IWarper::warp()
); -
Classifies eyes state and detects its iris and eyelid landmarks;
-
Outputs EyesEstimation structures.
Orientation terms 'left' and 'right' refer to the way you see the image as it is shown on the screen. It means that left eye is not necessarily left from the person's point of view, but is on the left side of the screen. Consequently, right eye is the one on the right side of the screen. More formally, the label 'left' refers to subject's left eye (and similarly for the right eye), such that xright < xleft.
EyesEstimation::EyeAttributes
presents eye state as enum EyeState with possible values: Open, Closed, Occluded.
Iris landmarks are presented with a template structure Landmarks that is specialized for 32 points.
Eyelid landmarks are presented with a template structure Landmarks that is specialized for 6 points.
The EyesEstimation structure contains results of the estimation:
struct EyesEstimation {
/**
* @brief Eyes attribute structure.
* */
struct EyeAttributes {
/**
* @brief Enumeration of possible eye states.
* */
enum class State : uint8_t {
Closed, //!< Eye is closed.
Open, //!< Eye is open.
Occluded //!< Eye is blocked by something not transparent, or landmark passed to estimator doesn't point to an eye.
};
static constexpr uint64_t irisLandmarksCount = 32; //!< Iris landmarks amount.
static constexpr uint64_t eyelidLandmarksCount = 6; //!< Eyelid landmarks amount.
/// @brief alias for @see Landmarks template structure with irisLandmarksCount as param.
using IrisLandmarks = Landmarks<irisLandmarksCount>;
/// @brief alias for @see Landmarks template structure with eyelidLandmarksCount as param
using EyelidLandmarks = Landmarks<eyelidLandmarksCount>;
State state; //!< State of an eye.
IrisLandmarks iris; //!< Iris landmarks.
EyelidLandmarks eyelid; //!< Eyelid landmarks
};
EyeAttributes leftEye; //!< Left eye attributes
EyeAttributes rightEye; //!< Right eye attributes
};
API structure name:
IEyeEstimator
Plan files:
- eyes_estimation_flwr8_cpu.plan
- eyes_estimation_ir_cpu.plan
- eyes_estimation_flwr8_cpu-avx2.plan
- eyes_estimation_ir_cpu-avx2.plan
- eyes_estimation_ir_gpu.plan
- eyes_estimation_flwr8_gpu.plan
- eye_status_estimation_cpu.plan
- eye_status_estimation_cpu-avx2.plan
- eye_status_estimation_gpu.plan
Head Pose Estimation#
This estimator is designed to determine a camera-space head pose. Since the 3D head translation is hard to reliably determine without a camera-specific calibration, only the 3D rotation component is estimated.
There are two head pose estimation methods available:
- Estimate by 68 face-aligned landmarks. You can get it from the Detector facility, see Chapter "Face detection facility" for details.
- Estimate by the original input image in the RGB format.
An estimation by the image is more precise. If you have already extracted 68 landmarks for another facilities, you can save time and use the fast estimator from 68 landmarks.
By default, all methods are available to use in the faceengine.conf configuration file in section "HeadPoseEstimator". You can disable these methods to decrease RAM usage and initialization time.
Estimation characteristics:
- Units (degrees)
- Notation (Euler angles)
- Precision (see table \ref{5.6})
Note: Prediction precision decreases as a rotation angle increases. We present typical average errors for different angle ranges in the table \ref{5.6}.
"Head pose prediction precision" \label{5.6}
Range | -45°...+45° | < -45° or > +45° | |
---|---|---|---|
Average prediction error (per axis) | Yaw | ±2.7° | ±4.6° |
Average prediction error (per axis) | Pitch | ±3.0° | ±4.8° |
Average prediction error (per axis) | Roll | ±3.0° | ±4.6° |
Zero position corresponds to a face placed orthogonally to the camera direction, with the axis of symmetry parallel to the vertical camera axis. See figure \ref{fig:Head111} for a reference.
Note: In order to work, this estimator requires precise 68-point face alignment results, so familiarize with section "Face alignment" in the "Face detection facility" chapter, as well.
Approximate Garbage Score Estimation (AGS)#
This estimator aims to determine the source image score for further descriptor extraction and matching. The higher the score, the better matching result is received for the image.
When you have several images of a person, it is better to save the image with the highest AGS score.
Contact VisionLabs for the recommended threshold value for this parameter.
The estimator (see IAGSEstimator
in IEstimator.h
):
- Implements the
estimate()
function that accepts the source image in the R8G8B8 format and thefsdk::Detection
structure of corresponding source image. For details, see section "Detection structure" in chapter "Face detection facility". - Estimates garbage score of the input image.
- Outputs a garbage score value.
Glasses Estimation#
Name: GlassesEstimator
Algorithm description:
Glasses estimator is designed to determine whether a person is currently wearing any glasses or not. There are 3 types of states the estimator is currently able to estimate:
NoGlasses
- Determines whether a person is wearing any glasses at all.EyeGlasses
- Determines whether a person is wearing eyeglasses.SunGlasses
- Determines whether a person is wearing sunglasses.
Note: The source input image must be warped for the estimator to work properly (see chapter "Image warping" for details). Estimation quality depends on threshold values located in the faceengine.conf configuration file.
Implementation description:
Enumeration of possible glasses estimation statuses:
enum class GlassesEstimation: uint8_t{
NoGlasses, //!< Person is not wearing glasses
EyeGlasses, //!< Person is wearing eyeglasses
SunGlasses, //!< Person is wearing sunglasses
EstimationError //!< failed to estimate
};
Recommended thresholds:
The table below contains thresholds specified in GlassesEstimator::Settings
section of the FaceEngine configuration file (faceengine.conf). By default, these threshold values are set to optimal.
"Glasses estimator recommended thresholds"
Threshold | Recommended value |
---|---|
noGlassesThreshold | 1 |
eyeGlassesThreshold | 1 |
sunGlassesThreshold | 1 |
Configurations:
See the "GlassesEstimator settings" section in the "ConfigurationGuide.pdf" document.
Metrics:
The table below contains true positive rates corresponding to the selected false positive rates.
"Glasses estimator TPR/FPR rates"
State | TPR | FPR |
---|---|---|
NoGlasses | 0.997 | 0.00234 |
EyeGlasses | 0.9768 | 0.000783 |
SunGlasses | 0.9712 | 0.000383 |
API structure name:
IGlassesEstimator
Plan files:
- glasses_estimation_v2_cpu.plan
- glasses_estimation_v2_cpu-avx2.plan
- glasses_estimation_v2_gpu.plan
Liveness check functionality#
LivenessOneShotRGB Estimation#
Name: LivenessOneShotRGBEstimator
Algorithm description:
This estimator shows whether the person's face is real or fake by the following types of attacks:
- Printed Photo Attack. One or several photos of another person are used.
- Video Replay Attack. A video of another person is used.
- Printed Mask Attack. An imposter cuts out a face from a photo and covers his face with it.
- 3D Mask Attack. An imposer puts on a 3D mask depicting the face of another person.
The requirements for the processed image and the face in the image are listed below.
Parameters | Requirements |
---|---|
Minimum resolution for mobile devices | 720x960 pixels |
Maximum resolution for mobile devices | 1080x1920 pixels |
Minimum resolution for webcams | 1280x720 pixels |
Maximum resolution for webcams | 1920x1080 pixels |
Compression | No |
Image warping | No |
Image cropping | No |
Effects overlay | No |
Mask | No |
Number of faces in the frame | 1 |
Face detection bounding box width | More than 200 pixels |
Frame edges offset | More than 10 pixels |
Head pose | -20 to +20 degrees for head pitch, yaw, and roll |
Image quality | The face in the frame should not be overexposed, underexposed, or blurred. |
See image quality thresholds in the "Image Quality Estimation" section.
Implementation description:
The estimator (see ILivenessOneShotRGBEstimator in ILivenessOneShotRGBEstimator.h):
-
Implements the estimate() function that needs
fsdk::Image
,fsdk::Detection
andfsdk::Landmarks5
objects (see section "Detection structure" in chapter "Face detection facility"). Output estimation is a structurefsdk::LivenessOneShotRGBEstimation
. -
Implements the estimate() function that needs the span of
fsdk::Image
, span offsdk::Detection
and span offsdk::Landmarks5
(see section "Detection structure" in chapter "Face detection facility").
The first output estimation is a span of structurefsdk::LivenessOneShotRGBEstimation
. The second output value (structurefsdk::LivenessOneShotRGBEstimation
) is the result of aggregation based on span of estimations announced above. Pay attention the second output value (aggregation) is optional, i.e.default argument
, which isnullptr
.
The LivenessOneShotRGBEstimation structure contains results of the estimation:
struct LivenessOneShotRGBEstimation {
enum class State {
Alive = 0, //!< The person on image is real
Fake, //!< The person on image is fake (photo, printed image)
Unknown //!< The liveness status of person on image is Unknown
};
float score; //!< Estimation score
State state; //!< Liveness status
float qualityScore; //!< Liveness quality score
};
Estimation score is normalized in range [0..1], where 1 - is real person, 0 - is fake.
Liveness quality score is an image quality estimation for the liveness recognition.
This parameter is used for filtering if it is possible to make bestshot when checking for liveness.
The reference score is 0,5.
The value of State
depends on score
and qualityThreshold
.
The value qualityThreshold
can be given as an argument of method estimate
(see ILivenessOneShotRGBEstimator
),
and in configuration file faceengine.conf (see ConfigurationGuide LivenessOneShotRGBEstimator
).
Recommended thresholds:
Table below contains thresholds from faceengine configuration file (faceengine.conf)
in the LivenessOneShotRGBEstimator::Settings
section.
By default, these threshold values are set to optimal.
"LivenessOneShotRGB estimator recommended thresholds"
Threshold | Recommended value |
---|---|
realThreshold | 0.5 |
qualityThreshold | 0.5 |
calibrationCoeff | 0.89 |
calibrationCoeff | 0.991 |
Configurations:
See the "LivenessOneShotRGBEstimator settings" section in the "ConfigurationGuide.pdf" document.
API structure name:
ILivenessOneShotRGBEstimator
Plan files:
- oneshot_rgb_liveness_v8_model_3_cpu.plan
- oneshot_rgb_liveness_v8_model_4_cpu.plan
- oneshot_rgb_liveness_v8_model_3_arm.plan
- oneshot_rgb_liveness_v8_model_4_arm.plan
Depth and RGB OneShotLiveness estimation#
Name: LivenessDepthRGBEstimator
Algorithm description:
This estimator shows whether the person's face is real or fake (photo, printed image). You can use this estimator in payment terminals (POS) and self-service cash registers (KCO) with two cameras - Depth and RGB.
The estimation is performed on the device with an Orbbec camera. The camera can be either built in a POS or KCO device or connected to it. This allows to perform the estimation at a higher speed and makes it more secure as data is not sent to the backend. Using the algorithm with Orbbec cameras lets you work with deep data. It increases system reliability and accuracy, as 3D data lets you assess facial shapes and detect fake masks more accurately.
The estimator is trained to work with warped images. For details, see chapter "Image warping".
Supported devices
The estimator works only on the following devices:
- VLS LUNA CAMERA 3D
- VLS LUNA CAMERA 3D Embedded
Different models of Orbbec cameras have different spacing between sensors. If you need to use another Orbbec Depth+RGB camera, you can change the calibration coefficients to match the device. Please, contact VisionLabs for details.
Image requirements
This estimator works based on two images:
- RGB image from the RGB camera
- Depth image (or depth map) from the depth camera
Input images must meet the following requirements:
Parameter | Requirements |
---|---|
Resolution | 640 × 480 pixels |
Compression | No |
Image cropping | No |
Image rotation | No |
Effects overlay | No |
Number of faces in the frame | 1 |
Face detection bounding box size | 200 pixels |
Frame edges offset | 10 pixels |
Head pose | -20 to +20 degrees for head pitch, yaw, and roll. |
Image quality | The face in the frame should not be overexposed, underexposed, or blurred. For details, see section "Image Quality Estimation". |
Implementation description:
The estimator implements the following:
- The
estimate()
function that needs the depth frame as the firstfsdk::Image
object, the RGB frame as the secondfsdk::Image
object,fsdk::Detection
andfsdk::Landmarks5
objects (see section "Detection structure" in chapter "Face detection facility"). The estimation output is thefsdk::DepthRGBEstimation
srtucture. - The
estimate()
function that needs the first span of depth frames as thefsdk::Image
objects, the second span of RGB frames as thefsdk::Image
objects, a span offsdk::Detection
, and a span offsdk::Landmarks5
(see section "Detection structure" in chapter "Face detection facility").
The estimation output is a span of thefsdk::DepthRGBEstimation
structure. The second output value is thefsdk::DepthRGBEstimation
structure.
DepthRGBEstimation
The DepthRGBEstimation
structure contains results of the estimation:
struct DepthRGBEstimation {
//!< confidence score in [0,1] range.
//!< The closer the score to 1, the more likely that person is alive.
float score;
//!< boolean flag that indicates whether a person is real.
bool isReal;
};
The estimation score is normalized in range [0..1], where 1 - is real person, 0 - is a fake.
The value of isReal
depends on score
and confidenceThreshold
.
The value of the confidenceThreshold
can be changed in configuration file faceengine.conf (see ConfigurationGuide LivenessDepthRGBEstimator
).
API structure name:
ILivenessDepthRGBEstimator
See ILivenessDepthRGBEstimator
in ILivenessDepthRGBEstimator.h
.
Plan files:
- depth_rgb_v2_model_1_cpu.plan
- depth_rgb_v2_model_1_gpu.plan
- depth_rgb_v2_model_2_cpu.plan
- depth_rgb_v2_model_2_gpu.plan
- depth_rgb_v2_model_1_cpu-avx2.plan
- depth_rgb_v2_model_2_cpu-avx2.plan
Depth liveness estimation (DepthLivenessEstimator)#
Name: DepthLivenessEstimator
Algorithm description:
Given a face depth warp, the estimator tells whether the face is real or fake (photo, printed image).
The estimator aims to unify different use cases of depth liveness estimation, while increasing the estimation accuracy compared to existing depth estimators.
The estimator can be used in payment terminals (POS) and self-service cash registers (KCO) with two cameras - Depth and RGB.
The estimator is trained to work with warped depth images of faces. For details, see chapter "Image warping".
The estimator can be used together with LivenessDepthRGBEstimator or as standalone. When DepthLivenessEstimator is used in conjunction with LivenessDepthRGBEstimator, the latter takes care of necessary preprocessing of RGB and depth frames, producing depth warps of faces required by DepthLivenessEstimator. When DepthLivenessEstimator is used as standalone, it is your responsibility to prepare a warped depth image of a face for estimation, including handling such issues as:
- detecting faces on RGB frames, quality checking of RGB frames and detections
- [possibly required] mapping between a) RGB frames used for face detection and b) depth frames
- obtaining depth warps of faces from depth frames
Supported devices
On its own, the estimator requires just a properly prepared depth warp of a face, and doesn't constrain the list of possible devices. However, if LivenessDepthRGBEstimator is involved, it has its own constraints.
Image requirements
The estimator works based on depth warps of faces. The warps must be 250x250 pixels, in the fsdk::Format::R16
format.
If you prepare depth warps yourself, there are some basic quality requirements for RGB frames:
Parameter | Requirements |
---|---|
Resolution | 640 × 480 pixels |
Compression | No |
Image cropping | No |
Image rotation | No |
Effects overlay | No |
Number of faces in the frame | 1 |
Face detection bounding box size | 200 pixels |
Frame edges offset | 10 pixels |
Head pose | -15 to +15 degrees for head pitch, yaw, and roll. |
Image quality | The face in the frame should not be overexposed, underexposed, or blurred. For details, see section "Image Quality Estimation". |
Implementation description:
The estimator (see IDepthLivenessEstimator.h
) implements the following:
- The
estimate()
function that needs the depth warp as the firstfsdk::Image
object. The estimation output is the returnedfsdk::DepthLivenessEstimation
structure. - The
estimate()
function that needs a span of depth warps (fsdk::Image
objects) as the first parameter, and a span offsdk::DepthLivenessEstimation
as the second parameter. The estimation output is saved in the second parameter.
DepthLivenessEstimation
The DepthLivenessEstimation
structure contains results of the estimation:
struct DepthLivenessEstimation {
//!< confidence score in [0,1] range.
//!< The closer the score to 1, the more likely that person is alive.
float score;
//!< boolean flag that indicates whether a person is real.
bool isReal;
};
The estimation score is normalized in the range [0..1], where 1 - is real person, 0 - is a fake.
The value of isReal
depends on score
and confidenceThreshold
.
The value of the confidenceThreshold
can be changed in configuration file faceengine.conf (see ConfigurationGuide DepthLivenessEstimator
).
API structure name:
IDepthLivenessEstimator
See IDepthLivenessEstimator
in IDepthLivenessEstimator.h
.
Examples:
- C++ example: example_depth_liveness
- Python example: example_depth_liveness.py
Plan files:
- depth_liveness_v2_arm.plan
- depth_liveness_v2_cpu.plan
- depth_liveness_v2_cpu-avx2.plan
- depth_liveness_v2_gpu.plan
Medical Mask Estimation Functionality#
Name: MedicalMaskEstimator
This estimator aims to detect a medical mask on the face in the source image. For the interface with MedicalMaskEstimation it can return the next results:
- A medical mask is on the face (see MedicalMask::Mask field in the MedicalMask enum);
- There is no medical mask on the face (see MedicalMask::NoMask field in the MedicalMask enum);
- The face is occluded with something (see MedicalMask::OccludedFace field in the MedicalMask enum);
For the interface with MedicalMaskEstimationExtended it can return the next results:
- A medical mask is on the face (see MedicalMaskExtended::Mask field in the MedicalMaskExtended enum);
- There is no medical mask on the face (see MedicalMaskExtended::NoMask field in the MedicalMaskExtended enum);
- A medical mask is not on the right place (see MedicalMaskExtended::MaskNotInPlace field in the MedicalMaskExtended enum);
- The face is occluded with something (see MedicalMaskExtended::OccludedFace field in the MedicalMaskExtended enum);
The estimator (see IMedicalMaskEstimator in IEstimator.h):
- Implements the estimate() function that accepts source warped image in R8G8B8 format and medical mask estimation structure to return results of estimation;
- Implements the estimate() function that accepts source image in R8G8B8 format, face detection to estimate and medical mask estimation structure to return results of estimation;
- Implements the estimate() function that accepts fsdk::Span of the source warped images in R8G8B8 format and fsdk::Span of the medical mask estimation structures to return results of estimation;
- Implements the estimate() function that accepts fsdk::Span of the source images in R8G8B8 format, fsdk::Span of face detections and fsdk::Span of the medical mask estimation structures to return results of the estimation.
Every method can be used with MedicalMaskEstimation and MedicalMaskEstimationExtended.
The estimator was implemented for two use-cases:
- When the user already has warped images. For example, when the medical mask estimation is performed right before (or after) the face recognition.
- When the user has face detections only.
Note: Calling the estimate() method with warped image and the estimate() method with image and detection for the same image and the same face could lead to different results.
MedicalMaskEstimator thresholds#
The estimator returns several scores, one for each possible result. The final result is based on that scores and thresholds. If some score is above the corresponding threshold, that result is estimated as final. If none of the scores exceed the matching threshold, the maximum value will be taken. If some of the scores exceed their thresholds, the results will take precedence in the following order for the case with MedicalMaskEstimation:
Mask, NoMask, OccludedFace
Mask, NoMask, MaskNotInPlace, OccludedFace
The default values for all thresholds are taken from the configuration file. See Configuration guide for details.
MedicalMask enumeration#
The MedicalMask enumeration contains all possible results of the MedicalMask estimation:
enum class MedicalMask {
Mask = 0, //!< medical mask is on the face
NoMask, //!< no medical mask on the face
OccludedFace //!< face is occluded by something
};
enum class DetailedMaskType {
CorrectMask = 0, //!< correct mask on the face (mouth and nose are covered correctly)
MouthCoveredWithMask, //!< mask covers only a mouth
ClearFace, //!< clear face - no mask on the face
ClearFaceWithMaskUnderChin, //!< clear face with a mask around of a chin, mask does not cover anything in the face region (from mouth to eyes)
PartlyCoveredFace, //!< face is covered with not a medical mask or a full mask
FullMask, //!< face is covered with a full mask (such as balaclava, sky mask, etc.)
Count
};
Mask
is according toCorrectMask
orMouthCoveredWithMask
;NoMask
is according toClearFace
orClearFaceWithMaskUnderChin
;OccludedFace
is according toPartlyCoveredFace
orFullMask
.
Note - NoMask
means absence of medical mask or any occlusion in the face region (from mouth to eyes).
Note - DetailedMaskType
is not supported for NPU-based platforms.
MedicalMaskEstimation structure#
The MedicalMaskEstimation
structure contains results of the estimation:
struct MedicalMaskEstimation {
MedicalMask result; //!< estimation result (@see MedicalMask enum)
DetailedMaskType maskType; //!< detailed type (@see DetailedMaskType enum)
// scores
float maskScore; //!< medical mask is on the face score
float noMaskScore; //!< no medical mask on the face score
float occludedFaceScore; //!< face is occluded by something score
float scores[static_cast<int>(DetailedMaskType::Count)]{}; //!< detailed estimation scores
inline float getScore(DetailedMaskType type) const;
};
There are two groups of the fields:
1․ The first group contains the result:
MedicalMask result;
Result enum field MedicalMaskEstimation contains the target results of the estimation. Also you can see the more detailed type in MedicalMaskEstimation.
DetailedMaskType maskType; //!< detailed type
2․ The second group contains scores:
float maskScore; //!< medical mask is on the face score
float noMaskScore; //!< no medical mask on the face score
float occludedFaceScore; //!< face is occluded by something score
The score group contains the estimation scores for each possible result of the estimation. All scores are defined in [0,1] range. They can be useful for users who want to change the default thresholds for this estimator. If the default thresholds are used, the group with scores could be just ignored in the user code. More detailed scores for every type of a detailed type of face covering are
float scores[static_cast<int>(DetailedMaskType::Count)]{}; //!< detailed estimation scores
maskScore
is the sum of scores forCorrectMask
,MouthCoveredWithMask
;NoMask
is the sum of scores forClearFace
andClearFaceWithMaskUnderChin
;occludedFaceScore
is the sum of scores forPartlyCoveredFace
andFullMask
fields.
Note - DetailedMaskType
, scores
, getScore
are not supported for NPU-based platforms. It means a user cannot use this fields and methods in code.
MedicalMaskExtended enumeration#
The MedicalMask enumeration contains all possible results of the MedicalMask estimation:
enum class MedicalMaskExtended {
Mask = 0, //!< medical mask is on the face
NoMask, //!< no medical mask on the face
MaskNotInPlace, //!< mask is not on the right place
OccludedFace //!< face is occluded by something
};
MedicalMaskEstimationExtended structure#
The MedicalMaskEstimationExtended structure contains results of the estimation:
struct MedicalMaskEstimationExtended {
MedicalMaskExtended result; //!< estimation result (@see MedicalMaskExtended enum)
// scores
float maskScore; //!< medical mask is on the face score
float noMaskScore; //!< no medical mask on the face score
float maskNotInPlace; //!< mask is not on the right place
float occludedFaceScore; //!< face is occluded by something score
};
There are two groups of the fields:
1․ The first group contains only the result enum:
MedicalMaskExtended result;
Result enum field MedicalMaskEstimationExtended contains the target results of the estimation.
2․ The second group contains scores:
float maskScore; //!< medical mask is on the face score
float noMaskScore; //!< no medical mask on the face score
float maskNotInPlace; //!< mask is not on the right place
float occludedFaceScore; //!< face is occluded by something score
The score group contains the estimation scores for each possible result of the estimation. All scores are defined in [0,1] range.
Filtration parameters#
The estimator is trained to work with face images that meet the following requirements:
"Requirements for fsdk::MedicalMaskEstimator::EstimationResult
"
Attribute | Acceptable values |
---|---|
headPose.pitch | [-40...40] |
headPose.yaw | [-40...40] |
headPose.roll | [-40...40] |
ags | [0.5...1.0] |
Configurations:
See the "Medical mask estimator settings" section in the "ConfigurationGuide.pdf" document.
API structure name:
IMedicalMaskEstimator
Plan files:
- mask_clf_v3_cpu.plan
- mask_clf_v3_cpu-avx2.plan
- mask_clf_v3_gpu.plan
Mouth Estimation Functionality#
Name: MouthEstimator
Algorithm description:
This estimator is designed to predict person's mouth state.
Implementation description:
Mouth Estimation
It returns the following bool flags:
bool isOpened; //!< Mouth is opened flag
bool isSmiling; //!< Person is smiling flag
bool isOccluded; //!< Mouth is occluded flag
Each of these flags indicate specific mouth state that was predicted.
The combined mouth state is assumed if multiple flags are set to true. For example there are many cases where person is smiling and its mouth is wide open.
Mouth estimator provides score probabilities for mouth states in case user need more detailed information:
float opened; //!< mouth opened score
float smile; //!< person is smiling score
float occluded; //!< mouth is occluded score
Mouth Estimation Extended
This estimation is extended version of regular Mouth Estimation (see above). In addition, It returns the following fields:
SmileTypeScores smileTypeScores; //!< Smile types scores
SmileType smileType; //!< Contains smile type if person "isSmiling"
If flag isSmiling
is true, you can get more detailed information of smile using smileType
variable.
smileType
can hold following states:
enum class SmileType {
None, //!< No smile
SmileLips, //!< regular smile, without teeths exposed
SmileOpen //!< smile with teeths exposed
};
If isSmiling
is false, the smileType
assigned to None
. Otherwise, the field will be assigned with
SmileLips
(person is smiling with closed mouth) or SmileOpen
(person is smiling with open mouth, with teeth's exposed).
Extended mouth estimation provides score probabilities for smile type in case user need more detailed information:
struct SmileTypeScores {
float smileLips; //!< person is smiling with lips score
float smileOpen; //!< person is smiling with open mouth score
};
smileType
variable is set based on according scores hold by smileTypeScores
variable - set based on maximum score from
smileLips
and smileOpen
or to None
if person not smiling at all.
if (estimation.isSmiling)
estimation.smileType = estimation.smileTypeScores.smileLips > estimation.smileTypeScores.smileOpen ?
fsdk::SmileType::SmileLips : fsdk::SmileType::SmileOpen;
else
estimation.smileType = fsdk::SmileType::None;
When you use Mouth Estimation Extended, the underlying computation are exactly the same as if you use regular Mouth Estimation. The regular Mouth Estimation was retained for backward compatibility.
These estimators are trained to work with warped images (see Chapter "Image warping" for details).
Recommended thresholds:
The table below contains thresholds specified in the MouthEstimator::Settings
section of the FaceEngine configuration file (faceengine.conf). By default, these threshold values are set to optimal.
"Mouth estimator recommended thresholds"
Threshold | Recommended value |
---|---|
occlusionThreshold | 0.5 |
smileThreshold | 0.5 |
openThreshold | 0.5 |
Filtration parameters:
The estimator is trained to work with face images that meet the following requirements:
- Requirements for
Detector
:
Attribute | Minimum value |
---|---|
detection size | 80 |
Detection size is detection width.
const fsdk::Detection detection = ... // somehow get fsdk::Detection object
const int detectionSize = detection.getRect().width;
- Requirements for
fsdk::MouthEstimator
:
Attribute | Acceptable values |
---|---|
headPose.pitch | [-20...20] |
headPose.yaw | [-25...25] |
headPose.roll | [-10...10] |
Configurations:
See the "Mouth Estimator settings" section in the "ConfigurationGuide.pdf" document.
API structure name:
IMouthEstimator
Plan files:
- mouth_estimation_v4_arm.plan
- mouth_estimation_v4_cpu.plan
- mouth_estimation_v4_cpu-avx2.plan
- mouth_estimation_v4_gpu.plan
Face Occlusion Estimation Functionality#
Name: FaceOcclusionEstimator
Algorithm description:
This estimator is designed to predict occlusions in different parts of the face, such as the forehead, eyes, nose, mouth, and lower face. It also provides an overall occlusion score.
Implementation description:
Face Occlusion Estimation
The estimator returns the following occlusion states:
/**
* @brief FaceOcclusionType enum.
* This enum contains all possible facial occlusion types.
* */
enum class FaceOcclusionType : uint8_t {
Forehead = 0, //!< Forehead
LeftEye, //!< Left eye
RightEye, //!< Right eye
Nose, //!< Nose
Mouth, //!< Mouth
LowerFace, //!< Lower part of the face (chin, mouth, etc.)
Count //!< Total number of occlusion types
};
/**
* @brief FaceOcclusionState enum.
* This enum contains all possible facial occlusion states.
* */
enum class FaceOcclusionState : uint8_t {
NotOccluded = 0, //!< Face is not occluded
Occluded, //!< Face is occluded
Count //!< Total number of states
};
FaceOcclusionState states[static_cast<uint8_t>(FaceOcclusionType::Count)]; //!< Occlusion states for each face region
float typeScores[static_cast<uint8_t>(FaceOcclusionType::Count)]; //!< Probability scores for occlusion types
FaceOcclusionState overallOcclusionState; //!< Overall occlusion state
float overallOcclusionScore; //!< Overall occlusion score
float hairOcclusionScore; //!< Hair occlusion score
To get the occlusion score for a specific facial zone, you can use the following method:
float getScore(FaceOcclusionType type) const {
return typeScores[static_cast<uint8_t>(type)];
}
To get the occlusion state for a specific facial zone, use the following:
FaceOcclusionState getState(FaceOcclusionType type) const {
return states[static_cast<uint8_t>(type)];
}
This estimator is trained to work with warped images and Landmarks5 (see Chapter "Image warping" for details).
Recommended thresholds:
The table below contains thresholds specified in the FaceOcclusion::Settings section of the FaceEngine configuration file (faceengine.conf). These values are optimal by default.
Threshold | Recommended value |
---|---|
normalHairCoeff | 0.15 |
overallOcclusionThreshold | 0.07 |
foreheadThreshold | 0.2 |
eyeThreshold | 0.15 |
noseThreshold | 0.2 |
mouthThreshold | 0.15 |
lowerFaceThreshold | 0.2 |
Configurations
See the "Face Occlusion Estimator settings" section in the "ConfigurationGuide.pdf" document.
Filtration parameters:
Name | Threshold |
---|---|
Face Size | >80px |
Yaw, Pitch, Roll | ±20 |
Blur (Subjective Quality) | >0.61 |
API structure name:
IFaceOcclusionEstimator
Plan files:
- face_occlusion_v1_arm.plan
- face_occlusion_v1_cpu.plan
- face_occlusion_v1_cpu-avx2.plan
- face_occlusion_v1_gpu.plan