Parameter Estimation Facility#
Overview#
The estimation facility is the only multi-purpose facility in FaceEngine. It is designed as a collection of tools that help to estimate various images or depicted object properties. These properties may be used to increase the precision of algorithms implemented by other FaceEngine facilities or to accomplish custom user tasks.
Best shot selection functionality#
Eyes Estimation#
Name: EyeEstimator
Algorithm description:
The estimator is trained to work with warped images (see chapter "Image warping" for details).
This estimator aims to determine:
- Eye state: Open, Closed, Occluded;
- Precise eye iris location as an array of landmarks;
- Precise eyelid location as an array of landmarks.
You can only pass warped image with detected face to the estimator interface. Better image quality leads to better results.
Eye state classifier supports three categories: "Open", "Closed", "Occluded". Poor quality images or ones that depict obscured eyes (think eyewear, hair, gestures) fall into the "Occluded" category. It is always a good idea to check eye state before using the segmentation result.
The precise location allows iris and eyelid segmentation. The estimator is capable of outputting iris and eyelid shapes as an array of points together forming an ellipsis. You should only use segmentation results if the state of that eye is "Open".
Implementation description:
The estimator:
-
Implements the estimate() function that accepts warped source image and warped landmarks, either of type Landmarks5 or Landmarks68. The warped image and landmarks are received from the warper (see
IWarper::warp()
); -
Classifies eyes state and detects its iris and eyelid landmarks;
-
Outputs EyesEstimation structures.
Orientation terms 'left' and 'right' refer to the way you see the image as it is shown on the screen. It means that left eye is not necessarily left from the person's point of view, but is on the left side of the screen. Consequently, right eye is the one on the right side of the screen. More formally, the label 'left' refers to subject's left eye (and similarly for the right eye), such that xright < xleft.
EyesEstimation::EyeAttributes
presents eye state as enum EyeState with possible values: Open, Closed, Occluded.
Iris landmarks are presented with a template structure Landmarks that is specialized for 32 points.
Eyelid landmarks are presented with a template structure Landmarks that is specialized for 6 points.
The EyesEstimation structure contains results of the estimation:
struct EyesEstimation {
/**
* @brief Eyes attribute structure.
* */
struct EyeAttributes {
/**
* @brief Enumeration of possible eye states.
* */
enum class State : uint8_t {
Closed, //!< Eye is closed.
Open, //!< Eye is open.
Occluded //!< Eye is blocked by something not transparent, or landmark passed to estimator doesn't point to an eye.
};
static constexpr uint64_t irisLandmarksCount = 32; //!< Iris landmarks amount.
static constexpr uint64_t eyelidLandmarksCount = 6; //!< Eyelid landmarks amount.
/// @brief alias for @see Landmarks template structure with irisLandmarksCount as param.
using IrisLandmarks = Landmarks<irisLandmarksCount>;
/// @brief alias for @see Landmarks template structure with eyelidLandmarksCount as param
using EyelidLandmarks = Landmarks<eyelidLandmarksCount>;
State state; //!< State of an eye.
IrisLandmarks iris; //!< Iris landmarks.
EyelidLandmarks eyelid; //!< Eyelid landmarks
};
EyeAttributes leftEye; //!< Left eye attributes
EyeAttributes rightEye; //!< Right eye attributes
};
API structure name:
IEyeEstimator
Plan files:
- eyes_estimation_flwr8_cpu.plan
- eyes_estimation_ir_cpu.plan
- eyes_estimation_flwr8_cpu-avx2.plan
- eyes_estimation_ir_cpu-avx2.plan
- eyes_estimation_ir_gpu.plan
- eyes_estimation_flwr8_gpu.plan
- eye_status_estimation_cpu.plan
- eye_status_estimation_cpu-avx2.plan
- eye_status_estimation_gpu.plan
BestShotQuality Estimation#
Name: BestShotQualityEstimator
Algorithm description:
The BestShotQuality estimator is designed to evaluate image quality to choose the best image before descriptor extraction. The BestShotQuality estimator consists of two components - AGS (garbage score) and Head Pose.
AGS aims to determine the source image score for further descriptor extraction and matching.
Estimation output is a float score which is normalized in range [0..1]. The closer score to 1, the better matching result is received for the image.
When you have several images of a person, it is better to save the image with the highest AGS score.
Recommended threshold for AGS score is equal to 0.2. But it can be changed depending on the purpose of use. Consult VisionLabs about the recommended threshold value for this parameter.
Head Pose determines person head rotation angles in 3D space, namely pitch, yaw and roll.
Since 3D head translation is hard to determine reliably without camera-specific calibration, only 3D rotation component is estimated.
Head pose estimation characteristics:
- Units (degrees);
- Notation (Euler angles);
- Precision (see table below).
Implementation description:
The estimator (see IBestShotQualityEstimator in IEstimator.h):
-
Implements the estimate() function that needs
fsdk::Image
in R8G8B8 format,fsdk::Detection
structure of corresponding source image (see section "Detection structure" in chapter "Face detection facility"),fsdk::IBestShotQualityEstimator::EstimationRequest
structure andfsdk::IBestShotQualityEstimator::EstimationResult
to store estimation result; -
Implements the estimate() function that needs the span of
fsdk::Image
in R8G8B8 format, the span offsdk::Detection
structures of corresponding source images (see section "Detection structure" in chapter "Face detection facility"),fsdk::IBestShotQualityEstimator::EstimationRequest
structure and span offsdk::IBestShotQualityEstimator::EstimationResult
to store estimation results. -
Implements the estimateAsync() function that needs
fsdk::Image
in R8G8B8 format,fsdk::Detection
structure of corresponding source image (see section "Detection structure" in chapter "Face detection facility"),fsdk::IBestShotQualityEstimator::EstimationRequest
structure;
Note: Method estimateAsync() is experimental, and it's interface may be changed in the future. Note: Method estimateAsync() is not marked as noexcept and may throw an exception.
Before using this estimator, user is free to decide whether to estimate or not some listed attributes. For this purpose, estimate() method takes one of the estimation requests:
fsdk::IBestShotQualityEstimator::EstimationRequest::estimateAGS
to make only AGS estimation;fsdk::IBestShotQualityEstimator::EstimationRequest::estimateHeadPose
to make only Head Pose estimation;fsdk::IBestShotQualityEstimator::EstimationRequest::estimateAll
to make both AGS and Head Pose estimations;
The EstimationResult structure contains results of the estimation:
struct EstimationResult {
Optional<HeadPoseEstimation> headPose; //!< HeadPose estimation if was requested, empty otherwise
Optional<float> ags; //!< AGS estimation if was requested, empty otherwise
};
Head Pose accuracy:
Prediction precision decreases as a rotation angle increases. We present typical average errors for different angle ranges in the table below.
"Head pose prediction precision"
Range | -45°...+45° | < -45° or > +45° | |
---|---|---|---|
Average prediction error (per axis) | Yaw | ±2.7° | ±4.6° |
Average prediction error (per axis) | Pitch | ±3.0° | ±4.8° |
Average prediction error (per axis) | Roll | ±3.0° | ±4.6° |
Zero position corresponds to a face placed orthogonally to camera direction, with the axis of symmetry parallel to the vertical camera axis.
API structure name:
IBestShotQualityEstimator
Plan files:
For more information see Approximate Garbage Score Estimation (AGS)
and Head Pose Estimation
Head Pose Estimation#
This estimator is designed to determine a camera-space head pose. Since the 3D head translation is hard to reliably determine without a camera-specific calibration, only the 3D rotation component is estimated.
There are two head pose estimation methods available:
- Estimate by 68 face-aligned landmarks. You can get it from the Detector facility, see Chapter "Face detection facility" for details.
- Estimate by the original input image in the RGB format.
An estimation by the image is more precise. If you have already extracted 68 landmarks for another facilities, you can save time and use the fast estimator from 68 landmarks.
By default, all methods are available to use in the faceengine.conf configuration file in section "HeadPoseEstimator". You can disable these methods to decrease RAM usage and initialization time.
Estimation characteristics:
- Units (degrees)
- Notation (Euler angles)
- Precision (see table \ref{5.6})
Note: Prediction precision decreases as a rotation angle increases. We present typical average errors for different angle ranges in the table \ref{5.6}.
"Head pose prediction precision" \label{5.6}
Range | -45°...+45° | < -45° or > +45° | |
---|---|---|---|
Average prediction error (per axis) | Yaw | ±2.7° | ±4.6° |
Average prediction error (per axis) | Pitch | ±3.0° | ±4.8° |
Average prediction error (per axis) | Roll | ±3.0° | ±4.6° |
Zero position corresponds to a face placed orthogonally to the camera direction, with the axis of symmetry parallel to the vertical camera axis. See figure \ref{fig:Head111} for a reference.
Note: In order to work, this estimator requires precise 68-point face alignment results, so familiarize with section "Face alignment" in the "Face detection facility" chapter, as well.
Approximate Garbage Score Estimation (AGS)#
This estimator aims to determine the source image score for further descriptor extraction and matching. The higher the score, the better matching result is received for the image.
When you have several images of a person, it is better to save the image with the highest AGS score.
Contact VisionLabs for the recommended threshold value for this parameter.
The estimator (see IAGSEstimator
in IEstimator.h
):
- Implements the
estimate()
function that accepts the source image in the R8G8B8 format and thefsdk::Detection
structure of corresponding source image. For details, see section "Detection structure" in chapter "Face detection facility". - Estimates garbage score of the input image.
- Outputs a garbage score value.
LivenessOneShotRGB Estimation#
Name: LivenessOneShotRGBEstimator
Algorithm description:
This estimator shows whether the person's face is real or fake by the following types of attacks:
- Printed Photo Attack. One or several photos of another person are used.
- Video Replay Attack. A video of another person is used.
- Printed Mask Attack. An imposter cuts out a face from a photo and covers his face with it.
- 3D Mask Attack. An imposer puts on a 3D mask depicting the face of another person.
The requirements for the processed image and the face in the image are listed below.
Mouth Estimation Functionality#
Name: MouthEstimator
Algorithm description:
This estimator is designed to predict person's mouth state.
Implementation description:
Mouth Estimation
It returns the following bool flags:
bool isOpened; //!< Mouth is opened flag
bool isSmiling; //!< Person is smiling flag
bool isOccluded; //!< Mouth is occluded flag
Each of these flags indicate specific mouth state that was predicted.
The combined mouth state is assumed if multiple flags are set to true. For example there are many cases where person is smiling and its mouth is wide open.
Mouth estimator provides score probabilities for mouth states in case user need more detailed information:
float opened; //!< mouth opened score
float smile; //!< person is smiling score
float occluded; //!< mouth is occluded score
Mouth Estimation Extended
This estimation is extended version of regular Mouth Estimation (see above). In addition, It returns the following fields:
SmileTypeScores smileTypeScores; //!< Smile types scores
SmileType smileType; //!< Contains smile type if person "isSmiling"
If flag isSmiling
is true, you can get more detailed information of smile using smileType
variable.
smileType
can hold following states:
enum class SmileType {
None, //!< No smile
SmileLips, //!< regular smile, without teeths exposed
SmileOpen //!< smile with teeths exposed
};
If isSmiling
is false, the smileType
assigned to None
. Otherwise, the field will be assigned with
SmileLips
(person is smiling with closed mouth) or SmileOpen
(person is smiling with open mouth, with teeth's exposed).
Extended mouth estimation provides score probabilities for smile type in case user need more detailed information:
struct SmileTypeScores {
float smileLips; //!< person is smiling with lips score
float smileOpen; //!< person is smiling with open mouth score
};
smileType
variable is set based on according scores hold by smileTypeScores
variable - set based on maximum score from
smileLips
and smileOpen
or to None
if person not smiling at all.
if (estimation.isSmiling)
estimation.smileType = estimation.smileTypeScores.smileLips > estimation.smileTypeScores.smileOpen ?
fsdk::SmileType::SmileLips : fsdk::SmileType::SmileOpen;
else
estimation.smileType = fsdk::SmileType::None;
When you use Mouth Estimation Extended, the underlying computation are exactly the same as if you use regular Mouth Estimation. The regular Mouth Estimation was retained for backward compatibility.
These estimators are trained to work with warped images (see Chapter "Image warping" for details).
Recommended thresholds:
The table below contains thresholds specified in the MouthEstimator::Settings
section of the FaceEngine configuration file (faceengine.conf). By default, these threshold values are set to optimal.
"Mouth estimator recommended thresholds"
Threshold | Recommended value |
---|---|
occlusionThreshold | 0.5 |
smileThreshold | 0.5 |
openThreshold | 0.5 |
Filtration parameters:
The estimator is trained to work with face images that meet the following requirements:
- Requirements for
Detector
:
Attribute | Minimum value |
---|---|
detection size | 80 |
Detection size is detection width.
const fsdk::Detection detection = ... // somehow get fsdk::Detection object
const int detectionSize = detection.getRect().width;
- Requirements for
fsdk::MouthEstimator
:
Attribute | Acceptable values |
---|---|
headPose.pitch | [-20...20] |
headPose.yaw | [-25...25] |
headPose.roll | [-10...10] |
Configurations:
See the "Mouth Estimator settings" section in the "ConfigurationGuide.pdf" document.
API structure name:
IMouthEstimator
Plan files:
- mouth_estimation_v4_arm.plan
- mouth_estimation_v4_cpu.plan
- mouth_estimation_v4_cpu-avx2.plan
- mouth_estimation_v4_gpu.plan
Face Occlusion Estimation Functionality#
Name: FaceOcclusionEstimator
Algorithm description:
This estimator is designed to predict occlusions in different parts of the face, such as the forehead, eyes, nose, mouth, and lower face. It also provides an overall occlusion score.
Implementation description:
Face Occlusion Estimation
The estimator returns the following occlusion states:
/**
* @brief FaceOcclusionType enum.
* This enum contains all possible facial occlusion types.
* */
enum class FaceOcclusionType : uint8_t {
Forehead = 0, //!< Forehead
LeftEye, //!< Left eye
RightEye, //!< Right eye
Nose, //!< Nose
Mouth, //!< Mouth
LowerFace, //!< Lower part of the face (chin, mouth, etc.)
Count //!< Total number of occlusion types
};
/**
* @brief FaceOcclusionState enum.
* This enum contains all possible facial occlusion states.
* */
enum class FaceOcclusionState : uint8_t {
NotOccluded = 0, //!< Face is not occluded
Occluded, //!< Face is occluded
Count //!< Total number of states
};
FaceOcclusionState states[static_cast<uint8_t>(FaceOcclusionType::Count)]; //!< Occlusion states for each face region
float typeScores[static_cast<uint8_t>(FaceOcclusionType::Count)]; //!< Probability scores for occlusion types
FaceOcclusionState overallOcclusionState; //!< Overall occlusion state
float overallOcclusionScore; //!< Overall occlusion score
float hairOcclusionScore; //!< Hair occlusion score
To get the occlusion score for a specific facial zone, you can use the following method:
float getScore(FaceOcclusionType type) const {
return typeScores[static_cast<uint8_t>(type)];
}
To get the occlusion state for a specific facial zone, use the following:
FaceOcclusionState getState(FaceOcclusionType type) const {
return states[static_cast<uint8_t>(type)];
}
This estimator is trained to work with warped images and Landmarks5 (see Chapter "Image warping" for details).
Recommended thresholds:
The table below contains thresholds specified in the FaceOcclusion::Settings section of the FaceEngine configuration file (faceengine.conf). These values are optimal by default.
Threshold | Recommended value |
---|---|
normalHairCoeff | 0.15 |
overallOcclusionThreshold | 0.07 |
foreheadThreshold | 0.2 |
eyeThreshold | 0.15 |
noseThreshold | 0.2 |
mouthThreshold | 0.15 |
lowerFaceThreshold | 0.2 |
Configurations
See the "Face Occlusion Estimator settings" section in the "ConfigurationGuide.pdf" document.
Filtration parameters:
Name | Threshold |
---|---|
Face Size | >80px |
Yaw, Pitch, Roll | ±20 |
Blur (Subjective Quality) | >0.61 |
API structure name:
IFaceOcclusionEstimator
Plan files:
- face_occlusion_v1_arm.plan
- face_occlusion_v1_cpu.plan
- face_occlusion_v1_cpu-avx2.plan
- face_occlusion_v1_gpu.plan
Parameters | Requirements |
---|---|
Minimum resolution for mobile devices | 720x960 pixels |
Maximum resolution for mobile devices | 1080x1920 pixels |
Minimum resolution for webcams | 1280x720 pixels |
Maximum resolution for webcams | 1920x1080 pixels |
Compression | No |
Image warping | No |
Image cropping | No |
Effects overlay | No |
Mask | No |
Number of faces in the frame | 1 |
Face detection bounding box width | More than 200 pixels |
Frame edges offset | More than 10 pixels |
Head pose | -20 to +20 degrees for head pitch, yaw, and roll |
Image quality | The face in the frame should not be overexposed, underexposed, or blurred. |
See image quality thresholds in the "Image Quality Estimation" section.
Implementation description:
The estimator (see ILivenessOneShotRGBEstimator in ILivenessOneShotRGBEstimator.h):
-
Implements the estimate() function that needs
fsdk::Image
,fsdk::Detection
andfsdk::Landmarks5
objects (see section "Detection structure" in chapter "Face detection facility"). Output estimation is a structurefsdk::LivenessOneShotRGBEstimation
. -
Implements the estimate() function that needs the span of
fsdk::Image
, span offsdk::Detection
and span offsdk::Landmarks5
(see section "Detection structure" in chapter "Face detection facility").
The first output estimation is a span of structurefsdk::LivenessOneShotRGBEstimation
. The second output value (structurefsdk::LivenessOneShotRGBEstimation
) is the result of aggregation based on span of estimations announced above. Pay attention the second output value (aggregation) is optional, i.e.default argument
, which isnullptr
.
The LivenessOneShotRGBEstimation structure contains results of the estimation:
struct LivenessOneShotRGBEstimation {
enum class State {
Alive = 0, //!< The person on image is real
Fake, //!< The person on image is fake (photo, printed image)
Unknown //!< The liveness status of person on image is Unknown
};
float score; //!< Estimation score
State state; //!< Liveness status
float qualityScore; //!< Liveness quality score
};
Estimation score is normalized in range [0..1], where 1 - is real person, 0 - is fake.
Liveness quality score is an image quality estimation for the liveness recognition.
This parameter is used for filtering if it is possible to make bestshot when checking for liveness.
The reference score is 0,5.
The value of State
depends on score
and qualityThreshold
.
The value qualityThreshold
can be given as an argument of method estimate
(see ILivenessOneShotRGBEstimator
),
and in configuration file faceengine.conf (see ConfigurationGuide LivenessOneShotRGBEstimator
).
Recommended thresholds:
Table below contains thresholds from faceengine configuration file (faceengine.conf)
in the LivenessOneShotRGBEstimator::Settings
section.
By default, these threshold values are set to optimal.
"LivenessOneShotRGB estimator recommended thresholds"
Threshold | Recommended value |
---|---|
realThreshold | 0.5 |
qualityThreshold | 0.5 |
calibrationCoeff | 0.89 |
calibrationCoeff | 0.991 |
Configurations:
See the "LivenessOneShotRGBEstimator settings" section in the "ConfigurationGuide.pdf" document.
API structure name:
ILivenessOneShotRGBEstimator
Plan files:
- oneshot_rgb_liveness_v8_model_3_cpu.plan
- oneshot_rgb_liveness_v8_model_4_cpu.plan
- oneshot_rgb_liveness_v8_model_3_arm.plan
- oneshot_rgb_liveness_v8_model_4_arm.plan