Appendix A. Specifications#
Classification performance#
Classification performance was measured on a two datasets:
- Cooperative dataset (containing 20K images from various sources obtained at several banks);
- Non cooperative dataset (containing 20K).
The two tables below contain true positive rates corresponding to select false positive rates.
"Classification performance @ low FPR on cooperative dataset"
FPR | TPR CNN 46 | TPR CNN 52 | TPR CNN 54 | TPR CNN 56 | TPR CNN 57 | TPR CNN 58 | TPR CNN 46m | TPR CNN 52m | TPR CNN 54m | TPR CNN 56m | TPR CNN 59 | TPR CNN 60 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
10^-7^ | 0.9732 | 0.9835 | 0.9765 | 0.9907 | 0.9906 | 0.9910 | 0.9193 | 0.9594 | 0.9699 | 0.9652 | 0.9911 | 0.9917 |
10^-6^ | 0.9852 | 0.9897 | 0.9849 | 0.9914 | 0.9915 | 0.9916 | 0.9598 | 0.9803 | 0.9829 | 0.9814 | 0.9915 | 0.9917 |
10^-5^ | 0.9892 | 0.9908 | 0.9892 | 0.9916 | 0.9917 | 0.9918 | 0.9790 | 0.9880 | 0.9887 | 0.9886 | 0.9919 | 0.9919 |
10^-4^ | 0.9907 | 0.9915 | 0.9909 | 0.9917 | 0.9918 | 0.9919 | 0.9882 | 0.9908 | 0.9910 | 0.9910 | 0.9921 | 0.9921 |
"Classification performance @ low FPR on non cooperative dataset"
FPR | TPR CNN 46 | TPR CNN 52 | TPR CNN 54 | TPR CNN 56 | TPR CNN 57 | TPR CNN 58 | TPR CNN 46m | TPR CNN 52m | TPR CNN 54m | TPR CNN 56m | TPR CNN 59 | TPR CNN 60 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
10^-7^ | 0.8616 | 0.9487 | 0.9638 | 0.9698 | 0.9723 | 0.9767 | 0.7404 | 0.8664 | 0.8813 | 0.8844 | 0.9832 | 0.9893 |
10^-6^ | 0.9073 | 0.9679 | 0.9773 | 0.9809 | 0.9817 | 0.9839 | 0.8208 | 0.9168 | 0.9233 | 0.9229 | 0.9880 | 0.9914 |
10^-5^ | 0.9446 | 0.9799 | 0.9852 | 0.9871 | 0.9873 | 0.9880 | 0.8919 | 0.9516 | 0.9538 | 0.9561 | 0.9908 | 0.9914 |
10^-4^ | 0.9704 | 0.9877 | 0.9896 | 0.9902 | 0.9905 | 0.9909 | 0.9435 | 0.9736 | 0.9752 | 0.9757 | 0.9924 | 0.9925 |
0.985555 0.979419
Runtime performance#
Server environment#
Benchmarking was performed on the following hardware configuration:
- CPU Intel(R) Xeon(R) CPU E5-2687W v4 @ 3.00GHz;
- RAM 128 GB DDR4 (Clock Speed: 2133 MHz);
- GPU NVIDIA Tesla T4.
Face detection performance depends on input image parameters such as resolution and bit depth as well as the size of the detected face.
Input data characteristics:
- Image resolution: 1200x1600px;
- Image format: 24 BPP RGB;
- Typical face size: ~260x260px.
Performance measurements are presented for both CPU and GPU execution modes in the tables below. Measured values are averages of at least 100 experiments.
CPU mode with AVX2#
"CPU mode performance"
Measurement | Average (ms) | CPU threads | BatchSize |
---|---|---|---|
Gaze estimator | 1.3 | 8 | - |
1.5 | 4 | - | |
1.6 | 1 | - | |
Emotion estimator | 8.9 | 8 | - |
11.3 | 4 | - | |
22.1 | 1 | - | |
Attributes estimator | 24.1 | 8 | - |
46.8 | 4 | - | |
63.5 | 1 | - | |
Quality estimator | 0.34 | 8 | - |
0.4 | 4 | - | |
1.0 | 1 | - | |
Overlap estimator | 1.1 | 8 | - |
1.8 | 4 | - | |
5.4 | 1 | - | |
Glasses estimator | 0.9 | 8 | - |
1.0 | 4 | - | |
2.0 | 1 | - | |
Eye estimator (useStatusPlan = false) | 0.5 | 8 | - |
0.5 | 4 | - | |
0.6 | 1 | - | |
Eye estimator (useStatusPlan = true) | 1.2 | 8 | - |
1.0 | 4 | - | |
1.2 | 1 | - | |
Eye estimator Batch | 0.3 | 8 | 8 |
0.3 | 8 | 4 | |
0.6 | 8 | 1 | |
Smile estimator | 0.75 | 8 | - |
1.1 | 4 | - | |
2.5 | 1 | - | |
HeadPose by Image | 0.3 | 8 | - |
0.3 | 8 | - | |
0.38 | 8 | - | |
HeadPose by Image Batch | 0.2 | 8 | 8 |
0.3 | 8 | 4 | |
0.3 | 8 | 1 | |
AGS estimator | 0.4 | 8 | - |
0.48 | 4 | - | |
0.49 | 1 | - | |
Child estimator | 7.8 | 8 | - |
11.1 | 4 | - | |
17.0 | 1 | - | |
Child estimator Batch | 4.1 | 8 | 8 |
6.1 | 8 | 4 | |
7.6 | 8 | 1 | |
LivenessIR estimator | 2.3 | 8 | - |
3.2 | 4 | - | |
7.0 | 1 | - | |
BestShotQuality estimator | 0.7 | 8 | - |
0.7 | 4 | - | |
0.8 | 1 | - | |
BestShotQuality estimator Batch | 0.6 | 8 | 8 |
0.6 | 8 | 4 | |
0.7 | 8 | 1 | |
MedicalMask estimator | 2.8 | 8 | - |
2.8 | 4 | - | |
3.2 | 1 | - | |
MedicalMask estimator Batch | 1.0 | 8 | 8 |
1.2 | 8 | 4 | |
2.8 | 8 | 1 | |
CredibilityCheck estimator | 50.7 | 8 | - |
91.7 | 4 | - | |
221.7 | 1 | - | |
CredibilityCheck estimator Batch | 51.8 | 8 | 8 |
43.8 | 8 | 4 | |
42.1 | 8 | 1 | |
FacialHair estimator | 5.3 | 8 | - |
13.8 | 1 | - | |
FacialHair estimator Batch | 4.1 | 8 | 8 |
5.5 | 8 | 1 | |
HeadWear estimator | 3.86 | 1 | 1 |
2.57 | 8 | 1 | |
HeadWear estimator Batch | 1.07 | 8 | 8 |
LivenessFlyingFaces estimator | 2.5 | 8 | - |
6.8 | 1 | - | |
HeadPose by 68 landmarks | 1.1 | - | - |
Warper | 3.1 | - | - |
Face detection (FaceDetV1) | 7.2 / 13.6 / 21.9 | 8 | - |
(Easy image/complex image/6 faces) | 8.5 / 18.5 / 32.4 | 4 | - |
20.0 / 35.0 / 72.4 | 1 | - | |
Face detection (FaceDetV2) | 7.2 / 5.9 / 13.4 | 8 | - |
(Easy image/complex image/6 faces) | 8.2 / 6.6 / 15.7 | 4 | - |
13.2 / 13.5 / 27.8 | 1 | - | |
Face detection (FaceDetV3) | 16.3 / 28.7 / 22.5 | 8 | - |
(minFaceSize = 90/40/20) | 20.1 / 28.5 / 34.3 | 4 | - |
18.2 / 20.1 / 46.8 | 1 | - | |
Human detection | 19.0 / 21.0 | 8 | - |
(bounding box only) | 18.6 / 33.0 | 4 | - |
(imageSize = 320 / 640) | 15.3 / 51.0 | 1 | - |
Human detection | 30.77 | 8 | - |
(bounding box + keypoints) | 36.47 | 4 | - |
(imageSize = 640) | 69.42 | 1 | - |
Descriptor extractor (CNN 46) | 38.3 | 8 | - |
Backend version | 69.0 | 4 | - |
233.0 | 1 | - | |
Descriptor extractor (CNN 46) | 5.4 | 8 | - |
Mobilenet version | 7.9 | 4 | - |
15.6 | 1 | - | |
Descriptor extractor (CNN 52) | 55.2 | 8 | - |
Backend version | 100.8 | 4 | - |
184.8 | 1 | - | |
Descriptor extractor (CNN 52) | 56.1 | 8 | - |
Mobilenet version | 63.2 | 4 | - |
72.3 | 1 | - | |
Descriptor extractor (CNN 54, 56, 57) | 73.2 | 8 | - |
Backend version | 103.0 | 4 | - |
203.3 | 1 | - | |
Descriptor extractor (CNN 58) | 50.0 | 8 | - |
Backend version | 85.0 | 4 | - |
215.0 | 1 | - | |
Descriptor extractor (CNN 54, 56) | 6.2 | 8 | - |
Mobilenet version | 7.0 | 4 | - |
14.9 | 1 | - | |
Descriptor extractor (CNN 59) | 217 | 1 | 1 |
Backend version | - | - | - |
Descriptor extractor (CNN 60) | 258.0 | 1 | 1 |
Backend version | 51.1 | 8 | 1 |
42.4 | 8 | 8 | |
Descriptor matching (CNN 46, 52) | 46.7 M matches/sec | 1 | 1 |
57.9 M matches/sec | 1 | 1000 | |
Descriptor matching (CNN 54, 56, 57, 58 | 34.6 M matches/sec | 1 | 1000 |
59, 60) |
Note 1: in experiments listed in the table above, face detection and descriptor extraction algorithms used all available CPU cores, whereas matching performance is specified per-core.
GPU mode#
"GPU mode performance"
Measurement | Average (ms) | Batch Size |
---|---|---|
Emotion estimator | 1.9 | - |
Attributes estimator | 6.3 | - |
Quality estimator | 0.9 | - |
Overlap estimator | 1.2 | - |
Glasses estimator | 0.9 | - |
AGS estimator | 0.5 | - |
LivenessFlyingFaces estimator | 3.9 | - |
Eye estimator (useStatusPlan = false) | 0.6 | 1 |
0.3 | 4 | |
0.3 | 8 | |
0.2 | 16 | |
Eye estimator (useStatusPlan = true) | 1.2 | 1 |
0.6 | 4 | |
0.5 | 8 | |
0.5 | 16 | |
Child estimator | 2.6 | 1 |
1.6 | 4 | |
1.5 | 8 | |
1.4 | 16 | |
LivenessIR estimator | 1.3 | 1 |
1.0 | 4 | |
0.9 | 8 | |
0.9 | 16 | |
Smile estimator | 0.8 | - |
HeadPose by image | 0.7 | 1 |
0.2 | 4 | |
0.2 | 8 | |
0.1 | 16 | |
MedicalMask | 3.6 | 1 |
1.9 | 4 | |
1.6 | 8 | |
1.2 | 16 | |
CredibilityCheck | 7.6 | 1 |
5.2 | 4 | |
4.9 | 8 | |
4.7 | 16 | |
FacialHair | 1.8 | 1 |
0.9 | 8 | |
0.8 | 16 | |
HeadWear | 3.98 | 1 |
0.44 | 16 | |
0.35 | 32 | |
Face detection (FaceDetV1) | 4.5 / 6.0 / 10.0 | 1 |
(Easy image/complex image/6 faces) | ||
Face detection (FaceDetV2) | 4.2 / 3.8 / 9.4 | 1 |
(Easy image/complex image/6 faces) | ||
Face detection (FaceDetV3) | 4.7 /12.2 / 39.8 | 1 |
(minFaceSize = 90/40/20) | ||
Human detection | 7.7 | 1 |
(bounding box only) | ||
(imageSize = 640) | ||
Human detection | 13.9 | 1 |
(bounding box + keypoints) | ||
(imageSize = 640) | ||
Descriptor extractor | 8.9/1.8 | 1 |
(CNN 46 backend/ CNN 46 mobilenet) | 3.6/1.4 | 4 |
2.9/1.3 | 8 | |
2.7/1.3 | 16 | |
Descriptor extractor | 18.2/2.8 | 1 |
(CNN 52 backend/ CNN 52 mobilenet) | 9.7/2.0 | 4 |
8.3/1.8 | 8 | |
7.5/1.8 | 16 | |
Descriptor extractor | 13.7/1.9 | 1 |
(CNN 54 backend/ CNN 54 mobilenet) | 8.3/1.8 | 4 |
7.8/1.4 | 8 | |
7.6/1.3 | 16 | |
Descriptor extractor | 13.9 | 1 |
(CNN 56 backend / CNN 56 mobilenet) | 8.6 | 4 |
7.9 | 8 | |
7.4 | 16 | |
Descriptor extractor | 13.9 | 1 |
(CNN 57 backend) | 8.6 | 4 |
7.9 | 8 | |
7.7 | 16 | |
Descriptor extractor | 16.0 | 1 |
(CNN 58 backend) | 9.7 | 4 |
8.6 | 8 | |
8.2 | 16 | |
Descriptor extractor | 15.7 | 1 |
(CNN 59 backend) | 9.9 | 4 |
9.1 | 8 | |
8.8 | 16 | |
Descriptor extractor | 16.0 | 1 |
(CNN 60 backend) | 10.1 | 4 |
9.3 | 8 | |
8.9 | 16 |
Note. Descriptor matching is only implemented on CPU.
Note. The number of CPU threads is set to 4, so FaceEngine can feed the GPU with commands and data in parallel where possible, further minimizing overhead.
Embedded environment#
Face detection performance depends on input image parameters such as resolution and bit depth as well as the size of the detected face.
Input data characteristics:
- Image resolution: 640x480px;
- Image format: 24 BPP RGB;
- Typical face size: ~260x260px.
Jetson#
Jetson does not use mobilenet by default.
Performance measurements are presented for Jetson. Measured values are averages of at least 100 experiments. Mobilenet is not used by default.
"Jetson GPU Performance. Detection and estimation"
Type | Average (ms) | Batch Size |
---|---|---|
Detector (FaceDetV1) | 11.9 / 22.3 / 10.5 | 1 |
(Easy/complex/6 faces) | ||
1 | ||
Detector (FaceDetV2) | 11.36 / 76.8 / 85.3 | 1 |
(Easy/complex/6 faces) | ||
1 | ||
Detector (FaceDetV3) | 30.0 / 50.0 /231.5 | - |
(minFaceSize=90/40/20) | - | |
- | ||
Human Detection | 40.1 / 100.3 | - |
(320px/640px) | - | |
- | ||
Head Pose By Landmarks | 3.6 | 1 |
Eyes Gaze | 14.7 | 1 |
Emotions | 13.7 | 1 |
Attributes | 32.9 | 1 |
Quality | 1.0 | 1 |
Head Pose By Image | 1.5 | 1 |
Head Pose Batch | 1.6 | 1 |
0.8 | 4 | |
0.7 | 8 | |
Warper | 4.1 | 1 |
Eyes | 1.8 | 1 |
Eyes Batch | 1.8 | 1 |
1.2 | 4 | |
1.0 | 8 | |
Infra-Red | 3.5 | 1 |
Infra-Red Batch | 3.5 | 1 |
2.9 | 4 | |
2.8 | 8 | |
AGS | 1.8 | 1 |
Overlap | 3.6 | 1 |
Glasses | 3.7 | 1 |
Smile | 2.2 | 1 |
Eyes | 3.3 | 1 |
Child | 17.4 | 1 |
Child Batch | 17.3 | 1 |
12.9 | 4 | |
11.9 | 8 | |
Best Shot Quality | 3.5 | 1 |
Best Shot Quality Batch | 3.5 | 1 |
1.9 | 4 | |
2.2 | 8 | |
MedicalMask estimator | 16.8 | - |
MedicalMask estimator Batch | 6.7 | 1 |
6.3 | 4 | |
6.2 | 8 | |
FacialHair | 11.06 | 1 |
9.01 | 16 | |
8.72 | 32 |
"Jetson GPU Performance. Extractor batch"
Type | Model | NumThreads | Average (ms) | Batch Size |
---|---|---|---|---|
Extractor Batch | 46 | 8 | 71.1 | 1 |
46 | 8 | 67.9 | 4 | |
46 | 8 | 55.8 | 8 | |
52 | 8 | 108.0 | 1 | |
52 | 8 | 75.6 | 4 | |
52 | 8 | 76.9 | 8 | |
58 | 8 | 98.9 | 1 | |
58 | 8 | 81.3 | 4 | |
58 | 8 | 73.2 | 8 | |
59 | 8 | 99.5 | 1 | |
59 | 8 | 81.6 | 4 | |
59 | 8 | 73.1 | 8 | |
Descriptor size#
Table below shows size of serialized descriptors to estimate memory requirements.
"Descriptor size"
Descriptor version | Data size (bytes) | Metadata size (bytes) | Total size |
---|---|---|---|
CNN 46 | 256 | 8 | 264 |
CNN 52 | 256 | 8 | 264 |
CNN 54 | 512 | 8 | 520 |
CNN 56 | 512 | 8 | 520 |
CNN 57 | 512 | 8 | 520 |
CNN 58 | 512 | 8 | 520 |
Metadata includes signature and version information that may be omitted during serialization if the NoSignature flag is specified.
When estimating individual descriptor size in memory or serialization storage requirements with default options, consider using values from the "Total size" column.
When estimating memory requirements for descriptor batches, use values from the "Data size" column instead, since a descriptor batch does not duplicate metadata per descriptor and thus is more memory-efficient.
Note: these numbers are for approximate computation only, since they do not include overhead like memory alignment for accelerated SIMD processing and the like.
Feature matrix#
The table below shows FaceEngine features supported by different editions.
"Feature matrix"
Facility | Module | Complete | Frontend |
---|---|---|---|
Core | Yes | Yes | |
Face detection & alignment | Face detector | Yes | Yes |
5-point face alignment | Yes | Yes | |
68-point face alignment | Yes | Yes | |
Parameter estimation | Attribute estimation | Yes | Yes |
Quality estimation | Yes | Yes | |
Color estimation | Yes | Yes | |
Eye estimation | Yes | Yes | |
Head pose estimation | Yes | Yes | |
Gaze estimation | Yes | Yes | |
Smile estimation | Yes | Yes | |
Emotions estimation | Yes | Yes | |
AGS estimation | Yes | Yes | |
Glasses estimation | Yes | Yes | |
Overlap estimation | Yes | Yes | |
Face descriptors | Descriptor extraction | Yes | No |
Descriptor matching | Yes | No | |
Descriptor batching | Yes | No | |
Descriptor search acceleration | Yes | No |
See file "doc/FeatureMap.htm" for more details.