Skip to content

Appendix A. Specifications#

Classification performance#

Classification performance was measured on a two datasets:

  • Cooperative dataset ( containing 20K images from various sources obtained at several banks);
  • Non cooperative dataset ( containing 20K ).

The two tables below contain true positive rates corresponding to select false positive rates.

"Classification performance @ low FPR on cooperative dataset"

FPR TPR CNN 46 TPR CNN 52 TPR CNN 54 TPR CNN 56 TPR CNN 57 TPR CNN 58 TPR CNN 46m TPR CNN 52m TPR CNN 54m TPR CNN 56m
10^-7^ 0.9732 0.9835 0.9765 0.9907 0.9906 0.9910 0.9193 0.9594 0.9699 0.9652
10^-6^ 0.9852 0.9897 0.9849 0.9914 0.9915 0.9916 0.9598 0.9803 0.9829 0.9814
10^-5^ 0.9892 0.9908 0.9892 0.9916 0.9917 0.9918 0.9790 0.9880 0.9887 0.9886
10^-4^ 0.9907 0.9915 0.9909 0.9917 0.9918 0.9919 0.9882 0.9908 0.9910 0.9910

"Classification performance @ low FPR on non cooperative dataset"

FPR TPR CNN 46 TPR CNN 52 TPR CNN 54 TPR CNN 56 TPR CNN 57 TPR CNN 58 TPR CNN 46m TPR CNN 52m TPR CNN 54m TPR CNN 56m
10^-7^ 0.8616 0.9487 0.9638 0.9698 0.9723 0.9767 0.7404 0.8664 0.8813 0.8844
10^-6^ 0.9073 0.9679 0.9773 0.9809 0.9817 0.9839 0.8208 0.9168 0.9233 0.9229
10^-5^ 0.9446 0.9799 0.9852 0.9871 0.9873 0.9880 0.8919 0.9516 0.9538 0.9561
10^-4^ 0.9704 0.9877 0.9896 0.9902 0.9905 0.9909 0.9435 0.9736 0.9752 0.9757

Runtime performance#

Server environment#

Benchmarking was performed on the following hardware configuration:

  • CPU Intel(R) Xeon(R) CPU E5-2687W v4 @ 3.00GHz;
  • RAM 128 GB DDR4 (Clock Speed: 2133 MHz);
  • GPU NVIDIA Tesla T4.

Face detection performance depends on input image parameters such as resolution and bit depth as well as the size of the detected face.

Input data characteristics:

  • Image resolution: 1200x1600px;
  • Image format: 24 BPP RGB;
  • Typical face size: ~260x260px.

Performance measurements are presented for both CPU and GPU execution modes in the tables below. Measured values are averages of at least 100 experiments.

CPU mode with AVX2#

"CPU mode performance"

Measurement Average (ms) CPU threads BatchSize
Gaze estimator 1.3 8 -
1.5 4 -
1.6 1 -
Emotion estimator 8.9 8 -
11.3 4 -
22.1 1 -
Attributes estimator 24.1 8 -
46.8 4 -
63.5 1 -
Quality estimator 0.34 8 -
0.4 4 -
1.0 1 -
Overlap estimator 1.1 8 -
1.8 4 -
5.4 1 -
Glasses estimator 0.9 8 -
1.0 4 -
2.0 1 -
Eye estimator (useStatusPlan = false) 0.5 8 -
0.5 4 -
0.6 1 -
Eye estimator (useStatusPlan = true) 1.2 8 -
1.0 4 -
1.2 1 -
Eye estimator Batch 0.3 8 8
0.3 8 4
0.6 8 1
Smile estimator 0.75 8 -
1.1 4 -
2.5 1 -
HeadPose by Image 0.3 8 -
0.3 8 -
0.38 8 -
HeadPose by Image Batch 0.2 8 8
0.3 8 4
0.3 8 1
AGS estimator 0.4 8 -
0.48 4 -
0.49 1 -
Child estimator 7.8 8 -
11.1 4 -
17.0 1 -
Child estimator Batch 4.1 8 8
6.1 8 4
7.6 8 1
LivenessIR estimator 2.3 8 -
3.2 4 -
7.0 1 -
BestShotQuality estimator 0.7 8 -
0.7 4 -
0.8 1 -
BestShotQuality estimator Batch 0.6 8 8
0.6 8 4
0.7 8 1
MedicalMask estimator 10.3 8 -
13.6 4 -
21.3 1 -
MedicalMask estimator Batch 4.6 8 8
6.2 8 4
9.7 8 1
CredibilityCheck estimator 50.7 8 -
91.7 4 -
221.7 1 -
CredibilityCheck estimator Batch 51.8 8 8
43.8 8 4
42.1 8 1
FacialHair estimator 2.8 8 -
4.3 4 -
5.5 1 -
FacialHair estimator Batch 1.6 8 8
1.7 8 4
3.0 8 1
HeadPose by 68 landmarks 1.1 - -
Warper 3.1 - -
Face detection (FaceDetV1) 7.2 / 13.6 / 21.9 8 -
(Easy image/complex image/6 faces) 8.5 / 18.5 / 32.4 4 -
20.0 / 35.0 / 72.4 1 -
Face detection (FaceDetV2) 7.2 / 5.9 / 13.4 8 -
(Easy image/complex image/6 faces) 8.2 / 6.6 / 15.7 4 -
13.2 / 13.5 / 27.8 1 -
Face detection (FaceDetV3) 16.3 / 28.7 / 22.5 8 -
(minFaceSize = 90/40/20) 20.1 / 28.5 / 34.3 4 -
18.2 / 20.1 / 46.8 1 -
Human detection 19.0 / 21.0 8 -
(bounding box only) 18.6 / 33.0 4 -
(imageSize = 320 / 640) 15.3 / 51.0 1 -
Human detection 30.77 8 -
(bounding box + keypoints) 36.47 4 -
(imageSize = 640) 69.42 1 -
Descriptor extractor (CNN 46) 38.3 8 -
Backend version 69.0 4 -
233.0 1 -
Descriptor extractor (CNN 46) 5.4 8 -
Mobilenet version 7.9 4 -
15.6 1 -
Descriptor extractor (CNN 52) 55.2 8 -
Backend version 100.8 4 -
184.8 1 -
Descriptor extractor (CNN 52) 56.1 8 -
Mobilenet version 63.2 4 -
72.3 1 -
Descriptor extractor (CNN 54, 56, 57) 73.2 8 -
Backend version 103.0 4 -
203.3 1 -
Descriptor extractor (CNN 58) 50.0 8 -
Backend version 85.0 4 -
215.0 1 -
Descriptor extractor (CNN 54, 56) 6.2 8 -
Mobilenet version 7.0 4 -
14.9 1 -
Descriptor matching (CNN 46, 52) 46.7 M matches/sec 1 1
57.9 M matches/sec 1 1000
Descriptor matching (CNN 54, 56, 57, 58) 34.6 M matches/sec 1 1
41.4 M matches/sec 1 1000

Note 1: in experiments listed in the table above, face detection and descriptor extraction algorithms used all available CPU cores, whereas matching performance is specified per-core.

GPU mode#

"GPU mode performance"

Measurement Average (ms) Batch Size
Emotion estimator 1.9 -
Attributes estimator 6.3 -
Quality estimator 0.9 -
Overlap estimator 1.2 -
Glasses estimator 0.9 -
AGS estimator 0.5 -
Eye estimator (useStatusPlan = false) 0.6 1
0.3 4
0.3 8
0.2 16
Eye estimator (useStatusPlan = true) 1.2 1
0.6 4
0.5 8
0.5 16
Child estimator 2.6 1
1.6 4
1.5 8
1.4 16
LivenessIR estimator 1.3 1
1.0 4
0.9 8
0.9 16
Smile estimator 0.8 -
HeadPose by image 0.7 1
0.2 4
0.2 8
0.1 16
MedicalMask 0.9 1
0.3 4
0.2 8
0.2 16
CredibilityCheck 7.6 1
5.2 4
4.9 8
4.7 16
FacialHair 1.3 1
0.6 4
0.5 8
0.4 16
Face detection (FaceDetV1) 4.5 / 6.0 / 10.0 1
(Easy image/complex image/6 faces)
Face detection (FaceDetV2) 4.2 / 3.8 / 9.4 1
(Easy image/complex image/6 faces)
Face detection (FaceDetV3) 4.7 /12.2 / 39.8 1
(minFaceSize = 90/40/20)
Human detection 7.7 1
(bounding box only)
(imageSize = 640)
Human detection 13.9 1
(bounding box + keypoints)
(imageSize = 640)
Descriptor extractor 8.9/1.8 1
(CNN 46 backend/ CNN 46 mobilenet) 3.6/1.4 4
2.9/1.3 8
2.7/1.3 16
Descriptor extractor 18.2/2.8 1
(CNN 52 backend/ CNN 52 mobilenet) 9.7/2.0 4
8.3/1.8 8
7.5/1.8 16
Descriptor extractor 13.7/1.9 1
(CNN 54 backend/ CNN 54 mobilenet) 8.3/1.8 4
7.8/1.4 8
7.6/1.3 16
Descriptor extractor 13.9 1
(CNN 56 backend / CNN 56 mobilenet) 8.6 4
7.9 8
7.4 16
Descriptor extractor 13.9 1
(CNN 57 backend) 8.6 4
7.9 8
7.7 16
Descriptor extractor 16.0 1
(CNN 58 backend) 9.7 4
8.6 8
8.2 16

Note. Descriptor matching is only implemented on CPU.

Note. The number of CPU threads is set to 4, so FaceEngine can feed the GPU with commands and data in parallel where possible, further minimizing overhead.

Embedded environment#

Face detection performance depends on input image parameters such as resolution and bit depth as well as the size of the detected face.

Input data characteristics:

  • Image resolution: 640x480px;
  • Image format: 24 BPP RGB;
  • Typical face size: ~260x260px.

Jetson#

 

Jetson does not use mobilenet by default.

Performance measurements are presented for Jetson. Measured values are averages of at least 100 experiments. Mobilenet is not used by default.

"Jetson GPU Performance. Detection and estimation"

Type Average (ms) Batch Size
Detector (FaceDetV1) 11.9 / 22.3 / 10.5 1
(Easy/complex/6 faces)
1
Detector (FaceDetV2) 11.36 / 76.8 / 85.3 1
(Easy/complex/6 faces)
1
Detector (FaceDetV3) 30.0 / 50.0 /231.5 -
(minFaceSize=90/40/20) -
-
Human Detection 40.1 / 100.3 -
(320px/640px) -
-
Head Pose By Landmarks 3.6 1
Eyes Gaze 14.7 1
Emotions 13.7 1
Attributes 32.9 1
Quality 1.0 1
Head Pose By Image 1.5 1
Head Pose Batch 1.6 1
0.8 4
0.7 8
Warper 4.1 1
Eyes 1.8 1
Eyes Batch 1.8 1
1.2 4
1.0 8
Infra-Red 3.5 1
Infra-Red Batch 3.5 1
2.9 4
2.8 8
AGS 1.8 1
Overlap 3.6 1
Glasses 3.7 1
Smile 2.2 1
Eyes 3.3 1
Child 17.4 1
Child Batch 17.3 1
12.9 4
11.9 8
Best Shot Quality 3.5 1
Best Shot Quality Batch 3.5 1
1.9 4
2.2 8
MedicalMask estimator 7.5 -
MedicalMask estimator Batch 8.0 1
4.0 4
2.8 8

"Jetson GPU Performance. Extractor batch"

Type Model NumThreads Average (ms) Batch Size
Extractor Batch 46 8 71.1 1
46 8 67.9 4
46 8 55.8 8
52 8 108.0 1
52 8 75.6 4
52 8 76.9 8
Extractor Batch 54 8 76.3 1
54 8 63.0 4
54 8 62.2 8
56 8 75.6 1
56 8 68.6 4
56 8 64.5 8

"Jetson ARM Performance. Detection and estimation"

Type Threads Average (ms) BatchSize
Detector (FaceDetV1) 1 73.5 / 275.6/130.2 -
(Easy/complex/6 faces) 4 43.9 / 134.9/ 73.0 -
8 39.6 / 112.1/ 62.6 -
Detector (FaceDetV2) 1 38.3 / 141.2/ 32.6 -
(Easy/complex/6 faces) 4 24.8 / 78.8 / 22.6 -
8 23.7 / 65.8 / 21.6 -
Detector (FaceDetV3) 1 138.0/101.4/ 472.8 -
(minFaceSize=90/40/20) 4 77.7 /168.0 /503.7 -
8 115.1 /117.8 /346.8 -
Human detection 1 80.0 / 500.7 -
(320px/640px) 4 72.2 / 440.0 -
8 65.9 / 333.7 -
Head Pose By Landmarks 1 3.5 -
4 3.6 -
8 3.6 -
Eyes Gaze 1 25.7 -
4 72.1 -
8 34.8 -
Emotions 1 294.3 -
4 154.1 -
8 127.5 -
Attributes 1 1621.9 -
4 703.3 -
8 513.5 -
Quality 1 8.6 -
4 7.0 -
8 7.8 -
Head Pose By Image 1 3.1 -
4 2.4 -
8 2.3 -
Head Pose Batch 8 2.3 1
8 1.5 4
8 1.4 8
Warper 1 4.3 -
4 4.3 -
8 4.2 -
Eyes 1 6.4 -
4 4.3 -
8 4.2 -
Eyes Batch 8 4.2 1
8 3.5 4
8 3.4 8
Infra-Red 1 38.0 -
4 19.7 -
8 17.3 -
Infra-Red Batch 8 17.3 1
8 17.6 4
8 16.7 8
AGS 1 3.4 -
4 2.5 -
8 2.4 -
Overlap 1 55.2 -
4 29.0 -
8 24.3 -
Glasses 1 37.7 -
4 20.5 -
8 17.7 -
Smile 1 30.0 -
4 16.3 -
8 13.9 -
Eyes 1 12.9 -
4 8.6 -
8 8.3 -
Child 1 287.8 -
4 174.0 -
8 185.8 -
Child Batch 8 138.3 1
8 126.3 4
8 113.5 8
Best Shot Quality 1 6.6 -
4 5.0 -
8 4.8 -
Best Shot Quality Batch 8 4.8 1
8 3.3 4
8 3.0 8
MedicalMask estimator 1 16.0 -
4 6.0 -
8 24.5 -
MedicalMask estimator Batch 8 24.5 1
8 8.9 4
8 6.2 8

"Jetson ARM Performance. Extractor and matcher"

Measurement Model Threads Average Units
Extractor 46 1 2558.8 ms
46 4 1203.4
46 8 875.4
52 1 4380.3
52 4 1923.9
52 8 1464.1
Matcher 46, 52 - 2.0 M matches/sec
Extractor 54 1 2056.2 ms
54 4 1474.3
54 8 1521.9
56 1 2080.6
56 4 1502.4
56 8 1644.7
Matcher 54, 56 - 1.0 M matches/sec

Descriptor size#

Table below shows size of serialized descriptors to estimate memory requirements.

"Descriptor size"

Descriptor version Data size (bytes) Metadata size (bytes) Total size
CNN 46 256 8 264
CNN 52 256 8 264
CNN 54 512 8 520
CNN 56 512 8 520
CNN 57 512 8 520
CNN 58 512 8 520

Metadata includes signature and version information that may be omitted during serialization if the NoSignature flag is specified.

When estimating individual descriptor size in memory or serialization storage requirements with default options, consider using values from the "Total size" column.

When estimating memory requirements for descriptor batches, use values from the "Data size" column instead, since a descriptor batch does not duplicate metadata per descriptor and thus is more memory-efficient.

Note: these numbers are for approximate computation only, since they do not include overhead like memory alignment for accelerated SIMD processing and the like.

Feature matrix#

The table below shows FaceEngine features supported by different editions.

"Feature matrix"

Facility Module Complete Frontend
Core Yes Yes
Face detection & alignment Face detector Yes Yes
5-point face alignment Yes Yes
68-point face alignment Yes Yes
Parameter estimation Attribute estimation Yes Yes
Quality estimation Yes Yes
Color estimation Yes Yes
Eye estimation Yes Yes
Head pose estimation Yes Yes
Gaze estimation Yes Yes
Smile estimation Yes Yes
Emotions estimation Yes Yes
AGS estimation Yes Yes
Glasses estimation Yes Yes
Overlap estimation Yes Yes
Face descriptors Descriptor extraction Yes No
Descriptor matching Yes No
Descriptor batching Yes No
Descriptor search acceleration Yes No

See file "doc/FeatureMap.htm" for more details.

Back to top