Appendix A. Specifications#

Classification performance#

Classification performance was measured on a two datasets:

Cooperative dataset ( containing 20K images from various sources obtained at several banks);
Non cooperative dataset ( containing 20K ).

The two tables below contain true positive rates corresponding to select false positive rates.

"Classification performance @ low FPR on cooperative dataset"

FPR	TPR CNN 46	TPR CNN 52	TPR CNN 54	TPR CNN 56	TPR CNN 57	TPR CNN 58	TPR CNN 46m	TPR CNN 52m	TPR CNN 54m	TPR CNN 56m
10^-7^	0.9732	0.9835	0.9765	0.9907	0.9906	0.9910	0.9193	0.9594	0.9699	0.9652
10^-6^	0.9852	0.9897	0.9849	0.9914	0.9915	0.9916	0.9598	0.9803	0.9829	0.9814
10^-5^	0.9892	0.9908	0.9892	0.9916	0.9917	0.9918	0.9790	0.9880	0.9887	0.9886
10^-4^	0.9907	0.9915	0.9909	0.9917	0.9918	0.9919	0.9882	0.9908	0.9910	0.9910

"Classification performance @ low FPR on non cooperative dataset"

FPR	TPR CNN 46	TPR CNN 52	TPR CNN 54	TPR CNN 56	TPR CNN 57	TPR CNN 58	TPR CNN 46m	TPR CNN 52m	TPR CNN 54m	TPR CNN 56m
10^-7^	0.8616	0.9487	0.9638	0.9698	0.9723	0.9767	0.7404	0.8664	0.8813	0.8844
10^-6^	0.9073	0.9679	0.9773	0.9809	0.9817	0.9839	0.8208	0.9168	0.9233	0.9229
10^-5^	0.9446	0.9799	0.9852	0.9871	0.9873	0.9880	0.8919	0.9516	0.9538	0.9561
10^-4^	0.9704	0.9877	0.9896	0.9902	0.9905	0.9909	0.9435	0.9736	0.9752	0.9757

Runtime performance#

Server environment#

Benchmarking was performed on the following hardware configuration:

CPU Intel(R) Xeon(R) CPU E5-2687W v4 @ 3.00GHz;
RAM 128 GB DDR4 (Clock Speed: 2133 MHz);
GPU NVIDIA Tesla T4.

Face detection performance depends on input image parameters such as resolution and bit depth as well as the size of the detected face.

Input data characteristics:

Image resolution: 1200x1600px;
Image format: 24 BPP RGB;
Typical face size: ~260x260px.

Performance measurements are presented for both CPU and GPU execution modes in the tables below. Measured values are averages of at least 100 experiments.

CPU mode with AVX2#

"CPU mode performance"

Measurement	Average (ms)	CPU threads	BatchSize
Gaze estimator	1.3	8	-
	1.5	4	-
	1.6	1	-
Emotion estimator	8.9	8	-
	11.3	4	-
	22.1	1	-
Attributes estimator	24.1	8	-
	46.8	4	-
	63.5	1	-
Quality estimator	0.34	8	-
	0.4	4	-
	1.0	1	-
Overlap estimator	1.1	8	-
	1.8	4	-
	5.4	1	-
Glasses estimator	0.9	8	-
	1.0	4	-
	2.0	1	-
Eye estimator (useStatusPlan = false)	0.5	8	-
	0.5	4	-
	0.6	1	-
Eye estimator (useStatusPlan = true)	1.2	8	-
	1.0	4	-
	1.2	1	-
Eye estimator Batch	0.3	8	8
	0.3	8	4
	0.6	8	1
Smile estimator	0.75	8	-
	1.1	4	-
	2.5	1	-
HeadPose by Image	0.3	8	-
	0.3	8	-
	0.38	8	-
HeadPose by Image Batch	0.2	8	8
	0.3	8	4
	0.3	8	1
AGS estimator	0.4	8	-
	0.48	4	-
	0.49	1	-
Child estimator	7.8	8	-
	11.1	4	-
	17.0	1	-
Child estimator Batch	4.1	8	8
	6.1	8	4
	7.6	8	1
LivenessIR estimator	2.3	8	-
	3.2	4	-
	7.0	1	-
BestShotQuality estimator	0.7	8	-
	0.7	4	-
	0.8	1	-
BestShotQuality estimator Batch	0.6	8	8
	0.6	8	4
	0.7	8	1
MedicalMask estimator	10.3	8	-
	13.6	4	-
	21.3	1	-
MedicalMask estimator Batch	4.6	8	8
	6.2	8	4
	9.7	8	1
CredibilityCheck estimator	50.7	8	-
	91.7	4	-
	221.7	1	-
CredibilityCheck estimator Batch	51.8	8	8
	43.8	8	4
	42.1	8	1
FacialHair estimator	2.8	8	-
	4.3	4	-
	5.5	1	-
FacialHair estimator Batch	1.6	8	8
	1.7	8	4
	3.0	8	1
HeadPose by 68 landmarks	1.1	-	-
Warper	3.1	-	-
Face detection (FaceDetV1)	7.2 / 13.6 / 21.9	8	-
(Easy image/complex image/6 faces)	8.5 / 18.5 / 32.4	4	-
	20.0 / 35.0 / 72.4	1	-
Face detection (FaceDetV2)	7.2 / 5.9 / 13.4	8	-
(Easy image/complex image/6 faces)	8.2 / 6.6 / 15.7	4	-
	13.2 / 13.5 / 27.8	1	-
Face detection (FaceDetV3)	16.3 / 28.7 / 22.5	8	-
(minFaceSize = 90/40/20)	20.1 / 28.5 / 34.3	4	-
	18.2 / 20.1 / 46.8	1	-
Human detection	19.0 / 21.0	8	-
(bounding box only)	18.6 / 33.0	4	-
(imageSize = 320 / 640)	15.3 / 51.0	1	-
Human detection	30.77	8	-
(bounding box + keypoints)	36.47	4	-
(imageSize = 640)	69.42	1	-
Descriptor extractor (CNN 46)	38.3	8	-
Backend version	69.0	4	-
	233.0	1	-
Descriptor extractor (CNN 46)	5.4	8	-
Mobilenet version	7.9	4	-
	15.6	1	-
Descriptor extractor (CNN 52)	55.2	8	-
Backend version	100.8	4	-
	184.8	1	-
Descriptor extractor (CNN 52)	56.1	8	-
Mobilenet version	63.2	4	-
	72.3	1	-
Descriptor extractor (CNN 54, 56, 57)	73.2	8	-
Backend version	103.0	4	-
	203.3	1	-
Descriptor extractor (CNN 58)	50.0	8	-
Backend version	85.0	4	-
	215.0	1	-
Descriptor extractor (CNN 54, 56)	6.2	8	-
Mobilenet version	7.0	4	-
	14.9	1	-
Descriptor matching (CNN 46, 52)	46.7 M matches/sec	1	1
	57.9 M matches/sec	1	1000
Descriptor matching (CNN 54, 56, 57, 58)	34.6 M matches/sec	1	1
	41.4 M matches/sec	1	1000

Note 1: in experiments listed in the table above, face detection and descriptor extraction algorithms used all available CPU cores, whereas matching performance is specified per-core.

GPU mode#

"GPU mode performance"

Measurement	Average (ms)	Batch Size
Emotion estimator	1.9	-
Attributes estimator	6.3	-
Quality estimator	0.9	-
Overlap estimator	1.2	-
Glasses estimator	0.9	-
AGS estimator	0.5	-
Eye estimator (useStatusPlan = false)	0.6	1
	0.3	4
	0.3	8
	0.2	16
Eye estimator (useStatusPlan = true)	1.2	1
	0.6	4
	0.5	8
	0.5	16
Child estimator	2.6	1
	1.6	4
	1.5	8
	1.4	16
LivenessIR estimator	1.3	1
	1.0	4
	0.9	8
	0.9	16
Smile estimator	0.8	-
HeadPose by image	0.7	1
	0.2	4
	0.2	8
	0.1	16
MedicalMask	0.9	1
	0.3	4
	0.2	8
	0.2	16
CredibilityCheck	7.6	1
	5.2	4
	4.9	8
	4.7	16
FacialHair	1.3	1
	0.6	4
	0.5	8
	0.4	16
Face detection (FaceDetV1)	4.5 / 6.0 / 10.0	1
(Easy image/complex image/6 faces)
Face detection (FaceDetV2)	4.2 / 3.8 / 9.4	1
(Easy image/complex image/6 faces)
Face detection (FaceDetV3)	4.7 /12.2 / 39.8	1
(minFaceSize = 90/40/20)
Human detection	7.7	1
(bounding box only)
(imageSize = 640)
Human detection	13.9	1
(bounding box + keypoints)
(imageSize = 640)
Descriptor extractor	8.9/1.8	1
(CNN 46 backend/ CNN 46 mobilenet)	3.6/1.4	4
	2.9/1.3	8
	2.7/1.3	16
Descriptor extractor	18.2/2.8	1
(CNN 52 backend/ CNN 52 mobilenet)	9.7/2.0	4
	8.3/1.8	8
	7.5/1.8	16
Descriptor extractor	13.7/1.9	1
(CNN 54 backend/ CNN 54 mobilenet)	8.3/1.8	4
	7.8/1.4	8
	7.6/1.3	16
Descriptor extractor	13.9	1
(CNN 56 backend / CNN 56 mobilenet)	8.6	4
	7.9	8
	7.4	16
Descriptor extractor	13.9	1
(CNN 57 backend)	8.6	4
	7.9	8
	7.7	16
Descriptor extractor	16.0	1
(CNN 58 backend)	9.7	4
	8.6	8
	8.2	16

Note. Descriptor matching is only implemented on CPU.

Note. The number of CPU threads is set to 4, so FaceEngine can feed the GPU with commands and data in parallel where possible, further minimizing overhead.

Embedded environment#

Face detection performance depends on input image parameters such as resolution and bit depth as well as the size of the detected face.

Input data characteristics:

Image resolution: 640x480px;
Image format: 24 BPP RGB;
Typical face size: ~260x260px.

Jetson#

Jetson does not use mobilenet by default.

Performance measurements are presented for Jetson. Measured values are averages of at least 100 experiments. Mobilenet is not used by default.

"Jetson GPU Performance. Detection and estimation"

Type	Average (ms)	Batch Size
Detector (FaceDetV1)	11.9 / 22.3 / 10.5	1
(Easy/complex/6 faces)
		1
Detector (FaceDetV2)	11.36 / 76.8 / 85.3	1
(Easy/complex/6 faces)
		1
Detector (FaceDetV3)	30.0 / 50.0 /231.5	-
(minFaceSize=90/40/20)		-
		-
Human Detection	40.1 / 100.3	-
(320px/640px)		-
		-
Head Pose By Landmarks	3.6	1
Eyes Gaze	14.7	1
Emotions	13.7	1
Attributes	32.9	1
Quality	1.0	1
Head Pose By Image	1.5	1
Head Pose Batch	1.6	1
	0.8	4
	0.7	8
Warper	4.1	1
Eyes	1.8	1
Eyes Batch	1.8	1
	1.2	4
	1.0	8
Infra-Red	3.5	1
Infra-Red Batch	3.5	1
	2.9	4
	2.8	8
AGS	1.8	1
Overlap	3.6	1
Glasses	3.7	1
Smile	2.2	1
Eyes	3.3	1
Child	17.4	1
Child Batch	17.3	1
	12.9	4
	11.9	8
Best Shot Quality	3.5	1
Best Shot Quality Batch	3.5	1
	1.9	4
	2.2	8
MedicalMask estimator	7.5	-
MedicalMask estimator Batch	8.0	1
	4.0	4
	2.8	8

"Jetson GPU Performance. Extractor batch"

Type	Model	NumThreads	Average (ms)	Batch Size
Extractor Batch	46	8	71.1	1
	46	8	67.9	4
	46	8	55.8	8
	52	8	108.0	1
	52	8	75.6	4
	52	8	76.9	8

Extractor Batch	54	8	76.3	1
	54	8	63.0	4
	54	8	62.2	8
	56	8	75.6	1
	56	8	68.6	4
	56	8	64.5	8

"Jetson ARM Performance. Detection and estimation"

Type	Threads	Average (ms)	BatchSize
Detector (FaceDetV1)	1	73.5 / 275.6/130.2	-
(Easy/complex/6 faces)	4	43.9 / 134.9/ 73.0	-
	8	39.6 / 112.1/ 62.6	-
Detector (FaceDetV2)	1	38.3 / 141.2/ 32.6	-
(Easy/complex/6 faces)	4	24.8 / 78.8 / 22.6	-
	8	23.7 / 65.8 / 21.6	-
Detector (FaceDetV3)	1	138.0/101.4/ 472.8	-
(minFaceSize=90/40/20)	4	77.7 /168.0 /503.7	-
	8	115.1 /117.8 /346.8	-
Human detection	1	80.0 / 500.7	-
(320px/640px)	4	72.2 / 440.0	-
	8	65.9 / 333.7	-
Head Pose By Landmarks	1	3.5	-
	4	3.6	-
	8	3.6	-
Eyes Gaze	1	25.7	-
	4	72.1	-
	8	34.8	-
Emotions	1	294.3	-
	4	154.1	-
	8	127.5	-
Attributes	1	1621.9	-
	4	703.3	-
	8	513.5	-
Quality	1	8.6	-
	4	7.0	-
	8	7.8	-
Head Pose By Image	1	3.1	-
	4	2.4	-
	8	2.3	-
Head Pose Batch	8	2.3	1
	8	1.5	4
	8	1.4	8
Warper	1	4.3	-
	4	4.3	-
	8	4.2	-
Eyes	1	6.4	-
	4	4.3	-
	8	4.2	-
Eyes Batch	8	4.2	1
	8	3.5	4
	8	3.4	8
Infra-Red	1	38.0	-
	4	19.7	-
	8	17.3	-
Infra-Red Batch	8	17.3	1
	8	17.6	4
	8	16.7	8
AGS	1	3.4	-
	4	2.5	-
	8	2.4	-
Overlap	1	55.2	-
	4	29.0	-
	8	24.3	-
Glasses	1	37.7	-
	4	20.5	-
	8	17.7	-
Smile	1	30.0	-
	4	16.3	-
	8	13.9	-
Eyes	1	12.9	-
	4	8.6	-
	8	8.3	-
Child	1	287.8	-
	4	174.0	-
	8	185.8	-
Child Batch	8	138.3	1
	8	126.3	4
	8	113.5	8
Best Shot Quality	1	6.6	-
	4	5.0	-
	8	4.8	-
Best Shot Quality Batch	8	4.8	1
	8	3.3	4
	8	3.0	8
MedicalMask estimator	1	16.0	-
	4	6.0	-
	8	24.5	-
MedicalMask estimator Batch	8	24.5	1
	8	8.9	4
	8	6.2	8

"Jetson ARM Performance. Extractor and matcher"

Measurement	Model	Threads	Average	Units
Extractor	46	1	2558.8	ms
	46	4	1203.4
	46	8	875.4
	52	1	4380.3
	52	4	1923.9
	52	8	1464.1
Matcher	46, 52	-	2.0 M	matches/sec

Extractor	54	1	2056.2	ms
	54	4	1474.3
	54	8	1521.9
	56	1	2080.6
	56	4	1502.4
	56	8	1644.7
Matcher	54, 56	-	1.0 M	matches/sec

Descriptor size#

Table below shows size of serialized descriptors to estimate memory requirements.

"Descriptor size"

Descriptor version	Data size (bytes)	Metadata size (bytes)	Total size
CNN 46	256	8	264
CNN 52	256	8	264
CNN 54	512	8	520
CNN 56	512	8	520
CNN 57	512	8	520
CNN 58	512	8	520

Metadata includes signature and version information that may be omitted during serialization if the NoSignature flag is specified.

When estimating individual descriptor size in memory or serialization storage requirements with default options, consider using values from the "Total size" column.

When estimating memory requirements for descriptor batches, use values from the "Data size" column instead, since a descriptor batch does not duplicate metadata per descriptor and thus is more memory-efficient.

Note: these numbers are for approximate computation only, since they do not include overhead like memory alignment for accelerated SIMD processing and the like.

Feature matrix#

The table below shows FaceEngine features supported by different editions.

"Feature matrix"

Facility	Module	Complete	Frontend
	Core	Yes	Yes
Face detection & alignment	Face detector	Yes	Yes
	5-point face alignment	Yes	Yes
	68-point face alignment	Yes	Yes
Parameter estimation	Attribute estimation	Yes	Yes
	Quality estimation	Yes	Yes
	Color estimation	Yes	Yes
	Eye estimation	Yes	Yes
	Head pose estimation	Yes	Yes
	Gaze estimation	Yes	Yes
	Smile estimation	Yes	Yes
	Emotions estimation	Yes	Yes
	AGS estimation	Yes	Yes
	Glasses estimation	Yes	Yes
	Overlap estimation	Yes	Yes
Face descriptors	Descriptor extraction	Yes	No
	Descriptor matching	Yes	No
	Descriptor batching	Yes	No
	Descriptor search acceleration	Yes	No

See file "doc/FeatureMap.htm" for more details.