Appendix A. Specifications#

Classification performance#

Classification performance was measured on a two datasets:

Cooperative dataset ( containing 20K images from various sources obtained at several banks);
Non cooperative dataset ( containing 20K ).

The two tables below contain true positive rates corresponding to select false positive rates.

"Classification performance @ low FPR on cooperative dataset"

FPR	TPR CNN 54	TPR CNN 56	TPR CNN 57	TPR CNN 54m	TPR CNN 56m	TPR CNN 58	TPR CNN 59
10^-7^	0.9765	0.9907	0.9906	0.9699	0.9652	0.9910	0.9911
10^-6^	0.9849	0.9914	0.9915	0.9829	0.9814	0.9916	0.9915
10^-5^	0.9892	0.9916	0.9917	0.9887	0.9886	0.9918	0.9919
10^-4^	0.9909	0.9917	0.9918	0.9910	0.9910	0.9919	0.9921

"Classification performance @ low FPR on non cooperative dataset"

FPR	TPR CNN 54	TPR CNN 56	TPR CNN 57	TPR CNN 54m	TPR CNN 56m	TPR CNN 58	TPR CNN 59
10^-7^	0.9638	0.9698	0.9723	0.8813	0.8844	0.9767	0.9832
10^-6^	0.9773	0.9809	0.9817	0.9233	0.9229	0.9839	0.9880
10^-5^	0.9852	0.9871	0.9873	0.9538	0.9561	0.9880	0.9908
10^-4^	0.9896	0.9902	0.9905	0.9752	0.9757	0.9909	0.9924

Runtime performance#

Server environment#

Face detection performance depends on input image parameters such as resolution and bit depth as well as the size of the detected face.

Input data characteristics:

Image resolution: 1200x1600px;
Image format: 24 BPP RGB;
Typical face size: ~260x260px.

Performance measurements are presented for CPU, GPU and NPU execution modes in tables below. Measured values are averages of at least 100 experiments.

All batch measurements are performed with minFaceSize = 50.

The results for minimum batch size and optimal batch size are shown in the tables below. All the intermediate and non-optimal values are omitted.

Face detections are performed using FaceDetV3 NN.

CPU performance#

Benchmarking for CPU was performed on the server with the following hardware configuration:

CPU:

Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz;
CPU(s): 40
Thread(s) per core: 2
Core(s) per socket: 10
Socket(s): 2
NUMA node(s): 2
CPU with AVX2 instruction set was used

OS: CentOS Linux release 8.3.2011

RAM: 128 GB DDR4 (Clock Speed: 2133 MHz)

In experiments listed in tables below face detection and descriptor extraction algorithms used all available CPU cores, whereas matching performance is specified per-core.

Descriptor matching is only implemented on CPU.

"CPU mode performance for detection and estimations"

Measurement	CPU threads	BatchSize	Average (ms)
Detector (minFaceSize=20)	1	-	355.1
Detector (minFaceSize=20)	8	-	154.3
Detector (minFaceSize=50)	1	-	57
Detector (minFaceSize=50)	8	-	24.5
Detector (minFaceSize=90)	1	-	22.1
Detector (minFaceSize=90)	8	-	12
RedetectBatch	8	1	5.7
RedetectBatch	8	4	6.1
RedetectBatch	8	8	3
HumanLandmarksDetector (resize to 640)	1	-	72.6
HumanLandmarksDetector (resize to 640)	8	-	36.2
HumanDetector (resize to 640)	1	-	39.8
HumanDetector (resize to 640)	8	-	18.8
HumanDetector Batch (resize to 640)	8	1	20.8
	8	4	20.5
	8	8	20.4
HumanDetector redetect Batch	8	1	1.3
	8	4	1.3
	8	8	1.1
HumanLandmarksDetector (resize to 320)	1	-	44.5
HumanLandmarksDetector (resize to 320)	8	-	25.1
HumanDetector (resize to 320)	1	-	6.5
HumanDetector (resize to 320)	8	-	6.5
HumanDetector Batch (resize to 320)	8	1	6.2
	8	4	6.0
	8	8	5.9
HeadPoseByLandmarks	1	-	1.57
HeadPoseByLandmarks	8	-	1.59
EyesGaze	1	-	2.5
EyesGaze	8	-	1.7
Emotions	1	-	13.8
Emotions	8	-	5.9
Attributes	1	-	64
Attributes	8	-	27.4
Quality	1	-	1.6
Quality	8	-	0.7
HeadPoseByImage	1	-	0.48
HeadPoseByImage	8	-	0.31
HeadPoseBatch	8	1	0.3
HeadPoseBatch	8	8	0.09
Warper	1	-	1.7
Warper	8	-	2.1
Eyes	1	-	0.8
Eyes	8	-	0.71
EyesBatch	8	1	0.6
EyesBatch	8	8	0.3
Infra-Red	1	-	2
Infra-Red	8	-	1
Infra-RedBatch	8	1	1.1
Infra-RedBatch	8	8	0.8
AGS	1	-	0.25
AGS	8	-	0.2
AGSBatch	8	1	0.21
AGSBatch	8	8	0.06
Overlap	1	-	4.5
Overlap	8	-	1.2
Glasses	1	-	1.7
Glasses	8	-	1
Eyes	1	-	1.7
Eyes	8	-	1.2
Child	1	-	20.2
Child	8	-	11.7
ChildBatch	8	1	11.8
ChildBatch	8	8	6.6
BestShotQuality	1	-	0.35
BestShotQuality	8	-	0.22
BestShotQualityBatch	8	1	0.24
BestShotQualityBatch	8	8	0.07
Mouth	1	-	1.7
Mouth	8	-	0.9
LivenessFlyingFaces	1	-	9.8
LivenessFlyingFaces	8	-	5.0
LivenessRGBMEstimator	1	-	20
LivenessRGBMEstimator	8	-	13.2
MedicalMask	1	-	1.1
MedicalMask	8	-	0.4
MedicalMaskBatch	8	1	0.52
MedicalMaskBatch	8	8	0.38
LivenessOneShotRGBEstimator	1	-	187.3
LivenessOneShotRGBEstimator	8	-	87.9
LivenessOneShotRGBEstimatorBatch	8	1	90.3
LivenessOneShotRGBEstimatorBatch	8	8	85.6
Orientation	1	-	19.6
Orientation	8	-	11.3
OrientationBatch	8	1	10.9
OrientationBatch	8	8	8.1
CredibilityCheck	1	-	111.0
CredibilityCheck	8	-	39.7
CredibilityCheckBatch	8	1	39.6
CredibilityCheckBatch	8	8	34.1
FacialHair	1	-	2.9
FacialHair	8	-	2.1
FacialHairBatch	8	1	2.1
FacialHairBatch	8	8	1.0

"Extractor performance"

Type	Model	CPU threads	Average (ms)
Extractor	57	1	217
Extractor	57	8	79.9
Extractor	58	1	214.7
Extractor	58	8	80.3
Extractor	59	1	213.6
Extractor	59	8	68.3
Extractor	102	1	3.0
Extractor	102	8	3.0
Extractor	103	1	140.0
Extractor	103	8	50.0
Extractor	104	1	14.4
Extractor	104	8	6.5

The following table includes average matcher per second for descriptors received using the following CNN model versions:

face descriptors: 57, 58, 59
human body descriptors: 102, 103, 104

"Matcher performance"

Type	Model	CPU threads	Batch Size	Average (matches/sec)
Matcher	57, 58, 59	1	1000	42.2 M
Matcher	57, 58, 59	1	100 000	26.60 M
Matcher	102, 103, 104	1	1000	10.17 M
Matcher	102, 103, 104	1	100 000	5.48 M

GPU performance#

Benchmarking for GPU was performed on the following hardware configuration:

GPU: NVIDIA Tesla T4.

OS: CentOS Linux release 8.3.2011

"GPU mode performance for detection and estimations"

Measurement	Batch Size	Average (ms)
Detector (minFaceSize=90)	-	13.6
Detector (minFaceSize=50)	-	8.7
Detector (minFaceSize=20)	-	62.3
DetectorBatch	1	10.1
DetectorBatch	8	7.5
RedetectBatch	1	3.9
RedetectBatch	32	0.2
HumanLandmarksDetector (resize to 640)	-	35.7
HumanDetector (resize to 640)	-	20.2
HumanLandmarksDetector (resize to 320)	-	24.7
HumanDetector (resize to 320)	-	9.3
Human redetection	-	1.3
HeadPoseByLandmarks	-	1.5
EyesGaze	-	1.7
Emotions	-	1.9
Attributes	-	3.4
Quality	-	0.9
HeadPoseByImage	-	1.8
HeadPoseBatch	1	1.84
HeadPoseBatch	32	1.05
Warper	-	1.9
Eyes	-	0.8
EyesBatch	1	0.78
EyesBatch	16	0.2
Infra-Red	-	1.2
Infra-RedBatch	1	1.14
Infra-RedBatch	32	0.53
AGS	-	1.76
AGSBatch	1	1.76
AGSBatch	16	1.05
Overlap	-	1.08
Glasses	-	1.05
Eyes	-	1.39
Child	-	2.31
ChildBatch	1	2.36
ChildBatch	16	1.37
BestShotQuality	-	1.86
BestShotQualityBatch	1	2.13
BestShotQualityBatch	16	1.1
Mouth	-	1.05
LivenessFlyingFaces	-	4.47
LivenessRGBMEstimator	-	8.96
MedicalMask	-	0.84
MedicalMaskBatch	1	0.8
MedicalMaskBatch	16	0.2
LivenessOneShotRGBEstimator	-	43.2
LivenessOneShotRGBEstimatorBatch	1	42.1
Orientation	-	5.56
OrientationBatch	1	5.6
OrientationBatch	16	3.71
CredibilityCheck	-	6.1
CredibilityCheckBatch	1	6.1
CredibilityCheckBatch	16	4.2
FacialHair	-	1.7
FacialHairBatch	1	1.7
FacialHairBatch	16	0.34

"Extractor performance"

Type	Model	Batch Size	Average (ms)
Extractor	57	-	10.99
ExtractorBatch	57	1	11.06
ExtractorBatch	57	16	8.34
Extractor	58	-	11.1
ExtractorBatch	58	1	11.1
ExtractorBatch	58	16	8.35
Extractor	59	-	11.1
ExtractorBatch	59	1	11.1
ExtractorBatch	59	16	11.4
Extractor	102	-	3.5
ExtractorBatch	102	1	3.5
ExtractorBatch	102	16	1.1
Extractor	103	-	7.0
ExtractorBatch	103	1	7.0
ExtractorBatch	103	16	4.6
Extractor	104	-	3.2
ExtractorBatch	104	1	3.0
ExtractorBatch	104	16	1.2

Embedded environment#

Face detection performance depends on input image parameters such as resolution and bit depth as well as the size of the detected face.

Input data characteristics:

Image resolution: 640x480px;
Image format: 24 BPP RGB;
Typical face size: ~260x260px.

All batch measurements are performed with minFaceSize = 50.

The results for minimum batch size and optimal batch size are shown in the tables below. All the intermediate and non-optimal values are omitted.

Face detections are performed using FaceDetV3 NN.

Jetson#

Jetson does not use mobilenet by default.

Performance measurements are presented for Jetson. Measured values are averages of at least 100 experiments. Mobilenet is not used by default.

Jetson TX#

"Jetson TX GPU Performance. Detection and estimation"

Type	Batch Size	Average (ms)
Detector (minFaceSize=90)	-	45.197
Detector (minFaceSize=50)	-	105.26
Detector (minFaceSize=20)	-	613.528
Redetect batch	1	18.3
Redetect batch	32	6.08
HumanLandmarksDetector (resize to 640)	-	222.4
HumanDetector (resize to 640)	-	66.9
HumanLandmarksDetector (resize to 320)	-	170.7
HumanDetector (resize to 320)	-	23.5
HeadPoseByLandmarks	-	3.32
EyesGaze	-	6
Emotions	-	18.14
Attributes	-	41.7
Quality	-	3.93
HeadPoseByImage	-	5.35
HeadPoseBatch	1	5.58
HeadPoseBatch	32	2.88
Warper	-	4.41
Eyes	-	3.83
EyesBatch	1	3.96
EyesBatch	32	1.77
Infra-Red	-	4.91
Infra-RedBatch	1	4.75
AGS	-	4.49
AGSBatch	1	4.6
AGSBatch	16	2.79
Overlap	-	4.39
Glasses	-	6
Eyes	-	9
EyesBatch	1	9.31
EyesBatch	16	3.56
Child	-	21.50
ChildBatch	1	21.51
ChildBatch	16	17.78
BestShotQuality	-	5.76
BestShotQualityBatch	1	5.78
BestShotQualityBatch	16	2.86
Mouth	-	4.98
LivenessFlyingFaces	-	22.94
LivenessRGBMEstimator	-	76.65
MedicalMask	-	2.25
MedicalMaskBatch	1	2
MedicalMaskBatch	32	0.96
LivenessOneShotRGBEstimator	-	250.0
LivenessOneShotRGBEstimatorBatch	1	248.3
Orientation	-	34.9
OrientationBatch	1	34.9
CredibilityCheck	-	61.0
CredibilityCheckBatch	1	61.2
CredibilityCheckBatch	8	67.4
CredibilityCheckBatch	16	62.8
CredibilityCheckBatch	32	61.9
FacialHair	-	6.7
FacialHairBatch	1	6.6
FacialHairBatch	16	9.8

"Jetson TX GPU Performance. Extractor"

Type	Model	Batch Size	Average (ms)
Extractor	57	-	133
Extractor Batch	57	1	132.5
	57	8	82.41
Extractor	58	-	132.4
Extractor Batch	58	1	131.6
	58	8	82.07
Extractor	59	-	119.5
Extractor Batch	59	1	119.6
	59	8	94.6
Extractor	102	-	11.8
Extractor Batch	102	1	11.8
	102	8	2.9
Extractor	103	-	54.0
Extractor Batch	103	1	54.0
	103	8	44.1
Extractor	104	-	16.4
Extractor Batch	104	1	16.4
	104	8	7.9

Jetson Xavier#

"Jetson Xavier GPU Performance. Detection and estimation"

Type	Batch Size	Average (ms)
Detector (minFaceSize=90)	-	16.72
Detector (minFaceSize=50)	-	27.4
Detector (minFaceSize=20)	-	145.02
DetectorBatch	1	29
DetectorBatch	8	29.97
RedetectBatch	1	8.29
RedetectBatch	32	0.76
HumanLandmarksDetector (resize to 640)	-	36.81
HumanDetector (resize to 640)	-	14.51
HumanLandmarksDetector (resize to 320 )	-	39.26
HumanDetector (resize to 320 )	-	8.26
HeadPoseByLandmarks	-	2.63
EyesGaze	-	5.99
Emotions	-	5.18
Attributes	-	10.85
Quality	-	5.52
HeadPoseByImage	-	6.69
HeadPoseBatch	1	4.36
HeadPoseBatch	32	0.9
Warper	-	3.56
Eyes	-	1.28
EyesBatch	1	1.6
EyesBatch	32	0.6
Infra-Red	-	3.37
Infra-RedBatch	1	3
Infra-RedBatch	32	1.54
AGS	-	4.15
AGSBatch	1	3.47
AGSBatch	32	0.92
Overlap	-	3.29
Glasses	-	2.26
Eyes	-	2
EyesBatch	1	1.83
EyesBatch	32	0.98
Child	-	5.77
ChildBatch	1	5.49
ChildBatch	8	3.91
BestShotQuality	-	7.26
BestShotQualityBatch	1	7.46
BestShotQualityBatch	32	0.88
Mouth	-	2.26
LivenessFlyingFaces	-	8.12
LivenessRGBMEstimator	-	17.78
MedicalMask	-	0.87
MedicalMaskBatch	1	1.02
MedicalMaskBatch	32	0.39
LivenessOneShotRGBEstimator	-	77.04
LivenessOneShotRGBEstimatorBatch	1	77
LivenessOneShotRGBEstimatorBatch	8	76.47
Orientation	-	8.71
OrientationBatch	1	8.69
OrientationBatch	32	7.2
CredibilityCheck	-	18.80
CredibilityCheckBatch	1	18.85
CredibilityCheckBatch	8	21.85
CredibilityCheckBatch	16	21.55
CredibilityCheckBatch	32	20.15
FacialHair	-	3.2
FacialHairBatch	1	3.2
FacialHairBatch	16	0.9

"Jetson Xavier GPU Performance. Extractor"

Type	Model	Batch Size	Average (ms)
Extractor	57	-	36.99
Extractor Batch	57	1	36.46
Extractor Batch	57	4	32.58
Extractor	58	-	39.67
Extractor Batch	58	1	38.36
Extractor Batch	58	8	32.26
Extractor	59	-	36.31
Extractor Batch	59	1	35.48
Extractor Batch	59	8	33.86
Extractor	102	-	6.7
Extractor Batch	102	1	6.7
	102	8	1.2
Extractor	103	-	17.4
Extractor Batch	103	1	17.4
	103	8	13.7
Extractor	104	-	9.6
Extractor Batch	104	1	9.6
	104	8	3.4

Jetson Xavier NX#

"Jetson Xavier NX GPU Performance. Detection and estimation"

Type	Batch Size	Average (ms)
Detector (minFaceSize=90)	-	16.1
Detector (minFaceSize=50)	-	39.9
Detector (minFaceSize=20)	-	224.6
DetectorBatch	1	42.54
DetectorBatch	8	37.75
RedetectBatch	1	7.15
RedetectBatch	32	1.32
HumanLandmarksDetector (resize to 640)	-	59.2
HumanDetector (resize to 640)	-	26.3
HumanLandmarksDetector (resize to 320)	-	38.5
HumanDetector (resize to 320)	-	10.3
HeadPoseByLandmarks	-	3.55
EyesGaze	-	3.3
Emotions	-	7.83
Attributes	-	21.35
Quality	-	2.2
HeadPoseByImage	-	3.1
HeadPoseBatch	1	3.03
HeadPoseBatch	32	1.25
Warper	-	5.11
Eyes	-	1.53
EyesBatch	1	1.52
EyesBatch	32	0.5
Infra-Red	-	2.86
Infra-RedBatch	1	2.85
Infra-RedBatch	32	1.8
AGS	-	2.7
AGSBatch	1	2.77
AGSBatch	32	1.22
Overlap	-	2.7
Glasses	-	2.75
Eyes	-	2.5
Eyes	1	2.63
Eyes	32	1.04
Child	-	9.49
ChildBatch	1	9.49
ChildBatch	8	7.19
BestShotQuality	-	3.78
BestShotQualityBatch	1	3.7
BestShotQualityBatch	32	1.27
Mouth	-	1.94
LivenessFlyingFaces	-	10.9
LivenessRGBMEstimator	-	29.9
MedicalMask	-	1.45
MedicalMaskBatch	1	1.46
MedicalMaskBatch	32	0.53
LivenessOneShotRGBEstimator	-	126.6
LivenessOneShotRGBEstimatorBatch	1	123.4
Orientation	-	14.89
OrientationBatch	1	14.96
OrientationBatch	32	12.97
CredibilityCheck	-	40.70
CredibilityCheckBatch	1	40.75
CredibilityCheckBatch	8	50.64
CredibilityCheckBatch	16	49.76
CredibilityCheckBatch	32	46.55
FacialHair	-	3.3
FacialHairBatch	1	3.3
FacialHairBatch	16	1.9

"Jetson Xavier NX GPU Performance. Extractor"

Type	Model	Batch Size	Average (ms)
Extractor	57	-	78.6
Extractor Batch	57	1	78.2
Extractor Batch	57	16	68.2
Extractor	58	-	78.3
Extractor Batch	58	1	78.1
Extractor Batch	58	16	67.7
Extractor	59	-	77.7
Extractor Batch	59	1	77.7
Extractor Batch	59	16	78.2
Extractor	102	-	6.6
Extractor Batch	102	1	6.6
	102	16	2.0
Extractor	103	-	34.2
Extractor Batch	103	1	34.2
	103	16	29.4
Extractor	104	-	9.3
Extractor Batch	104	1	9.3
	104	16	6.8

Descriptor size#

Table \ref{Tab.A.3.1} shows size of serialized face descriptors to estimate memory requirements.

"Descriptor size" \label{Tab.A.3.1}

Face descriptor version	Data size (bytes)	Metadata size (bytes)	Total size
CNN 54	512	8	520
CNN 56	512	8	520
CNN 57	512	8	520
CNN 58	512	8	520
CNN 59	512	8	520

Table \ref{Tab.A.3.2} shows size of serialized human descriptors to estimate memory requirements. Human descriptors are used only for reidentification tasks/

"Human descriptor size (used only for reidentification tasks)" \label{Tab.A.3.2}

Human descriptor version	Data size (bytes)	Metadata size (bytes)	Total size
CNN 102	2048	8	2056
CNN 103	2048	8	2056
CNN 104	2048	8	2056

Metadata includes signature and version information that may be omitted during serialization if the NoSignature flag is specified.

When estimating individual descriptor size in memory or serialization storage requirements with default options, consider using values from the "Total size" column.

When estimating memory requirements for descriptor batches, use values from the "Data size" column instead, since a descriptor batch does not duplicate metadata per descriptor and thus is more memory-efficient.

These numbers are for approximate computation only, since they do not include overhead like memory alignment for accelerated SIMD processing and the like.