Skip to content

Appendix A. Specifications#

Classification performance#

Classification performance was measured on a two datasets:

  • Cooperative dataset ( containing 20K images from various sources obtained at several banks);
  • Non cooperative dataset ( containing 20K ).

The two tables below contain true positive rates corresponding to select false positive rates.

"Classification performance @ low FPR on cooperative dataset"

FPR TPR CNN 54 TPR CNN 56 TPR CNN 57 TPR CNN 54m TPR CNN 56m TPR CNN 58 TPR CNN 59
10^-7^ 0.9765 0.9907 0.9906 0.9699 0.9652 0.9910 0.9911
10^-6^ 0.9849 0.9914 0.9915 0.9829 0.9814 0.9916 0.9915
10^-5^ 0.9892 0.9916 0.9917 0.9887 0.9886 0.9918 0.9919
10^-4^ 0.9909 0.9917 0.9918 0.9910 0.9910 0.9919 0.9921

"Classification performance @ low FPR on non cooperative dataset"

FPR TPR CNN 54 TPR CNN 56 TPR CNN 57 TPR CNN 54m TPR CNN 56m TPR CNN 58 TPR CNN 59
10^-7^ 0.9638 0.9698 0.9723 0.8813 0.8844 0.9767 0.9832
10^-6^ 0.9773 0.9809 0.9817 0.9233 0.9229 0.9839 0.9880
10^-5^ 0.9852 0.9871 0.9873 0.9538 0.9561 0.9880 0.9908
10^-4^ 0.9896 0.9902 0.9905 0.9752 0.9757 0.9909 0.9924

Runtime performance#

Server environment#

Face detection performance depends on input image parameters such as resolution and bit depth as well as the size of the detected face.

Input data characteristics:

  • Image resolution: 1200x1600px;
  • Image format: 24 BPP RGB;
  • Typical face size: ~260x260px.

Performance measurements are presented for CPU, GPU and NPU execution modes in tables below. Measured values are averages of at least 100 experiments.

All batch measurements are performed with minFaceSize = 50.

The results for minimum batch size and optimal batch size are shown in the tables below. All the intermediate and non-optimal values are omitted.

Face detections are performed using FaceDetV3 NN.

CPU performance#

Benchmarking for CPU was performed on the server with the following hardware configuration:

CPU:

  • Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz;
  • CPU(s): 40
  • Thread(s) per core: 2
  • Core(s) per socket: 10
  • Socket(s): 2
  • NUMA node(s): 2
  • CPU with AVX2 instruction set was used

OS: CentOS Linux release 8.3.2011

RAM: 128 GB DDR4 (Clock Speed: 2133 MHz)

In experiments listed in tables below face detection and descriptor extraction algorithms used all available CPU cores, whereas matching performance is specified per-core.

Descriptor matching is only implemented on CPU.

"CPU mode performance for detection and estimations"

Measurement CPU threads BatchSize Average (ms)
Detector (minFaceSize=20) 1 - 355.1
Detector (minFaceSize=20) 8 - 154.3
Detector (minFaceSize=50) 1 - 57
Detector (minFaceSize=50) 8 - 24.5
Detector (minFaceSize=90) 1 - 22.1
Detector (minFaceSize=90) 8 - 12
RedetectBatch 8 1 5.7
RedetectBatch 8 4 6.1
RedetectBatch 8 8 3
HumanLandmarksDetector (resize to 640) 1 - 72.6
HumanLandmarksDetector (resize to 640) 8 - 36.2
HumanDetector (resize to 640) 1 - 39.8
HumanDetector (resize to 640) 8 - 18.8
HumanDetector Batch (resize to 640) 8 1 20.8
8 4 20.5
8 8 20.4
HumanDetector redetect Batch 8 1 1.3
8 4 1.3
8 8 1.1
HumanLandmarksDetector (resize to 320) 1 - 44.5
HumanLandmarksDetector (resize to 320) 8 - 25.1
HumanDetector (resize to 320) 1 - 6.5
HumanDetector (resize to 320) 8 - 6.5
HumanDetector Batch (resize to 320) 8 1 6.2
8 4 6.0
8 8 5.9
HeadPoseByLandmarks 1 - 1.57
HeadPoseByLandmarks 8 - 1.59
EyesGaze 1 - 2.5
EyesGaze 8 - 1.7
Emotions 1 - 13.8
Emotions 8 - 5.9
Attributes 1 - 64
Attributes 8 - 27.4
Quality 1 - 1.6
Quality 8 - 0.7
HeadPoseByImage 1 - 0.48
HeadPoseByImage 8 - 0.31
HeadPoseBatch 8 1 0.3
HeadPoseBatch 8 8 0.09
Warper 1 - 1.7
Warper 8 - 2.1
Eyes 1 - 0.8
Eyes 8 - 0.71
EyesBatch 8 1 0.6
EyesBatch 8 8 0.3
Infra-Red 1 - 2
Infra-Red 8 - 1
Infra-RedBatch 8 1 1.1
Infra-RedBatch 8 8 0.8
AGS 1 - 0.25
AGS 8 - 0.2
AGSBatch 8 1 0.21
AGSBatch 8 8 0.06
Overlap 1 - 4.5
Overlap 8 - 1.2
Glasses 1 - 1.7
Glasses 8 - 1
Eyes 1 - 1.7
Eyes 8 - 1.2
Child 1 - 20.2
Child 8 - 11.7
ChildBatch 8 1 11.8
ChildBatch 8 8 6.6
BestShotQuality 1 - 0.35
BestShotQuality 8 - 0.22
BestShotQualityBatch 8 1 0.24
BestShotQualityBatch 8 8 0.07
Mouth 1 - 1.7
Mouth 8 - 0.9
LivenessFlyingFaces 1 - 9.8
LivenessFlyingFaces 8 - 5.0
LivenessRGBMEstimator 1 - 20
LivenessRGBMEstimator 8 - 13.2
MedicalMask 1 - 1.1
MedicalMask 8 - 0.4
MedicalMaskBatch 8 1 0.52
MedicalMaskBatch 8 8 0.38
LivenessOneShotRGBEstimator 1 - 187.3
LivenessOneShotRGBEstimator 8 - 87.9
LivenessOneShotRGBEstimatorBatch 8 1 90.3
LivenessOneShotRGBEstimatorBatch 8 8 85.6
Orientation 1 - 19.6
Orientation 8 - 11.3
OrientationBatch 8 1 10.9
OrientationBatch 8 8 8.1
CredibilityCheck 1 - 111.0
CredibilityCheck 8 - 39.7
CredibilityCheckBatch 8 1 39.6
CredibilityCheckBatch 8 8 34.1
FacialHair 1 - 2.9
FacialHair 8 - 2.1
FacialHairBatch 8 1 2.1
FacialHairBatch 8 8 1.0

"Extractor performance"

Type Model CPU threads Average (ms)
Extractor 57 1 217
Extractor 57 8 79.9
Extractor 58 1 214.7
Extractor 58 8 80.3
Extractor 59 1 213.6
Extractor 59 8 68.3
Extractor 102 1 3.0
Extractor 102 8 3.0
Extractor 103 1 140.0
Extractor 103 8 50.0
Extractor 104 1 14.4
Extractor 104 8 6.5

The following table includes average matcher per second for descriptors received using the following CNN model versions:

  • face descriptors: 57, 58, 59
  • human body descriptors: 102, 103, 104

"Matcher performance"

Type Model CPU threads Batch Size Average (matches/sec)
Matcher 57, 58, 59 1 1000 42.2 M
Matcher 57, 58, 59 1 100 000 26.60 M
Matcher 102, 103, 104 1 1000 10.17 M
Matcher 102, 103, 104 1 100 000 5.48 M

GPU performance#

Benchmarking for GPU was performed on the following hardware configuration:

GPU: NVIDIA Tesla T4.

OS: CentOS Linux release 8.3.2011

"GPU mode performance for detection and estimations"

Measurement Batch Size Average (ms)
Detector (minFaceSize=90) - 13.6
Detector (minFaceSize=50) - 8.7
Detector (minFaceSize=20) - 62.3
DetectorBatch 1 10.1
DetectorBatch 8 7.5
RedetectBatch 1 3.9
RedetectBatch 32 0.2
HumanLandmarksDetector (resize to 640) - 35.7
HumanDetector (resize to 640) - 20.2
HumanLandmarksDetector (resize to 320) - 24.7
HumanDetector (resize to 320) - 9.3
Human redetection - 1.3
HeadPoseByLandmarks - 1.5
EyesGaze - 1.7
Emotions - 1.9
Attributes - 3.4
Quality - 0.9
HeadPoseByImage - 1.8
HeadPoseBatch 1 1.84
HeadPoseBatch 32 1.05
Warper - 1.9
Eyes - 0.8
EyesBatch 1 0.78
EyesBatch 16 0.2
Infra-Red - 1.2
Infra-RedBatch 1 1.14
Infra-RedBatch 32 0.53
AGS - 1.76
AGSBatch 1 1.76
AGSBatch 16 1.05
Overlap - 1.08
Glasses - 1.05
Eyes - 1.39
Child - 2.31
ChildBatch 1 2.36
ChildBatch 16 1.37
BestShotQuality - 1.86
BestShotQualityBatch 1 2.13
BestShotQualityBatch 16 1.1
Mouth - 1.05
LivenessFlyingFaces - 4.47
LivenessRGBMEstimator - 8.96
MedicalMask - 0.84
MedicalMaskBatch 1 0.8
MedicalMaskBatch 16 0.2
LivenessOneShotRGBEstimator - 43.2
LivenessOneShotRGBEstimatorBatch 1 42.1
Orientation - 5.56
OrientationBatch 1 5.6
OrientationBatch 16 3.71
CredibilityCheck - 6.1
CredibilityCheckBatch 1 6.1
CredibilityCheckBatch 16 4.2
FacialHair - 1.7
FacialHairBatch 1 1.7
FacialHairBatch 16 0.34

"Extractor performance"

Type Model Batch Size Average (ms)
Extractor 57 - 10.99
ExtractorBatch 57 1 11.06
ExtractorBatch 57 16 8.34
Extractor 58 - 11.1
ExtractorBatch 58 1 11.1
ExtractorBatch 58 16 8.35
Extractor 59 - 11.1
ExtractorBatch 59 1 11.1
ExtractorBatch 59 16 11.4
Extractor 102 - 3.5
ExtractorBatch 102 1 3.5
ExtractorBatch 102 16 1.1
Extractor 103 - 7.0
ExtractorBatch 103 1 7.0
ExtractorBatch 103 16 4.6
Extractor 104 - 3.2
ExtractorBatch 104 1 3.0
ExtractorBatch 104 16 1.2

Embedded environment#

Face detection performance depends on input image parameters such as resolution and bit depth as well as the size of the detected face.

Input data characteristics:

  • Image resolution: 640x480px;
  • Image format: 24 BPP RGB;
  • Typical face size: ~260x260px.

All batch measurements are performed with minFaceSize = 50.

The results for minimum batch size and optimal batch size are shown in the tables below. All the intermediate and non-optimal values are omitted.

Face detections are performed using FaceDetV3 NN.

Jetson#

 

Jetson does not use mobilenet by default.

Performance measurements are presented for Jetson. Measured values are averages of at least 100 experiments. Mobilenet is not used by default.

Jetson TX#

"Jetson TX GPU Performance. Detection and estimation"

Type Batch Size Average (ms)
Detector (minFaceSize=90) - 45.197
Detector (minFaceSize=50) - 105.26
Detector (minFaceSize=20) - 613.528
Redetect batch 1 18.3
Redetect batch 32 6.08
HumanLandmarksDetector (resize to 640) - 222.4
HumanDetector (resize to 640) - 66.9
HumanLandmarksDetector (resize to 320) - 170.7
HumanDetector (resize to 320) - 23.5
HeadPoseByLandmarks - 3.32
EyesGaze - 6
Emotions - 18.14
Attributes - 41.7
Quality - 3.93
HeadPoseByImage - 5.35
HeadPoseBatch 1 5.58
HeadPoseBatch 32 2.88
Warper - 4.41
Eyes - 3.83
EyesBatch 1 3.96
EyesBatch 32 1.77
Infra-Red - 4.91
Infra-RedBatch 1 4.75
AGS - 4.49
AGSBatch 1 4.6
AGSBatch 16 2.79
Overlap - 4.39
Glasses - 6
Eyes - 9
EyesBatch 1 9.31
EyesBatch 16 3.56
Child - 21.50
ChildBatch 1 21.51
ChildBatch 16 17.78
BestShotQuality - 5.76
BestShotQualityBatch 1 5.78
BestShotQualityBatch 16 2.86
Mouth - 4.98
LivenessFlyingFaces - 22.94
LivenessRGBMEstimator - 76.65
MedicalMask - 2.25
MedicalMaskBatch 1 2
MedicalMaskBatch 32 0.96
LivenessOneShotRGBEstimator - 250.0
LivenessOneShotRGBEstimatorBatch 1 248.3
Orientation - 34.9
OrientationBatch 1 34.9
CredibilityCheck - 61.0
CredibilityCheckBatch 1 61.2
CredibilityCheckBatch 8 67.4
CredibilityCheckBatch 16 62.8
CredibilityCheckBatch 32 61.9
FacialHair - 6.7
FacialHairBatch 1 6.6
FacialHairBatch 16 9.8

"Jetson TX GPU Performance. Extractor"

Type Model Batch Size Average (ms)
Extractor 57 - 133
Extractor Batch 57 1 132.5
57 8 82.41
Extractor 58 - 132.4
Extractor Batch 58 1 131.6
58 8 82.07
Extractor 59 - 119.5
Extractor Batch 59 1 119.6
59 8 94.6
Extractor 102 - 11.8
Extractor Batch 102 1 11.8
102 8 2.9
Extractor 103 - 54.0
Extractor Batch 103 1 54.0
103 8 44.1
Extractor 104 - 16.4
Extractor Batch 104 1 16.4
104 8 7.9

Jetson Xavier#

"Jetson Xavier GPU Performance. Detection and estimation"

Type Batch Size Average (ms)
Detector (minFaceSize=90) - 16.72
Detector (minFaceSize=50) - 27.4
Detector (minFaceSize=20) - 145.02
DetectorBatch 1 29
DetectorBatch 8 29.97
RedetectBatch 1 8.29
RedetectBatch 32 0.76
HumanLandmarksDetector (resize to 640) - 36.81
HumanDetector (resize to 640) - 14.51
HumanLandmarksDetector (resize to 320 ) - 39.26
HumanDetector (resize to 320 ) - 8.26
HeadPoseByLandmarks - 2.63
EyesGaze - 5.99
Emotions - 5.18
Attributes - 10.85
Quality - 5.52
HeadPoseByImage - 6.69
HeadPoseBatch 1 4.36
HeadPoseBatch 32 0.9
Warper - 3.56
Eyes - 1.28
EyesBatch 1 1.6
EyesBatch 32 0.6
Infra-Red - 3.37
Infra-RedBatch 1 3
Infra-RedBatch 32 1.54
AGS - 4.15
AGSBatch 1 3.47
AGSBatch 32 0.92
Overlap - 3.29
Glasses - 2.26
Eyes - 2
EyesBatch 1 1.83
EyesBatch 32 0.98
Child - 5.77
ChildBatch 1 5.49
ChildBatch 8 3.91
BestShotQuality - 7.26
BestShotQualityBatch 1 7.46
BestShotQualityBatch 32 0.88
Mouth - 2.26
LivenessFlyingFaces - 8.12
LivenessRGBMEstimator - 17.78
MedicalMask - 0.87
MedicalMaskBatch 1 1.02
MedicalMaskBatch 32 0.39
LivenessOneShotRGBEstimator - 77.04
LivenessOneShotRGBEstimatorBatch 1 77
LivenessOneShotRGBEstimatorBatch 8 76.47
Orientation - 8.71
OrientationBatch 1 8.69
OrientationBatch 32 7.2
CredibilityCheck - 18.80
CredibilityCheckBatch 1 18.85
CredibilityCheckBatch 8 21.85
CredibilityCheckBatch 16 21.55
CredibilityCheckBatch 32 20.15
FacialHair - 3.2
FacialHairBatch 1 3.2
FacialHairBatch 16 0.9

"Jetson Xavier GPU Performance. Extractor"

Type Model Batch Size Average (ms)
Extractor 57 - 36.99
Extractor Batch 57 1 36.46
Extractor Batch 57 4 32.58
Extractor 58 - 39.67
Extractor Batch 58 1 38.36
Extractor Batch 58 8 32.26
Extractor 59 - 36.31
Extractor Batch 59 1 35.48
Extractor Batch 59 8 33.86
Extractor 102 - 6.7
Extractor Batch 102 1 6.7
102 8 1.2
Extractor 103 - 17.4
Extractor Batch 103 1 17.4
103 8 13.7
Extractor 104 - 9.6
Extractor Batch 104 1 9.6
104 8 3.4

Jetson Xavier NX#

"Jetson Xavier NX GPU Performance. Detection and estimation"

Type Batch Size Average (ms)
Detector (minFaceSize=90) - 16.1
Detector (minFaceSize=50) - 39.9
Detector (minFaceSize=20) - 224.6
DetectorBatch 1 42.54
DetectorBatch 8 37.75
RedetectBatch 1 7.15
RedetectBatch 32 1.32
HumanLandmarksDetector (resize to 640) - 59.2
HumanDetector (resize to 640) - 26.3
HumanLandmarksDetector (resize to 320) - 38.5
HumanDetector (resize to 320) - 10.3
HeadPoseByLandmarks - 3.55
EyesGaze - 3.3
Emotions - 7.83
Attributes - 21.35
Quality - 2.2
HeadPoseByImage - 3.1
HeadPoseBatch 1 3.03
HeadPoseBatch 32 1.25
Warper - 5.11
Eyes - 1.53
EyesBatch 1 1.52
EyesBatch 32 0.5
Infra-Red - 2.86
Infra-RedBatch 1 2.85
Infra-RedBatch 32 1.8
AGS - 2.7
AGSBatch 1 2.77
AGSBatch 32 1.22
Overlap - 2.7
Glasses - 2.75
Eyes - 2.5
Eyes 1 2.63
Eyes 32 1.04
Child - 9.49
ChildBatch 1 9.49
ChildBatch 8 7.19
BestShotQuality - 3.78
BestShotQualityBatch 1 3.7
BestShotQualityBatch 32 1.27
Mouth - 1.94
LivenessFlyingFaces - 10.9
LivenessRGBMEstimator - 29.9
MedicalMask - 1.45
MedicalMaskBatch 1 1.46
MedicalMaskBatch 32 0.53
LivenessOneShotRGBEstimator - 126.6
LivenessOneShotRGBEstimatorBatch 1 123.4
Orientation - 14.89
OrientationBatch 1 14.96
OrientationBatch 32 12.97
CredibilityCheck - 40.70
CredibilityCheckBatch 1 40.75
CredibilityCheckBatch 8 50.64
CredibilityCheckBatch 16 49.76
CredibilityCheckBatch 32 46.55
FacialHair - 3.3
FacialHairBatch 1 3.3
FacialHairBatch 16 1.9

"Jetson Xavier NX GPU Performance. Extractor"

Type Model Batch Size Average (ms)
Extractor 57 - 78.6
Extractor Batch 57 1 78.2
Extractor Batch 57 16 68.2
Extractor 58 - 78.3
Extractor Batch 58 1 78.1
Extractor Batch 58 16 67.7
Extractor 59 - 77.7
Extractor Batch 59 1 77.7
Extractor Batch 59 16 78.2
Extractor 102 - 6.6
Extractor Batch 102 1 6.6
102 16 2.0
Extractor 103 - 34.2
Extractor Batch 103 1 34.2
103 16 29.4
Extractor 104 - 9.3
Extractor Batch 104 1 9.3
104 16 6.8

Descriptor size#

Table \ref{Tab.A.3.1} shows size of serialized face descriptors to estimate memory requirements.

"Descriptor size" \label{Tab.A.3.1}

Face descriptor version Data size (bytes) Metadata size (bytes) Total size
CNN 54 512 8 520
CNN 56 512 8 520
CNN 57 512 8 520
CNN 58 512 8 520
CNN 59 512 8 520

Table \ref{Tab.A.3.2} shows size of serialized human descriptors to estimate memory requirements. Human descriptors are used only for reidentification tasks/

"Human descriptor size (used only for reidentification tasks)" \label{Tab.A.3.2}

Human descriptor version Data size (bytes) Metadata size (bytes) Total size
CNN 102 2048 8 2056
CNN 103 2048 8 2056
CNN 104 2048 8 2056

Metadata includes signature and version information that may be omitted during serialization if the NoSignature flag is specified.

When estimating individual descriptor size in memory or serialization storage requirements with default options, consider using values from the "Total size" column.

When estimating memory requirements for descriptor batches, use values from the "Data size" column instead, since a descriptor batch does not duplicate metadata per descriptor and thus is more memory-efficient.

These numbers are for approximate computation only, since they do not include overhead like memory alignment for accelerated SIMD processing and the like.

Back to top