Skip to content

Appendix A. Specifications#

Classification performance#

Classification performance was measured on a two datasets:

  • Cooperative dataset ( containing 20K images from various sources obtained at several banks);
  • Non cooperative dataset ( containing 20K ).

The two tables below contain true positive rates corresponding to select false positive rates.

"Classification performance @ low FPR on cooperative dataset"

FPR TPR CNN 54 TPR CNN 56 TPR CNN 57 TPR CNN 58 TPR CNN 59 TPR CNN 54m TPR CNN 56m TPR CNN 59m
10^-7^ 0.9765 0.9907 0.9906 0.9910 0.9911 0.9699 0.9652 0.9876
10^-6^ 0.9849 0.9914 0.9915 0.9916 0.9915 0.9829 0.9814 0.9904
10^-5^ 0.9892 0.9916 0.9917 0.9918 0.9919 0.9887 0.9886 0.9915
10^-4^ 0.9909 0.9917 0.9918 0.9919 0.9921 0.9910 0.9910 0.9919

"Classification performance @ low FPR on non cooperative dataset"

FPR TPR CNN 54 TPR CNN 56 TPR CNN 57 TPR CNN 58 TPR CNN 59 TPR CNN 54m TPR CNN 56m TPR CNN 59m
10^-7^ 0.9638 0.9698 0.9723 0.9767 0.9832 0.8813 0.8844 0.9377
10^-6^ 0.9773 0.9809 0.9817 0.9839 0.9880 0.9233 0.9229 0.9629
10^-5^ 0.9852 0.9871 0.9873 0.9880 0.9908 0.9538 0.9561 0.9794
10^-4^ 0.9896 0.9902 0.9905 0.9909 0.9924 0.9752 0.9757 0.9880

Runtime performance for CentOS Linux environment#

Face detection performance depends on input image parameters such as resolution and bit depth as well as the size of the detected face.

Input data characteristics:

  • Image resolution: 1920x1080px;
  • Image format: 24 BPP RGB;

Performance measurements are presented for CPU, GPU and NPU execution modes in tables below. Measured values are averages of at least 100 experiments.

Estimated values of memory consumption are also presented for CPU and GPU. These values are highly depend on the input data and the conditions of the experiment.

The results for minimum batch size and optimal batch size are shown in the tables below. All the intermediate and non-optimal values are omitted.

Face detections are performed using FaceDetV3 NN.

All types of face detection and redetect performed with capturing bounding boxes and 5 facial landmarks.

CPU performance#

Benchmarking for CPU was performed on the server with the following hardware configuration:

CPU:

  • Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz;
  • CPU(s): 40
  • Thread(s) per core: 2
  • Core(s) per socket: 10
  • Socket(s): 2
  • NUMA node(s): 2
  • CPU with AVX2 instruction set was used

OS: CentOS Linux release 8.3.2011

RAM: 128 GB DDR4 (Clock Speed: 2133 MHz)

In experiments listed in tables below face detection and descriptor extraction algorithms used all available CPU cores, whereas matching performance is specified per-core.

Descriptor matching is only implemented on CPU.

CPU. Detector performance#

The table below shows the performance of Detector on the CPU.

Measurement CPU threads BatchSize Average (ms) RAM Memory (Mb)
Detector (minFaceSize=20) 1 1 358.3 1757.0
Detector (minFaceSize=20) 8 1 169.6 2071.0
Detector (minFaceSize=20) 8 4 166.4 3711.0
Detector (minFaceSize=20) 8 8 169.2 6102.0
Detector (minFaceSize=50) 1 1 55.8 1256.0
Detector (minFaceSize=50) 8 1 27.1 1435.0
Detector (minFaceSize=50) 8 4 25.1 1719.0
Detector (minFaceSize=50) 8 8 26.5 2151.0
Detector (minFaceSize=90) 1 1 18.9 1229.0
Detector (minFaceSize=90) 8 1 12.3 1397.0
Detector (minFaceSize=90) 8 4 8.5 1505.0
Detector (minFaceSize=90) 8 8 9.2 1695.0
Redetect 1 1 4.05 1245.0
Redetect 8 1 2.99 1280.0
Redetect 8 4 1.5 1597.0
Redetect 8 8 1.34 1968.0

CPU. HumanDetector performance#

The table below shows the performance of HumanDetector on the CPU.

Measurement CPU threads BatchSize Average (ms) RAM Memory (Mb)
HumanDetector (imageSize=320) 1 1 11.97 1191.0
HumanDetector (imageSize=320) 8 1 6.86 1395.0
HumanDetector (imageSize=320) 8 4 4.1 1490.0
HumanDetector (imageSize=320) 8 8 4.1 1642.0
HumanDetector (imageSize=640) 1 1 40.93 1228.0
HumanDetector (imageSize=640) 8 1 17.41 1478.0
HumanDetector (imageSize=640) 8 4 14.18 1691.0
HumanDetector (imageSize=640) 8 8 15.44 1886.0
HumanLandmarksDetector (imageSize=320) 1 1 44.96 1236.0
HumanLandmarksDetector (imageSize=320) 8 1 20.37 1443.0
HumanLandmarksDetector (imageSize=320) 8 4 12.82 1646.0
HumanLandmarksDetector (imageSize=320) 8 8 12.31 1852.0
HumanLandmarksDetector (imageSize=640) 1 1 73.01 1247.0
HumanLandmarksDetector (imageSize=640) 8 1 31.16 1473.0
HumanLandmarksDetector (imageSize=640) 8 4 22.93 1774.0
HumanLandmarksDetector (imageSize=640) 8 8 24.01 2010.0
HumanRedetect 1 1 2.61 1239.0
HumanRedetect 8 1 2.76 1545.0
HumanRedetect 8 4 1.24 1770.0
HumanRedetect 8 8 1.26 1987.0

CPU. Estimations performance with batch interface#

The table below shows the performance of Estimations on the CPU for estimators that have a batch interface. All these measurements are performed with minFaceSize=50.

Measurement CPU threads BatchSize Average (ms) RAM Memory (Mb)
Eyes (INFRA_RED, useStatusPlan=0) 1 1 0.6 1184.0
Eyes (INFRA_RED, useStatusPlan=0) 8 1 0.4 1204.0
Eyes (INFRA_RED, useStatusPlan=0) 8 8 0.3 1202.0
Eyes (RGB, useStatusPlan=0) 1 1 1.2 1237.0
Eyes (RGB, useStatusPlan=0) 8 1 0.8 1259.0
Eyes (RGB, useStatusPlan=0) 8 8 0.5 1258.0
Eyes (INFRA_RED, useStatusPlan=1) 1 1 0.6 1187.0
Eyes (INFRA_RED, useStatusPlan=1) 8 1 0.4 1207.0
Eyes (INFRA_RED, useStatusPlan=1) 8 8 0.3 1205.0
Eyes (RGB, useStatusPlan=1) 1 1 1.1 1241.0
Eyes (RGB, useStatusPlan=1) 8 1 0.8 1257.0
Eyes (RGB, useStatusPlan=1) 8 8 0.5 1255.0
Infra-Red 1 1 2 1191.0
Infra-Red 8 1 1.0 1209.0
Infra-Red 8 8 0.7 1218.0
AGS 1 1 0.3 1242.0
AGS 8 1 0.2 1259.0
AGS 8 8 0.07 1303.0
HeadPoseByImage 1 1 0.3 1188.0
HeadPoseByImage 8 1 0.3 1220.0
HeadPoseByImage 8 8 0.09 1252.0
Warper 1 1 2.1 1180.0
Warper 8 1 2.2 1219.0
Warper 8 8 0.9 1230.0
Child 1 1 18.7 1263.0
Child 8 1 6.3 1281.0
Child 8 8 5.2 1297.0
BlackWhite 1 1 1.3 1249.0
BlackWhite 8 1 0.7 1265.0
BlackWhite 8 8 1.2 1263.0
BestShotQuality 1 1 0.3 1238.0
BestShotQuality 8 1 0.2 1259.0
BestShotQuality 8 8 0.08 1299.0
MedicalMask 1 1 5.6 1258.0
MedicalMask 8 1 3.2 1287.0
MedicalMask 8 8 2.8 1318.0
LivenessOneShotRGBEstimator 1 1 214.6 1359.0
LivenessOneShotRGBEstimator 8 1 58.7 1428.0
LivenessOneShotRGBEstimator 8 8 78.8 2536.0
Orientation 1 1 20.8 1166.0
Orientation 8 1 10.1 1250.0
Orientation 8 8 8.9 1406.0
CredibilityCheck 1 1 120.3 1332.0
CredibilityCheck 8 1 35.1 1351.0
CredibilityCheck 8 8 34.1 1558.0
FacialHair 1 1 2.7 1249.0
FacialHair 8 1 1.9 1264.0
FacialHair 8 8 0.99 1266.0
PortraitStyle 1 1 1.0 1243.0
PortraitStyle 8 1 1.2 1260.0
PortraitStyle 8 8 1.7 1309.0
Background 1 1 1.1 1239.0
Background 8 1 1.2 1258.0
Background 8 8 1.7 1305.0
NaturalLight 1 1 2.37 1250.0
NaturalLight 8 1 1.49 1267.0
NaturalLight 8 8 1.97 1276.0
FishEye 1 1 2.77 1257.0
FishEye 8 1 2.08 1276.0
FishEye 8 8 5.86 1271.0
RedEye 1 1 5.7 1241.0
RedEye 8 1 1.9 1260.0
RedEye 8 8 1.6 1264.0
HeadWear 1 1 2.22 1249.0
HeadWear 8 1 1.51 1262.0
HeadWear 8 8 1.96 1276.0
EyeBrowEstimator 1 1 13.82 1257.0
EyeBrowEstimator 8 1 4.77 1273.0
EyeBrowEstimator 8 8 3.05 1273.0
HumanAttributeEstimator 1 1 14.91 1236.0
HumanAttributeEstimator 8 1 7.23 1247.0
HumanAttributeEstimator 8 8 4.46 1290.0
Mouth 1 1 6.64 1252.0
Mouth 8 1 2.64 1271.0
Mouth 8 8 2.12 1290.0

CPU. Estimations performance without batch interface#

The table below shows the performance of Estimations on the CPU for estimators that do not have a batch interface. All these measurements are performed with minFaceSize=50.

Measurement CPU threads Average (ms) RAM Memory (Mb)
EyesGaze 1 2.2 1250.0
EyesGaze 8 1.4 1270.0
Emotions 1 13.6 1262.0
Emotions 8 4.9 1275.0
Attributes 1 63.3 1265.0
Attributes 8 19.8 1291.0
Quality 1 1.2 1178.0
Quality 8 0.6 1220.0
Overlap 1 4.5 1248.0
Overlap 8 1.3 1267.0
Glasses 1 1.8 1240.0
Glasses 8 0.8 1264.0
PPE 1 8.9 1273.0
PPE 8 4.9 1283.0
LivenessFlyingFaces 1 9.2 1278.0
LivenessFlyingFaces 8 5.0 1530.0
LivenessRGBMEstimator 1 30.6 1245.0
LivenessRGBMEstimator 8 9.7 1265.0
LivenessFPR 1 44.2 1263.0
LivenessFPR 8 19.9 1293.0

CPU. Extractor performance#

The table below shows the performance of Extractor on the CPU.

Model CPU threads Average (ms) RAM Memory (Mb)
57 1 221.2 1475.0
57 8 58.3 1551.0
58 1 219.3 1470.0
58 8 58.0 1543.0
59 1 219.7 1473.0
59 8 58.2 1550.0
102 1 1.8 1149.0
102 8 2.1 1190.0
103 1 142.2 1459.0
103 8 50.6 1494.0
104 1 12.6 1186.0
104 8 6.2 1243.0

CPU. Matcher performance#

The table below shows the performance of Matcher on the CPU. The table includes average matcher per second for descriptors received using the following CNN model versions:

  • face descriptors: 57, 58, 59
  • human body descriptors: 102, 103, 104
Model CPU threads Batch Size Average (matches/sec) RAM Memory (Mb)
57 1 1000 42.2 M 20.0
58 1 1000 42.2 M 18.0
59 1 1000 42.2 M 15.0
102 1 1000 10.17 M 16.0
103 1 1000 10.17 M 16.0
104 1 1000 10.17 M 20.0

Note: The above value is the maximum performance of the matcher on a particular piece of hardware. Performance in general does not depend on the size of the batch, but may be limited by memory performance at large values of the batch size.

GPU performance#

Benchmarking for GPU was performed on the following hardware configuration:

GPU: NVIDIA Tesla T4.

OS: CentOS Linux release 8.3.2011

GPU. Detector performance#

The table below shows the performance of Detector on the GPU.

Measurement Batch Size Average (ms) GPU Memory (Mb) RAM Memory (Mb)
Detector (minFaceSize=20) 1 31.8 1293.0 1683.0
Detector (minFaceSize=20) 4 35.0 2941.0 1733.0
Detector (minFaceSize=20) 8 38.9 5235.0 1785.0
Detector (minFaceSize=50) 1 7.9 821.0 1675.0
Detector (minFaceSize=50) 4 6.9 1103.0 1706.0
Detector (minFaceSize=50) 8 6.6 1571.0 1730.0
Detector (minFaceSize=90) 1 5.2 811.0 1675.0
Detector (minFaceSize=90) 4 3.8 873.0 1698.0
Detector (minFaceSize=90) 8 3.4 1011.0 1720.0
Redetect 1 3.45 821.0 1682.0
Redetect 4 1.91 1103.0 1672.0
Redetect 8 1.64 1571.0 1693.0
Redetect 16 1.51 2369.0 1728.0

GPU. HumanDetector performance#

The table below shows the performance of HumanDetector on the GPU.

Measurement Batch Size Average (ms) GPU Memory (Mb) RAM Memory (Mb)
HumanDetector (imageSize=320) 1 3.96 1251.0 1683.0
HumanDetector (imageSize=320) 4 2.5 815.0 1696.0
HumanDetector (imageSize=320) 8 2.2 895.0 1741.0
HumanDetector (imageSize=640) 1 5.22 789.0 1686.0
HumanDetector (imageSize=640) 4 4.61 1013.0 1704.0
HumanDetector (imageSize=640) 8 4.43 1251.0 1729.0
HumanLandmarksDetector (imageSize=320) 1 15.34 833.0 1787.0
HumanLandmarksDetector (imageSize=320) 4 7.02 991.0 1792.0
HumanLandmarksDetector (imageSize=320) 8 5.65 1215.0 1777.0
HumanLandmarksDetector (imageSize=640) 1 16.52 821.0 1802.0
HumanLandmarksDetector (imageSize=640) 4 9.13 1013.0 1778.0
HumanLandmarksDetector (imageSize=640) 8 7.79 1283.0 1801.0
HumanRedetect 1 2.74 789.0 1696.0
HumanRedetect 4 1.67 1013.0 1695.0
HumanRedetect 8 1.47 1251.0 1689.0
HumanRedetect 16 1.4 1867.0 1709.0

GPU. Estimations performance with batch interface#

The table below shows the performance of Estimations on the GPU for estimators that have a batch interface. All these measurements are performed with minFaceSize=50.

Measurement Batch Size Average (ms) GPU Memory (Mb) RAM Memory (Mb)
HeadPoseByImage 1 2.32 733.0 1679.0
HeadPoseByImage 32 1.43 923.0 1864.0
Warper 1 0.11 739.0 1672.0
Warper 32 0.03 931.0 1672.0
Eyes (INFRA_RED, useStatusPlan=0) 1 0.65 811.0 1668.0
Eyes (INFRA_RED, useStatusPlan=0) 16 0.23 811.0 1667.0
Eyes (INFRA_RED, useStatusPlan=0) 32 0.2 811.0 1674.0
Eyes (RGB, useStatusPlan=0) 1 1.19 821.0 1681.0
Eyes (RGB, useStatusPlan=0) 16 0.44 821.0 1669.0
Eyes (RGB, useStatusPlan=0) 32 0.43 853.0 1683.0
Eyes (INFRA_RED, useStatusPlan=1) 1 0.64 811.0 1666.0
Eyes (INFRA_RED, useStatusPlan=1) 16 0.23 811.0 1678.0
Eyes (INFRA_RED, useStatusPlan=1) 32 0.2 811.0 1672.0
Eyes (RGB, useStatusPlan=1) 1 0.66 821.0 1671.0
Eyes (RGB, useStatusPlan=1) 16 0.24 821.0 1673.0
Eyes (RGB, useStatusPlan=1) 32 0.23 853.0 1680.0
Infra-Red 1 1.11 811.0 1666.0
Infra-Red 32 0.54 811.0 1679.0
AGS 1 2.2 821.0 1676.0
AGS 16 1.46 917.0 1764.0
Child 1 2.66 853.0 1694.0
Child 16 1.11 963.0 1697.0
BlackWhite 1 1.05 821.0 1676.0
BlackWhite 16 0.4 853.0 1677.0
BestShotQuality 1 2.31 821.0 1677.0
BestShotQuality 16 1.45 917.0 1765.0
MedicalMask 1 5.01 821.0 1702.0
MedicalMask 16 1.69 917.0 1791.0
LivenessOneShotRGBEstimator 1 20.41 1138.0 1800.0
LivenessOneShotRGBEstimator 16 17.48 3628.0 1801.0
Orientation 1 3.56 751.0 1668.0
Orientation 16 2.92 1169.0 1670.0
CredibilityCheck 1 5.54 947.0 1774.0
CredibilityCheck 16 3.72 1339.0 1771.0
FacialHair 1 1.59 853.0 1680.0
FacialHair 16 0.33 853.0 1686.0
PortraitStyle 1 2.5 821.0 1672.0
PortraitStyle 16 1.5 917.0 1770.0
Background 1 2.6 821.0 1679.0
Background 16 1.5 917.0 1770.0
NaturalLight 1 3.61 853.0 1692.0
NaturalLight 16 0.27 853.0 1695.0
FishEye 1 2.91 821.0 1684.0
FishEye 16 1.51 821.0 1692.0
RedEye 1 1.1 821.0 1675.0
RedEye 16 0.15 821.0 1675.0
HeadWear 1 3.65 853.0 1684.0
HeadWear 16 0.26 853.0 1682.0
EyeBrowEstimator 1 1.8 821.0 1696.0
EyeBrowEstimator 16 0.88 821.0 1704.0
EyeBrowEstimator 32 0.84 821.0 1696.0
HumanAttributeEstimator 1 5.05 853.0 1727.0
HumanAttributeEstimator 16 0.64 853.0 1722.0
Mouth 1 4.03 853.0 1690.0
Mouth 16 0.42 949.0 1691.0
Mouth 32 0.37 1043.0 1690.0

GPU. Estimations performance without batch interface#

The table below shows the performance of Estimations on the GPU for estimators that do not have a batch interface. All these measurements are performed with minFaceSize=50.

Measurement Average (ms) GPU Memory (Mb) RAM Memory (Mb)
EyesGaze 1.65 821.0 1675.0
Emotions 1.99 821.0 1689.0
Attributes 4.95 815.0 1717.0
Quality 0.98 731.0 1665.0
Overlap 1.23 821.0 1688.0
PPE 2.62 809.0 1709.0
Glasses 1.01 821.0 1679.0
LivenessFlyingFaces 5.78 853.0 1693.0
LivenessRGBMEstimator 6.96 821.0 1674.0
LivenessFPR 12.56 885.0 1697.0

GPU. Extractor performance#

The table below shows the performance of Extractor on the GPU.

Model Batch Size Average (ms) GPU Memory (Mb) RAM Memory (Mb)
57 1 10.2 929.0 1841.0
57 16 6.5 1341.0 1834.0
58 1 10.2 989.0 1835.0
58 16 6.4 1781.0 1825.0
59 1 10.2 929.0 1833.0
59 16 6.4 1341.0 1837.0
102 1 3.7 733.0 1659.0
102 16 0.3 763.0 1658.0
103 1 7.2 913.0 1817.0
103 16 3.7 1279.0 1821.0
104 1 4.5 759.0 1686.0
104 16 0.6 865.0 1690.0

NPU Performance#

Benchmarking for NPU was performed on the server with the following hardware configuration:

NPU: Huawei Atlas 300I (inference card).

OS: Ubuntu 18.04

CPU: Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz x 48

RAM: 64GB

NPU. Detector performance#

The table below shows the performance of Detector on the NPU.

Measurement BatchSize Average (ms)
Detector (minFaceSize=20) 1 25.7
Detector (minFaceSize=20) 4 18.7
Detector (minFaceSize=20) 8 17.3
Detector (minFaceSize=50) 1 25.7
Detector (minFaceSize=50) 4 18.0
Detector (minFaceSize=50) 8 17.3
Detector (minFaceSize=90) 1 25.5
Detector (minFaceSize=90) 4 18.0
Detector (minFaceSize=90) 8 17.1
Redetect 1 12.7
Redetect 4 6.0
Redetect 8 5.1

NPU. Estimations performance with batch interface#

The table below shows the performance of Estimations on the NPU for estimators that have a batch interface. All these measurements are performed with minFaceSize=50.

Measurement BatchSize Average (ms)
HeadPoseByImage 1 8.0
HeadPoseByImage 16 4.2
HeadPoseByImage 32 3.9
AGS 1 6.6
AGS 16 3.7
AGS 32 3.7
BestShotQuality 1 15.6
BestShotQuality 16 7.8
BestShotQuality 32 7.6
MedicalMask 1 6.1
MedicalMask 16 3.8
MedicalMask 32 3.7

NPU. Estimations performance without batch interface#

The table below shows the performance of Estimations on the NPU for estimators that do not have a batch interface. All these measurements are performed with minFaceSize=50.

Measurement Average (ms)
Warper 2.1

NPU. Extractor performance#

The table below shows the performance of Extractor on the NPU.

Type Model Batch Size Average (ms)
Extractor 57 1 10.9
Extractor 57 16 7.4

Runtime performance for macOS environment#

Face detection performance depends on input image parameters such as resolution and bit depth as well as the size of the detected face.

Input data characteristics:

  • Image resolution: 1920x1080px;
  • Image format: 24 BPP RGB;

Performance measurements are presented for CPU execution modes in tables below. Measured values are averages of at least 1000 experiments.

The results for minimum batch size and optimal batch size are shown in the tables below. All the intermediate and non-optimal values are omitted.

Face detections are performed using FaceDetV3 NN.

Intel-based processor performance (x86_64)#

Benchmarking for CPU was performed on the device with the following configuration:

Hardware Overview:

  • Model Name: Mac mini
  • Processor Name: 6-Core Intel Core i5
  • Processor Speed: 3 GHz
  • Number of Processors: 1
  • Total Number of Cores: 6
  • Memory: 16 GB
  • CPU with AVX2 instruction set was used

OS: macOS 11.2.1

In experiments listed in tables below face detection and descriptor extraction algorithms used all available CPU cores, whereas matching performance is specified per-core.

Intel-based processor. Detector performance#

The table below shows the performance of Detector on the Intel-based processor.

Measurement CPU threads BatchSize Average (ms)
Detector (minFaceSize=20) 1 1 261.3
Detector (minFaceSize=20) 8 1 170.0
Detector (minFaceSize=20) 8 4 174.7
Detector (minFaceSize=20) 8 8 178.4
Detector (minFaceSize=50) 1 1 38.9
Detector (minFaceSize=50) 8 1 28.3
Detector (minFaceSize=50) 8 4 29.8
Detector (minFaceSize=50) 8 8 30.3
Detector (minFaceSize=90) 1 1 14.0
Detector (minFaceSize=90) 8 1 9.8
Detector (minFaceSize=90) 8 4 9.9
Detector (minFaceSize=90) 8 8 10.7
Redetect 1 1 4.3
Redetect 8 1 1.98
Redetect 8 4 1.27
Redetect 8 8 1.45

Intel-based processor. HumanDetector performance#

The table below shows the performance of HumanDetector on the Intel-based processor.

Measurement CPU threads BatchSize Average (ms)
HumanDetector (imageSize=320) 1 1 10.28
HumanDetector (imageSize=320) 8 1 4.55
HumanDetector (imageSize=320) 8 4 4.94
HumanDetector (imageSize=320) 8 8 4.84
HumanDetector (imageSize=640) 1 1 27.99
HumanDetector (imageSize=640) 8 1 15.12
HumanDetector (imageSize=640) 8 4 17.04
HumanDetector (imageSize=640) 8 8 17.98
HumanLandmarksDetector (imageSize=320) 1 1 33.8
HumanLandmarksDetector (imageSize=320) 8 1 16.24
HumanLandmarksDetector (imageSize=320) 8 4 13.99
HumanLandmarksDetector (imageSize=320) 8 8 14.94
HumanLandmarksDetector (imageSize=640) 1 1 57.66
HumanLandmarksDetector (imageSize=640) 8 1 26.69
HumanLandmarksDetector (imageSize=640) 8 4 26.37
HumanLandmarksDetector (imageSize=640) 8 8 28.39
HumanRedetect 1 1 3.69
HumanRedetect 8 1 1.96
HumanRedetect 8 4 1.21
HumanRedetect 8 8 1.33

Intel-based processor. Estimations performance with batch interface#

The table below shows the performance of Estimations on the Intel-based processor for estimators that have a batch interface. All these measurements are performed with minFaceSize=50.

Measurement CPU threads BatchSize Average (ms)
HeadPoseByImage 1 1 0.21
HeadPoseByImage 8 1 0.17
HeadPoseByImage 8 8 0.07
Eyes (INFRA_RED, useStatusPlan=0) 1 1 0.45
Eyes (INFRA_RED, useStatusPlan=0) 8 1 0.29
Eyes (INFRA_RED, useStatusPlan=0) 8 8 0.23
Eyes (RGB, useStatusPlan=0) 1 1 0.85
Eyes (RGB, useStatusPlan=0) 8 1 0.58
Eyes (RGB, useStatusPlan=0) 8 8 0.48
Eyes (INFRA_RED, useStatusPlan=1) 1 1 0.85
Eyes (INFRA_RED, useStatusPlan=1) 8 1 0.57
Eyes (INFRA_RED, useStatusPlan=1) 8 8 0.47
Eyes (RGB, useStatusPlan=1) 1 1 0.85
Eyes (RGB, useStatusPlan=1) 8 1 0.58
Eyes (RGB, useStatusPlan=1) 8 8 0.47
Infra-Red 1 1 1.51
Infra-Red 8 1 0.71
Infra-Red 8 8 0.81
AGS 1 1 0.18
AGS 8 1 0.13
AGS 8 8 0.05
Child 1 1 12.22
Child 8 1 4.67
Child 8 8 6.37
BlackWhite 1 1 1.0
BlackWhite 8 1 0.4
BlackWhite 8 8 1.0
BestShotQuality 1 1 0.18
BestShotQuality 8 1 0.14
BestShotQuality 8 8 0.06
MedicalMask 1 1 4.15
MedicalMask 8 1 2.2
MedicalMask 8 8 2.2
LivenessOneShotRGBEstimator 1 1 153.1
LivenessOneShotRGBEstimator 8 1 70.5
LivenessOneShotRGBEstimator 8 8 95.9
Orientation 1 1 15.5
Orientation 8 1 8.1
Orientation 8 8 10.0
CredibilityCheck 1 1 96.4
CredibilityCheck 8 1 36.0
CredibilityCheck 8 8 37.9
PortraitStyle 1 1 1.3
PortraitStyle 8 1 1.3
PortraitStyle 8 8 1.7
Background 1 1 0.7
Background 8 1 0.7
Background 8 8 1.4
NaturalLight 1 1 2.9
NaturalLight 8 1 1.7
NaturalLight 8 8 1.8
FishEye 1 1 2.09
FishEye 8 1 1.58
FishEye 8 8 5.17
RedEye 1 1 7.5
RedEye 8 1 1.4
RedEye 8 8 1.5
HeadWear 1 1 1.76
HeadWear 8 1 1.07
HeadWear 8 8 1.62
EyeBrowEstimator 1 1 15.57
EyeBrowEstimator 8 1 4.02
EyeBrowEstimator 8 8 4.95
HumanAttributeEstimator 1 1 13.11
HumanAttributeEstimator 8 1 5.24
HumanAttributeEstimator 8 8 4.5
Mouth 1 1 5.24
Mouth 8 1 2.16
Mouth 8 8 2.36

Intel-based processor. Estimations performance without batch interface#

The table below shows the performance of Estimations on the Intel-based processor for estimators that do not have a batch interface. All these measurements are performed with minFaceSize=50.

Measurement CPU threads Average (ms)
EyesGaze 1 1.7
EyesGaze 8 0.9
Emotions 1 10.5
Emotions 8 4.1
Attributes 1 49.4
Attributes 8 16.4
Quality 1 1.4
Quality 8 0.6
Warper 1 1.3
Warper 8 1.0
Overlap 1 3.4
Overlap 8 1.1
PPE 1 6.3
PPE 8 3.8
Glasses 1 1.3
Glasses 8 0.5
LivenessFlyingFaces 1 7.7
LivenessFlyingFaces 8 4.2
LivenessRGBMEstimator 1 20.4
LivenessRGBMEstimator 8 8.9
LivenessFPR 1 31.2
LivenessFPR 8 18.0

Intel-based processor. Extractor performance#

The table below shows the performance of Extractor on the Intel-based processor.

Type Model CPU threads Average (ms)
Extractor 57 1 169.3
Extractor 57 8 60.7
Extractor 58 1 168.9
Extractor 58 8 63.8
Extractor 59 1 168.8
Extractor 59 8 68.6
Extractor 102 1 2.4
Extractor 102 8 1.56
Extractor 103 1 113.2
Extractor 103 8 44.9
Extractor 104 1 10.8
Extractor 104 8 9.21

Intel-based processor. Matcher performance#

The table below shows the performance of Matcher on the Intel-based processor. The table includes average matcher per second for descriptors received using CNN 57, 58, 59, 102, 103 and 104.

Type Model CPU threads Batch Size Average (matches/sec)
Matcher 57, 58 1 10000 37.34 M
Matcher 59 1 10000 40.56 M
Matcher 102 1 10000 14.46 M
Matcher 103 1 10000 9.26 M
Matcher 104 1 10000 14.66 M

Note: The above value is the maximum performance of the matcher on a particular piece of hardware. Performance in general does not depend on the size of the batch, but may be limited by memory performance at large values of the batch size.

ARM-based processor performance (aarch64)#

Benchmarking for CPU was performed on the device with the following configuration:

Hardware Overview:

  • Model Name: Mac mini
  • Chip: Apple M1
  • Total Number of Cores: 8 (4 performance and 4 efficiency)
  • Memory: 16 GB
  • AVX2 instruction is not available

OS: macOS 11.2

In experiments listed in tables below face detection and descriptor extraction algorithms used all available CPU cores, whereas matching performance is specified per-core.

ARM-based processor. Detector performance#

The table below shows the performance of Detector on the ARM-based processor.

Measurement CPU threads BatchSize Average (ms)
Detector (minFaceSize=20) 1 1 661.4
Detector (minFaceSize=20) 8 1 262.6
Detector (minFaceSize=20) 8 4 262.2
Detector (minFaceSize=20) 8 8 266.3
Detector (minFaceSize=50) 1 1 108.0
Detector (minFaceSize=50) 8 1 49.8
Detector (minFaceSize=50) 8 4 45.6
Detector (minFaceSize=50) 8 8 44.7
Detector (minFaceSize=90) 1 1 34.6
Detector (minFaceSize=90) 8 1 20.7
Detector (minFaceSize=90) 8 4 17.2
Detector (minFaceSize=90) 8 8 16.6
Redetect 1 1 4.9
Redetect 8 1 3.8
Redetect 8 4 2.7
Redetect 8 8 2.5

ARM-based processor. HumanDetector performance#

The table below shows the performance of HumanDetector on the ARM-based processor.

Measurement CPU threads BatchSize Average (ms)
HumanDetector (imageSize=320) 1 1 21.74
HumanDetector (imageSize=320) 8 1 11.47
HumanDetector (imageSize=320) 8 4 8.0
HumanDetector (imageSize=320) 8 8 1.15
HumanDetector (imageSize=640) 1 1 79.3
HumanDetector (imageSize=640) 8 1 30.38
HumanDetector (imageSize=640) 8 4 25.2
HumanDetector (imageSize=640) 8 8 24.77
HumanLandmarksDetector (imageSize=320) 1 1 68.95
HumanLandmarksDetector (imageSize=320) 8 1 40.91
HumanLandmarksDetector (imageSize=320) 8 4 31.9
HumanLandmarksDetector (imageSize=320) 8 8 27.87
HumanLandmarksDetector (imageSize=640) 1 1 126.23
HumanLandmarksDetector (imageSize=640) 8 1 59.94
HumanLandmarksDetector (imageSize=640) 8 4 48.85
HumanLandmarksDetector (imageSize=640) 8 8 44.9
HumanRedetect 1 1 3.51
HumanRedetect 8 1 2.31
HumanRedetect 8 4 1.67
HumanRedetect 8 8 1.51

ARM-based processor. Estimations performance with batch interface#

The table below shows the performance of Estimations on the ARM-based processor for estimators that have a batch interface. All these measurements are performed with minFaceSize=50.

Measurement CPU threads BatchSize Average (ms)
HeadPoseByImage 1 1 0.41
HeadPoseByImage 8 1 0.35
HeadPoseByImage 8 8 0.19
Eyes (INFRA_RED, useStatusPlan=0) 1 1 0.92
Eyes (INFRA_RED, useStatusPlan=0) 8 1 0.59
Eyes (INFRA_RED, useStatusPlan=0) 8 8 0.4
Eyes (RGB, useStatusPlan=0) 1 1 1.07
Eyes (RGB, useStatusPlan=0) 8 1 0.63
Eyes (RGB, useStatusPlan=0) 8 8 0.48
Eyes (INFRA_RED, useStatusPlan=1) 1 1 1.07
Eyes (INFRA_RED, useStatusPlan=1) 8 1 0.63
Eyes (INFRA_RED, useStatusPlan=1) 8 8 0.48
Eyes (RGB, useStatusPlan=1) 1 1 2.06
Eyes (RGB, useStatusPlan=1) 8 1 1.24
Eyes (RGB, useStatusPlan=1) 8 8 0.95
Infra-Red 1 1 4.72
Infra-Red 8 1 2.4
Infra-Red 8 8 1.82
AGS 1 1 0.37
AGS 8 1 0.29
AGS 8 8 0.17
Child 1 1 39.43
Child 8 1 22.54
Child 8 8 18.1
BlackWhite 1 1 2.9
BlackWhite 8 1 1.3
BlackWhite 8 8 1.3
BestShotQuality 1 1 0.39
BestShotQuality 8 1 0.31
BestShotQuality 8 8 0.18
MedicalMask 1 1 12.5
MedicalMask 8 1 6.28
MedicalMask 8 8 4.53
LivenessOneShotRGBEstimator 1 1 799.5
LivenessOneShotRGBEstimator 8 1 203.9
LivenessOneShotRGBEstimator 8 8 201.4
Orientation 1 1 41.45
Orientation 8 1 18.68
Orientation 8 8 15.76
CredibilityCheck 1 1 296.2
CredibilityCheck 8 1 103.2
CredibilityCheck 8 8 105.9
PortraitStyle 1 1 2.3
PortraitStyle 8 1 2.2
PortraitStyle 8 8 2.0
Background 1 1 2.4
Background 8 1 2.3
Background 8 8 2.1
NaturalLight 1 1 5.2
NaturalLight 8 1 2.9
NaturalLight 8 8 2.6
FishEye 1 1 9.3
FishEye 8 1 4.7
FishEye 8 8 5.3
RedEye 1 1 10.5
RedEye 8 1 3.3
RedEye 8 8 3.0
HeadWear 1 1 5.03
HeadWear 8 1 2.66
HeadWear 8 8 2.52
EyeBrowEstimator 1 1 35.58
EyeBrowEstimator 8 1 13.44
EyeBrowEstimator 8 8 10.22
HumanAttributeEstimator 1 1 37.32
HumanAttributeEstimator 8 1 23.16
HumanAttributeEstimator 8 8 18.95
Mouth 1 1 16.26
Mouth 8 1 7.86
Mouth 8 8 5.36

ARM-based processor. Estimations performance without batch interface#

The table below shows the performance of Estimations on the ARM-based processor for estimators that do not have a batch interface. All these measurements are performed with minFaceSize=50.

Measurement CPU threads Average (ms)
EyesGaze 1 5.3
EyesGaze 8 3.0
Emotions 1 36.9
Emotions 8 15.8
Attributes 1 154.9
Attributes 8 58.0
Quality 1 2.1
Quality 8 1.2
Warper 1 0.9
Warper 8 1.9
Overlap 1 9.0
Overlap 8 3.6
PPE 1 22.3
PPE 8 15.9
Glasses 1 4.3
Glasses 8 2.2
LivenessFlyingFaces 1 15.7
LivenessFlyingFaces 8 6.4
LivenessRGBMEstimator 1 51.5
LivenessRGBMEstimator 8 30.6
LivenessFPR 8 103.7
LivenessFPR 8 56.3

ARM-based processor. Extractor performance#

The table below shows the performance of Extractor on the ARM-based processor.

Type Model CPU threads Average (ms)
Extractor 57 1 547.2
Extractor 57 8 189.3
Extractor 58 1 547.2
Extractor 58 8 189.2
Extractor 59 1 535.6
Extractor 59 8 188.6

ARM-based processor. Matcher performance#

The table below shows the performance of Matcher on the ARM-based processor. The table includes average matcher per second for descriptors received using CNN 57, 58 and 59.

Type Model CPU threads Batch Size Average (matches/sec)
Matcher 57, 58 1 10000 2.08 M
Matcher 59 1 10000 2.08 M

Note: The above value is the maximum performance of the matcher on a particular piece of hardware. Performance in general does not depend on the size of the batch, but may be limited by memory performance at large values of the batch size.

Runtime performance for embedded environment#

Face detection performance depends on input image parameters such as resolution and bit depth as well as the size of the detected face.

Input data characteristics:

  • Image resolution: 640x480px;
  • Image format: 24 BPP RGB;

The results for minimum batch size and optimal batch size are shown in the tables below. All the intermediate and non-optimal values are omitted.

Face detections are performed using FaceDetV3 NN.

Jetson#

 

Jetson does not use mobilenet by default.

Performance measurements are presented for Jetson. Measured values are averages of at least 100 experiments. Mobilenet is not used by default.

Jetson TX#

Jetson TX GPU. Detector performance#

The table below shows the performance of Detector on the Jetson TX GPU.

Type Batch Size Average (ms)
Detector (minFaceSize=20) 1 499.59
Detector (minFaceSize=20) 4 470.32
Detector (minFaceSize=50) 1 88.97
Detector (minFaceSize=50) 4 80.13
Detector (minFaceSize=50) 8 79.67
Detector (minFaceSize=90) 1 35.66
Detector (minFaceSize=90) 4 30.14
Detector (minFaceSize=90) 8 29.48
Redetect 1 9.5
Redetect 4 5.2
Redetect 8 4.5

Jetson TX GPU. HumanDetector performance#

The table below shows the performance of HumanDetector on the Jetson TX GPU.

Type Batch Size Average (ms)
HumanDetector (imageSize=320) 1 15.6
HumanDetector (imageSize=320) 4 13.75
HumanDetector (imageSize=320) 8 13.75
HumanDetector (imageSize=640) 1 44.55
HumanDetector (imageSize=640) 4 41.57
HumanDetector (imageSize=640) 8 3.51
HumanLandmarksDetector (imageSize=320) 1 68.84
HumanLandmarksDetector (imageSize=320) 4 35.04
HumanLandmarksDetector (imageSize=320) 8 32.84
HumanLandmarksDetector (imageSize=640) 1 98.44
HumanLandmarksDetector (imageSize=640) 4 62.84
HumanLandmarksDetector (imageSize=640) 8 58.65
HumanRedetect 1 6.15
HumanRedetect 4 4.09
HumanRedetect 8 3.5
HumanRedetect 16 3.41

Jetson TX GPU. Estimations performance with batch interface#

The table below shows the performance of Estimations on the Jetson TX GPU for estimators that have a batch interface. All these measurements are performed with minFaceSize=50.

Type Batch Size Average (ms)
HeadPoseByImage 1 8.85
HeadPoseByImage 32 2.82
Eyes (INFRA_RED, useStatusPlan=0) 1 1.53
Eyes (INFRA_RED, useStatusPlan=0) 16 1.02
Eyes (INFRA_RED, useStatusPlan=0) 32 0.93
Eyes (RGB, useStatusPlan=0) 1 2.83
Eyes (RGB, useStatusPlan=0) 16 1.68
Eyes (RGB, useStatusPlan=0) 32 1.65
Eyes (INFRA_RED, useStatusPlan=1) 1 1.49
Eyes (INFRA_RED, useStatusPlan=1) 16 1.17
Eyes (INFRA_RED, useStatusPlan=1) 32 1.1
Eyes (RGB, useStatusPlan=1) 1 2.82
Eyes (RGB, useStatusPlan=1) 16 1.68
Eyes (RGB, useStatusPlan=1) 32 1.6
Infra-Red 1 3.29
AGS 1 5.02
AGS 16 2.57
Child 1 15.23
Child 16 8.95
BlackWhite 1 3.0
BlackWhite 16 1.1
BestShotQuality 1 5.41
BestShotQuality 16 2.59
MedicalMask 1 13.4
MedicalMask 32 4.98
LivenessOneShotRGBEstimator 1 188.8
Orientation 1 26.3
CredibilityCheck 1 44.5
CredibilityCheck 8 35.7
CredibilityCheck 16 34.4
CredibilityCheck 32 34.1
FacialHair 1 3.6
FacialHair 16 2.7
PortraitStyle 1 7.1
PortraitStyle 16 4.0
Background 1 7.2
Background 16 3.9
NaturalLight 1 13.8
NaturalLight 16 1.5
FishEye 1 8.24
FishEye 16 5.41
RedEye 1 2.1
RedEye 16 0.8
HeadWear 1 14.1
HeadWear 16 1.58
EyeBrowEstimator 1 11.81
EyeBrowEstimator 8 10.32
EyeBrowEstimator 16 9.81
EyeBrowEstimator 32 9.57
HumanAttributeEstimator 1 18.81
HumanAttributeEstimator 16 5.48
Mouth 1 14.76
Mouth 16 4.14
Mouth 32 4.0

Jetson TX GPU. Estimations performance without batch interface#

The table below shows the performance of Estimations on the Jetson TX GPU for estimators that do not have a batch interface. All these measurements are performed with minFaceSize=50.

Type Average (ms)
EyesGaze 4.29
Emotions 11.96
Attributes 27.24
Quality 2.17
Warper 8.08
Overlap 3.98
Glasses 3.63
PPE 9.96
LivenessFlyingFaces 19.68
LivenessRGBMEstimator 64.42
LivenessFPR 62.67

Jetson TX GPU. Extractor performance#

The table below shows the performance of Extractor on the Jetson TX GPU.

Type Model Batch Size Average (ms)
Extractor 57 1 76.07
Extractor 57 8 62.03
Extractor 58 1 76.15
Extractor 58 8 61.63
Extractor 59 1 76.15
Extractor 59 8 61.64
Extractor 102 1 17.31
Extractor 102 8 2.61
Extractor 103 1 45.64
Extractor 103 8 32.34
Extractor 104 1 15.23
Extractor 104 8 5.41

Jetson Xavier#

Jetson Xavier GPU. Detector performance#

The table below shows the performance of Detector on the Jetson Xavier GPU.

Type Batch Size Average (ms)
Detector (minFaceSize=20) 1 89.56
Detector (minFaceSize=20) 4 102.86
Detector (minFaceSize=20) 8 153.48
Detector (minFaceSize=50) 1 19.27
Detector (minFaceSize=50) 4 16.73
Detector (minFaceSize=50) 8 16.24
Detector (minFaceSize=90) 1 10.38
Detector (minFaceSize=90) 4 7.41
Detector (minFaceSize=90) 8 6.87
Redetect 1 6.4
Redetect 4 2.9
Redetect 8 2.3

Jetson Xavier GPU. HumanDetector performance#

The table below shows the performance of HumanDetector on the Jetson Xavier GPU.

Type Batch Size Average (ms)
HumanDetector (imageSize=320) 1 9.89
HumanDetector (imageSize=320) 4 7.47
HumanDetector (imageSize=320) 8 6.7
HumanDetector (imageSize=640) 1 2.76
HumanDetector (imageSize=640) 4 18.94
HumanDetector (imageSize=640) 8 18.14
HumanLandmarksDetector (imageSize=320) 1 39.69
HumanLandmarksDetector (imageSize=320) 4 22.61
HumanLandmarksDetector (imageSize=320) 8 19.02
HumanLandmarksDetector (imageSize=640) 1 51.2
HumanLandmarksDetector (imageSize=640) 4 34.61
HumanLandmarksDetector (imageSize=640) 8 30.71
HumanRedetect 1 3.83
HumanRedetect 4 1.98
HumanRedetect 8 1.71
HumanRedetect 16 1.48

Jetson Xavier GPU. Estimations performance with batch interface#

The table below shows the performance of Estimations on the Jetson Xavier GPU for estimators that have a batch interface. All these measurements are performed with minFaceSize=50.

Type Batch Size Average (ms)
HeadPoseByImage 1 4.38
HeadPoseByImage 32 0.89
Eyes (INFRA_RED, useStatusPlan=0) 1 1.12
Eyes (INFRA_RED, useStatusPlan=0) 16 0.53
Eyes (INFRA_RED, useStatusPlan=0) 32 0.48
Eyes (RGB, useStatusPlan=0) 1 2.17
Eyes (RGB, useStatusPlan=0) 16 1.0
Eyes (RGB, useStatusPlan=0) 32 0.99
Eyes (INFRA_RED, useStatusPlan=1) 1 1.12
Eyes (INFRA_RED, useStatusPlan=1) 16 0.51
Eyes (INFRA_RED, useStatusPlan=1) 32 0.5
Eyes (RGB, useStatusPlan=1) 1 2.16
Eyes (RGB, useStatusPlan=1) 16 1.1
Eyes (RGB, useStatusPlan=1) 32 0.99
Infra-Red 1 2.3
Infra-Red 32 1.25
AGS 1 2.83
AGS 32 0.86
Child 1 8.37
Child 8 5.88
BlackWhite 1 2.2
BlackWhite 16 0.6
BestShotQuality 1 3.04
BestShotQuality 32 0.88
MedicalMask 1 6.59
MedicalMask 32 3.45
LivenessOneShotRGBEstimator 1 97.95
LivenessOneShotRGBEstimator 8 81.8
Orientation 1 11.6
Orientation 32 9.75
CredibilityCheck 1 35.2
CredibilityCheck 8 25.09
CredibilityCheck 16 24.64
CredibilityCheck 32 24.22
FacialHair 1 3.35
FacialHair 16 1.84
PortraitStyle 1 3.6
PortraitStyle 16 1.8
Background 1 3.8
Background 16 1.8
NaturalLight 1 3.6
NaturalLight 16 1.5
FishEye 1 4.75
FishEye 16 2.36
RedEye 1 2.0
RedEye 16 0.5
HeadWear 1 4.34
HeadWear 16 1.49
EyeBrowEstimator 1 7.21
EyeBrowEstimator 8 5.32
EyeBrowEstimator 16 5.16
EyeBrowEstimator 32 5.02
HumanAttributeEstimator 1 9.99
HumanAttributeEstimator 16 4.17
Mouth 1 5.84
Mouth 16 2.63
Mouth 32 2.33

Jetson Xavier GPU. Estimations performance without batch interface#

The table below shows the performance of Estimations on the Jetson Xavier GPU for estimators that do not have a batch interface. All these measurements are performed with minFaceSize=50.

Type Average (ms)
EyesGaze 2.99
Emotions 7.48
Attributes 20.3
Quality 1.64
Warper 6.63
Overlap 3.03
PPE 6.43
Glasses 2.14
LivenessFlyingFaces 6.98
LivenessRGBMEstimator 27.14
LivenessFPR 39.41

Jetson Xavier GPU. Extractor performance#

The table below shows the performance of Extractor on the Jetson Xavier GPU.

Type Model Batch Size Average (ms)
Extractor 57 1 66.4
Extractor 57 8 44.1
Extractor 58 1 66.2
Extractor 58 8 44.1
Extractor 59 1 66.3
Extractor 59 8 44.1
Extractor 102 1 8.3
Extractor 102 8 0.98
Extractor 103 1 18.3
Extractor 103 8 19.4
Extractor 104 1 6.6
Extractor 104 8 2.4

Jetson Xavier NX#

Jetson Xavier NX GPU. Detector performance#

The table below shows the performance of Detector on the Jetson Xavier NX GPU.

Type Batch Size Average (ms)
Detector (minFaceSize=20) 1 172.28
Detector (minFaceSize=20) 4 171.78
Detector (minFaceSize=20) 8 238.0
Detector (minFaceSize=50) 1 32.12
Detector (minFaceSize=50) 4 32.21
Detector (minFaceSize=50) 8 29.32
Detector (minFaceSize=90) 1 15.57
Detector (minFaceSize=90) 4 12.19
Detector (minFaceSize=90) 8 11.57
Redetect 1 6.9
Redetect 4 2.8
Redetect 8 2.3

Jetson Xavier NX GPU. HumanDetector performance#

The table below shows the performance of HumanDetector on the Jetson Xavier NX GPU.

Type Batch Size Average (ms)
HumanDetector (imageSize=320) 1 8.72
HumanDetector (imageSize=320) 4 7.08
HumanDetector (imageSize=320) 8 6.45
HumanDetector (imageSize=640) 1 20.23
HumanDetector (imageSize=640) 4 18.76
HumanDetector (imageSize=640) 8 18.02
HumanLandmarksDetector (imageSize=320) 1 42.62
HumanLandmarksDetector (imageSize=320) 4 20.75
HumanLandmarksDetector (imageSize=320) 8 18.17
HumanLandmarksDetector (imageSize=640) 1 61.21
HumanLandmarksDetector (imageSize=640) 4 32.95
HumanLandmarksDetector (imageSize=640) 8 30.08
HumanRedetect 1 4.1
HumanRedetect 4 2.07
HumanRedetect 8 1.74
HumanRedetect 16 1.57

Jetson Xavier NX GPU. Estimations performance with batch interface#

The table below shows the performance of Estimations on the Jetson Xavier NX GPU for estimators that have a batch interface. All these measurements are performed with minFaceSize=50.

Type Batch Size Average (ms)
HeadPoseByImage 1 5.6
HeadPoseByImage 32 1.3
Eyes (INFRA_RED, useStatusPlan=0) 1 1.36
Eyes (INFRA_RED, useStatusPlan=0) 16 0.65
Eyes (INFRA_RED, useStatusPlan=0) 32 0.6
Eyes (RGB, useStatusPlan=0) 1 2.21
Eyes (RGB, useStatusPlan=0) 16 1.09
Eyes (RGB, useStatusPlan=0) 32 1.01
Eyes (INFRA_RED, useStatusPlan=1) 1 1.37
Eyes (INFRA_RED, useStatusPlan=1) 16 0.71
Eyes (INFRA_RED, useStatusPlan=1) 32 0.65
Eyes (RGB, useStatusPlan=1) 1 2.48
Eyes (RGB, useStatusPlan=1) 16 1.31
Eyes (RGB, useStatusPlan=1) 32 1.21
Infra-Red 1 2.32
Infra-Red 32 1.49
AGS 1 3.41
AGS 32 1.25
Child 1 7.85
Child 8 5.49
BlackWhite 1 2.4
BlackWhite 16 0.7
BestShotQuality 1 3.59
BestShotQuality 32 1.27
MedicalMask 1 7.01
MedicalMask 32 3.41
LivenessOneShotRGBEstimator 1 112.7
LivenessOneShotRGBEstimator 16 81.81
Orientation 1 11.57
Orientation 32 10.17
CredibilityCheck 1 31.05
CredibilityCheck 8 22.59
CredibilityCheck 16 21.91
CredibilityCheck 32 21.5
FacialHair 1 2.97
FacialHair 16 1.63
PortraitStyle 1 4.2
PortraitStyle 16 2.0
Background 1 4.0
Background 16 2.1
NaturalLight 1 4.48
NaturalLight 16 1.26
FishEye 1 5.01
FishEye 16 2.42
RedEye 1 2.1
RedEye 16 0.5
HeadWear 1 4.96
HeadWear 16 1.27
EyeBrowEstimator 1 6.27
EyeBrowEstimator 8 5.14
EyeBrowEstimator 16 4.89
EyeBrowEstimator 32 4.79
HumanAttributeEstimator 1 10.1
HumanAttributeEstimator 16 3.78
Mouth 1 6.91
Mouth 16 2.36
Mouth 32 2.14

Jetson Xavier NX GPU. Estimations performance without batch interface#

The table below shows the performance of Estimations on the Jetson Xavier NX GPU for estimators that do not have a batch interface. All these measurements are performed with minFaceSize=50.

Type Average (ms)
EyesGaze 3.7
Emotions 6.8
Attributes 17.36
Quality 1.59
Warper 9.82
Overlap 3.56
PPE 6.31
Glasses 2.05
LivenessFlyingFaces 8.46
LivenessRGBMEstimator 28.1
LivenessFPR 40.5

Jetson Xavier NX GPU. Extractor performance#

The table below shows the performance of Extractor on the Jetson Xavier NX GPU.

Type Model Batch Size Average (ms)
Extractor 57 1 58.2
Extractor 57 16 38.1
Extractor 58 1 58.0
Extractor 58 16 38.1
Extractor 59 1 58.0
Extractor 59 16 38.0
Extractor 102 1 10.7
Extractor 102 16 1.0
Extractor 103 1 28.4
Extractor 103 16 41.3
Extractor 104 1 9.8
Extractor 104 16 3.6

Jetson Nano#

Jetson Nano GPU. Detector performance#

The table below shows the performance of Detector on the Jetson Nano GPU.

Type Batch Size Average (ms)
Detector (minFaceSize=20) 1 1784.37
Detector (minFaceSize=50) 1 322.4
Detector (minFaceSize=50) 4 308.74
Detector (minFaceSize=50) 8 313.25
Detector (minFaceSize=90) 1 127.57
Detector (minFaceSize=90) 4 116.31
Detector (minFaceSize=90) 8 117.14
Redetect 1 11.08
Redetect 8 6.78
Redetect 16 6.8

Jetson Nano GPU. HumanDetector performance#

The table below shows the performance of HumanDetector on the Jetson Nano GPU.

Type Batch Size Average (ms)
HumanDetector (imageSize=320) 1 56.61
HumanDetector (imageSize=320) 4 53.09
HumanDetector (imageSize=320) 8 50.64
HumanDetector (imageSize=640) 1 163.04
HumanDetector (imageSize=640) 4 154.65
HumanDetector (imageSize=640) 8 161.43
HumanLandmarksDetector (imageSize=320) 1 168.85
HumanLandmarksDetector (imageSize=320) 4 140.82
HumanLandmarksDetector (imageSize=320) 8 135.28
HumanLandmarksDetector (imageSize=640) 1 300.52
HumanLandmarksDetector (imageSize=640) 4 252.41
HumanLandmarksDetector (imageSize=640) 8 243.25
HumanRedetect 1 10.69
HumanRedetect 4 8.14
HumanRedetect 8 7.25
HumanRedetect 16 7.61

Jetson Nano GPU. Estimations performance with batch interface#

The table below shows the performance of Estimations on the Jetson Nano GPU for estimators that have a batch interface. All these measurements are performed with minFaceSize=50.

Type Batch Size Average (ms)
HeadPoseByImage 1 6.59
HeadPoseByImage 4 5.2
HeadPoseByImage 16 3.49
Warper 1 1.8
Warper 4 1.62
Warper 16 1.68
Eyes (INFRA_RED, useStatusPlan=0) 1 3.5
Eyes (INFRA_RED, useStatusPlan=0) 4 2.93
Eyes (INFRA_RED, useStatusPlan=0) 16 2.71
Eyes (RGB, useStatusPlan=0) 1 6.9
Eyes (RGB, useStatusPlan=0) 4 5.82
Eyes (RGB, useStatusPlan=0) 16 5.23
Eyes (INFRA_RED, useStatusPlan=1) 1 3.51
Eyes (INFRA_RED, useStatusPlan=1) 4 2.88
Eyes (INFRA_RED, useStatusPlan=1) 16 2.63
Eyes (RGB, useStatusPlan=1) 1 7.44
Eyes (RGB, useStatusPlan=1) 4 5.77
Eyes (RGB, useStatusPlan=1) 16 5.33
Infra-Red 1 9.76
Infra-Red 4 8.62
Infra-Red 16 8.32
AGS 1 6.26
AGS 4 4.19
AGS 16 3.5
Child 1 63.82
Child 4 48.97
Child 16 42.4
BlackWhite 1 6.17
BlackWhite 4 3.41
BlackWhite 16 2.76
BestShotQuality 1 6.68
BestShotQuality 4 4.43
BestShotQuality 16 3.53
MedicalMask 1 26.13
MedicalMask 4 19.52
MedicalMask 16 17.62
LivenessOneShotRGBEstimator 1 1167.49
LivenessOneShotRGBEstimator 4 1193.24
Orientation 1 120.76
Orientation 4 114.46
Orientation 16 113.92
CredibilityCheck 1 282.77
CredibilityCheck 4 250.91
CredibilityCheck 16 240.66
FacialHair 1 15.46
FacialHair 4 15.03
FacialHair 16 14.49
PortraitStyle 1 13.7
PortraitStyle 4 10.45
PortraitStyle 16 10.13
Background 1 12.8
Background 4 10.82
Background 16 9.98
NaturalLight 1 27.08
NaturalLight 4 10.89
NaturalLight 16 7.43
FishEye 1 56.38
FishEye 4 14.49
FishEye 16 3.58
RedEye 1 7.9
RedEye 4 5.2
RedEye 16 4.63
HeadWear 1 29.07
HeadWear 4 12.27
HeadWear 16 8.52
EyeBrowEstimator 1 57.24
EyeBrowEstimator 4 54.78
EyeBrowEstimator 16 55.01
HumanAttributeEstimator 1 56.53
HumanAttributeEstimator 4 38.71
HumanAttributeEstimator 16 34.36
Mouth 1 41.05
Mouth 4 35.52
Mouth 16 24.92

Jetson Nano GPU. Estimation performance without batch interface#

The table below shows the performance of Estimations on the Jetson Nano GPU for estimators that do not have a batch interface. All these measurements are performed with minFaceSize=50.

Type Average (ms)
EyesGaze 12.37
Emotions 49.04
Attributes 140.48
Quality 4.35
Overlap 10.28
PPE 37.88
Glasses 10.83
LivenessFlyingFaces 42.99
LivenessRGBMEstimator 172.06
LivenessFPR 206.17

Jetson Nano GPU. Extractor performance#

The table below shows the performance of Extractor on the Jetson Nano GPU.

Type Model Batch Size Average (ms)
Extractor 58 1 442.35
Extractor 58 4 403.95
Extractor 59 1 428.35
Extractor 59 4 411.47
Extractor 102 1 26.17
Extractor 102 4 9.9
Extractor 103 1 254.11
Extractor 103 4 215.07
Extractor 104 1 39.97
Extractor 104 4 31.63

Descriptor size#

Table below shows size of serialized face descriptors to estimate memory requirements.

"Descriptor size"

Face descriptor version Data size (bytes) Metadata size (bytes) Total size
CNN 54 512 8 520
CNN 56 512 8 520
CNN 57 512 8 520
CNN 58 512 8 520
CNN 59 512 8 520

Table below shows size of serialized human descriptors to estimate memory requirements. Human descriptors are used only for reidentification tasks.

"Human descriptor size (used only for reidentification tasks)"

Human descriptor version Data size (bytes) Metadata size (bytes) Total size
CNN 102 2048 8 2056
CNN 103 2048 8 2056
CNN 104 2048 8 2056

Metadata includes signature and version information that may be omitted during serialization if the NoSignature flag is specified.

When estimating individual descriptor size in memory or serialization storage requirements with default options, consider using values from the "Total size" column.

When estimating memory requirements for descriptor batches, use values from the "Data size" column instead, since a descriptor batch does not duplicate metadata per descriptor and thus is more memory-efficient.

These numbers are for approximate computation only, since they do not include overhead like memory alignment for accelerated SIMD processing and the like.