Skip to content

Appendix A. Specifications#

Classification performance#

Classification performance was measured on a two datasets:

  • Cooperative dataset (containing 20K images from various sources obtained at several banks);
  • Non cooperative dataset (containing 20K).

The two tables below contain true positive rates corresponding to select false positive rates.

"Classification performance @ low FPR on cooperative dataset"

FPR TPR CNN 58 TPR CNN 59 TPR CNN 59m TPR CNN 60 TPR CNN 60m TPR CNN 62 TPR CNN 65
10^-7^ 0.9910 0.9911 0.9876 0.9917 0.9660 0.9909 0.9909
10^-6^ 0.9916 0.9915 0.9904 0.9917 0.9824 0.9950 0.9950
10^-5^ 0.9918 0.9919 0.9915 0.9919 0.9889 0.9976 0.9976
10^-4^ 0.9919 0.9921 0.9919 0.9921 0.9909 0.9988 0.9988

"Classification performance @ low FPR on non cooperative dataset"

FPR TPR CNN 58 TPR CNN 59 TPR CNN 59m TPR CNN 60 TPR CNN 60m TPR CNN 62 TPR CNN 65
10^-7^ 0.9767 0.9832 0.9377 0.9893 0.8797 0.9916 0.9909
10^-6^ 0.9839 0.9880 0.9629 0.9914 0.9246 0.9917 0.9950
10^-5^ 0.9880 0.9908 0.9794 0.9914 0.9595 0.9918 0.9976
10^-4^ 0.9909 0.9924 0.9880 0.9925 0.9821 0.9920 0.9988

Runtime performance for CentOS Linux environment#

Face detection performance depends on input image parameters such as resolution and bit depth as well as the size of the detected face.

Input data characteristics:

  • Image resolution: 1920x1080px;
  • Image format: 24 BPP RGB;

Performance measurements are presented for CPU, GPU and NPU execution modes in tables below. Measured values are averages of at least 100 experiments.

Estimated values of memory consumption are also presented for CPU and GPU. These values are highly depend on the input data and the conditions of the experiment.

The results for minimum batch size and optimal batch size are shown in the tables below. All the intermediate and non-optimal values are omitted.

Face detections are performed using FaceDetV3 NN.

All types of face detection and redetect performed with capturing bounding boxes and 5 facial landmarks.

CPU performance#

Benchmarking for CPU was performed on the server with the following hardware configuration:

CPU:

  • Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz;
  • CPU(s): 40
  • Thread(s) per core: 2
  • Core(s) per socket: 10
  • Socket(s): 2
  • NUMA node(s): 2
  • CPU with AVX2 instruction set was used

OS: CentOS Linux release 8.3.2011

RAM: 128 GB DDR4 (Clock Speed: 2133 MHz)

In experiments listed in tables below face detection and descriptor extraction algorithms used all available CPU cores, whereas matching performance is specified per-core.

Descriptor matching is only implemented on CPU.

CPU. Detector performance#

The table below shows the performance of FaceDetV3 Detector on the CPU.

Measurement CPU threads BatchSize Average (ms) RAM Memory (Mb)
Detector (minFaceSize=20) 1 1 373.92 1889.0
Detector (minFaceSize=20) 8 1 152.73 2076.0
Detector (minFaceSize=20) 8 4 147.26 4411.0
Detector (minFaceSize=20) 8 8 148.32 7329.0
Detector (minFaceSize=50) 1 1 63.23 1261.0
Detector (minFaceSize=50) 8 1 27.52 1482.0
Detector (minFaceSize=50) 8 4 23.43 1810.0
Detector (minFaceSize=50) 8 8 24.61 2358.0
Detector (minFaceSize=90) 1 1 23.11 1184.0
Detector (minFaceSize=90) 8 1 11.62 1364.0
Detector (minFaceSize=90) 8 4 8.03 1470.0
Detector (minFaceSize=90) 8 8 8.23 1748.0
Redetect 1 1 0.63 1252.0
Redetect 8 1 0.83 1284.0
Redetect 8 4 0.32 1673.0
Redetect 8 8 0.25 2153.0
FaceLandmarks5Detector 1 1 0.22 1225.0
FaceLandmarks5Detector 8 1 0.37 1225.0
FaceLandmarks5Detector 8 8 0.09 1226.0
FaceLandmarks68Detector 1 1 3.2 1226.0
FaceLandmarks68Detector 8 1 2.0 1230.0
FaceLandmarks68Detector 8 8 1.0 1237.0

CPU. HumanDetector performance#

The table below shows the performance of HumanDetector on the CPU.

Measurement CPU threads BatchSize Average (ms) RAM Memory (Mb)
HumanDetector (resize to 320) 1 1 10.05 1740.0
HumanDetector (resize to 320) 8 1 6.18 1813.0
HumanDetector (resize to 320) 8 8 3.53 1978.0
HumanDetector (resize to 640) 1 1 35.03 1776.0
HumanDetector (resize to 640) 8 1 14.71 1865.0
HumanDetector (resize to 640) 8 8 11.55 2234.0
HumanRedetect 1 1 2.61 1239.0
HumanRedetect 8 1 2.76 1545.0
HumanRedetect 8 4 1.24 1770.0
HumanRedetect 8 8 1.26 1987.0

CPU. HumanFaceDetector performance#

The table below shows the performance of HumanFaceDetector on the CPU.

Measurement CPU threads BatchSize Average (ms) RAM Memory (Mb)
HumanFaceDetector (minFaceSize=20) 1 1 425.37 2558
HumanFaceDetector (minFaceSize=20) 8 1 183.5 2600
HumanFaceDetector (minFaceSize=20) 8 8 182.35 9340
HumanFaceDetector (minFaceSize=50) 1 1 66.97 1783
HumanFaceDetector (minFaceSize=50) 8 1 28.9 1812
HumanFaceDetector (minFaceSize=50) 8 8 29.17 2900
HumanFaceDetector (minFaceSize=90) 1 1 22.6 1734
HumanFaceDetector (minFaceSize=90) 8 1 10.71 1758
HumanFaceDetector (minFaceSize=90) 8 8 9.17 2072

CPU. HeadDetector performance#

Type CPU threads Batch Size Average (ms) RAM Memory (Mb)
HeadDetector (minHeadSize=20) 1 1 322.93 2156.0
HeadDetector (minHeadSize=20) 8 1 118.41 2223.0
HeadDetector (minHeadSize=20) 8 8 109.41 5578.0
HeadDetector (minHeadSize=50) 1 1 57.97 1781.0
HeadDetector (minHeadSize=50) 8 1 23.99 1823.0
HeadDetector (minHeadSize=50) 8 8 19.94 2485.0
HeadDetector (minHeadSize=90) 1 1 23.37 1708.0
HeadDetector (minHeadSize=90) 8 1 10.9 1779.0
HeadDetector (minHeadSize=90) 8 8 7.32 2036.0

CPU. Estimations performance with batch interface#

The table below shows the performance of Estimations on the CPU for estimators that have a batch interface. All these measurements are performed with minFaceSize=50.

Measurement CPU threads BatchSize Average (ms) RAM Memory (Mb)
Eyes (INFRA_RED, useStatusPlan=0) 1 1 1.61 1853.0
Eyes (INFRA_RED, useStatusPlan=0) 8 1 0.77 1967.0
Eyes (INFRA_RED, useStatusPlan=0) 8 8 0.35 1978.0
Eyes (RGB, useStatusPlan=0) 1 1 1.53 1858.0
Eyes (RGB, useStatusPlan=0) 8 1 0.68 1974.0
Eyes (RGB, useStatusPlan=0) 8 8 0.34 1974.0
Eyes (INFRA_RED, useStatusPlan=1) 1 1 0.8 1768.0
Eyes (INFRA_RED, useStatusPlan=1) 8 1 0.35 1830.0
Eyes (INFRA_RED, useStatusPlan=1) 8 8 0.19 1832.0
Eyes (RGB, useStatusPlan=1) 1 1 0.79 1766.0
Eyes (RGB, useStatusPlan=1) 8 1 0.35 1829.0
Eyes (RGB, useStatusPlan=1) 8 8 0.19 1827.0
Infra-Red 1 1 2 1191.0
Infra-Red 8 1 1.0 1209.0
Infra-Red 8 8 0.7 1218.0
AGS 1 1 0.24 1735.0
AGS 8 1 0.15 1763.0
AGS 8 8 0.08 1804.0
HeadPoseByImage 1 1 0.24 1648.0
HeadPoseByImage 8 1 0.15 1672.0
HeadPoseByImage 8 8 0.06 1712.0
Warper 1 1 2.1 1180.0
Warper 8 1 2.2 1219.0
Warper 8 8 0.9 1230.0
BlackWhite 1 1 1.3 1249.0
BlackWhite 8 1 0.7 1265.0
BlackWhite 8 8 1.2 1263.0
BestShotQuality 1 1 0.5 1833.0
BestShotQuality 8 1 0.22 1857.0
BestShotQuality 8 8 0.1 1896.0
MedicalMask 1 1 5.6 1258.0
MedicalMask 8 1 3.2 1287.0
MedicalMask 8 8 2.8 1318.0
LivenessOneShotRGBEstimator 1 1 199.57 2119.0
LivenessOneShotRGBEstimator 8 1 51.62 2204.0
LivenessOneShotRGBEstimator 8 8 47.39 2570.0
Orientation 1 1 5.06 1609.0
Orientation 8 1 3.33 1682.0
Orientation 8 8 1.86 1875.0
CredibilityCheck 1 1 120.3 1332.0
CredibilityCheck 8 1 35.1 1351.0
CredibilityCheck 8 8 34.1 1558.0
FacialHair 1 1 12.86 1751.0
FacialHair 8 1 4.84 1770.0
FacialHair 8 8 4.24 1794.0
PortraitStyle 1 1 1.54 1738.0
PortraitStyle 8 1 2.2 1846.0
PortraitStyle 8 8 0.95 1915.0
Background 1 1 1.1 1239.0
Background 8 1 1.2 1258.0
Background 8 8 1.7 1305.0
NaturalLight 1 1 2.37 1250.0
NaturalLight 8 1 1.49 1267.0
NaturalLight 8 8 1.97 1276.0
FishEye 1 1 12.8 1747.0
FishEye 8 1 4.8 1747.0
FishEye 8 8 0.6 1771.0
RedEye 1 1 5.7 1241.0
RedEye 8 1 1.9 1260.0
RedEye 8 8 1.6 1264.0
HeadWear 1 1 4.09 1742.0
HeadWear 8 1 2.63 1769.0
HeadWear 8 8 1.2 1773.0
EyeBrowEstimator 1 1 13.06 1751.0
EyeBrowEstimator 8 1 4.82 1769.0
EyeBrowEstimator 8 8 4.27 1781.0
HumanAttributeEstimator 1 1 11.93 1624.0
HumanAttributeEstimator 8 1 5.83 1651.0
HumanAttributeEstimator 8 8 3.78 1699.0
Mouth 1 1 6.64 1252.0
Mouth 8 1 2.64 1271.0
Mouth 8 8 2.12 1290.0
CrowdEstimator (Single, minHeadSize=6) 1 1 3157.74 2613.0
CrowdEstimator (Single, minHeadSize=6) 8 1 900.79 2631.0
CrowdEstimator (Single, minHeadSize=6) 8 8 615.48 8676.0
CrowdEstimator (Single, minHeadSize=12) 1 1 801.6 1969.0
CrowdEstimator (Single, minHeadSize=12) 8 1 231.88 1990.0
CrowdEstimator (Single, minHeadSize=12) 8 8 147.72 3535.0
CrowdEstimator (TwoNets, minHeadSize=6) 1 1 3085.82 2641.0
CrowdEstimator (TwoNets, minHeadSize=6) 8 1 906.33 2714.0
CrowdEstimator (TwoNets, minHeadSize=6) 8 8 613.95 9073.0
CrowdEstimator (TwoNets, minHeadSize=12) 1 1 819.59 2005.0
CrowdEstimator (TwoNets, minHeadSize=12) 8 1 239.66 2072.0
CrowdEstimator (TwoNets, minHeadSize=12) 8 8 162.99 3955.0
DynamicRange 1 1 1.49 1721.0
DynamicRange 8 1 1.61 1749.0
DynamicRange 8 8 0.81 1793.0
LivenessDepthRGB 1 1 8.06 1757.0
LivenessDepthRGB 8 1 4.13 1796.0
LivenessDepthRGB 8 8 2.96 1839.0
Glasses 1 1 0.86 1743.0
Glasses 8 1 1.01 1768.0
Glasses 8 8 0.42 1768.0
DeepFake 1 1 232.47 2110.0
DeepFake 8 1 78.41 2179.0
DeepFake 8 8 76.75 2505.0
NIRLivenessEstimator 1 1 15.49 1625.0
NIRLivenessEstimator 8 1 10.05 1639.0
NIRLivenessEstimator 8 8 9.47 1747.0
LivenessRGBMEstimator 1 1 29.1 1968.0
LivenessRGBMEstimator 8 1 10.71 2037.0
LivenessRGBMEstimator 8 8 8.74 2356.0
DepthLivenessEstimator 1 1 2.15 1856.0
DepthLivenessEstimator 8 1 1.35 1876.0
DepthLivenessEstimator 8 8 0.84 1894.0
Attributes 1 1 68.89 1994.0
Attributes 8 1 24.7 2023.0
Attributes 8 8 19.32 2274.0
FaceOcclusionBatch 1 1 7.35 1303.0
FaceOcclusionBatch 1 8 3.61 1469.0
FaceOcclusionBatch 8 8 3.03 1455.0

CPU. Estimations performance without batch interface#

The table below shows the performance of Estimations on the CPU for estimators that do not have a batch interface. All these measurements are performed with minFaceSize=50.

Measurement CPU threads Average (ms) RAM Memory (Mb)
EyesGaze 1 2.2 1250
EyesGaze 8 1.4 1270
Emotions 1 13.6 1262
Emotions 8 4.9 1275
Quality 1 1.2 1178
Quality 8 0.6 1220
Overlap 1 4.5 1248
Overlap 8 1.3 1267
PPE 1 11.74 1711.0
PPE 8 5.6 1733.0
LivenessFlyingFaces 1 15.07 1804
LivenessFlyingFaces 8 7.21 1913
LivenessFPR 1 44.2 1263
LivenessFPR 8 19.9 1293
Fights 1 250.26 1876
Fights 8 63.9 1895

CPU. Extractor performance#

The table below shows the performance of Extractor on the CPU.

Model CPU threads Batch Size Average (ms) RAM Memory (Mb)
58 1 1 219.3 1470
58 8 8 58.0 1543
59 1 1 219.7 1473
59 8 8 58.2 1550
60 1 1 258.0 1473
60 8 8 51.1 1550
62 1 1 254.36 2007
62 8 1 67.54 2008
62 8 8 71.48 2025
65 1 1 364.93 1992
65 8 1 120.88 1993
65 8 8 93.0 2616
105 1 1 1.66 1604
105 8 8 0.71 1657
106 1 1 140.76 1892
106 8 8 39.01 1954
107 1 1 12.0 1637
107 8 8 3.7 1723
108 1 1 1.69 1606
108 8 8 0.72 1671
109 1 1 133.7 1822
109 8 8 37.33 1889
110 1 1 15.53 1640
110 8 8 5.39 1733
112 1 1 112.33 1823.0
112 8 1 39.73 1839.0
112 8 8 32.95 1884.0
113 1 1 15.17 1640.0
113 8 1 6.57 1656.0
113 8 8 4.7 1727.0
115 1 1 117.12 1920.0
115 8 1 41.21 1947.0
115 8 8 33.19 1967.0
116 1 1 16.79 1739.0
116 8 1 7.23 1759.0
116 8 8 5.07 1811.0

CPU. Matcher performance#

The table below shows the performance of Matcher on the CPU. The table includes average matcher per second for descriptors received using the following CNN model versions:

  • face descriptors: 59, 60, 62
  • human body descriptors: 105, 106, 107, 108, 109, 110, 112, 113, 115, 116
Model CPU threads Batch Size Average (matches/sec) RAM Memory (Mb)
58 1 1000 28 M 15.0
59 1 1000 28 M 15.0
60 1 1000 28 M 15.0
62 1 1000 28 M 15.0
65 1 1000 28 M 15.0
105 1 1000 27.78 M 113
106 1 1000 28.67 M 112
107 1 1000 27.34 M 113
108 1 1000 31.89 M 117
109 1 1000 29.23 M 114
110 1 1000 27.41 M 112
112 1 1000 30 M 109.0
113 1 1000 28.32 112.0
115 1 1000 31.6 112.0
116 1 1000 28.7 112.0

Note: The above value is the maximum performance of the matcher on a particular piece of hardware. Performance in general does not depend on the size of the batch, but may be limited by memory performance at large values of the batch size.

GPU performance#

Benchmarking for GPU was performed on the following hardware configuration:

GPU: NVIDIA Tesla T4.

OS: CentOS Linux release 8.3.2011

GPU. Detector performance#

The table below shows the performance of FaceDetV3 Detector on the GPU.

Measurement Batch Size Average (ms) GPU Memory (Mb) RAM Memory (Mb)
Detector (minFaceSize=20) 1 29.02 1485.0 1663.0
Detector (minFaceSize=20) 4 34.37 3611.0 1691.0
Detector (minFaceSize=20) 8 38.09 6539.0 1741.0
Detector (minFaceSize=50) 1 7.46 847.0 1653.0
Detector (minFaceSize=50) 4 6.56 1207.0 1682.0
Detector (minFaceSize=50) 8 6.24 1779.0 1702.0
Detector (minFaceSize=90) 1 4.95 835.0 1655.0
Detector (minFaceSize=90) 4 3.44 907.0 1669.0
Detector (minFaceSize=90) 8 3.17 1381.0 1694.0
Redetect 1 2.52 847.0 1657.0
Redetect 4 1.64 1207.0 1660.0
Redetect 8 1.47 1779.0 1663.0
Redetect 16 1.38 2781.0 1667.0
FaceLandmarks5Detector 1 2.33 821.0 1651.0
FaceLandmarks5Detector 8 0.32 821.0 1651.0
FaceLandmarks5Detector 16 0.17 821.0 1657.0
FaceLandmarks68Detector 1 2.6 821.0 1669.0
FaceLandmarks68Detector 8 1.5 821.0 1668.3
FaceLandmarks68Detector 16 1.4 949.0 1663.0

GPU. HumanDetector performance#

The table below shows the performance of HumanDetector on the GPU.

Measurement Batch Size Average (ms) GPU Memory (Mb) RAM Memory (Mb)
HumanDetector (resize to 320) 1 4.17 779.0 1778.0
HumanDetector (resize to 320) 4 2.46 819.0 1792.0
HumanDetector (resize to 320) 8 2.17 909.0 1815.0
HumanDetector (resize to 640) 1 5.42 827.0 1784.0
HumanDetector (resize to 640) 4 4.14 1013.0 1796.0
HumanDetector (resize to 640) 8 3.92 1371.0 1824.0
HumanRedetect 1 2.74 789.0 1696.0
HumanRedetect 4 1.67 1013.0 1695.0
HumanRedetect 8 1.47 1251.0 1689.0
HumanRedetect 16 1.4 1867.0 1709.0

GPU. HeadDetector performance#

The table below shows the performance of HeadDetector on the GPU.

Type Batch Size Average (ms) GPU Memory (Mb) RAM Memory (Mb)
HeadDetector (minHeadSize=20) 1 24.38 1561.0 1730.0
HeadDetector (minHeadSize=20) 4 31.35 4103.0 1745.0
HeadDetector (minHeadSize=20) 8 35.85 7491.0 1799.0
HeadDetector (minHeadSize=50) 1 6.63 837.0 1716.0
HeadDetector (minHeadSize=50) 4 5.74 1367.0 1749.0
HeadDetector (minHeadSize=50) 8 5.45 1931.0 1767.0
HeadDetector (minHeadSize=90) 1 4.41 749.0 1720.0
HeadDetector (minHeadSize=90) 4 3.04 905.0 1734.0
HeadDetector (minHeadSize=90) 8 2.8 1103.0 1759.0

GPU. HumanFace detector performance#

The table below shows the performance of HumanFaceDetector on the GPU.

Measurement Batch Size Average (ms) GPU Memory (Mb) RAM Memory (Mb)
HumanFaceDetector (minFaceSize=20) 1 34.1 1675.0 1703.0
HumanFaceDetector (minFaceSize=20) 4 42.6 4415.0 1774.0
HumanFaceDetector (minFaceSize=20) 8 50.32 8041.0 1889.0
HumanFaceDetector (minFaceSize=50) 1 7.99 903.0 1674.0
HumanFaceDetector (minFaceSize=50) 4 7.15 1487.0 1706.0
HumanFaceDetector (minFaceSize=50) 8 6.83 2067.0 1764.0
HumanFaceDetector (minFaceSize=90) 1 5.3 903.0 1672.0
HumanFaceDetector (minFaceSize=90) 4 3.52 929.0 1685.0
HumanFaceDetector (minFaceSize=90) 8 3.24 1125.0 1719.0

GPU. Estimations performance with batch interface#

The table below shows the performance of Estimations on the GPU for estimators that have a batch interface. All these measurements are performed with minFaceSize=50.

Measurement Batch Size Average (ms) GPU Memory (Mb) RAM Memory (Mb)
HeadPoseByImage 1 2.26 785.0 1692.0
HeadPoseByImage 16 1.45 881.0 1775.0
HeadPoseByImage 32 1.42 975.0 1873.0
Warper 1 0.11 739.0 1672.0
Warper 32 0.03 931.0 1672.0
Eyes (INFRA_RED, useStatusPlan=0) 1 1.03 855.0 1805.0
Eyes (INFRA_RED, useStatusPlan=0) 16 0.19 855.0 1806.0
Eyes (INFRA_RED, useStatusPlan=0) 32 0.15 887.0 1812.0
Eyes (RGB, useStatusPlan=0) 1 1.04 855.0 1810.0
Eyes (RGB, useStatusPlan=0) 16 0.19 855.0 1808.0
Eyes (RGB, useStatusPlan=0) 32 0.14 887.0 1812.0
Eyes (INFRA_RED, useStatusPlan=1) 1 0.59 743.0 1803.0
Eyes (INFRA_RED, useStatusPlan=1) 16 0.14 743.0 1824.0
Eyes (INFRA_RED, useStatusPlan=1) 32 0.12 775.0 1827.0
Eyes (RGB, useStatusPlan=1) 1 0.6 743.0 1804.0
Eyes (RGB, useStatusPlan=1) 16 0.13 743.0 1825.0
Eyes (RGB, useStatusPlan=1) 32 0.11 775.0 1830.0
Infra-Red 1 1.11 811.0 1666.0
Infra-Red 32 0.54 811.0 1679.0
AGS 1 2.28 899.0 1689.0
AGS 16 1.42 899.0 1777.0
AGS 32 1.39 1089.0 1874.0
BlackWhite 1 1.05 821.0 1676.0
BlackWhite 16 0.4 853.0 1677.0
BestShotQuality 1 3.11 855.0 1821.0
BestShotQuality 16 1.44 855.0 1914.0
BestShotQuality 32 1.41 1045.0 2008.0
MedicalMask 1 5.01 821.0 1702.0
MedicalMask 16 1.69 917.0 1791.0
LivenessOneShotRGBEstimator 1 13.44 1046.0 2091.0
LivenessOneShotRGBEstimator 8 10.61 1614.0 2092.0
LivenessOneShotRGBEstimator 16 10.3 2062.0 2091.0
Orientation 1 3.12 799.0 1670.0
Orientation 16 1.73 963.0 1664.0
Orientation 32 1.69 1141.0 1669.0
CredibilityCheck 1 5.54 947.0 1774.0
CredibilityCheck 16 3.72 1339.0 1771.0
FacialHair 1 1.86 853.0 1687.0
FacialHair 16 0.32 853.0 1683.0
FacialHair 32 0.28 853.0 1685.0
PortraitStyle 1 2.84 895.0 1671.0
PortraitStyle 16 1.51 915.0 1770.0
PortraitStyle 32 1.48 1085.0 1861.0
Background 1 2.6 821.0 1679.0
Background 16 1.5 917.0 1770.0
NaturalLight 1 3.61 853.0 1692.0
NaturalLight 16 0.27 853.0 1695.0
FishEye 1 2.37 895.0 1692.0
FishEye 16 0.14 895.0 1694.0
RedEye 1 1.1 821.0 1675.0
RedEye 16 0.15 821.0 1675.0
HeadWear 1 4.14 853.0 1696.0
HeadWear 16 0.36 853.0 1699.0
HeadWear 32 0.27 853.0 1697.0
EyeBrowEstimator 1 2.56 895.0 1694.0
EyeBrowEstimator 16 0.8 895.0 1693.0
EyeBrowEstimator 32 0.76 803.0 1079.0
HumanAttributeEstimator 1 5.53 853.0 1691.0
HumanAttributeEstimator 16 0.57 853.0 1722.0
Mouth 1 4.03 853.0 1690.0
Mouth 16 0.42 949.0 1691.0
Mouth 32 0.37 1043.0 1690.0
Glasses 1 1.41 901.0 1695.0
Glasses 16 0.2 901.0 1689.0
Glasses 32 0.16 901.0 1686.0
CrowdEstimator (Single, minHeadSize=6) 1 64.57 1569.0 1843.0
CrowdEstimator (Single, minHeadSize=6) 4 65.7 3185.0 1873.0
CrowdEstimator (Single, minHeadSize=6) 8 66.96 3334.0 1904.0
CrowdEstimator (Single, minHeadSize=12) 1 22.15 985.0 1834.0
CrowdEstimator (Single, minHeadSize=12) 4 21.38 1433.0 1857.0
CrowdEstimator (Single, minHeadSize=12) 8 21.67 1496.0 1883.0
CrowdEstimator (TwoNets, minHeadSize=6) 1 69.7 1745.0 1854.0
CrowdEstimator (TwoNets, minHeadSize=6) 4 71.11 3570.0 1903.0

| CrowdEstimator (TwoNets, minHeadSize=6) | 8 | 72.04 | 4164.0 | 1925.0 | | CrowdEstimator (TwoNets, minHeadSize=12) | 1 | 26.89 | 1083.0 | 1846.0 | | CrowdEstimator (TwoNets, minHeadSize=12) | 4 | 23.8 | 1770.0 | 1871.0 | | CrowdEstimator (TwoNets, minHeadSize=12) | 8 | 25.44 | 2208.0 | 1904.0 | | DeepFake | 1 | 15.64 | 1015.0 | 2087.0 | | DeepFake | 16 | 13.08 | 1725.0 | 2177.0 | | DeepFake | 32 | 13.09 | 2685.0 | 2270.0 | | LivenessDepthRGB | 1 | 4.79 | 931.0 | 1717.0 | | LivenessDepthRGB | 16 | 3.91 | 975.0 | 1809.0 | | LivenessDepthRGB | 32 | 3.9 | 1127.0 | 1914.0 | | NIRLivenessEstimator | 1 | 8.36 | 817.0 | 1677.0 | | NIRLivenessEstimator | 16 | 7.74 | 915.0 | 1775.0 | | NIRLivenessEstimator | 32 | 7.65 | 1043.0 | 1878.0 | | LivenessRGBMEstimator | 1 | 6.56 | 871.0 | 1938.0 | | LivenessRGBMEstimator | 16 | 4.18 | 1625.0 | 2085.0 | | LivenessRGBMEstimator | 32 | 4.9 | 2225.0 | 2238.0 | | DepthLivenessEstimator | 1 | 2.08 | 737.0 | 1927.0 | | DepthLivenessEstimator | 16 | 0.44 | 771.0 | 1932.0 | | DepthLivenessEstimator | 32 | 0.38 | 805.0 | 1936.0 | | Attributes | 1 | 3.75 | 871.0 | 1984.0 | | Attributes | 16 | 1.97 | 1373.0 | 1980.0 | | Attributes | 32 | 1.9 | 1895.0 | 1991.0 | | FaceOcclusionBatch | 1 | 1.64 | 620.0 | 1281.0 |
| FaceOcclusionBatch | 16 | 0.76 | 844.0 | 1330.0 |
| FaceOcclusionBatch | 32 | 0.73 | 1036.0 | 1324.0 |

GPU. Estimations performance without batch interface#

The table below shows the performance of Estimations on the GPU for estimators that do not have a batch interface. All these measurements are performed with minFaceSize=50.

Measurement Average (ms) GPU Memory (Mb) RAM Memory (Mb)
EyesGaze 1.65 821 1675
Emotions 1.99 821 1689
Quality 0.98 731 1665
Overlap 1.23 821 1688
PPE 3.32 803.0 1718.0
LivenessFlyingFaces 6.39 927 1694
LivenessFPR 12.56 885 1697
Fights 14.56 1093 1874

GPU. Extractor performance#

The table below shows the performance of Extractor on the GPU.

Model Batch Size Average (ms) GPU Memory (Mb) RAM Memory (Mb)
58 1 10.2 989.0 1835
58 16 6.4 1781.0 1825
59 1 10.2 929.0 1833
59 16 6.4 1341.0 1837
60 1 16.0 931.0 1840
60 16 8.9 1343.0 1845
62 1 11.23 1043.0 2009.0
62 8 7.81 1227.0 2006.0
62 16 7.75 1437.0 2016.0
65 1 6.48 949.0 1995
65 8 3.47 1911.0 1996
65 16 3.34 2439.0 1996
105 1 3.48 785 1664
105 16 0.3 815 1673
106 1 6.28 973 1893
106 16 9.38 1371 1894
107 1 3.41 807 1698
107 16 0.59 911 1696
108 1 3.47 785 1654
108 16 0.3 815 1672
109 1 6.22 933 1833
109 16 7.83 1261 1833
110 1 3.38 809 1693
110 16 0.76 939 1693
112 1 6.52 901.0 1836.0
112 8 3.71 1029.0 1834.0
112 16 3.57 1209.0 1835.0
113 1 3.13 809.0 1696.0
113 8 0.82 873.0 1697.0
113 16 0.68 937.0 1703.0
115 1 6.56 877.0 1925.0
115 8 5.51 1001.0 1931.0
115 16 5.43 1141.0 1932.0
116 1 2.92 753.0 1783.0
116 8 0.85 819.0 1804.0
116 16 0.73 885.0 1804.0

NPU Performance#

Benchmarking for NPU was performed on the server with the following hardware configuration:

NPU: Huawei Atlas 300I (inference card).

OS: Ubuntu 18.04

CPU: Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz x 48

RAM: 64GB

NPU. Detector performance#

The table below shows the performance of Detector on the NPU.

Measurement BatchSize Average (ms)
Detector (minFaceSize=20) 1 24.4
Detector (minFaceSize=20) 4 18.01
Detector (minFaceSize=20) 8 17.73
Detector (minFaceSize=50) 1 24.53
Detector (minFaceSize=50) 4 18.0
Detector (minFaceSize=50) 8 17.74
Detector (minFaceSize=90) 1 24.44
Detector (minFaceSize=90) 4 17.91
Detector (minFaceSize=90) 8 17.44
Redetect 1 7.56
Redetect 8 4.31
Redetect 16 4.08

NPU. Estimations performance with batch interface#

The table below shows the performance of Estimations on the NPU for estimators that have a batch interface. All these measurements are performed with minFaceSize=50.

Measurement BatchSize Average (ms)
HeadPoseByImage 1 8.0
HeadPoseByImage 16 4.2
HeadPoseByImage 32 3.9
AGS 1 6.6
AGS 16 3.7
AGS 32 3.7
BestShotQuality 1 15.6
BestShotQuality 16 7.8
BestShotQuality 32 7.6
MedicalMask 1 6.1
MedicalMask 16 3.8
MedicalMask 32 3.7

NPU. Estimations performance without batch interface#

The table below shows the performance of Estimations on the NPU for estimators that do not have a batch interface. All these measurements are performed with minFaceSize=50.

Measurement Average (ms)
Warper 2.1

NPU. Extractor performance#

The table below shows the performance of Extractor on the NPU.

Type Model Batch Size Average (ms)
Extractor 57 1 10.9
Extractor 57 16 7.4

Runtime performance for embedded environment#

Face detection performance depends on input image parameters such as resolution and bit depth as well as the size of the detected face.

Input data characteristics:

  • Image resolution: 640x480px;
  • Image format: 24 BPP RGB;

The results for minimum batch size and optimal batch size are shown in the tables below. All the intermediate and non-optimal values are omitted.

Face detections are performed using FaceDetV3 NN.

Descriptor size#

Table below shows size of serialized face descriptors to estimate memory requirements.

"Descriptor size"

Face descriptor version Data size (bytes) Metadata size (bytes) Total size
CNN 54 512 8 520
CNN 56 512 8 520
CNN 57 512 8 520
CNN 58 512 8 520
CNN 59 512 8 520
CNN 60 512 8 520
CNN 62 512 8 520

Table below shows size of serialized human descriptors to estimate memory requirements. Human descriptors are used only for reidentification tasks.

"Human descriptor size (used only for reidentification tasks)"

Human descriptor version Data size (bytes) Metadata size (bytes) Total size
CNN 102 (deprecated) 2048 8 2056
CNN 103 (deprecated) 2048 8 2056
CNN 104 (deprecated) 2048 8 2056
CNN 105 512 8 520
CNN 106 512 8 520
CNN 107 512 8 520
CNN 108 512 8 520
CNN 109 512 8 520
CNN 110 512 8 520
CNN 112 512 8 520
CNN 113 512 8 520

Metadata includes signature and version information that may be omitted during serialization if the NoSignature flag is specified.

When estimating individual descriptor size in memory or serialization storage requirements with default options, consider using values from the "Total size" column.

When estimating memory requirements for descriptor batches, use values from the "Data size" column instead, since a descriptor batch does not duplicate metadata per descriptor and thus is more memory-efficient.

These numbers are for approximate computation only, since they do not include overhead like memory alignment for accelerated SIMD processing and the like.