Hardware requirements#

Server / PC installations#

See "Appendix A. Specifications" for information about hardware used for performance measurements.

General considerations#

Be warned, that not all algorithms in the SDK have GPU implementations. If the desired algorithm doesn’t have a GPU implementation, a fallback to the CPU implementation has to be made. In this case, one should take care of possible memory transfers and latency they cause. Please see the algorithm implementation matrix for details.

Some neural networks are unavailable for particular architectures.

Neural networks and available architectures

Neural network	CPU	CPU AVX2	GPU	ARM
FaceDet_v1_first.plan	yes		yes	yes
FaceDet_v1_second.plan	yes	yes	yes	yes
FaceDet_v1_third.plan	yes	yes	yes	yes
FaceDet_v2_first.plan	yes		yes	yes
FaceDet_v2_second.plan	yes	yes	yes	yes
FaceDet_v2_third.plan	yes	yes	yes	yes
FaceDet_v3__.plan	yes	yes	yes	yes
FaceDet_v3_redetect__.plan	yes	yes	yes	yes
model_subjective_quality__.plan	yes	yes	yes	yes
ags_angle_estimation_flwr_.plan	yes	yes	yes	yes
angle_estimation_flwr_.plan	yes	yes	yes	yes
ags_estimation_flwr_.plan	yes	yes	yes	yes
attributes_estimation_.plan	yes	yes	yes	yes
childnet_estimation_flwr_.plan	yes	yes	yes	yes
emotion_recognition__.plan	yes	yes	yes	yes
glasses_estimation_flwr_.plan	yes	yes	yes	yes
eyes_estimation_flwr8_.plan	yes	yes	yes	yes
eye_status_estimation_flwr_.plan	yes	yes	yes	yes
eyes_estimation_ir_.plan	yes	yes	yes	yes
gaze__.plan	yes	yes	yes	yes
gaze_ir__.plan	yes	yes	yes	yes
overlap_estimation_flwr_.plan	yes	yes	yes	yes
mouth_estimation_.plan	yes	yes	yes	yes
mask_clf__.plan	yes	yes	yes	yes
ppe_estimation__.plan	yes	yes	yes	yes
orientation_.plan	yes	yes	yes	yes
LNet_precise__.plan	yes	yes	yes	yes
LNet_ir_precise__.plan	yes	yes	yes	yes
slnet__.plan	yes	yes	yes	yes
liveness_model__.plan	yes	yes	yes	yes
depth_estimation_.plan	yes	yes	yes	yes
faceflow_model_1_.plan	yes	yes	yes	yes
faceflow_model_2_.plan	yes	yes	yes	yes
ir_liveness_universal_.plan	yes	yes	yes	yes
ir_liveness_ambarella_.plan	yes	yes	yes	yes
hs_shoulders_liveness_estimation_flwr_.plan	yes	yes	yes	yes
hs_head_liveness_estimation_flwr_.plan	yes	yes	yes	yes
flying_faces_liveness_v2_.plan	yes	yes	yes	yes
rgbm_liveness_.plan	yes	yes	yes	yes
rgbm_liveness_pp_hand_frg_.plan	yes	yes	yes	yes
human_keypoints__.plan	yes	yes	yes	yes
human__.plan	yes	yes	yes	yes
human_redetect_.plan	yes	yes	yes	yes
reid_.plan	yes	yes	yes
credibility_Check_.plan	yes	yes	yes	yes
cnn54b_.plan	yes	yes	yes	yes
cnn54m_.plan	yes	yes	yes	yes
cnn56b_.plan	yes	yes	yes	yes
cnn56m_.plan	yes	yes	yes	yes
cnn57b_.plan	yes	yes	yes	yes
cnn58b_.plan	yes	yes	yes	yes
cnn59b_.plan	yes	yes	yes	yes

CPU requirements#

For NN with "*_cpu.plan" in names, CPU should support at least the SSE4.2 instruction set.

For NN with "*_cpu-avx2.plan" in names, AVX2 instruction set support is required for the best performance.

Only 64-bit CPUs are supported.

If in doubt, consider checking your CPU specifications at the following websites:

Intel CPU: [http://ark.intel.com]{.underline};
AMD CPU: [http://products.amd.com]{.underline}.

GPU requirements#

For GPU acceleration an NVIDIA GPU is required. The following architectures are supported:

Pascal or newer.

A minimum of 6GB or dedicated video RAM is required. 8 GB or more VRAM recommended.

The number of actually created threads while using GPU#

The total number of threads can be calculated by such expression:

totalNumberOfThreads = numThreads + 2*numGpuDevices + 1 (and 1 optional),

where

numThreads is the value of setting <param name="numThreads" type="Value::Int1" x="12" />. Description can be found in "Configuration Guide - Runtime settings";
numGpuDevices is the number of GPU devices;
One of threads for CUDA in runtime;
And besides 1 optional thread depending on internal settings LUNA-SDK API;

Example: if numThreads==4 and there are 2 GPU devices in system the total number of threads will be 9 where 4 - are numThreads, 2 + 2 for every GPU and 1 thread for CUDA.

For decreasing of threads number can be set the environment variable CUDA_VISIBLE_DEVICES=-1.

RAM requirements#

System memory consumption differs depending on a usage scenario and is proportional to the number of worker threads. This is true for both CPU (think system RAM) and GPU (think VRAM) execution modes.

For example, in CPU execution mode 1GB RAM is enough for a typical pipeline, which consists of a face detector and a face descriptor extractor running on a single core (one worker thread) and processing 1080p input images with 10-12 faces on average. If this setup is scaled up to 8 worker threads, overall memory consumption grows up to 8GB.

It is recommended to assume at least 1GB of free RAM per worker thread.

Storage requirements#

FaceEngine requires 1GB of free space to install. This includes model data for both CPU and GPU execution modes that should be redistributed with your application. If only one execution mode is planned, reduce space requirements by half.

Requirements for GPU acceleration#

Recommended versions of CUDA

For Win64 - CUDA Toolkit 11.1
For Linux(Ubuntu, CentOS) - CUDA Toolkit 11.1
For Jetson(TX2, AGX Xavier, Xavier NX) - CUDA Toolkit 10.2

The most current version of these release notes can be found online at http://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html.

Cuda version on Linux can be found using command below:

$nvidia-smi

Cuda version on Windows can be found in Control Panel\Programs\Programs and Features as in fugure below

We recommend to use suggested version of CUDA for your operating system. But if your version is older than required, we can't give guaranties, that it will work successfuly. More detials about CUDA Compatibility, can be found online at https://docs.nvidia.com/deploy/cuda- compatibility/index.html.

Embedded installations#

CPU requirements#

Supported CPU architectures:

ARMv7-A;
ARMv8-A (ARM64).