Best practices#
This section provides a set of recommendations and performance tips that you should follow to get optimal performance when running the LUNA SDK algorithms on your target device.
Thread pools#
We recommend that you use thread pools for user-created threads when running LUNA SDK algorithms in a multithreaded environment. For each thread, LUNA SDK caches some amount of thread local objects under the hood in order to make its algorithms run faster next time the same thread is used at the cost of higher memory footprint. For this reason, we recommend that you reuse threads from a pool to avoid caching new internal objects and to reduce penalty of creating or destroying new user threads.
Estimator creation and inference#
To optimize RAM usage and improve performance, create face engine objects once and reuse them whenever a new estimate is needed.
Recreating estimators repeatedly results in reopening their corresponding .plan files each time, which can be resource-intensive. These .plan files are cached individually upon loading and remain in memory until they are either flushed from the cache or the FaceEngine root object's destructor is called. By reusing existing objects, you avoid unnecessary overhead and ensure efficient resource management.
Using CPU and GPU models for network inference#
To ensure optimal performance and accuracy when using LUNA SDK, it is essential to follow our recommendations for CPUs and GPUs based on the type of workload and network configurations.
CPU recommendations#
- Quantized networks and DL Boost support
If you plan to use quantized versions of neural networks, ensure that your CPU supports DL Boost. Without this feature, you may experience a significant drop in inference accuracy. In such cases, we recommend that you use the FP32 versions of the networks instead. - Processor requirements
Regardless of whether you are using quantized or non-quantized networks, we recommend that you use processors from the Intel Pentium Gold series or higher. These CPUs include advanced instruction sets like AVX512 FMA, which significantly enhance performance during network inference. When selecting a processor, prioritize models with a higher number of accelerators, as they directly impact computational efficiency.
GPU recommendations#
For GPU-based inference, only server-grade (compute-class) GPUs are supported. Gaming GPUs are not recommended or supported for running LUNA SDK due to potential compatibility issues and performance limitations. Below is the list of supported GPUs:
Microarchitecture | Compute capability | GPU |
---|---|---|
Turing | 7.5 | Nvidia Tesla T4 |
Ampere | 8.0 | Nvidia A30 |
Ampere | 8.0 | Nvidia A100 |
Ampere | 8.6 | Nvidia A40 |
Ampere | 8.6 | Nvidia A10 |
Ampere | 8.6 | Nvidia A16 |
Ampere | 8.6 | Nvidia A2 |
Ada Lovelace | 8.9 | Nvidia L4 |
Ada Lovelace | 8.9 | Nvidia L40 |
Forking process#
UNIX-like operating systems implement a mechanism to duplicate a process.
It creates a new child process and copies its parents' memory space into the child’s one.
This is typically done programmatically by calling the fork()
system function in the parent process.
Care should be taken when forking a process running the SDK.
Important: Always fork before the first instance of
IFaceEngine
is created!
This is because the SDK internally maintains a pool of worker threads, which is created lazily at the time the very firstIFaceEngine
object is born and destroyed right after the lastIFaceEngine
object is released. When using GPU or NPU devices, their runtime is initialized and shut down in the same manner.
The hazard comes from the fact that while fork()
copies process memory, it only creates just one thread - the main thread. For details, see https://man7.org/linux/man-pages/man2/fork.2.html.
As a result, if at least one IFaceEngine
object is alive at the time the process is being forked, the child processes will inherit the knowledge of the object, and therefore, the implicit thread pool (and device runtime, when appropriate). But there will be no worker threads actually running (in both, the inherited pool and the runtime, when appropriate) and attempting to call certain SDK functions will cause a deadlock.
Liveness estimator combination#
Depending on your device and its camera, you can enhance the accuracy of the model by simultaneously using a combination of two universal liveness estimators. For example, you might use:
- LivenessDepthRGBEstimator and NIRLivenessEstimator
- LivenessDepthEstimator and LivenessOneShotRGBEstimator
To implement this, you need to aggregate the rates from each liveness estimator and adjust the thresholds in the faceengine.conf configuration file.
Changing the threshold#
All models are calibrated so that the base threshold is 0.5 for any model of any modality.
If you need greater protection against hacking, then the threshold can be raised, and if the convenience of real users is more important, then lowered. We recommend that you configure specific values for changing the threshold in deviation from the basic one on a client basis.
Aggregating the scores#
Any of two liveness modalities can be aggregated with each other. To do this, you need to multiply the speeds of the corresponding networks. The threshold in this case is also multiplied and becomes equal to 0.25.
Recommended thresholds#
The recommended threshold is an optimal balance between TPR and FPR.
Possible LivenessOneShotRGBEstimator model combinations#
You can use the LivenessOneShotRGBEstimator models in the following combinations:
For version v11
:
- Use these models in the backend as an analogue of server LivenessOneShotRGBEstimator:
oneshot_rgb_liveness_v11_model_1_cpu-avx2.plan
oneshot_rgb_liveness_v11_model_2_cpu-avx2.plan
oneshot_rgb_liveness_v11_model_3_cpu-avx2.plan
-
oneshot_rgb_liveness_v11_model_7_cpu-avx2.plan
-
Use these models on smartphones as an analogue of LivenessOneShotRGBEstimator:
-
oneshot_rgb_liveness_v11_model_6_cpu-avx2.plan
-
Use the below models on devices with Orbbec cameras, such as payment terminals (POS) and self-service cash registers (KCO):
oneshot_rgb_liveness_v11_model_4_cpu-avx2.plan
-
oneshot_rgb_liveness_v11_model_5_cpu-avx2.plan
-
Use the following models for extended backend processing with higher accuracy requirements:
oneshot_rgb_liveness_v11_model_1_cpu-avx2.plan
oneshot_rgb_liveness_v11_model_2_cpu-avx2.plan
oneshot_rgb_liveness_v11_model_3_cpu-avx2.plan
oneshot_rgb_liveness_v11_model_7_cpu-avx2.plan
oneshot_rgb_liveness_v11_model_8_cpu-avx2.plan
oneshot_rgb_liveness_v11_model_9_cpu-avx2.plan
For version v10
:
- Use these models in the backend as an analogue of server LivenessOneShotRGBEstimator:
oneshot_rgb_liveness_v10_model_1_cpu-avx2.plan
oneshot_rgb_liveness_v10_model_2_cpu-avx2.plan
oneshot_rgb_liveness_v10_model_3_cpu-avx2.plan
-
oneshot_rgb_liveness_v10_model_7_cpu-avx2.plan
-
Use these models on smartphones as an analogue of LivenessOneShotRGBEstimator:
-
oneshot_rgb_liveness_v10_model_6_cpu-avx2.plan
-
Use the below models on devices with Orbbec cameras, such as payment terminals (POS) and self-service cash registers (KCO):
oneshot_rgb_liveness_v10_model_4_cpu-avx2.plan
-
oneshot_rgb_liveness_v10_model_5_cpu-avx2.plan
-
Use the following models for extended backend processing with higher accuracy requirements:
oneshot_rgb_liveness_v10_model_1_cpu-avx2.plan
oneshot_rgb_liveness_v10_model_2_cpu-avx2.plan
oneshot_rgb_liveness_v10_model_3_cpu-avx2.plan
oneshot_rgb_liveness_v10_model_7_cpu-avx2.plan
oneshot_rgb_liveness_v10_model_8_cpu-avx2.plan
oneshot_rgb_liveness_v10_model_9_cpu-avx2.plan
.