Best practices#

Overview#

The following chapter provides a set of recommendations that user should follow in order to get optimal performance when running Luna SDK algorithms on their target device. Over time this list will be populated with more recommendations and performance tips.

Creation and deletion order#

All Luna SDK objects should be destroyed in the order reversal to their creation order. This implies the following:

at first FaceEngine object should be created (using createFaceEngine method)
after that all child objects, such as detectors, estimators etc, can be created
at the end of the work all these child objects should be deleted in the first place
and only after that the main FaceEngine object can be deleted

It is not recommended to use FaceEngine objects as globals (or static objects), because in this case their deletion order could be undefined. In the case when such a usage is necessary, the correct deletion order should be guaranteed via explicit deletion of all objects in the correct order, before the end of the program. For instance:

fsdk::IFaceEnginePtr faceEngine = fsdk::createFaceEngine("./data");
fsdk::IDetectorPtr detector = faceEngine->createDetector();
fsdk::IBestShotQualityEstimator bestShotQualityEstimator = faceEngine->createBestShotQualityEstimator();

int main() {
    // application code here

    detector.reset();
    bestShotQualityEstimator.reset();
    faceEngine.reset();
    return 0;
}

Multithread scenario#

Creation and destroying Luna SDK algorithms from the different threads is prohibited due to internal implementation restrictions. All objects of the FaceEngine class and all objects of algorithms (for example, detectors, estimators, extractors and others) must be created and destroied by the same thread. A typical scenario is as follows: Thread 1 (may be a main thread) creates the FaceEngine object and all needed algorithms (for example, IDetector). Threads 2..N (maybe several) uses that objects for any purpose. Thread 1 destroys the FaceEngine object and all algorithms after all work is complete.

Thread pools#

When running Luna SDK algorithms in a multithreaded environment it is highly recommended to use thread pools for user-created threads. For each thread Luna SDK caches some amount of thread local objects under the hood in order to make its algorithms run faster next time the same thread is used at the cost of higher memory footprint. For this reason, it is recommended to reuse threads from a pool in order to avoid caching new internal objects and to reduce penalty of creating/destroying new user threads.

Estimators. Creation and Inference#

Create face engine objects once and reuse them when you need to make a new estimate to reduce RAM usage and increase performance. The reason is that recreating of estimators leads to reopen the corresponding plan file every time. These plan files are cached separately for every load and will be removed only when they are flushed from the cache or after calling the destructor of FaceEngine root object.

Forking process#

UNIX-like operating systems implement a mechanism to duplicate a process. It creates a new child process and copies its parents' memory space into the child’s. This is typically done programmatically by calling the fork() system function in the parent process. Care should be taken when forking a process running the SDK. Always fork before the first instance of IFaceEngine is created! This is because the SDK internally maintains a pool of worker threads, which is created lazily at the time the very first IFaceEngine object is born and destroyed right after the last IFaceEngine object is released. When using GPU or NPU devices, their runtime is initialized and shut down in the same manner. The hazard comes from the fact that while fork() copies process memory, it only creates just one thread - the main thread (refer to man pages for details: https://man7.org/linux/man-pages/man2/fork.2.html). As a result, if at least one IFaceEngine object is alive at the time the process is being forked, the child processes will inherit the knowledge of the object, and therefore, the implicit thread pool (and device runtime, when appropriate). But there will be no worker threads actually running (in both, the inherited pool and the runtime, when appropriate) and attempting to call certain SDK functions will cause a deadlock.