Recommendations#

This section provides recommendations for optimal work with the LUNA PLATFORM.

Resource optimization#

This section provides tips for optimizing system resources when working with the LUNA PLATFORM.

Remove unnecessary policies in handlers enabled by default. For example, saving samples and events is enabled in handlers by default.

If saving is not required, then you need to disable them. Otherwise, it is necessary to monitor and periodically delete outdated data so that the server memory does not fill up.

If estimates that are not required are enabled, then their execution will increase the execution time of each request.
Disable the use of unnecessary services.
Instead of increasing the number of service instances, increase the number of workers (with the exception of the Remote SDK service on GPU and Python Matcher).
Adjust the number of instances/workers of the Remote SDK and the "num_threads" parameter in the settings of the Remote SDK service when running on the CPU.

Typically, the value of the parameter "num_threads" and the number of instances/workers of the Remote SDK is a multiple of the number of physical cores. I.e. if there are 8 physical cores, it is recommended to use 2 instances of the Remote SDK and "num_threads" = 4 for each instance (2x4=8). Similarly, if there are 24 cores, then you need 4 Remote SDK instances and "num_threads" = 6 for each instance (4x6=24).

It should be remembered that too many instances/workers can negatively affect performance, so you should not run 8 instances/workers with "num_threads" = 1 with 8 cores available. You can divide the number of cores by 6 or 8 to determine the number of instances/workers.

The "num_threads" parameter does not affect the operation on the GPU.
When performing matching explicitly specify the required "target" fields.

For example, if nothing is required from the matching result except "face_id", then it makes no sense to waste resources on using the remaining fields (default behavior if you do not set the fields in "target").

Within the handlers, the "match_policy" policy also specifies "targets", which also need to be configured.
With small lists, you can run more instances of Python Matcher by reducing the "thread_count" parameter.
With large lists of more than 1-2M, do not do more than one Python Matcher service on one NUMA-node.
Correctly specify the limit of candidates and references and do not set more than necessary (see the "PLATFORM_LIMITS" setting in the Python Matcher service settings).

For example, if you need to get only one candidate with the highest similarity in the response, then you do not need to specify that the top-3 candidates should be returned.
Set the logging level to "WARNING".

Note that reducing the logging level may reduce resource consumption, however, in case of problems, this level of logs may not be enough.
Use task schedule to control the launch of the Garbage collection task.
Use "Accept-Encoding" headers with the value "gzip, deflate" to optimize network traffic during API requests when receiving data in JSON format.
Decrease the value of the "optimal_batch_size" parameter in the settings of the Remote SDK service when working on the CPU (the values are different for each estimator).

When running on CPU, the "optimal_batch_size" parameter should be less than the "num_threads" value. For example, if "num_threads" = 4, it is recommended to set "optimal_batch_size" <= 4.

As in the case of "num_threads", the optimal value of "optimal_batch_size" depends on the specific system and the characteristics of the task, so it should be adjusted experimentally.
Do not forget to specify "image_type" equal to "1" or 2" when sending samples of the face or body. By default, the value is "0" (raw images).

If you leave the value "0", then repeated detection of the face in the image is performed, which increases the processing time of the request and may lead to unforeseen problems. For example, when one face was found in the frontend, and two faces were found in the backend, because different versions of detectors or different settings for detecting faces are used.
Run Remote SDK only with certain estimators.

If some estimators are not required for the implementation of business logic, then they can be disabled and removed from the container to reduce memory consumption.
Read the section "Resource consumption by services" in order to select optimal servers for services.

Advanced PostgreSQL setting#

PostgreSQL can be configured to interact effectively with LUNA PLATFORM 5. To do this, you need to set certain values for the PostgreSQL settings in the postgresql.conf file.

This section does not provide a complete list of all settings with a detailed description. See official PostgreSQL website for a complete list of settings and their descriptions.

Useful tips for calculating PostgreSQL configuration are described here: https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server.

It is possible to calculate configuration for PostgreSQL based on the maximum performance for a given hardware configuration (see https://pgtune.leopard.in.ua/).

Recommended values for settings#

Note: The following settings should be changed with caution as manually changing PostgreSQL settings requires experience.

The recommended values of the settings and their description are given below.

max_connections = 200 — determines the maximum number of concurrent connections to the database server. The default value is 100.

The default value may be enough for test demonstrations of the LUNA PLATFORM, but for real purposes, the standard value may not be enough and it will need to be calculated.

In the Configurator service, you can set the number of DB connections using the connection_pool_size setting located in the LUNA_<SERVICE_NAME>_DB sections, where <SERVICE_NAME> is the name of the service that has the database. The actual number of connections may be greater than the value of this setting by 1.

If there are too many connections, but not enough active ones, you can use third-party load balancing services, for example, haproxy or pgbouncer. When using balancing services, it is necessary to take into account some nuances described here: https://magicstack.github.io/asyncpg/current/faq.html#why-am-i-getting-prepared-statement-errors.

maintenance_work_mem = 2GB — specifies the maximum amount of memory to be used by maintenance operations.

shared_buffers = 0.25…0.5 * RAM (MB) — determines how much memory Postgres will allocate for caching data. It depends on how often the matching by database is performed, which indexes, etc.

effective_io_concurrency = 100 — sets the number of concurrent disk I/O operations that PostgreSQL expects can be executed simultaneously. Raising this value will increase the number of I/O operations that any individual PostgreSQL session attempts to initiate in parallel.

max_worker_processes = CPU_COUNT — sets the maximum number of worker processes that the system can support.

max_parallel_maintenance_workers = 4 — sets the maximum number of parallel worker processes performing the index creation command (CREATE INDEX).

max_parallel_workers_per_gather = 4 — sets the maximum number of workers that a query or subquery can be parallelized to.

max_parallel_workers = CPU_COUNT — sets the maximum number of workers that the system can support for parallel operations.

The following values of settings are related to the function of matching by database for large tables.

enable_bitmapscan = off — enables or disables the query planner's use of bitmap-scan plan types. Sometimes it may be necessary when PostgreSQL erroneously determines that bitmapscan is better than the index. It is recommended to change it only if necessary, when it is assumed that the query will use the index, but for unknown reasons does not use it.

seq_page_cost = 1 — sets the planner's estimate of the cost of a disk page fetch that is part of a series of sequential fetches.

random_page_cost = 1.5 — sets the planner's estimate of the cost of a non-sequentially-fetched disk page.

parallel_tuple_cost = 0.1 — sets the approximate cost of transferring one tuple (row) from a parallel worker to another worker.

parallel_setup_cost = 5000.0 — sets the approximate cost of running parallel workers.

max_parallel_workers_per_gather = CPU_COUNT/2 * sets the maximum number of workers that a query or subquery can be parallelized to.

min_parallel_table_scan_size = 1MB — sets the minimum amount of table data that should be scanned in order for a parallel scan to be considered.

min_parallel_index_scan_size = 8k — sets the minimum amount of index data for parallel scanning.