Index services#
Index Manager service#
The service manages the process of creating indexes for lists containing face descriptors and performs the following tasks:
- Generates tasks for building the index.
- Sends tasks to the internal queue.
- Retrieves tasks from the internal queue and coordinates the process of sending tasks for indexing to the Indexer service.
It is recommended to run at least two manager instances for redundancy purposes. Since task management is carried out through the Redis, if one manager is down, the second one will be able to continue its work from the instant step.
Background routines#
The Index Manager service performs two types of background routines in parallel:
- Planning routine
- Lookup routine
In the planning routine, the Index Manager checks which sets of lists are to be indexed, then creates tasks under a unique "task_id" and places them in the internal queue. The planning procedure is executed with the period (sec) specified in the "planning_period" setting or Cron-like string specified in the "planning_schedule" setting.
In the lookup routine, the Index Manager checks the status of all running Indexer instances. If any Indexer instance has finished building the index, the Index Manager updates the task information and submits the data for monitoring. If any Indexer instance is ready to accept the task, the Index Manager service retrieves the next task from the internal queue, sends the task to the free Indexer instance, and updates the task information. The lookup routine is performed with the period (sec) specified in the "lookup_period" setting.
Setting index rebuilding and refreshing#
The Index Manager service sends tasks to create, rebuild, and update the index according to the procedures described above.
By default, a complete index rebuild is performed. If necessary, you can update the indexes instead of rebuilding them. To save time and resources, the Index Manager service can only take into account newly added or deleted descriptors. In this case, the Indexer service will not rebuild indexes from scratch. To prioritize upgrade over rebuild, you need to set the "rebuild_rules" > "default" setting to false
, which means "do not rebuild, but update". By default, a rebuild is performed (value true
).
An update involves deleting old and/or adding new descriptors from/to the created index. Due to the indexing algorithm, removing a descriptor from the created index results in degradation of the index. The more descriptors that have been removed, the worse the quality of the index, resulting in lower search accuracy for that index. It is highly recommended that you rebuild your indexes if you remove a large number of individuals from the original dataset.
To specify the number of times descriptors can be removed from index creation before the index is rebuilt from scratch, you can set the "rebuild_rules" > "max_removal_for_rebuild" parameter to an appropriate value. For example, if this setting is set to 10, it means that up to 10 descriptors can be removed before the index is rebuilt from scratch to ensure it is effective. The default is "0", which means "never rebuild from scratch". The recommended guideline for rebuilding indexes is to remove no more than 10% of the total volume of descriptors.
Index Manager storage#
All information about the created tasks is stored in the Redis database. Also, using the Redis Redlock mechanism, work with multiple instances is regulated.
Work with multiple instances#
The multiple instance mode is supported by automatic selection of the master instance based on Redis Redlock.
See https://redis.io/docs/reference/patterns/distributed-locks/ for details on distributed locks done with Redis.
Only the master instance can perform planning and lookup background routines. The remaining instances can only accept requests for a one-time index creation, as well as issue responses to GET requests.
If necessary, the following environment variables can be specified when starting the respective containers:
LIM_MANAGER_MASTER_LOCK
— Redis lock name for the master instance of the Index Manager service. This lock ensures that only one instance of the Index Manager can perform planning and lookup routines. The default value islim_manager_master
.LIM_MATCHER_CONSUMER
— Redis consumer group for matching requests. The default value islim_matcher
.LIM_MATCHER_LOCK_PREFIX
— Prefix for the lock name of the Indexed Matcher service in Redis. This helps avoid potential naming conflicts with other Redis users. The default value islim_matcher
.
Requests to service#
Interaction with the Index Manager service is performed using HTTP requests. The main requests are listed below:
-
"get queue" — Get list of tasks and their number from the queue.
-
"get tasks" — Get information on tasks:
- "task_id" — Task ID.
- "status" — "pending", "indexing", "success", "error".
- "create_time" — Index build create time in RFC 3339 format.
- "start_time" — Index build start time in RFC 3339 format.
- "end_time" — Index build end time in RFC 3339 format.
- "indexer" — Address of the server where the Indexer instance that processes the specified task is running.
- "error" — Error received during index build.
- "content" — Processed "list_id".
If necessary, you can filter the received tasks.
-
"create task" — Create task to build the index once.
-
"remove tasks" — Delete tasks. If necessary, you can filter the tasks to be deleted.
-
"get indexes" — Get the number of indexes, as well as the following information for each index:
- "index_id" — Equal to "task_id".
- "index_type" — List only.
- "label" — Processed "list_id".
-
"remove indexe" — Remove the index from the repository by ID.
-
"remove indexes" - Remove indexes according to the policy specified in the request parameter and takes the following values:
all
- Remove all indexesoutdated
- Remove only outdated indexes, i.e. those for which there are newer indexes (default)
-
"get most relevant indexes" — Get information on the most relevant index, i.e. by the last built index for the list.
See the OpenAPI specification for more information about requests made to the Index Manager service and other requests.
Indexer service#
The Indexer service is intended to process tasks received by the Index Manager service and perform the indexes creation process.
Requests to the Indexer service are not intended for the user. All requests related to the LUNA Index Module must be made to the Index Manager service (see "Requests to Index Manager service").
The deployment of the Indexer service should be done on a separate server, because building an index takes a lot of resources for a long time. One Indexer instance can only build one index at a time, so it is recommended to run multiple indexer instances. The indexer must be also configured with storage, which must be large enough.
Indexed Matcher service#
The Indexed Matcher service loads the most relevant indexes from the index storage (file system) and processes matching requests.
On startup, the Indexed Matcher service loads all indexes of the latest version from the index storage into memory and sets up Redis streams to accept match messages for all matching labels loaded into the index storage.
The Indexed Matcher service always checks for the existence of the list when starting, loading a new index into memory, and refreshing an index in memory. An index without an existing list will be removed from the service's memory.
To speed up access to the index, you can configure index caching in a special folder in the Indexed Matcher service container (caching is disabled by default). Caching is enabled by "LIM_MATCHER_CACHE" setting.
Requests to the Indexed Matcher service are not intended for the user. All requests related to the LUNA Index Module must be made to the Index Manager service (see "Requests to Index Manager service").
Indexed Matcher does not communicate with other LIM services. It only monitors the storage, and when indices appear it loads them into memory. Since matching requests processing is carried out through the Redis streams, any number of matcher instances could be run without any system config updates. The number of Indexed Matcher instances should be determined by performance requirements.
Synchronization of matching labels in memory#
The Indexed Matcher service synchronizes matching labels of indexes with Redis keys in its memory. For all labels in memory, the service sets the keys in the following format:
matching_label__<label>__<matcher_id>
For example, matching_label__17cdbe41-c7f1-440b-b9ad-aad93c7176ee__127.0.0.1:5200
.
The
<matcher_id>
field in the label key is the host and port of the Indexed Matcher instance. The host is read from the environment variableVL_LIM_MATCHER_HOST
or, if the variable is not set, it is guessed using the operating system sockets API. Reading these keys from Redis enables the matching plugin to get information about which instances of Indexed Matcher specific index labels were loaded into memory.
Label key being set have a TTL and will expire if not updated again. The presence of such a key in Redis means that some of the running Indexed Matcher instances can process matching requests on the label.
Index reloading#
In-memory indexes in the Indexed Matcher service are synchronized with the store by a periodic background process called index reloading.
If the index is removed from storage, the index is also removed from the Indexed Matcher service's memory.
If a new index with a new match label appears in the store, the Indexed Matcher service will attempt to load the new index into memory.
If a new index appears in the store with a newer version of the matching label than the index loaded into memory, the Indexed Matcher service will try to load the new index into memory instead of the old one.
To ensure that the given index can only be reloaded by one Indexed Matcher service at a time, the Redis Redlock mechanism is used. If a lock is set, the older version of the index is removed from the Indexed Matcher service's memory and the newer one is loaded.
If there is a problem loading the index, for example, lack of memory, an appropriate message is sent to the logs and monitoring.
When the index is reloaded, the Indexed Matcher service does not accept matching requests for the corresponding label. However, only one Indexed Matcher can reload the index for a particular label at a time. Therefore, it is recommended to run multiple instances of the Indexed Matcher in order to be able to match all labels at any time.
See the sequence diagram for index reloading in the "Index reloading diagram" section.
Refreshing index in memory#
By default, the Indexed Matcher service monitors lists with faces for changes. If new changes are made to the list, the Indexed Matcher service updates the corresponding indexes in its memory by gradually adding a small number of descriptors.
The use of this functionality is controlled by the "enabled" setting.
This information is described for an index that is already loaded into the memory of the Indexed Matcher service. The index used and the index in the storage may differ.
When the index is updated in memory, the Indexed Matcher service stops matching on that index, but continues to accept new match requests for that index. By adding a small number of descriptors (no more than 10 descriptors at a time) to the index in memory, the matching process is performed with minimal interruption. However, it should be taken into account that if elements are inserted into the list too often (dozens and hundreds of additions), this will affect a significant degradation in the speed of work, up to an almost complete stop of the matching process.
During the index update, the Index Matcher service outputs the following information to the logs:
Refresh index for: 2d5832ad-8c8f-415f-a0b4-d12d69fabd60
Sync: 5->6, 0->0
Refresh index for: 2d5832ad-8c8f-415f-a0b4-d12d69fabd60 has finished successfully
where:
2d5832ad-8c8f-415f-a0b4-d12d69fabd60
— List ID.5->6
- Information about downloading packets with descriptors (1 packet equals 10 descriptors) from the Faces database. Here6
is the total number of packages that need to be downloaded from the database, and5
is the current number of downloaded packages. Thus, the message5->6
means that synchronization will continue and another packet will be downloaded.0->0
— Information about deleting packets with descriptors (1 packet equals 10 descriptors) from the index in memory. The principle of operation is similar to downloading packages from the Faces database.
The speed of updating the index in memory depends on the size of the current index.
If this functionality is used, then it is not necessary and not recommended to perform frequent index rebuilds. Accordingly, it is recommended to increase the planning routine period ("planning_period" setting). However, adding new faces to the index in memory is slower than rebuilding the index, so it makes no sense to use this function if a very large number of faces have been added to the list. In this case, it is easier to rebuild the index again.
Unlinking faces from the list does not remove those faces from the index in memory. In this case, the descriptors are marked as unsearchable, so the index retains the storage space allocated to them.
See the sequence diagram for index refreshing in the "Index refreshing diagram" section.
Index caching#
You can enable index caching to speed up the process of loading data into the memory of the Indexed Matcher service. Using caching enables you not to load the index into memory from the Storage, but to load it from the cached directory in case of an unexpected restart of the Indexed Matcher service.
Caching is enabled when specifying an intermediate directory for storing and loading indexes in the "lim_matcher_cache" setting of the Configurator service. By default, the directory is not specified, i.e. caching is disabled.
Intermediate directory must be located at local file system (using things like GlusterFS or NFS might cause bugs). Every time Indexed Matcher service reloads its indexes it tries to clean up cache directory by removing old generation of list indexes. Cache system has locking mechanism. In case of multiple instances of Indexed Matcher running on the same host and sharing the same directory for cache, locking will prevent downloading of the same indexes multiple times. It means, index storage will be hit exactly one time when data is being sent between Indexed Matcher services host and the storage.