Index services#

Index Manager service#

The service manages the process of creating indexes for lists containing face descriptors and performs the following tasks:

generates tasks for building the index;
sends tasks to the internal queue;
retrieves tasks from the internal queue and coordinates the process of delivering tasks for indexing to the Indexer service.

The service operates in two modes - one-time and automatic.

One-time mode of operation enables you to "create task" to build the index once using an HTTP request to the Index Manager service. In the body of the response, you should specify the required "list_id".

Automatic mode of operation enables you to:

automatically create tasks for indexing a set of lists specified in the "INDEXING_LISTS" setting in the Configurator service;
automatically create tasks to perform indexing of all lists existing in the LP, in which the number of faces exceeds the number specified in the "MIN_INDEXING_LIST_SIZE" setting in the Configurator service. In this case, the value of the "INDEXING_LISTS" setting must take the value "dynamic". The default value is 50000 faces.

When working in automatic mode, the Index Manager service tracks changes in the number of faces in the lists, interacting with the Faces service. If the number of persons has changed, a new task will be sent to the internal queue.

One task processes only one list.

Background procedures#

The Index Manager service performs two types of background routines in parallel:

planning routine;
lookup routine.

In the planning routine, the Index Manager checks which sets of lists are to be indexed, then creates tasks under a unique "task_id" and places them in the internal queue. The planning procedure is executed with the period (sec) specified in the "PLANNING_PERIOD" setting of the Configurator service.

In the lookup routine, the Index Manager checks the status of all running Indexer instances. If any Indexer instance has finished building the index, the Index Manager updates the task information and submits the data for monitoring. If any Indexer instance is ready to accept the task, the Index Manager service retrieves the next task from the internal queue, sends the task to the free Indexer instance, and updates the task information. The lookup routine is performed with the period (sec) specified in the "LOOKUP_PERIOD" setting in the Configurator service.

Index Manager storage#

All information about the created tasks is stored in the Redis database. Also, using the Redis Redlock mechanism, work with multiple instances is regulated.

Work with multiple instances#

The multiple instance mode is supported by automatic selection of the master instance based on Redis Redlock.

See https://redis.io/docs/reference/patterns/distributed-locks/ for details on distributed locks done with Redis.

Only the master instance can perform planning and lookup background routines. The remaining instances can only accept requests for a one-time index creation, as well as issue responses to GET requests.

Requests to service#

Interaction with the Index Manager service is performed using HTTP requests. The main requests are listed below:

"get queue" - get list of tasks and their number from the queue
"get tasks" - get information on tasks:
- task_id
- status - "pending", "indexing", "success", "error"
- create_time - index build create time in RFC 3339 format
- start_time - index build start time in RFC 3339 format
- end_time - index build end time in RFC 3339 format
- indexer - address of the server where the Indexer instance that processes the specified task is running
- error - error received during index build
- content - processed "list_id"
If necessary, you can filter the received tasks.
"create task" - create task to build the index once
"remove tasks" - delete tasks. If necessary, you can filter the tasks to be deleted.
"get indexes" - get the number of indexes, as well as the following information for each index:
- index_id (equal to task_id)
- index_type (list only)
- label (processed "list_id")
"remove indexes" - delete the index from the repository by ID
"get most relevant indexes" - get information on the most relevant index, i.e. by the last built index for the list.

See the OpenAPI specification for more information about requests made to the Index Manager service and other requests.

Indexer service#

The Indexer service is intended to process tasks received by the Index Manager service and perform the indexing process.

Below is the operation process of the service:

To start indexing, the Index Manager service sends a request to the Indexer service with the necessary parameters - "list_id" and "task_id". The Indexer service converts these parameters into "label" and "index_id" respectively.
When the indexing request is received, the Indexer service starts a separate indexing process. At this point, the Indexer sets its status to "indexing".
When the indexing process is started, the Indexer service fetches the descriptors from the Faces service. Fetching is performed in batches of 1000 items.
After all descriptors have been fetched and loaded into memory, Indexer begins building of the index with help of LUNA SDK. A directed descriptor dependency graph is created. In this graph, identical descriptors are placed on connected vertices. The more descriptors being processed, the longer it takes to build the index. The dependence of index building speed on the number of descriptors is non-linear. The graph will be required to compare the descriptors by the Indexed Matcher service.
Next, when indexing has finished, the index itself is saved using configured backend (filesystem). In the storage, the index is a directory containing some files (see "Index structure").
After successfully saving the index, the indexing process stops. At this point, the Indexer sets its status to "success". If the indexing process ended in an error, then the Indexer will set its status to "error".

You can view the status of the Indexer service using the "get tasks" request to the Index Manager service.

Index structure#

The index consists of the following files:

The meta.json file contains meta information about the index, including which objects are indexed.
The index.dat file contains binary index data.
The ids.dat file contains an ordered list of object IDs in the index.

Indexed Matcher service#

The Indexed Matcher service loads the most relevant indexes from the index storage (file system) and processes matching requests.

Relevant means the last built index for the list.

Because the index storage can contain multiple versions of indexes with a specific matching label, the Indexed Matcher service always tries to match against the newer (i.e., more relevant) version.

The matching label is generated by the Indexer service during index building ("label" parameter). The label contains the UUID of the list.

Beginning of work#

At startup, the Indexed Matcher service caches all indexes of the latest version from the index store and configures Redis streams to receive messages for matching for all matching labels loaded into the index store.

Index reload#

In-memory indexes in the Indexed Matcher service are synchronized with the store by a periodic background process called index reloading.

If the index is removed from storage, the index is also removed from the Indexed Matcher service's memory.

If a new index with a new match label appears in the store, the Indexed Matcher service will attempt to load the new index into memory.

If a new index appears in the store with a newer version of the matching label than the index loaded into memory, the Indexed Matcher service will try to load the new index into memory instead of the old one.

Replacing index with outdated version of list (Index 1) with new one (Index 3)

To ensure that the given index can only be reloaded by one Indexed Matcher service at a time, the Redis Redlock mechanism is used. If a lock is set, the older version of the index is removed from the Indexed Matcher service's memory and the newer one is loaded.

If there is a problem loading the index, for example, lack of memory, an appropriate message is sent to the logs and monitoring.

When the index is reloaded, the Indexed Matcher service does not accept matching requests for the corresponding label. However, only one Indexed Matcher can reload the index for a particular label at a time. Therefore, it is recommended to run multiple instances of the Indexed Matcher in order to be able to match all labels at any time.

Matching requests#

Matching requests come from the API service to the Matcher Proxy service, which uses the matching plugin to forward the request to the Indexed Matcher service. The Indexed Matcher service accepts matching requests via Redis streams, performs the matching, and sends the matching result to the Redis channel, from where the result is redirected to the Python Matcher Proxy service and then to the API service.

For queries for each corresponding matching label, there is the stream with the label name. Several running instances of Indexed Matcher with index loaded are the consumer group for this stream.

Matching process#

The Indexed Matcher service does not match the incoming descriptor with all the listed descriptors, but moves along the vertices of the graph.

The graph is built in Indexer service.

After moving to the first vertex of the graph, the service matches the incoming descriptor with all the vertices associated with the current vertex. When the most similar descriptors is found, its vertex is selected, and a matching is performed with the vertices associated with it. After several iterations, the descriptor with the highest similarity index is determined. The number of operations with such a search is significantly reduced, which increases the search performance a hundred times.