General concepts#
LUNA Index Module:
- sends requests for indexing lists with descriptors;
- performs index building;
- loads index into memory and performs matching.
LIM contains the following services:
- Index Manager - manages index building tasks and coordinates the Indexer service;
- Indexer - builds indexes based on the list of descriptors;
- Indexed Matcher - performs approximate nearest neighbor (descriptors) matching using indexes built.
See the Wiki for more information about the nearest neighbor search.
It is required Python Matcher Proxy service with a built-in matching plugin to work with module. Matching plugin enables you to determine to which service requests from the LUNA API service will be sent - to the Python Matcher service or to the Indexed Matcher service. The Index Manager and Indexed Matcher services require a Redis database.
All LIM services are scalable, which means you can use multiple instances.
Index#
Index is a collection of user-provided set of descriptors deployed together for approximate matching. It is building as a dependency graph whose vertices are descriptors. Search descriptors in this dependency graph is performed while moving along its vertices (see "Matching process" section below).
Building the index requires a lot of resources for a long time and is a rather slow process, so you need to correctly set the period for automatic index rebuilding when changes appear in the list (see below).
The size of the list with descriptors controls speed/accuracy trade-off during the index construction and search. Higher values leads to more accurate but slower search. To configure these parameters, use the "ef_construction" settings of the "LIM_MANAGER_INDEXING" section and "ef_search" of the "LIM_MATCHING" section.
Index structure#
The index consists of the following files:
- The meta.json file, which contains meta information about the index, including which objects are indexed;
- The index.dat file, which contains binary index data;
- The ids.dat file, which contains an ordered list of object IDs in the index.
Each index has unique name, and it is used as key/folder name.
The default index storage directory is specified for each LIM service in the "index_storage_local" setting of the "OTHER" section of the Configurator service. Note that the directory must be the same for all three services.
Index building task creation progress#
The indexing of a set of descriptors is performed out by placing tasks for indexing in a queue. Such tasks are created in the Index Manager service. There are two types of index building tasks - one-time and background.
One-time type enables you to "create task" to build the index once using an HTTP request to the Index Manager service. In the request body, you should specify the required "list_id".
Background type enables you to create index building tasks in the background, where:
- set of lists is explicitly specified in the "indexing_list" setting of the "LIM_MANAGER_INDEXING" section of the Configurator service;
- all existing lists in LP are dynamically indexed, whose number of faces exceeds the number specified in the "min_indexing_list_size" setting of the "LIM_MANAGER_INDEXING" section of the Configurator service. In this case, the value of the "indexing_list" setting should take the value "dynamic". The default value is 50000 faces.
When using the background type, the Index Manager service tracks changes in the number of faces in the lists, interacting with the Faces service. If the number of faces has changed, a new task will be sent to the internal queue.
One task processes only one list.
Index creation process#
Below is the operation process of the index creation:
-
To start indexing, the Index Manager service sends a request to the Indexer service with the necessary parameters - "list_id" and "task_id". The Indexer service converts these parameters into "label" and "index_id" respectively.
-
When the indexing request is received, the Indexer service starts a separate indexing process. At this point, the Indexer sets its status to "indexing".
-
When the indexing process is started, the Indexer service fetches the descriptors from the Faces service. Fetching is performed in batches of 1000 items.
-
After all descriptors have been fetched and loaded into memory, Indexer begins building of the index. A directed descriptor dependency graph is created (see "Index").
-
Next, when indexing has finished, the index itself is saved using configured backend (filesystem). In the storage, the index is a directory containing some files (see "Index structure").
-
After successfully saving the index, the indexing process stops. At this point, the Indexer sets its status to "success". If the indexing process ended in an error, then the Indexer will set its status to "error".
Information about stored indexes can be obtained using "get indexes" or "get most relevant indexes" to the Index Manager service.
You can view the status of the Indexer service using the "get tasks" request to the Index Manager service.
Some time after the indexes are stored, all running instances of the Indexed Matcher automatically (re)load those indexes into memory. After the indexes are loaded into memory, you can send requests to match the indexed descriptors sets with the specified matching label.
Matching#
Indexed Matcher loads more relevant indexes from the storage and processes requests for matching. Because the index storage can contain multiple versions of indexes with a specific matching label, the Indexed Matcher service always tries to match against the newer (i.e., more relevant) version.
The index becomes outdated as soon as descriptors are created or deleted in LUNA PLATFORM 5. Indexed Matcher will search on the outdated version of the index until the index is rebuilt.
In-memory indexes in the Indexed Matcher service are synchronized with the store by a periodic background process called index reloading (see "Index reload" section for details).
Matching requests#
Matching requests come from the API service to the Matcher Proxy service, which uses the matching plugin to forward the request to the Indexed Matcher service. The Indexed Matcher service accepts matching requests via Redis streams, performs the matching, and sends the matching result to the Redis channel, from where the result is redirected to the Python Matcher Proxy service and then to the API service.
For requests for each corresponding matching label, there is the stream with the label name. Several running instances of Indexed Matcher with index loaded are the consumer group for this stream.
Matching process#
The Indexed Matcher service moves along the vertices of the dependency graph (index).
After moving to the first vertex of the graph, the service matches the incoming descriptor with all the vertices associated with the current vertex. When the most similar vertex is found, the next matching is made with the vertices associated with it. After several iterations, the most similar vertex is found (i.e., the descriptor with the highest similarity score). The number of operations with such a search is significantly reduced, which increases the search performance a hundred times.