Indexing in a nutshell¶
What is descriptor?¶
A descriptor is a characteristic extracted from an image containing a human face or body by a neural network.
What is matching?¶
Matching involves comparing a given descriptor with a batch of descriptors, resulting in similarity scores. Its purpose is to search for the most similar one within a user-provided set of descriptors.
What is index?¶
Index is a collection of user-provided set of descriptors deployed together for approximate matching.
Why approximate?¶
When matching a massive set of descriptors using classical brute-force methods, it becomes impossible to maintain low latency with a high request rate. Therefore, it is necessary to employ approximation techniques that trade off some accuracy for significant speed improvements. These techniques expedite the matching process by optimizing the data and creating an efficient index.
Search basics¶
The index structure is optimized for efficient nearest neighbor search. It is constructed as a dependency graph, with descriptors as its vertices. Searching within this dependency graph is performed by traversing its vertices. After moving to the first vertex, the incoming descriptor is compared with all the vertices connected to the current vertex. When the most similar vertex is found, the next comparison is performed with the vertices connected to it. After several iterations, the most similar vertex (i.e., the descriptor with the highest similarity score) is found.
Configurable parameters¶
The size of the dynamic list for the nearest neighbors controls the speed/accuracy trade-off during index construction and search. Higher values lead to more accurate but slower searches. You should use the LIM_MANAGER_INDEXING.EF_CONSTRUCTION and LIM_MATCHING.EF_SEARCH settings to configure these parameters.