Indexing in a nutshell¶
What is descriptor?¶
Descriptor is a unique vector stored the number of packed properties from detection sample.
What is matching?¶
Matching is a comparison of a given descriptor with some descriptor batch that results in similarity scores. Its purpose is to search for descriptors most similar to a given descriptor over some user-provided set of descriptors.
What is index?¶
Index is a collection of user-provided set of descriptors deployed together for approximate matching.
Why approximate?¶
When matching a really large set of descriptors by classical brute-force matching, it is impossible to get a low latency with a high number of requests per second. Therefore, it is required to use approximation techniques that exchange some accuracy for massive speed. These techniques speed up the matching by preprocessing the data into an efficient index.
Search basics¶
Index structure is optimized for efficient nearest neighbor search. It is constructed as a dependency graph whose vertices are descriptors. Search in this dependency graph is performed while moving along its vertices. After moving to the first vertex, the incoming descriptor is compared with all the vertices connected to the current vertex. When the most similar vertex is found, next comparison is performed with vertices connected to it. After several iterations, the most similar vertex (i.e. descriptor with the highest similarity score) is found.
Configurable parameters¶
The size of the dynamic list for the nearest neighbors controls speed/accuracy trade-off during the index construction & search. Higher values leads to more accurate but slower search. LIM_MANAGER_INDEXING.EF_CONSTRUCTION and LIM_MATCHING.EF_SEARCH settings should be used to set up these parameters.