Introduction¶
Luna-Vinder is a system for fast search of similar objects by their descriptors in large volumes of data. The main idea of the system is to separate data storage and search (matching). Instead of searching directly in the main database, the system creates special optimized copies of data called projections and works with them.
Core Concept¶
When millions or billions of objects with descriptors accumulate in a database, there is a need for flexible and fast search across different subsets of this data. Although modern databases such as PostgreSQL with vector extensions can handle descriptor-based searches, the main problem lies elsewhere: the database is a single storage for all Luna system events, and using it directly for search tasks creates several limitations:
First, it’s difficult to efficiently work with different data subsets - if you need to search only through events of a certain type or source, the database will still consider the entire table.
Second, massive search queries create load on the database, which can negatively affect its core functions - receiving and storing new events and objects.
Third, it’s impossible to control which attributes to store and index for different search scenarios.
Fourth, descriptors and all other related data are stored in separate tables, which makes it inefficient to perform searches that require joining or filtering across these tables.
Luna-Vinder solves this problem by creating separate specialized data copies - projections. The key advantage of this approach is that data can be grouped by certain characteristics into separate projections, and you can choose which attributes to store in them. Instead of one huge table with all possible data, you create several targeted projections:
one only for events from specific sources with a minimal set of attributes,
another for events from the recent period with extended metadata,
a third for statistical analysis with aggregated data, etc.
Each projection contains only relevant information, making it compact. Accordingly, filtering such focused data sets is much easier, and search works faster because the system operates on a smaller volume of data and doesn’t waste resources processing unnecessary information.
Important
The key difference from the traditional approach is that the source of truth about data remains the main database. Projections are just working copies optimized for specific search tasks.
The system’s flexibility allows creating multiple projections with different groupings, filters, and targets - each projection solves its specific task as efficiently as possible.
How the System Works¶
Luna-Vinder consists of three interconnected components, each performing its role in the search process.
Projector - Projection Management¶
Projector is responsible for managing projections. When you create a projection, you pass the Projector a configuration:
filters - which objects to include
targets - which object attributes to save
projection type
origin - where to get data from
description
At this stage, Projector creates an empty projection - essentially, it’s a definition of what data should be in this projection, but the data itself is not yet loaded. Projector registers the projection in the system and makes it available for use by other components.
Configurator and Indexes¶
The next step is configuring indexes for the Matcher. An index is a way of organizing data from a projection that defines two things:
Which fields can be used to filter data during search
Which data will be structured for fast access
You go to the Configurator and create an index configuration. In it, you specify the projection to use and a list of fields to build the index on. For example, you specify fields: gender, age. This means that during search, you’ll be able to filter data by these two fields - search for people of a specific gender and certain age.
Warning
If you don’t include a field in the index, you won’t be able to filter by it.
Note
You can create multiple indexes for the same projection to support different search scenarios. However, keep in mind that descriptors and related data will be duplicated in each index, which increases storage usage.
Field Cardinality¶
The order of fields in the index is critically important, and here you need to consider cardinality - the number of unique values a field can take.
Note
Rule: fields should be specified in ascending order of cardinality, from fields with the smallest number of unique values to fields with the largest.
For example:
Field
genderhas only 3 possible values (male / female / undefined) - low cardinalityField
agecan have dozens of values - medium cardinalityField
creation_timecan have millions of unique values - high cardinality
Correct order will be: gender → age → creation_time
Why Is This Important?¶
Matcher builds a tree structure where data is sequentially divided by the values of specified fields. If the first field has low cardinality, the tree will be efficient:
at the first level, data will be divided into 3 groups (by gender)
at the second level, each group will be divided into ~100 subgroups (by age)
and so on
However, if you put a field with high cardinality first, for example creation_time with millions of unique values, the tree will be excessively branched and inefficient.
Matcher - Search Engine¶
After you’ve configured the index, Matcher receives this configuration and starts the population process:
Contacts Projector and gets access to the projection
Goes to the data source (specified in the projection’s
origin)Extracts all objects matching the projection’s filters, along with specified targets and descriptors
Loads them into its RAM
Builds a tree structure of the index by specified fields in the specified order
This process can take time, especially if the projection contains millions of records.
Query Processing¶
After population is complete, Matcher is ready to work. When you send a search request:
Matcher checks which filters are specified in the request
If all filters match the index fields, Matcher can apply them
It uses the built tree to quickly find the needed descriptor lists
Compares the incoming descriptor only with them
Important
If the request contains a filter for a field that was not included in the index, Matcher won’t be able to process it.
Thus, the index not only speeds up search but also defines functionality - what queries can be executed at all on this projection.
Data Synchronization¶
At the same time, Matcher doesn’t just load data once and that’s it. It continues to synchronize: periodically checks whether new or deleted events have appeared in the projection and applies them to its memory. This way, data in Matcher is always up-to-date and reflects the current state of the data source.
Matching Plugin - Request Dispatcher¶
The third component - matching plugin - acts as a smart request dispatcher. It knows about all running Matchers:
which indexes they use
which projections underlie these indexes
which fields can be used to filter data
When a search request arrives, the plugin:
Analyzes request parameters
Determines which Matchers can process it
Verifies that all filters in the request match the Matchers’ index fields
If the request can be executed on one of the Matchers, sends the request there and returns the result
At the same time, the plugin constantly monitors the state of Matchers in the background, updating information about which indexes they have loaded and ready to work.
Why Is This Fast?¶
Luna-Vinder’s speed is achieved through thoughtful architecture at all levels:
- 1. Data in RAM
Data is loaded into RAM rather than read from disk with each request - memory access is thousands of times faster.
- 2. Optimized Projections
Projections contain only the data that is actually needed for search - filters cut off everything unnecessary at the projection creation stage, and targets include only the necessary attributes.
- 3. Efficient Indexing
The tree structure of the index, built with proper field cardinality in mind, allows efficient data organization and quick location of needed subsets for search.
- 4. Horizontal Scaling
The ability to distribute data across multiple servers through sharding allows parallel processing of huge volumes of information and horizontal scaling of the system for any load.
Tip
Thus, Luna-Vinder transforms the task of searching for similar objects from an expensive operation into an instant one, while not creating load on the main database and allowing flexible system configuration for different use cases through proper configuration of projections and indexes.
Key Concepts¶
- Descriptor
A numeric vector representation of an object (face, body) used for comparison and searching for similar objects. A descriptor contains encoded object characteristics as an array of numbers. Two similar objects have close descriptors, which allows finding matches by calculating the distance between vectors.
- Event
A record in Luna-Events containing an object’s descriptor and associated metadata: event source, creation time, additional object attributes, and other information.
- Projection
A specialized copy of data from Luna-Events created for a specific search task. A projection contains only a filtered subset of events with a selected set of attributes. Projections are created and managed by the Projector service.
- Targets
A list of event attributes that are saved in the projection along with descriptors. Targets define which metadata will be available for filtering and analysis during search.
- Index
A tree data structure in Matcher built on specified projection fields. The index defines which fields can be used to filter data during search and organizes data for fast access.
- Cardinality
The number of unique values a field can take. Low cardinality means a small number of unique values (e.g., 2-10), high cardinality means a large number (thousands or millions). Proper consideration of cardinality is critical when building efficient indexes.
- Projector
A service that creates and maintains projections. It extracts data from Luna-Events according to specified filters, forms projections, and synchronizes them when new events appear.
- Matcher
A service that loads projections into RAM, builds indexes, and performs searches for similar descriptors. Matchers can work in a cluster to process large volumes of data through sharding.
- Sharding
Distribution of data between multiple Matchers. Each Matcher processes only its portion of data, which allows horizontal scaling of the system. Sharding occurs by a specified field with a limited set of values.
- Matching Plugin
A dispatcher component that receives search requests, determines suitable Matchers, distributes requests among them, and combines results into a single response.
- Matching (Search)
The process of finding similar objects by descriptor. Includes filtering data by specified criteria, comparing descriptors, and returning the most similar matches with an indication of the degree of similarity.
Basic Usage¶
Follow these steps to start using Luna-Vinder for fast descriptor-based search:
Create a Projection
Use the Projector service to create a projection. Define the filters and targets to specify which data should be included and which attributes will be available for search. See projections for details.
Create Indexes for the Projection in Configurator
In Luna Configurator, create one or more indexes for your newly created projection. Specify the composite fields in ascending order of cardinality to optimize search performance. Each index enables specific filtering scenarios for matching requests. See matching indexes for details.
Enable Vinder-Plugin in Python Matcher Proxy Plug-in System
Activate the Vinder-Plugin in the Python Matcher Proxy plug-in system. This allows the proxy to route matching requests to Luna-Vinder automatically when appropriate. See matching plugin for details.
Send a Matching Request via Python Matcher Proxy
Use the Python Matcher Proxy API to send a matching request. The proxy will analyze the request, select the suitable Matcher and index, perform the matching operation, and return the result in the unified response format. See matching plugin for details.
Tip
For best results, design your projections and indexes to match your most common query patterns. This ensures maximum compatibility and performance for matching requests.
For more details and advanced configuration, refer to the corresponding documentation sections.