General concepts#
The LUNA Vinder Module module:
- creates and maintains specialized data copies (projections);
- configures indexes to optimize data in projections;
- loads projection data into RAM and performs matching.
LVM contains the following services:
- Projector — creates and manages data projections;
- Matcher — loads projections into RAM, builds indexes, and performs matching.
The following LUNA PLATFORM services are required for the module to function:
-
LUNA Configurator — for managing system configuration.
-
LUNA Events — as a data source for creating and synchronizing projections.
-
LUNA Python Matcher Proxy — for integrating the search plugin.
Projections#
Projections are specialized copies of data optimized for matching descriptors. Instead of performing matching directly in the main database, LVM creates projections—data subsets that meet specific matching criteria. This significantly speeds up and improves the efficiency of matching while reducing the load on the main data warehouse.
A projection is a filtered and optimized representation of data. When creating a projection, you define several key parameters: which events to include in the projection using filters, which attributes of these events to store using target fields, where to obtain the source data by specifying its source, and how to organize the projection by selecting its type. This approach allows you to create multiple projections, each tailored to a specific matching scenario and containing only relevant information. For example, if you need to compare only events from a certain source over the past month, the projection will contain only those events, not all the millions or billions of records in the main warehouse.
It's important to note that each projection represents a separate copy of the data in Matcher's memory. When creating multiple projections, data is duplicated for each projection according to its filters and attribute set. This ensures maximum performance for matching, but requires appropriate memory resource planning.
Projection components#
- Data Source (Origin)
The origin determines where Projector will retrieve data for the projection. The choice of origin determines not only the data source but also the attributes available for filtering and selection, as different sources have different data structures.
Currently, the Events service database LUNA PLATFORM is supported as a data source, which contains object descriptors and associated metadata (creation time, etc.).
- Filters
Filters determine which objects from the specified source should be included in the projection. This is a key optimization mechanism—properly configured filters allow the projection to include only the necessary data, reducing the amount of information to process.
If no filters are specified, all events from the source will be included in the projection. The specific fields and operators available for filtering depend on the data source. For a full list of available filters, see the API documentation.
- Target Fields
The target fields determine which event attributes will be stored in the projection along with the descriptors.
You must specify the attributes you want to use in index composite fields or return in results.
- Projection Type
The projection type determines how data will be stored and processed in the system. Currently, the supported type is the 'view' type—a configuration for data retrieval, based on which Matcher creates an optimized in-memory copy. When LUNA Vinder Module components access a 'view' projection, data is retrieved from the source according to the specified filters and target fields.
Index#
After creating a projection, it's necessary to determine how Matcher will organize this data for matching. This is where the concept of an index comes in. An index specifies a list of fields (composite fields) that determine two important things: which attributes from the projection can be used as filters in search queries and how the data will be hierarchically organized in Matcher's RAM.
When Matcher loads an index, it builds a tree structure in which the data is gradually divided according to the values of the specified composite fields. This tree allows Matcher to quickly navigate to the appropriate subset of the composite fields without scanning the entire dataset. For example, if your index has composite fields for gender and age, Matcher can instantly find all descriptors for men in their 30s without checking the descriptors for women or other age groups.
Proper index configuration is a key performance factor for LUNA Vinder Module.
Index components#
- Projection link
Each index is built on a specific projection, referenced by its projection_id. An index can only access data and attributes defined in that projection.
Note: A projection is a prerequisite for creating an index. The projection determines what data is available, while the index determines how that data is organized for searching.
- Composite Fields
Composite fields are an ordered list of attribute names that define the structure of the data tree and which filters can be applied during matching. The order of the fields is critical and directly impacts matching performance.
Fields should be ordered by increasing cardinality — from fields with the fewest unique values to fields with the most unique values. Cardinality is the number of distinct values a field can have.
For example, the gender field typically has only 2-3 unique values (male, female, unspecified) — this is a low cardinality. The age field can have dozens of unique values — medium cardinality. The end_time field, which contains a timestamp, can have millions of unique values—high cardinality. The correct order is: gender → age → end_time.
Important! It is not recommended to add fields with high cardinality to "composite fields," as a separate tree branch is created for each unique value. This can lead to an inefficient structure when working with millions of records.
Empty composite fields are also allowed and create an index with no filtering capabilities. This index allows comparisons across all data in the projection without applying any filters.
Note: The create_time field is always filterable and should not be specified in composite fields. An optimized mechanism has been implemented for it that avoids issues typical of fields with high cardinality.
Creating an index#
Indexes are configured through the LUNA Configurator service by adding entries to the LUNA_VINDER_INDEXES setting. Each index configuration contains:
projection_id— the projection ID on which this index is builtindex_composite_fields— an ordered list of field names for the index
Example configuration:
LUNA_VINDER_INDEXES = [
{
"projection_id": "49ddbfb3-4b19-4c00-a165-df2a9fc7f321",
"index_composite_fields": ["gender", "age"]
},
{
"projection_id": "a7b2c891-5d3e-4f12-8c9a-1e4f5b6d7c8e",
"index_composite_fields": ["handler_id", "age", "emotion"]
},
{
"projection_id": "3f8d9c2b-1a4e-4d5f-9b8c-7e6d5c4b3a2f",
"index_composite_fields": []
}
]
In this example, three indexes are defined: one allows filtering by gender and age, another by handler, age, and emotion, and the third has no filter fields.
After adding or changing index configurations in the Configurator, Matcher automatically detects the changes and begins the data loading process. Matcher connects to the Projector to access the specified projection, retrieves all data according to the filters and projection target fields, loads the descriptors into memory sorted by create_time, and builds a tree structure according to the composite field specification.
The Matcher service first loads all indexes and only after loading is complete is it available for matching. This process can take a significant amount of time, especially for projections containing millions of records. Once loading is complete, Matcher marks the indexes as ready and begins accepting matching requests.
Synchronizing indexes
Matcher continuously monitors projection changes. When new events are added to a projection, or existing ones are updated or deleted, Matcher detects these changes and updates the in-memory indexes. This ensures that matching results always reflect the current state of the data.
Synchronization occurs automatically in the background, without the need for manual intervention. The frequency of synchronization checks can be configured in Matcher settings to balance data freshness requirements with system load.
Matching#
The projection creation and index configuration steps must be completed before submitting a matching request.
The matching includes the following steps:
1. Determining a suitable index
When a matching request is received from the API service in the Python Matcher Proxy, the matching plugin analyzes the specified filters and determines which indexes can handle the request. The selection is based on the matching of the filter fields with the index's constituent fields and the compatibility of the projection filters.
2. Data filtering
Matcher uses the tree structure of the index to quickly find the appropriate subset of the descriptors. For example, if the gender and age fields are present in the index, the system localizes all descriptors for men of a certain age without searching through the rest of the data.
3. Matching
Matcher compares the incoming descriptor only with the found subset and returns the most similar objects with a similarity score.
Matching results#
Upon successful completion of the request, the API service receives a list of events ranked by similarity. Each result contains:
- event information;
- similarity;
- event attributes specified in the projection's targets field.
Performance factors#
Matching performance is achieved through:
- storing data in Matcher's RAM
- optimal data organization based on the order of index fields
- tree-structure navigation instead of sequential search
- the ability to configure the number of threads (see THREAD_COUNT) for parallel query processing.
This allows the system to process queries with minimal latency when working with large volumes of data.
A detailed description of LUNA Vinder Module processes is provided in the "Sequence diagrams" section.