Benchmark

250M projection with selected event fields

Database and projection parameters:

  • Database size: 250_000_000 events

  • Event payload size: ~4 KB per event (full event structure in database)

  • Data distribution:

    • Events are uniformly distributed across 60 days (~4,166,667 events per day)

    • Age values are uniformly distributed from 0 to 100

    • Gender values are uniformly distributed between 0 and 1

Projection configuration:

A single projection was created with the following targets, extracting only the required fields from the full event payload:

Projection(event_id, source, external_id, age, gender)

This projection loads only the specified fields from each event in the database, resulting in a much smaller memory footprint compared to the full event structure.

Index configurations:

Four indexes were created on this projection to test different filtering scenarios:

  1. Index 1: Empty composite fields Index()

  2. Index 2: Age-based filtering Index(age)

  3. Index 3: Gender-based filtering Index(gender)

  4. Index 4: Combined age and gender filtering Index(age, gender)

Note

The field ordering in Index 4 follows the cardinality principle: age (101 unique values) comes before gender (2 unique values) as lower cardinality fields should appear first in composite fields.

Hardware details:

  • Processor: Intel(R) Xeon(R) Gold 6240R CPU @ 2.40GHz, 96 cpu

  • Memory: RAM 376G

Vinder-matcher service configuration:

  • LUNA_VINDER_MATCHING.THREAD_COUNT: 16

Benchmark methodology:

Each index was tested with various filter combinations. All requests used raw descriptors as reference. A total of 100 requests were executed sequentially (concurrency = 1) for each test case. The descriptor version was 65.

Index 1: No composite fields

Index with empty composite fields - search performance

Request filters

Request reference

Total requests

Concurrency

RPS

Min time (ms)

Avg time (ms)

Max time (ms)

No filters

Raw descriptor

100

1

0.19

5007

5321

5645

create_time__gte=(now-14d)

Raw descriptor

100

1

0.88

1001

1135

1314

The first row represents searching across all 250M events in the projection. The second row filters to approximately 58.3M events (14 days out of 60 days = 23.3% of data).

Index 2: Age filtering

Index with age composite field - search performance

Request filters

Request reference

Total requests

Concurrency

RPS

Min time (ms)

Avg time (ms)

Max time (ms)

No filters

Raw descriptor

100

1

0.20

4898

5016

5072

create_time__gte=(now-14d)

Raw descriptor

100

1

0.74

1335

1346

1378

age__gte=50

Raw descriptor

100

1

0.40

2291

2464

2505

age__gte=75

Raw descriptor

100

1

0.80

1138

1244

1278

age__gte=50, create_time__gte=(now-14d)

Raw descriptor

100

1

1.46

678

686

711

age__gte=75, create_time__gte=(now-14d)

Raw descriptor

100

1

2.91

338

343

352

The age__gte=50 filter reduces the search space to approximately 50% of events (ages 50-100). The age__gte=75 filter is more selective, reducing to approximately 25% of events (ages 75-100).

Index 3: Gender filtering

Index with gender composite field - search performance

Request filters

Request reference

Total requests

Concurrency

RPS

Min time (ms)

Avg time (ms)

Max time (ms)

No filters

Raw descriptor

100

1

0.19

4814

5389

6047

create_time__gte=(now-14d)

Raw descriptor

100

1

0.82

1160

1222

1291

gender=1

Raw descriptor

100

1

0.36

2432

2768

3400

gender=1, create_time__gte=(now-14d)

Raw descriptor

100

1

1.59

585

627

676

The gender=1 filter reduces the search space to approximately 50% of events (half of the uniformly distributed gender values).

Index 4: Combined age and gender filtering

Index with age and gender composite fields - search performance

Request filters

Request reference

Total requests

Concurrency

RPS

Min time (ms)

Avg time (ms)

Max time (ms)

No filters

Raw descriptor

100

1

0.19

4831

5326

5948

create_time__gte=(now-14d)

Raw descriptor

100

1

0.76

1274

1317

1404

age__gte=50, gender=1

Raw descriptor

100

1

1.63

610

613

671

age__lt=1, gender=1

Raw descriptor

100

1

42.00

21

23

25

age__gte=50, gender=1, create_time__gte=(now-14d)

Raw descriptor

100

1

3.16

312

316

320

age__lt=1, gender=1, create_time__gte=(now-14d)

Raw descriptor

100

1

130.00

6

7

8

The combined filters demonstrate multiplicative selectivity: age__gte=50, gender=1 reduces to ~25% of events, while age__lt=1, gender=1 is highly selective at ~0.5% of events (ages 0 only, one gender).

Performance Analysis

The benchmark results demonstrate several key performance characteristics:

Impact of temporal filtering: Filtering by create_time alone provides significant performance improvement, reducing average response time from ~5 seconds to ~1.3 seconds across all index configurations. This demonstrates the effectiveness of temporal filtering optimization, which benefits from the sorted create_time structure in the index.

Selectivity effect: Highly selective filters (e.g., age__lt=1) dramatically improve performance, achieving response times as low as 6-8 ms when combined with time filtering. The age__lt=1, gender=1 combination filters to approximately 1.25M events (0.5% of 250M), demonstrating how filter selectivity directly correlates with query performance.

Composite field benefit: The combined age and gender index (Index 4) shows the best performance for queries using both filters, with average response times of 23 ms for highly selective queries (age__lt=1, gender=1). This demonstrates the value of properly configured composite fields that match your query patterns.

Time filtering efficiency: Combining create_time filtering with attribute filters consistently improves performance across all indexes, reducing response times by 50-90% compared to attribute-only filtering. For example, age__gte=75 alone averages 1244 ms, but combined with create_time__gte=(now-14d) it drops to 343 ms - a 72% improvement. This highlights the importance of maintaining proper create_time ordering in the data.

Index configuration impact: Comparing Index 2 (age only) with Index 4 (age and gender) for the query age__gte=50, gender=1, we see that Index 4 performs significantly better (613 ms average) because both filter fields are in the composite fields, allowing efficient tree traversal. This validates the principle that filters should match index composite fields for optimal performance.

Data volume scaling: The roughly linear relationship between filtered data volume and query time is evident: filtering to 50% of data (gender filter) takes approximately half the time of unfiltered queries, while filtering to 0.5% of data (age__lt=1, gender=1) achieves over 200x speedup.

Projection efficiency: By projecting only the required fields (event_id, source, external_id, age, gender) rather than loading the full ~4 KB event payload, the index maintains a smaller memory footprint while still providing all necessary data for filtering and matching operations.