Index storage¶
Built indexes are stored either locally or in S3 storage according to the specified configuration settings.
Index consist of several files:
meta.json contains meta information about index, including what kind of objects are indexed.
index.dat binary data of the index.
ids.dat ordered list of identifiers of objects in the index.
Each index has unique name, and it is used as key/folder name.
Warning
Indices can take a lot of disk space, so you need to calculate both the disk and the network performance.
If you plan to use S3 as an Index Storage, then you need to specify a number of settings in the configuration:
The INDEX_STORAGE_S3 family of settings - S3 connection settings
OTHER.INDEX_STORAGE_TYPE=S3 - assign the storage used in the LIM services
It is also strongly recommended to specify the Cache Storage for the downloaded indexes:
OTHER.LIM_MATCHER_CACHE - set a local path where to save cached indexes (example: ./cache_indexes)
This will increase the performance of the LIM_MATCHER service, as well as reduce the load on the network.
Working with Remote Storage involves network risks. To ensure smooth operation with S3 Storage, a delay policy (5 per request) with a timeout of 60 seconds is used.
In addition, the specified S3 bucket must have read permissions for the specified Creds. And store only indexes in itself (if other types of files are stored in the bucket, they will participate in data reading, which will greatly reduce query performance).
Creds must also have the rights to read all objects in the bucket.