Monitoring¶
Data for monitoring¶
We support two database options for collecting monitoring data: Clickhouse and InfluxDB. Depending on the database chosen, the structure and methodology for storing data vary.
Types of processed events
Our monitoring system processes the following event types:
request (any HTTP request)
error (failed HTTP request)
index processing (index building workflow)
indexed matching (indexed matching request)
Comparison of data formats for Clickhouse and InfluxDB:
InfluxDB: Each event is presented as a “point” in a time series. The structure of a point includes:
series name
start event time
tags, indexed data in storage, dictionary: keys - string tag names, values - string, integer, float
fields, non indexed data in storage, dictionary: keys - string tag names, values - string, integer, float
Clickhouse: In Clickhouse, the data structure resembles that of a traditional SQL table. Each event is represented as a record, where:
The `time` field contains the record’s creation timestamp;
The `data` field contains a JSON object with all the information that would otherwise be distributed across tags and fields in InfluxDB.
Important: In Clickhouse, there is no differentiation between “tags” and “fields” — all data is consolidated into a single JSON object within the data field.
Every event is a point in the time series. The point is represented as a union of the following data:
series name (now requests and errors)
start request time
tags, indexed data in storage, dictionary: keys - string tag names, values - string, integer, float
fields, non-indexed data in storage, dictionary: keys - string tag names, values - string, integer, float
Requests series.
Triggered on every request. Each point contains a data about corresponding request (execution time and etc).
InfluxDB:
Triggered on every HTTP request. Each point contains data about the corresponding request (execution time and etc).
Requests series tags¶ tag name
description
service
“lim-indexer”, “lim-manager”, or “lim-matcher”
route
concatenation of a request method and a request resource (GET:/version)
status_code
http status code of response
Requests series fields¶ fields
description
request_id
request id
execution_time
request execution time
ClickHouse JSON `data` field Example:
{ "service": "lim-manager", "route": "GET:/tasks", "status_code": 201, "request_id": "1536751345,6a5c2191-3e9b-f5a4-fc45-3abf43625c5f", "execution_time": 1.234 }
Errors series.
Triggered on failed request. Each point contains error_code of luna error.
InfluxDB:
Errors series tags¶ tag name
description
service
“lim-indexer”, “lim-manager”, or “lim-matcher”
route
concatenation of a request method and a request resource (GET:/version)
status_code
http status code of response
error_code
Luna Platform error code
Errors series fields¶ fields
description
request_id
request id
ClickHouse JSON `data` field Example:
{ "service": "lim-manager", "route": "POST:/tasks", "status_code": 400, "error_code": 13037, "request_id": "1536751345,6a5c2191-3e9b-f5a4-fc45-3abf43625c5f" }
Index processing series.
Triggered on an error in a pipeline of an index processing.
InfluxDB:
Index processing series tags¶ tag name
description
service
“lim-manager”, or “lim-matcher”
socket_address
service address in the format <host>:<port> (for matcher only)
stage
“build_index”, “load_index” or “drop_index”
label
index label (some unique index content id)
error_code
Luna Platform error code (‘0’ - success)
Index processing series fields¶ field name
description
index_id
index unique ID
pending
time spend in the pending queue, in seconds
duration
index processing (i.e. building / loading / dropping) time, in seconds
generation
index generation (unix timestamp)
ClickHouse JSON `data` field Example:
{ "service": "lim-manager", "stage": "build_index", "label": "4d1ae1f4-fbbd-49c8-be47-f6aec34449f3", "index_id": "e16e57de-3e15-4052-be9f-8e33f7629893", "error_code": 0, "generation": 1536751345, "pending": 1, "duration": 23456 }
Indexed matching series.
Triggered on matching performed.
InfluxDB:
Index matching series tags¶ tag name
description
service
always “lim-matcher”
socket_address
service address in the format <host>:<port>
label
index label (some unique index content id)
error_code
Luna Platform error code (‘0’ - success)
Indexed matching series fields¶ field name
description
request_id
request ID
index_id
index unique ID
execution_time
matching request execution time, in seconds
ClickHouse JSON `data` field Example:
{ "service": "lim-matcher", "socket_address": "luna:5200", "label": "4d1ae1f4-fbbd-49c8-be47-f6aec34449f3", "index_id": "e16e57de-3e15-4052-be9f-8e33f7629893", "error_code": 0, "request_id": "1536751345,6a5c2191-3e9b-f5a4-fc45-3abf43625c5f", "execution_time": 0.123 }
TTL¶
You can configure base tables(for ClickhouseDB) and buckets(for InfluxDB) retention policy when running monitoring migration via the -x monitoring-ttl parameter. See Monitoring integration section in the integration manual. Note that monitoring-ttl is specified in days and won’t affect aggregated data in any way. All the aggregated data must be cleaned up manually The default value is 30 days
Database¶
You can refer to documentation for influx database and clickhouse database to compare the databases and choose what benefit your needs more. Note that clickhouse might be the better choice for aggregation You can setup your database credentials in configuration file in section “monitoring”.