Monitoring#

This section describes the monitoring capabilities of FaceStream and LUNA Streams.

Section Contents:

1․ General information — Description of available monitoring systems and their features, InfluxDB and ClickHouse configuration

2․ FaceStream monitoring — Enabling monitoring and description of sent data

3․ LUNA Streams monitoring — Sending data and exporting metrics in Prometheus format

General information#

Monitoring is a feature that allows you to collect, store, and analyze system operation data, which aids in diagnostics, optimization, and stability.

Monitoring in FaceStream is disabled by default and is implemented as:

LUNA Streams has several monitoring methods:

Sending data to InfluxDB (enabled by default)
Sending data to ClickHouse
Exporting metrics in Prometheus format via the /metrics resource (disabled by default)

When choosing between databases, we recommend reviewing the documentation for Influx and Clickhouse.

Currently, LUNA Streams uses InfluxDB by default (in future versions, ClickHouse will become the default monitoring database, as it outperforms InfluxDB in performance and analytical capabilities, especially under high loads).

The following describes the key considerations when working with InfluxDB and ClickHouse.

InfluxDB#

To work with InfluxDB, you need to register with a username and password and specify the bucket name, organization name and token. All this data is set when starting the InfluxDB container using environment variables.

In order to use FaceStream or LUNA Streams monitoring, it is necessary in FaceStream settings or LUNA Streams settings to set for the "bucket", "organization", "token" fields exactly the same data specified when launching the InfluxDB container. So, for example, if the following settings were used when starting the InfluxDB container...:

-e DOCKER_INFLUXDB_INIT_BUCKET=luna_monitoring \
-e DOCKER_INFLUXDB_INIT_USERNAME=luna \
-e DOCKER_INFLUXDB_INIT_PASSWORD=password \
-e DOCKER_INFLUXDB_INIT_ORG=luna \
-e DOCKER_INFLUXDB_INIT_ADMIN_TOKEN=kofqt4Pfqjn6o \

... then the following parameters should be specified in the FaceStream or LUNA Streams settings:

"influxdb": {
    "organization": "luna",
    "token": "kofqt4Pfqjn6o",
    "bucket": "luna_monitoring",

Login and password are used to access the InfluxDB user interface.

FaceStream and LUNA Streams settings contain different data of the "bucket", "organization" and "token" fields by default. If you need to use monitoring for both services, then you need to set the same settings. If necessary, you can save FaceStream and LUNA Streams data to different buckets (see below).

In order to separate FaceStream and LUNA Streams monitoring data, you can create separate buckets after launching the InfluxDB container. This can be done using one of the following methods:

Using the InfluxDB user interface (Explore tab > Create bucket) after launching the InfluxDB container.
Using the command influx bucket create -n <bucket_name> -o <organization_name> in InfluxCLI after launching the InfluxDB container.

The organization name must be the same as when creating the InfluxDB container.

The data sent to InfluxDB differs for LUNA Streams and FaceStream. See the relevant sections below for more details on the data sent.

View InfluxDB data#

You can use the InfluxDB GUI to view monitoring data.

Go to the InfluxDB GUI <server_ip>:<influx_port>. The default port is 8086. The default login data is luna/password.
Select the "Explore" tab.
Select a way to display information in the drop-down list (graph, histogram, table, etc.).
Select a bucket at the bottom of the page.
Filter the necessary data.
Click "Submit".

ClickHouse#

As an alternative to InfluxDB for monitoring FaceStream and LUNA Streams, you can use the ClickHouse database.

ClickHouse is a column-oriented DBMS, storing data in columns rather than rows. This allows for quick reading and aggregation of data by only the required columns, bypassing processing of the entire row. See the ClickHouse documentation.

To use FaceStream or LUNA Streams monitoring, you must configure the "db_user" and "db_password" fields in the FaceStream settings or LUNA Streams settings to match the values you specified when starting the ClickHouse container. For example, if the following settings were used when launching the ClickHouse container...:

-e CLICKHOUSE_USER=luna \
-e CLICKHOUSE_PASSWORD=password \

... then the following parameters must be specified in the FaceStream or LUNA Streams settings:

{
"storage_type": "clickhouse",
"db_user": "luna",
"db_password": "password",
}

By default, the FaceStream settings and LUNA Streams settings address is 127.0.0.1, which means ClickHouse is deployed on the same server as the service. LUNA Configurator. To use a remote server with ClickHouse, you need to change the host, port, and http_port parameters.

Data partitioning in ClickHouse#

To optimize performance, monitoring tables in ClickHouse are automatically partitioned by day. This ensures:

faster query execution (by reducing the volume of scanned data);
efficient data management (the ability to work with individual time intervals);
optimized data cleanup operations.

View ClickHouse data#

The Grafana platform is used to visualize monitoring data from ClickHouse. It requires configured dashboards.

As a ready-to-use solution, you can use the "LUNA Dashboards", which includes Grafana and a set of necessary dashboards.

For more information, please refer to the Monitoring section of the LUNA PLATFORM 5 Administrator Guide.

FaceStream monitoring#

Enable monitoring#

To enable FaceStream monitoring, follow these steps:

Go to the Configurator user interface: http://<configurator_server_ip>:5070/.
Enter "FACE_STREAM_CONFIG" in the "Setting name" field and click "Apply Filters".
Enable the "send_data" setting and select the data storage type for monitoring "storage_type" in the "monitoring" section
Depending on the selected data storage type (InfluxDB or ClickHouse), specify the corresponding values in the "monitoring" section:
- For InfluxDB — in the influxdb subsection: bucket, organization, token — the values must match the "DOCKER_INFLUXDB_INIT_BUCKET", "DOCKER_INFLUXDB_INIT_ORG", "DOCKER_INFLUXDB_INIT_ADMIN_TOKEN" parameters specified when starting the InfluxDB container
- For ClickHouse — in the clickhouse subsection: db_user, db_password — the values must match the "CLICKHOUSE_USER" and "CLICKHOUSE_PASSWORD" parameters specified when starting the ClickHouse container
Restart the FaceStream container: docker restart facestream.

Data being sent to InfluxDB#

The following data is sent to InfluxDB:

"measurement" element. It is equal to the value of "fs-requests".
Tag set:
- "fs_ip" — IP address where FaceStream is deployed.
- "source" — The "name" field set when creating a stream in LUNA Streams (optional).
- "stream_id"
Field set:
- "track_id"
- "event_id"
- "request_id" — External ID for communication with monitoring of LUNA PLATFORM services.
- "track_start_time"
- "track_best_shot_time" — Time when the frame with the best shot being sent appeared in the system.
- "track_best_shot_min_size_time" (optional) — Time when the detection size reached the value specified in the "best_shot_min_size" parameter.
- "track_best_shot_proper_size_time" (optional) — Time when the detection size reached the value specified in the "best_shot_proper_size" parameter.
- "liveness_start_time" (optional) — Liveness start time.
- "liveness_end_time" (optional) — Liveness end time.
- "bestshot_count" — Number of best shots sent in one request to LP along with the current best shot. So, for example, if 2 sends of 10 best shots were made, then the value of this parameter will be 10, and the value of the `track_send_count" parameter will be 2.
- "time_from_first_frame_to_send" — Time that passed from the appearance of the first frame in FS to sending to LP.
- "track_send_count" — Sequence number of sending data from the track
Tags containing time are sent as UTC with microsecond precision.
"timestamp" element. Is the time the best shot(s) was(were) sent in microseconds.

The frequency of sending data to InfluxDB is controlled by the "flushing_period" parameter of the FaceStream settings.

There may be several best shots, because sending from one track at a time counts as one measurement. To save this measurement, InfluxDB uses the last best shot data from the best shots group. Data that is unique for each best shot (track_best_shot_time, liveness_start_time, liveness_end_time) will be lost for all best shots except the last one if sent this way.

If there are no optional fields, the data of these fields will not be sent to the Influxdb.

During normal monitoring operation, no additional information is output to the FaceStream logs. If an error is detected during monitoring, the corresponding message will appear in the FaceStream logs.

Data being sent to ClickHouse#

In ClickHouse, data is sent in JSON format to the fs_requests table in the database specified in the db_name parameter.

Unlike InfluxDB, where data is divided into tags and fields, ClickHouse uses a single table structure with JSON support. Each event is represented by a record, where:

time is the timestamp of record creation;
data is data with event information in JSON format, which would otherwise be distributed across tags and fields in InfluxDB. The contents of the fields within data are identical to the tags and fields sent to InfluxDB (see their description above in Data being sent to InfluxDB).

Example of the contents of the data field:

{
    "fs_ip": "127.0.0.1",
    "source": "main_stream",
    "stream_id": "b5d6fd45-fcca-453d-ac05-3e594054b813",
    "track_id": "8d57654a-3905-4326-8db6-dcf8000000a3",
    "event_id": "f9687459-986b-406d-9c1f-0d6289be5256",
    "request_id": "1536751345,6a5c2191-3e9b-f5a4-fc45-3abf43625c5f",
    "track_start_time": 1705314645123456,
    "track_best_shot_time": 1705314645123456,
    "bestshot_count": 10,
    "time_from_first_frame_to_send": 150,
    "track_send_count": 2
}

Optional fields that can be included in the data field:

track_best_shot_min_size_time and track_best_shot_proper_size_time — sent if the primary_track_policy parameter is enabled in the stream settings.
liveness_start_time and liveness_end_time — sent if liveness is enabled in the stream settings.

When sending multiple best shots from one track, the unique data for each shot (track_best_shot_time, liveness_start_time, liveness_end_time) is saved only for the last shot in the group.

LUNA Streams monitoring#

Data being sent to InfluxDB#

There are two types of events that are monitored:

Request (all requests)
Error (failed requests only)

Every event is a point in the time series. For the API service, the point is represented using the following data:

Series name (requests or errors)
Timestamp of the request start
Tags
Fields

For other services, the set of event types may differ. For example, the Handlers service also collects data on SDK usage, estimations, and licensing.

The tag is an indexed data in storage. It is represented as a dictionary, where:

Keys — String tag names.
Values — String, integer or float.

The field is a non-indexed data in storage. It is represented as a dictionary, where:

Keys — String field names.
Values — String, integer or float.

Requests series. Triggered on every request. Each point contains a data about corresponding request (execution time and etc).

Tags

Tag name	Description
service	Always "luna-streams"
route	Concatenation of a request method and a request resource (POST:/streams)
status_code	HTTP status code of response

Fields

Field name	Description
request_id	Request ID
execution_time	Request execution time

Errors series. Triggered on failed request. Each point contains error_code of luna error.

Tags

Tag name	Description
service	Always "luna-streams"
route	Concatenation of a request method and a request resource (POST:/streams)
status_code	HTTP status code of response
error_code	LUNA PLATFORM error code

Fields

Field name	Description
request_id	Request ID

Licensing series. Triggered at service start and every 60 seconds. Each dot contains license verification data.

Tags

Tag name	Description
service	Always "luna-streams"
license_status	License status ("ok", "warning", "error", "exception")

Fields

Field name	Description
license_streams_limit_rate	Percentage of used streams
warnings	License warning messages
errors	License error messages

Data being sent to ClickHouse#

Monitoring is available for two event types: request (all requests) and error (failed requests only).

Unlike InfluxDB, where data is divided into tags and fields, ClickHouse uses a single table structure with JSON support. Each event is represented by a record, where:

time is the timestamp of record creation;
data is data with event information in JSON format, which would otherwise be distributed across tags and fields in InfluxDB.

Monitoring data is saved for each processed request. Each record contains information about the corresponding request (execution time, etc.). Example of the data field contents:

{
    "service": "luna-streams",
    "route": "POST:/streams",
    "status_code": 200,
    "request_id": "1536751345,6a5c2191-3e9b-f5a4-fc45-3abf43625c5f",
    "execution_time": 123.45
}

Error data storage is triggered when a request fails. Each record contains the error_code of the LUNA Streams error.

{
    "service": "luna-streams",
    "route": "POST:/streams",
    "status_code": 400,
    "error_code": 12022,
    "request_id": "1536751345,6a5c2191-3e9b-f5a4-fc45-3abf43625c5f"
}

Saving of licensing data begins at service startup and every 60 seconds. Each record contains license verification data.

{
    "service": "luna-streams",
    "license_status": "error",
    "license_streams_limit_rate": 120,
    "errors": "License limit exceeded: 120.0 % of the available license limit is used. Please contact VisionLabs for license upgrade or delete redundant streams."
}

Export metrics in Prometheus format#

LUNA Streams service can collect and save metrics in Prometheus format in the form of time series data that can be used to track the behavior of the service. Metrics can be integrated into the Prometheus monitoring system to track performance. See Prometheus official documentation for more information.

By default, the collection of metrics is disabled. The collection of metrics is enabled in the "LUNA_SERVICE_METRICS" section.

Note that all metric data is reset when the service is shut down.

Type of metrics#

Two types of metrics are available:

Counters, which increase with each event.
Cumulative histograms, which are used to measure the distribution of duration or size of events.

A cumulative histogram is a mapping that counts the cumulative number of observations in all of the bins up to the specified bin. See description in Wikipedia.

The following metrics of type counters are available:

request_count_total — Total number of requests
errors_count_total — Total number of errors

Each of them has at least two labels for sorting:

status_code (or error_code for error metrics)
path — Path consisting of a request method and an endpoint route.

Labels are key pairs consisting of a name and a value that are assigned to metrics.

If necessary, you can add custom label types by specifying the pair tag_name=tag_value in the "extra_labels" parameter.

Note that the pair tag_name=tag_value will be added to each metric of the LUNA PLATFORM service.

A special manager distributes all requests passing through the service among the counters using these tags. This ensures that two successful requests sent to different endpoints or to the same endpoint, but with different status codes, will be delivered to different metrics.

Unsuccessful requests are distributed according to the metrics request_count_total and request_errors_total.

The requests metric of cumulative histogram type tracks the duration of requests to the service. The following intervals (bucket) are defined for the histogram, in which the measurements fall:

0.0001
0.00025
0.0005
0.001
0.0025
0.005
0.01
0.025
0.05
0.075
0.1
0.25
0.5
0.75
1.0
2.5
5.0
7.5
10.0
Inf

In this way the range of request times can be broken down into several intervals, ranging from very fast requests (0.0001 seconds) to very long requests (Inf - infinity). Histograms also have labels to categorize the data, such as status_code for the status of a request or route to indicate the route of a request.

Examples

If you send one request to the /healthcheck resource, followed by three requests to the /docs/spec resource, one of which will be redirected (response status code 301), then when executing the request to the /metrics resource, the following result will be displayed in the response body:

# HELP request_count_total Counter of requests
# TYPE request_count_total counter
request_count_total{path="GET:/docs/spec",status_code="200"} 2.0
request_count_total{path="GET:/docs/spec",status_code="301"} 1.0
request_count_total{path="GET:/healthcheck",status_code="200"} 1.0

If you send one invalid POST request to the /streams resource, then when executing the request to the /metrics resource, the following result will be displayed in the response body:

# HELP request_count_total Counter of requests
# TYPE request_count_total counter
request_count_total{path="POST:/streams",status_code="401"} 1.0
# HELP request_errors_total Counter of request errors
# TYPE request_errors_total counter
request_errors_total{error_code="12010",path="POST:/streams"} 1.0
# HELP requests Histogram of request time metrics
# TYPE requests histogram
requests_sum{route="GET:/docs/spec",status_code="200"} 0.003174567842297907
requests_bucket{le="0.0001",route="GET:/docs/spec",status_code="200"} 0.0
requests_bucket{le="0.00025",route="GET:/docs/spec",status_code="200"} 0.0
requests_bucket{le="0.0005",route="GET:/docs/spec",status_code="200"} 0.0
requests_bucket{le="0.001",route="GET:/docs/spec",status_code="200"} 1.0
...
requests_count{route="GET:/docs/spec",status_code="200"} 2.0
requests_sum{route="GET:/docs/spec",status_code="301"} 0.002381476051209132

Configuring metrics collection for Prometheus#

Prometheus must be configured to collect LUNA PLATFORM metrics.

Example Prometheus configuration for collecting LP service metrics:

  - job_name: "luna-streams"
     static_configs:
       - targets: ["127.0.0.1:5160"]
   ...

   - job_name: "luna-configurator"
     static_configs:
       - targets: ["127.0.0.1:5070"]

See the official documentation for an example of running Prometheus.