Monitoring#
Monitoring in FaceStream is implemented as sending data to InfluxDB and is disabled by default.
LUNA Streams has several monitoring methods:
- Sending data to InfluxDB (enabled by default)
- Exporting metrics in Prometheus format via the
/metrics
resource (disabled by default)
InfluxDB#
To work with InfluxDB, you need to register with a username and password and specify the bucket name, organization name and token. All this data is set when starting the InfluxDB container using environment variables.
In order to use FaceStream or LUNA Streams monitoring, it is necessary in FaceStream settings or LUNA Streams settings to set for the "bucket", "organization", "token" fields exactly the same data specified when launching the InfluxDB container. So, for example, if the following settings were used when starting the InfluxDB container...:
-e DOCKER_INFLUXDB_INIT_BUCKET=luna_monitoring \
-e DOCKER_INFLUXDB_INIT_USERNAME=luna \
-e DOCKER_INFLUXDB_INIT_PASSWORD=password \
-e DOCKER_INFLUXDB_INIT_ORG=luna \
-e DOCKER_INFLUXDB_INIT_ADMIN_TOKEN=kofqt4Pfqjn6o \
... then the following parameters should be specified in the FaceStream or LUNA Streams settings:
"influxdb": {
"organization": "luna",
"token": "kofqt4Pfqjn6o",
"bucket": "luna_monitoring",
Login and password are used to access the InfluxDB user interface.
FaceStream and LUNA Streams settings contain different data of the "bucket", "organization" and "token" fields by default. If you need to use monitoring for both services, then you need to set the same settings. If necessary, you can save FaceStream and LUNA Streams data to different buckets (see below).
In order to separate FaceStream and LUNA Streams monitoring data, you can create separate buckets after launching the InfluxDB container. This can be done using one of the following methods:
- Using the InfluxDB user interface (Explore tab > Create bucket) after launching the InfluxDB container.
- Using the command
influx bucket create -n <bucket_name> -o <organization_name>
in InfluxCLI after launching the InfluxDB container.
The organization name must be the same as when creating the InfluxDB container.
The data sent to InfluxDB differs for LUNA Streams and FaceStream. See the relevant sections below for more details on the data sent.
FaceStream monitoring#
Enable monitoring#
To enable FaceStream monitoring, follow these steps:
-
Go to the Configurator user interface:
http://<configurator_server_ip>:5070/
. -
Enter "FACE_STREAM_CONFIG" in the "Setting name" field and click "Apply Filters".
-
Enable the "send_data" setting in the "monitoring" section.
-
Depending on the values of the parameters "DOCKER_INFLUXDB_INIT_BUCKET", "DOCKER_INFLUXDB_INIT_ORG", "DOCKER_INFLUXDB_INIT_ADMIN_TOKEN" set when launching the Influx container, specify the corresponding values in the fields "bucket", "organization" and "token" in the section "monitoring".
-
Restart the FaceStream container:
docker restart facestream
.
Data being sent#
The following data is sent to InfluxDB:
- "measurement" element. It is equal to the value of "fs-requests".
-
Tag set:
- "fs_ip" — IP address where FaceStream is deployed.
- "source" — The "name" field set when creating a stream in LUNA Streams (optional).
- "stream_id"
-
Field set:
- "track_id"
- "event_id"
- "request_id" — External ID for communication with monitoring of LUNA PLATFORM services.
- "track_start_time"
- "track_best_shot_time" — Time when the frame with the best shot being sent appeared in the system.
- "track_best_shot_min_size_time" (optional) — Time when the detection size reached the value specified in the "best_shot_min_size" parameter.
- "track_best_shot_proper_size_time" (optional) — Time when the detection size reached the value specified in the "best_shot_proper_size" parameter.
- "liveness_start_time" (optional) — Liveness start time.
- "liveness_end_time" (optional) — Liveness end time.
- "bestshot_count" — Number of best shots sent in one request to LP along with the current best shot. So, for example, if 2 sends of 10 best shots were made, then the value of this parameter will be 10, and the value of the `track_send_count" parameter will be 2.
- "time_from_first_frame_to_send" — Time that passed from the appearance of the first frame in FS to sending to LP.
- "track_send_count" — Sequence number of sending data from the track
Tags containing time are sent as UTC with microsecond precision.
-
"timestamp" element. Is the time the best shot(s) was(were) sent in microseconds.
The frequency of sending data to InfluxDB is controlled by the "flashing_period" parameter of the FaceStream settings.
There may be several best shots, because sending from one track at a time counts as one measurement. To save this measurement, InfluxDB uses the last best shot data from the best shots group. Data that is unique for each best shot (
track_best_shot_time
,liveness_start_time
,liveness_end_time
) will be lost for all best shots except the last one if sent this way.
If there are no optional fields, the data of these fields will not be sent to the Influxdb.
During normal monitoring operation, no additional information is output to the FaceStream logs. If an error is detected during monitoring, the corresponding message will appear in the FaceStream logs.
LUNA Streams monitoring#
Data being sent to InfluxDB#
There are two types of events that are monitored:
- Request (all requests)
- Error (failed requests only)
Every event is a point in the time series. For the API service, the point is represented using the following data:
- Series name (requests or errors)
- Timestamp of the request start
- Tags
- Fields
For other services, the set of event types may differ. For example, the Handlers service also collects data on SDK usage, estimations, and licensing.
The tag is an indexed data in storage. It is represented as a dictionary, where:
- Keys — String tag names.
- Values — String, integer or float.
The field is a non-indexed data in storage. It is represented as a dictionary, where:
- Keys — String field names.
- Values — String, integer or float.
Requests series. Triggered on every request. Each point contains a data about corresponding request (execution time and etc).
- Tags
Tag name | Description |
---|---|
service | Always "luna-streams" |
route | Concatenation of a request method and a request resource (POST:/streams) |
status_code | HTTP status code of response |
- Fields
Field name | Description |
---|---|
request_id | Request ID |
execution_time | Request execution time |
Errors series. Triggered on failed request. Each point contains error_code of luna error.
- Tags
Tag name | Description |
---|---|
service | Always "luna-streams" |
route | Concatenation of a request method and a request resource (POST:/streams) |
status_code | HTTP status code of response |
error_code | LUNA PLATFORM error code |
- Fields
Field name | Description |
---|---|
request_id | Request ID |
Licensing series. Triggered at service start and every 60 seconds. Each dot contains license verification data.
- Tags
Tag name | Description |
---|---|
service | Always "luna-streams" |
license_status | License status ("ok", "warning", "error", "exception") |
- Fields
Field name | Description |
---|---|
license_streams_limit_rate | Percentage of used streams |
warnings | License warning messages |
errors | License error messages |
View InfluxDB data#
You can use the InfluxDB GUI to view monitoring data.
-
Go to the InfluxDB GUI
<server_ip>:<influx_port>
. The default port is 8086. The default login data is luna/password. -
Select the "Explore" tab.
-
Select a way to display information in the drop-down list (graph, histogram, table, etc.).
-
Select a bucket at the bottom of the page.
-
Filter the necessary data.
-
Click "Submit".
Export metrics in Prometheus format#
LUNA Streams service can collect and save metrics in Prometheus format in the form of time series data that can be used to track the behavior of the service. Metrics can be integrated into the Prometheus monitoring system to track performance. See Prometheus official documentation for more information.
By default, the collection of metrics is disabled. The collection of metrics is enabled in the "LUNA_SERVICE_METRICS" section.
Note that all metric data is reset when the service is shut down.
Type of metrics#
Two types of metrics are available:
- Counters, which increase with each event.
- Cumulative histograms, which are used to measure the distribution of duration or size of events.
A cumulative histogram is a mapping that counts the cumulative number of observations in all of the bins up to the specified bin. See description in Wikipedia.
The following metrics of type counters are available:
request_count_total
— Total number of requestserrors_count_total
— Total number of errors
Each of them has at least two labels for sorting:
status_code
(orerror_code
for error metrics)path
— Path consisting of a request method and an endpoint route.
Labels are key pairs consisting of a name and a value that are assigned to metrics.
If necessary, you can add custom label types by specifying the pair tag_name=tag_value
in the "extra_labels" parameter.
Note that the pair
tag_name=tag_value
will be added to each metric of the LUNA PLATFORM service.
A special manager distributes all requests passing through the service among the counters using these tags. This ensures that two successful requests sent to different endpoints or to the same endpoint, but with different status codes, will be delivered to different metrics.
Unsuccessful requests are distributed according to the metrics
request_count_total
andrequest_errors_total
.
The requests
metric of cumulative histogram type tracks the duration of requests to the service. The following intervals (bucket) are defined for the histogram, in which the measurements fall:
- 0.0001
- 0.00025
- 0.0005
- 0.001
- 0.0025
- 0.005
- 0.01
- 0.025
- 0.05
- 0.075
- 0.1
- 0.25
- 0.5
- 0.75
- 1.0
- 2.5
- 5.0
- 7.5
- 10.0
- Inf
In this way the range of request times can be broken down into several intervals, ranging from very fast requests (0.0001 seconds) to very long requests (Inf - infinity). Histograms also have labels to categorize the data, such as status_code
for the status of a request or route
to indicate the route of a request.
Examples
If you send one request to the /healthcheck
resource, followed by three requests to the /docs/spec
resource, one of which will be redirected (response status code 301), then when executing the request to the /metrics
resource, the following result will be displayed in the response body:
# HELP request_count_total Counter of requests
# TYPE request_count_total counter
request_count_total{path="GET:/docs/spec",status_code="200"} 2.0
request_count_total{path="GET:/docs/spec",status_code="301"} 1.0
request_count_total{path="GET:/healthcheck",status_code="200"} 1.0
If you send one invalid POST request to the /streams
resource, then when executing the request to the /metrics
resource, the following result will be displayed in the response body:
# HELP request_count_total Counter of requests
# TYPE request_count_total counter
request_count_total{path="POST:/streams",status_code="401"} 1.0
# HELP request_errors_total Counter of request errors
# TYPE request_errors_total counter
request_errors_total{error_code="12010",path="POST:/streams"} 1.0
# HELP requests Histogram of request time metrics
# TYPE requests histogram
requests_sum{route="GET:/docs/spec",status_code="200"} 0.003174567842297907
requests_bucket{le="0.0001",route="GET:/docs/spec",status_code="200"} 0.0
requests_bucket{le="0.00025",route="GET:/docs/spec",status_code="200"} 0.0
requests_bucket{le="0.0005",route="GET:/docs/spec",status_code="200"} 0.0
requests_bucket{le="0.001",route="GET:/docs/spec",status_code="200"} 1.0
...
requests_count{route="GET:/docs/spec",status_code="200"} 2.0
requests_sum{route="GET:/docs/spec",status_code="301"} 0.002381476051209132
Configuring metrics collection for Prometheus#
Prometheus must be configured to collect LUNA PLATFORM metrics.
Example Prometheus configuration for collecting LP service metrics:
- job_name: "luna-streams"
static_configs:
- targets: ["127.0.0.1:5160"]
...
- job_name: "luna-configurator"
static_configs:
- targets: ["127.0.0.1:5070"]
See the official documentation for an example of running Prometheus.