Grafana monitoring service#

The LUNA CARS package includes the Grafana service, which is not mandatory for installation and is used for visualization, monitoring and analysis of data from LUNA CARS subsystems.

The Grafana service works in conjunction with the following additional components:

cAdvisor — analyzes and provides data on resource usage and performance of running containers (see https://github.com/google/cadvisor for more details);
prometheus — writes metrics to the database (see https://prometheus.io/ for more details);
nodeexporter — measures various machine resources such as memory, disk and CPU load (see https://github.com/prometheus/node_exporter for more details).

Installation#

Before starting the installation, make sure that docker and docker-compose are installed, see section LUNA CARS installation and configuration.

To start the Grafana service, follow these steps:

1․ Go to the working directory:

cd cars-installer_v.2.15.0

2․ Enable monitoring and access to Grafana by setting the MONITOR_ENABLE variable to true in the .env-vanilla configuration file:

MONITOR_ENABLE=true

3․ Reboot the server to apply the changes:

./docker_stop_all.sh

./docker_start_all.sh vanilla

Additional information about configuring the Grafana service is described in the README.md file.

After this, the Grafana service will be available at the following address:

http://${IP}:${GRAFANA_PORT}

Note: The port and IP address can be configured in the .env-vanilla configuration file, where: - IP — the server's IP address; - GRAFANA_PORT — the port for accessing Grafana, default is 3000.

Authorization Data#

By default, to log into the Grafana web interface (Figure 5), use the following credentials:

Username — admin (unchangeable);
Password — PASSWORD (set in the GRAFANA_PASSWORD variable in the .env-vanilla configuration file).

Dashboards#

Pre-configured dashboards will be available in the Dashboards section after monitoring is started.

To open one of the dashboards, simply click on its name, and you will be redirected to a new page with the selected dashboard.

The general view of the «Dashboards» section (Figure 6):

For more detailed information on Grafana administration, please refer to the official documentation at: https://grafana.com/docs/grafana/latest/

Dashboard «Statistics»#

This dashboard is designed for monitoring the memory usage of crops, full frames, and the database, as well as tracking the number of events and incidents over the selected time period and overall (Figure 7).

Figure 7. General view of the Statistics Dashboard

Description of the dashboard panels

Storage Block: This block provides information on memory usage by different system components.

Database size: Displays the amount of memory used by the database. This is an important indicator for assessing storage resource usage and load
Full frame size: Displays the amount of memory used by full frames
Crop size: Displays the amount of memory used by crops

Events Block: This block contains information about the number of events and incidents.

Checkpoint events count: Shows the number of extended events that occurred within a certain period
Events count: Displays the number of events that occurred within the specified period
Incidents count: Shows the number of extended incidents that occurred during a defined time period

The number of events of different types graph: This graph is designed to display events and incidents by type over a specific time period. It can be useful for analyzing trends and detecting anomalies.

Celery graph: Displays the number of tasks in the queue over a specific time period.

Dashboard «Docker and System Monitoring»#

This dashboard is designed for monitoring the host and Docker containers' status, as well as tracking system resources such as memory, CPU, disk, and network. At the top of the dashboard, you can select the time interval for analysis and the monitored node (Node) using node_exporter (Figure 8).

Figure 8. General view of the Docker and System Monitoring Dashboard

Dashboard Panels Description

Host Block: This block provides information about the system resources of the host.

Uptime: Displays the uptime of the host since the last reboot. This metric is useful for assessing system stability.
Disk space: Displays the disk space usage on the host.
Containers: Shows the number of running containers on the host.
Memory: Displays general information about the host's memory usage.
Load: Displays the system load.
Swap: Displays the usage of virtual memory space, which is actively used when the physical memory (RAM) of the system is full.
Network traffic: Displays the total volume of data sent and received over the network.
CPU Usage: Shows the CPU usage by the host.
Disk I/O: Displays information about disk read and write operations, including the speed of input/output operations.

Containers Block: This block provides information about the containers.

Sent Network Traffic per Container: Displays the volume of data sent over the network for each container during the selected time period
Received Network Traffic per Container: Displays the volume of data received over the network for each container
CPU Usage per Container: Displays CPU usage for each container
Memory Usage per Container: Displays memory usage for each container
Memory Swap per Container: Displays virtual memory usage for each container
Usage memory: Displays the memory usage for the selected container at a specific point in time
Remaining memory: Displays the remaining memory for the selected container
Limit memory: Shows the maximum memory limit for the selected container

Dashboard «Prometheus Blackbox Exporter»#

This dashboard is designed for monitoring the status of Prometheus Blackbox Exporter, which allows checking the availability of services and the status of network resources (Figure 9).

Figure 9. General view of the Prometheus Blackbox Exporter Dashboard

Dashboard Panels Description

Global Probe Duration graph displays the total duration of the check performed by the Blackbox Exporter.

All Status Block: This block displays a summary of the status of various checks and their metrics.

Status: The overall status of the check
HTTP Status Code: The HTTP status code received during the check
HTTP Version: The version of the HTTP protocol used for the request
SSL: The status of the SSL connection (e.g., active or inactive)
HTTP Duration: The time taken to perform the HTTP request
SSL Expiry: The expiration date of the SSL certificate
Probe Duration: The time taken to perform the entire probe
Average Probe Duration: The average time for all probes during the selected period
Average DNS Lookup: The average time for DNS lookups

Dashboard «Celery Monitoring»#

This dashboard is designed to track the number of tasks in the queue for execution and the status of the handlers (Figure 10).

Figure 10. General view of the Celery Monitoring Dashboard

Dashboard Panels Description

Celery Worker Status graph: Displays the status of Celery handlers.

1 — handler is processing tasks.
0 — handler is not processing tasks.

Number of Tasks Currently Executing at Worker graph: Displays the number of tasks currently being executed by a specific handler.

Average Task Runtime at Worker graph: Displays the average task runtime for each handler and task.

Task Prefetch Time at Worker graph: Displays the prefetch time for tasks at each handler, tracking how much time it takes before the task starts executing.

Number of Tasks Prefetched at Worker graph: Displays the number of tasks prefetched by each handler.

Task Success Ratio graph: Displays the average task success ratio over the selected time period.

Task Failure Ratio graph: Displays the average task failure ratio over the selected period.

Dashboard «Health Check»#

This dashboard is designed to display the statuses of LUNA CARS_API, LUNA CARS_Stream, LUNA CARS_Analytics, Redis, Postgres, and Celery (Figure 11).

Figure 11. General view of the Health Check Dashboard

Dashboard Panels Description

Service Status graph: Bars displaying the current status of each service over a specific period
Service Status: Detailed information about the current status of each system component (e.g., whether it’s up or down)
Container Uptime: Displays the uptime of the container, showing how long it has been running since the last restart

Adding a New Dashboard#

In Grafana, you can not only create new dashboards but also create folders to organize dashboards and import existing ones.

1․ To create a new dashboard, click the «New Dashboard» button. You will be redirected to the dashboard creation page (Figure 12).

Figure 12. Buttons for adding new dashboards

The dashboard creation page (Figure 13) consists of three main sections:

Start your new dashboard by adding a visualization: This section is for starting the creation of the dashboard by adding a visualization. You can choose the type of visualization, such as graphs, tables, metrics, lists, markdown, and other widgets. To do this, click «Add visualization» and set up the visualization based on the type of data you want to display.
Import panel: This section allows you to import panels that can be used in other dashboards. You can import panels that already exist or have been prepared for other dashboards and then add them to your current dashboard. To do this, select «Import panel» and choose from the available panels list. These panels can be shared across multiple dashboards and help you quickly integrate pre-made visualizations.
Import a dashboard: This section allows you to import entire dashboards. You can import dashboards saved in JSON format. To do this, select «Import a dashboard», upload the dashboard settings file, or import dashboards from the Grafana.com repository. If necessary, select the data source and configure the dashboard settings.

Detailed information on dashboard settings is described in the Grafana documentation: https://grafana.com/docs/grafana/latest/

Importing an Existing Dashboard#

You can import a dashboard from the dropdown menu in the general dashboards section, or you can press the button on the separate dashboard creation page after clicking the «New Dashboard» button.

1․ Click the «Import» or «Import a dashboard» button on the new dashboard creation page. This will open the «Import» section (Figure 14).

2․ Copy the contents of your settings file and paste it into the Import via panel json field on the «Import» page (Figure 15).

Figure 15. Importing new.json settings file

3․ Click the «Load» button at the bottom of the window. If necessary, change the dashboard name in the Name field and click the «Import» button (Figure 16).

4․ After importing, the page with the added dashboard will open (Figure 17).

Figure 17. General view of the dashboard

For detailed information on Grafana administration, refer to the documentation at: https://grafana.com/docs/grafana/latest/

Configuring and Editing the Dashboard#

You can edit the entire dashboard as well as individual elements (panels) within it.

Editing the Entire Dashboard#

To edit the entire dashboard, click the «Edit» button in the top right corner of the screen. This will open the dashboard editing mode, where you can modify its settings and components.

To edit existing elements, click the three dots in the upper right corner of the element (Figure 18).

Editing Existing Elements#

1․ To edit existing elements (panels) on the dashboard, select «Edit» from the dropdown menu of the desired element. To do this, click the three dots in the upper-right corner of the panel (Figure 19).

Figure 19. Editing elements on the dashboard

2․ Go to the «Query» tab and modify the existing query or add a new one. Queries are written in PromQL (Figure 20).

Figure 20. Query tab when editing an element

3․ After making changes, don't forget to save the modifications.

Adding New Elements (Panels)#

1․ To add a new element to the dashboard, click the «Edit» button in the top-right corner of the dashboard to enter the editing mode. Then click the «Add» button and select «Visualization» (Figure 21).

2․ In the new window, configure the visualization and choose the type of panel.

3․ Enter the query in the «Query» tab using PromQL. PromQL queries allow you to extract and visualize data from Prometheus (Figure 22).

Figure 22. Query tab when adding a new element

The query is written using PromQL, the query language for Prometheus. For detailed information, refer to the documentation at: https://prometheus.io/docs/prometheus/latest/querying/basics/

4․ After configuring the panel and entering the query, click «Back to dashboard». The new panel will automatically appear on your dashboard. Then click «Save the dashboard» to save the changes.

For more detailed information about Grafana administration, refer to the documentation at: https://grafana.com/docs/grafana/latest/

Additional Settings#

All additional monitoring settings (such as Prometheus parameters, Telegram notifications, etc.) are configured in the same .env-vanilla configuration file.

Prometheus Data Collector Role#

For storing historical data, Prometheus is used. The data retention depth is configured through the following variables:

PROMETHEUS_RETENTION_TIME — the retention time for data
PROMETHEUS_RETENTION_SIZE — the maximum size of the storage

The system will use whichever of these two limits is reached first.

PROMETHEUS_RETENTION_TIME=30d
PROMETHEUS_RETENTION_SIZE=10GB

Sending Data to Telegram#

To send notifications to Telegram, the following three variables need to be configured:

GRAFANA_TG_BOT_TOKEN — the Telegram bot token
GRAFANA_TG_CHAT_ID — the chat ID for sending messages
GRAFANA_SERVER_NAME — the name of the server where Grafana is running

GRAFANA_TG_BOT_TOKEN=null
GRAFANA_TG_CHAT_ID=null

GRAFANA_SERVER_NAME=REPLACE_SERVER_NAME

The values for GRAFANA_TG_BOT_TOKEN and GRAFANA_TG_CHAT_ID cannot be empty; otherwise, the system will fail to start.

Getting the Telegram Bot Token#

1․ In the Telegram search, find the bot @BotFather and click the START button. 2․ To create a new bot, send the command /newbot. 3․ Specify the name of the bot (this will be the name displayed in your chat list). 4․ Specify the bot's username. It must end with bot, for example: lunacars_health_bot. 5․ To get the token for the created bot, send the command /token. The token will be displayed in the response (Figure 23).

Figure 23. Example response from the bot

Getting the Chat ID#

To get the chat ID where the bot will send messages, follow these steps:

1․ Go to the following URL in your browser:

https://api.telegram.org/bot`Your Bot Token`/getUpdates

Replace «Your Bot Token» with the token you received in the previous step.

2․ If the result is empty, send any message to your bot and refresh the page. This is required so that Telegram registers your chat.

3․ In the response, you will get a JSON object containing information about your chat. The chat.id is the chat ID that you need to use in the GRAFANA_TG_CHAT_ID variable (Figure 24).

Grafana will automatically generate notifications for the configured alerts (Figure 25). The messages will be sent to the specified Telegram chat through the bot, which is configured using the obtained token and chat ID.

The user does not interact with the bot manually — notifications are sent automatically when alerts are triggered in Grafana.

Figure 25. Example message from Grafana in the bot

Grafana sends the following to Telegram:

Service status (state):

If a service is down, a message will be sent with the service name and error.
When the service is back up, a message will be sent with the service name, error, and a «FIXED» label.

The date and time are indicated in parentheses, which represents the moment the alert was triggered (when the service went down or came back up).