Lambda monitoring

Data for monitoring

Now we monitor two types of events for monitoring: request and error. First type is all requests, second is failed requests only.

Comparison of data formats for Clickhouse and InfluxDB:

InfluxDB: Each event is presented as a “point” in a time series. The structure of a point includes:

series name
start event time
tags, indexed data in storage, dictionary: keys - string tag names, values - string, integer, float
fields, non indexed data in storage, dictionary: keys - string tag names, values - string, integer, float

Clickhouse: In Clickhouse, the data structure resembles that of a traditional SQL table. Each event is represented as a record, where:

The `time` field contains the record’s creation timestamp;
The `data` field contains a JSON object with all the information that would otherwise be distributed across tags and fields in InfluxDB.

Important: In Clickhouse, there is no differentiation between “tags” and “fields”—all data is consolidated into a single JSON object within the data field.

Monitoring series

The structure and the meaning of each monitoring series remain consistent. However, for Clickhouse, data from tags and fields are merged into a single JSON object under the data field. Below are examples for each series:

Requests series.
Triggered on every request. Each point contains a data about corresponding request (execution time and etc).

InfluxDB:
- tags
  
  tag name
  
  description
  
  service
  
  always “lambda-<lambda-id>”
  
  route
  
  concatenation of a request method and a request resource (POST:/main)
  
  status_code
  
  http status code of response
- fields
  
  fields
  
  description
  
  request_id
  
  request id
  
  execution_time
  
  request execution time
ClickHouse JSON `data` field Example:
{ "service": "lambda-<lambda-id>", "route": "POST:/main", "status_code": 200, "request_id": "1536751345,6a5c2191-3e9b-f5a4-fc45-3abf43625c5f", "execution_time": 123.45 }

Errors series.

Triggered on failed request. Each point contains error_code of luna error.

InfluxDB:

tags

tag name

description

service

always “lambda-<lambda-id>”

route

concatenation of a request method and a request resource (POST:/main)

status_code

http status code of response

error_code

luna error code

fields

fields

description

request_id

request id

ClickHouse JSON `data` field Example:
{
    "service": "lambda-<lambda-id>",
    "route": "POST:/main",
    "status_code": 400,
    "error_code": 13037,
    "request_id": "1536751345,6a5c2191-3e9b-f5a4-fc45-3abf43625c5f",
}

Custom Monitoring

For base monitoring usage information see monitoring.

It is possible to create custom monitoring points. For example, there is a need to send data about how long it took to download or process images.

It is possible to specify your own series, tags and fields, but there will always be a mandatory tag “service” with the “lambda-<lambda-id>” value.

For add custom monitoring point follow steps:

Add file monitoring_points.py to lambda archive with the following content:
monitoring_points.py
```
from luna_lambda_tools.public.monitoring import CustomMonitoringPoint


class TestMonitoringPoint(CustomMonitoringPoint):
    """Test monitoring point"""

    series = "test_monitoring"
```
There are several rules for this file:
- The number of points can be any.
- Each point class name may be specify by any unique name.
- Each point class must be inherited by CustomMonitoringPoint
- Each point class must contain series attribute to specify monitoring series.

Set monitoring points from monitoring_points.py to lambda_main.py and specify tags and fields:

There are several general rules for this file:

Enable monitoring. See INFLUX_MONITORING setting in basic principles of configuration.

If monitoring is unavailable, the points will not be sent without any errors.

Specify tags by set dictionary with tags to pointTags named argument.

Specify fileds by set dictionary with fields to pointFields named argument.

Warning

There are diferencies between standalone, handlers and tasks lambda monitoring mechanism. See description below.

Send points using request.sendToMonitoring function for standalone or handlers lambda:

lambda_main.py

import asyncio
from time import time

from luna_lambda_tools import StandaloneLambdaRequest
from monitoring_points import TestMonitoringPoint


async def main(request: StandaloneLambdaRequest) -> dict:
    # start execution time
    stt = time()

    # do some logic
    await asyncio.sleep(1)

    # send monitoring point with execution time
    request.sendToMonitoring(
        (TestMonitoringPoint(pointTags={"lambda_type": "standalone"}, pointFields={"execution_time": time() - stt}),)
    )
    return {"result": "lambda result"}

request example

from luna3.luna_lambda.luna_lambda import LambdaApi

SERVER_ORIGIN = "http://lambda_address:lambda_port"  # Replace by your values before start
SERVER_API_VERSION = 1
lambdaApi = LambdaApi(origin=SERVER_ORIGIN, api=SERVER_API_VERSION)
lambdaId, accountId = "your_lambda_id", "your_account_id"  # Replace by your values before start


def makeRequest():
    reply = lambdaApi.proxyLambdaPost(lambdaId=lambdaId, path="main", accountId=accountId)
    return reply


if __name__ == "__main__":
    response = makeRequest()
    print(response.json)

Send points using self.sendToMonitoring function for tasks lambda:

To run lambda tasks example, refer to the task processing description here.

lambda_main.py

import asyncio
from time import time

from luna_lambda_tools.public.tasks import BaseLambdaTask
from monitoring_points import TestMonitoringPoint


class LambdaTask(BaseLambdaTask):
    """Lambda task"""

    async def splitTasksContent(self, content: dict) -> list[dict]:
        """Split task content to sub task contents"""
        stt = time()
        # do some logic
        await asyncio.sleep(0.5)
        self.sendToMonitoring(
            (
                TestMonitoringPoint(
                    pointTags={"lambda_type": "tasks", "target": "split_content"},
                    pointFields={"execution_time": time() - stt},
                ),
            )
        )
        return [content]

    async def executeSubtask(self, subtaskContent: dict) -> dict | list:
        """Execute current sub task processing"""
        stt = time()
        # do some logic
        await asyncio.sleep(0.5)
        self.sendToMonitoring(
            (
                TestMonitoringPoint(
                    pointTags={"lambda_type": "tasks", "target": "execute_subtask"},
                    pointFields={"execution_time": time() - stt},
                ),
            )
        )
        return {"result": "Some lambda-tasks result"}

request example

from luna3.tasks.tasks import TasksApi

TASKS_ORIGIN = "http://tasks_address:tasks_port"  # Replace by your values before start
TASKS_API_VERSION = 2
tasksApi = TasksApi(origin=TASKS_ORIGIN, api=TASKS_API_VERSION)
lambdaId, accountId = "your_lambda_id", "your_account_id"  # Replace by your values before start


def makeRequest():
    reply = tasksApi.taskLambda(content={"lambda_id": lambdaId}, accountId=accountId)
    return reply


if __name__ == "__main__":
    response = makeRequest()
    print(response.json)

Database

You can refer to documentation for influx database and clickhouse database to compare the databases and choose what benefit your needs more. Note that clickhouse might be the better choice for aggregation You can setup your database credentials in configuration file in section “monitoring”.