Lambda monitoring

Data for monitoring

Now we monitor two types of events for monitoring: request and error. First type is all requests, second is failed requests only.

Comparison of data formats for Clickhouse and InfluxDB:

InfluxDB: Each event is presented as a “point” in a time series. The structure of a point includes:

  • series name

  • start event time

  • tags, indexed data in storage, dictionary: keys - string tag names, values - string, integer, float

  • fields, non indexed data in storage, dictionary: keys - string tag names, values - string, integer, float

Clickhouse: In Clickhouse, the data structure resembles that of a traditional SQL table. Each event is represented as a record, where:

  • The `time` field contains the record’s creation timestamp;

  • The `data` field contains a JSON object with all the information that would otherwise be distributed across tags and fields in InfluxDB.

Important: In Clickhouse, there is no differentiation between “tags” and “fields”—all data is consolidated into a single JSON object within the data field.

Monitoring series

The structure and the meaning of each monitoring series remain consistent. However, for Clickhouse, data from tags and fields are merged into a single JSON object under the data field. Below are examples for each series:

  • Requests series.

    Triggered on every request. Each point contains a data about corresponding request (execution time and etc).

    InfluxDB:

    • tags

      tag name

      description

      service

      always “lambda-<lambda-id>”

      route

      concatenation of a request method and a request resource (POST:/main)

      status_code

      http status code of response

    • fields

      fields

      description

      request_id

      request id

      execution_time

      request execution time

    ClickHouse JSON `data` field Example:

    {
        "service": "lambda-<lambda-id>",
        "route": "POST:/main",
        "status_code": 200,
        "request_id": "1536751345,6a5c2191-3e9b-f5a4-fc45-3abf43625c5f",
        "execution_time": 123.45
    }
    
  • Errors series.

    Triggered on failed request. Each point contains error_code of luna error.

    InfluxDB:

    • tags

      tag name

      description

      service

      always “lambda-<lambda-id>”

      route

      concatenation of a request method and a request resource (POST:/main)

      status_code

      http status code of response

      error_code

      luna error code

    • fields

      fields

      description

      request_id

      request id

    ClickHouse JSON `data` field Example:

    {
        "service": "lambda-<lambda-id>",
        "route": "POST:/main",
        "status_code": 400,
        "error_code": 13037,
        "request_id": "1536751345,6a5c2191-3e9b-f5a4-fc45-3abf43625c5f",
    }
    

Custom Monitoring

For base monitoring usage information see monitoring.

It is possible to create custom monitoring points. For example, there is a need to send data about how long it took to download or process images.

It is possible to specify your own series, tags and fields, but there will always be a mandatory tag “service” with the “lambda-<lambda-id>” value.

For add custom monitoring point follow steps:

  1. Add file monitoring_points.py to lambda archive with the following content:

    monitoring_points.py
    from luna_lambda_tools.public.monitoring import CustomMonitoringPoint
    
    
    class TestMonitoringPoint(CustomMonitoringPoint):
        """Test monitoring point"""
    
        series = "test_monitoring"
    

    There are several rules for this file:

    • The number of points can be any.

    • Each point class name may be specify by any unique name.

    • Each point class must be inherited by CustomMonitoringPoint

    • Each point class must contain series attribute to specify monitoring series.

  2. Set monitoring points from monitoring_points.py to lambda_main.py and specify tags and fields:

    There are several general rules for this file:

    • Enable monitoring. See INFLUX_MONITORING setting in basic principles of configuration.

    • If monitoring is unavailable, the points will not be sent without any errors.

    • Specify tags by set dictionary with tags to pointTags named argument.

    • Specify fileds by set dictionary with fields to pointFields named argument.

    Warning

    There are diferencies between standalone, handlers and tasks lambda monitoring mechanism. See description below.

    Send points using request.sendToMonitoring function for standalone or handlers lambda:

    lambda_main.py
    import asyncio
    from time import time
    
    from luna_lambda_tools import StandaloneLambdaRequest
    from monitoring_points import TestMonitoringPoint
    
    
    async def main(request: StandaloneLambdaRequest) -> dict:
        # start execution time
        stt = time()
    
        # do some logic
        await asyncio.sleep(1)
    
        # send monitoring point with execution time
        request.sendToMonitoring(
            (TestMonitoringPoint(pointTags={"lambda_type": "standalone"}, pointFields={"execution_time": time() - stt}),)
        )
        return {"result": "lambda result"}
    
    request example
    from luna3.luna_lambda.luna_lambda import LambdaApi
    
    SERVER_ORIGIN = "http://lambda_address:lambda_port"  # Replace by your values before start
    SERVER_API_VERSION = 1
    lambdaApi = LambdaApi(origin=SERVER_ORIGIN, api=SERVER_API_VERSION)
    lambdaId, accountId = "your_lambda_id", "your_account_id"  # Replace by your values before start
    
    
    def makeRequest():
        reply = lambdaApi.proxyLambdaPost(lambdaId=lambdaId, path="main", accountId=accountId)
        return reply
    
    
    if __name__ == "__main__":
        response = makeRequest()
        print(response.json)
    

    Send points using self.sendToMonitoring function for tasks lambda:

    To run lambda tasks example, refer to the task processing description here.

    lambda_main.py
    import asyncio
    from time import time
    
    from luna_lambda_tools.public.tasks import BaseLambdaTask
    from monitoring_points import TestMonitoringPoint
    
    
    class LambdaTask(BaseLambdaTask):
        """Lambda task"""
    
        async def splitTasksContent(self, content: dict) -> list[dict]:
            """Split task content to sub task contents"""
            stt = time()
            # do some logic
            await asyncio.sleep(0.5)
            self.sendToMonitoring(
                (
                    TestMonitoringPoint(
                        pointTags={"lambda_type": "tasks", "target": "split_content"},
                        pointFields={"execution_time": time() - stt},
                    ),
                )
            )
            return [content]
    
        async def executeSubtask(self, subtaskContent: dict) -> dict | list:
            """Execute current sub task processing"""
            stt = time()
            # do some logic
            await asyncio.sleep(0.5)
            self.sendToMonitoring(
                (
                    TestMonitoringPoint(
                        pointTags={"lambda_type": "tasks", "target": "execute_subtask"},
                        pointFields={"execution_time": time() - stt},
                    ),
                )
            )
            return {"result": "Some lambda-tasks result"}
    
    request example
    from luna3.tasks.tasks import TasksApi
    
    TASKS_ORIGIN = "http://tasks_address:tasks_port"  # Replace by your values before start
    TASKS_API_VERSION = 2
    tasksApi = TasksApi(origin=TASKS_ORIGIN, api=TASKS_API_VERSION)
    lambdaId, accountId = "your_lambda_id", "your_account_id"  # Replace by your values before start
    
    
    def makeRequest():
        reply = tasksApi.taskLambda(content={"lambda_id": lambdaId}, accountId=accountId)
        return reply
    
    
    if __name__ == "__main__":
        response = makeRequest()
        print(response.json)
    

Database

You can refer to documentation for influx database and clickhouse database to compare the databases and choose what benefit your needs more. Note that clickhouse might be the better choice for aggregation You can setup your database credentials in configuration file in section “monitoring”.