Metrics

The tool for tracking service metrics, available at /metrics path.

Introduction

The manager collects and stores metrics as time-series data, which might be used to track service behavior.

The /metrics route and statistics collectors are disabled by default. To enable both, set the LUNA_SERVICE_METRICS.ENABLED parameter to 1 via the configurator or configuration file. Note that all the metrics data is reset on service shutdown.

Metric types

There are a number of default counter metrics:
  • request_count_total

  • errors_count_total

Each of them has at least 2 labels for sorting:
  • status_code (or error_code for error metrics)

  • path (which consists of the request method and endpoint route)

Note

To add extra labels, specify the pairs of label_name=label_value for config file or {“label_name”: “label_value”} for configurator in LUNA_SERVICE_METRICS.EXTRA_LABELS. Note that the set pair of label_name=label_value will be added to each metric for the running app.

The manager distributes all requests passing through the service among counters using these labels, ensuring that two successful requests sent to different endpoints or to one endpoint but with different status codes will be delivered to different metrics.

Note

Failed requests are distributed to both the request_count_total and request_errors_total metrics.

There are a number of default histogram metrics:
  • requests, tracks request durations

Request histogram buckets:
  • 0.0001

  • 0.00025

  • 0.0005

  • 0.001

  • 0.0025

  • 0.005

  • 0.01

  • 0.025

  • 0.05

  • 0.075

  • 0.1

  • 0.25

  • 0.5

  • 0.75

  • 1.0

  • 2.5

  • 5.0

  • 7.5

  • 10.0

  • Inf

Request histogram has two labels for sorting:
  • status_code

  • route

Metrics format

The only available option (LUNA_SERVICE_METRICS.METRICS_FORMAT property) for collecting and storing metrics at the moment is through the Prometheus toolkit. You might want to refer to its documentation, as it offers a wide range of core functions.

For example:

rate(request_count_total{path="GET:/healthcheck"}[5m])

will get you the average requests per second to “/healthcheck” for the last 5 minutes.

Examples

If you send one request to /healthcheck, followed by three requests to /docs/spec, one of which will be redirected, you will receive the response.text from /metrics route as follow:

# HELP request_count_total Counter of requests
# TYPE request_count_total counter
request_count_total{path="GET:/docs/spec",status_code="200"} 2.0
request_count_total{path="GET:/docs/spec",status_code="301"} 1.0
request_count_total{path="GET:/healthcheck",status_code="200"} 1.0
# HELP requests Histogram of request time metrics
# TYPE requests histogram
requests_sum{route="GET:/docs/spec",status_code="200"} 0.003174567842297907
requests_bucket{le="0.0001",route="GET:/docs/spec",status_code="200"} 0.0
requests_bucket{le="0.00025",route="GET:/docs/spec",status_code="200"} 0.0
requests_bucket{le="0.0005",route="GET:/docs/spec",status_code="200"} 0.0
requests_bucket{le="0.001",route="GET:/docs/spec",status_code="200"} 1.0
requests_bucket{le="0.0025",route="GET:/docs/spec",status_code="200"} 2.0
requests_bucket{le="0.005",route="GET:/docs/spec",status_code="200"} 2.0
requests_bucket{le="0.01",route="GET:/docs/spec",status_code="200"} 2.0
requests_bucket{le="0.025",route="GET:/docs/spec",status_code="200"} 2.0
requests_bucket{le="0.05",route="GET:/docs/spec",status_code="200"} 2.0
requests_bucket{le="0.075",route="GET:/docs/spec",status_code="200"} 2.0
requests_bucket{le="0.1",route="GET:/docs/spec",status_code="200"} 2.0
requests_bucket{le="0.25",route="GET:/docs/spec",status_code="200"} 2.0
requests_bucket{le="0.5",route="GET:/docs/spec",status_code="200"} 2.0
requests_bucket{le="0.75",route="GET:/docs/spec",status_code="200"} 2.0
requests_bucket{le="1.0",route="GET:/docs/spec",status_code="200"} 2.0
requests_bucket{le="2.5",route="GET:/docs/spec",status_code="200"} 2.0
requests_bucket{le="5.0",route="GET:/docs/spec",status_code="200"} 2.0
requests_bucket{le="7.5",route="GET:/docs/spec",status_code="200"} 2.0
requests_bucket{le="10.0",route="GET:/docs/spec",status_code="200"} 2.0
requests_bucket{le="+Inf",route="GET:/docs/spec",status_code="200"} 2.0
requests_count{route="GET:/docs/spec",status_code="200"} 2.0
requests_sum{route="GET:/docs/spec",status_code="301"} 0.002381476051209132
requests_bucket{le="0.0001",route="GET:/docs/spec",status_code="301"} 0.0
requests_bucket{le="0.00025",route="GET:/docs/spec",status_code="301"} 0.0
requests_bucket{le="0.0005",route="GET:/docs/spec",status_code="301"} 0.0
requests_bucket{le="0.001",route="GET:/docs/spec",status_code="301"} 0.0
requests_bucket{le="0.0025",route="GET:/docs/spec",status_code="301"} 0.0
requests_bucket{le="0.005",route="GET:/docs/spec",status_code="301"} 0.0
requests_bucket{le="0.01",route="GET:/docs/spec",status_code="301"} 1.0
requests_bucket{le="0.025",route="GET:/docs/spec",status_code="301"} 1.0
requests_bucket{le="0.05",route="GET:/docs/spec",status_code="301"} 1.0
requests_bucket{le="0.075",route="GET:/docs/spec",status_code="301"} 1.0
requests_bucket{le="0.1",route="GET:/docs/spec",status_code="301"} 1.0
requests_bucket{le="0.25",route="GET:/docs/spec",status_code="301"} 1.0
requests_bucket{le="0.5",route="GET:/docs/spec",status_code="301"} 1.0
requests_bucket{le="0.75",route="GET:/docs/spec",status_code="301"} 1.0
requests_bucket{le="1.0",route="GET:/docs/spec",status_code="301"} 1.0
requests_bucket{le="2.5",route="GET:/docs/spec",status_code="301"} 1.0
requests_bucket{le="5.0",route="GET:/docs/spec",status_code="301"} 1.0
requests_bucket{le="7.5",route="GET:/docs/spec",status_code="301"} 1.0
requests_bucket{le="10.0",route="GET:/docs/spec",status_code="301"} 1.0
requests_bucket{le="+Inf",route="GET:/docs/spec",status_code="301"} 1.0
requests_count{route="GET:/docs/spec",status_code="301"} 1.0
requests_sum{route="GET:/healthcheck",status_code="200"} 0.005269583023618907
requests_bucket{le="0.0001",route="GET:/healthcheck",status_code="200"} 0.0
requests_bucket{le="0.00025",route="GET:/healthcheck",status_code="200"} 0.0
requests_bucket{le="0.0005",route="GET:/healthcheck",status_code="200"} 0.0
requests_bucket{le="0.001",route="GET:/healthcheck",status_code="200"} 0.0
requests_bucket{le="0.0025",route="GET:/healthcheck",status_code="200"} 0.0
requests_bucket{le="0.005",route="GET:/healthcheck",status_code="200"} 0.0
requests_bucket{le="0.01",route="GET:/healthcheck",status_code="200"} 1.0
requests_bucket{le="0.025",route="GET:/healthcheck",status_code="200"} 1.0
requests_bucket{le="0.05",route="GET:/healthcheck",status_code="200"} 1.0
requests_bucket{le="0.075",route="GET:/healthcheck",status_code="200"} 1.0
requests_bucket{le="0.1",route="GET:/healthcheck",status_code="200"} 1.0
requests_bucket{le="0.25",route="GET:/healthcheck",status_code="200"} 1.0
requests_bucket{le="0.5",route="GET:/healthcheck",status_code="200"} 1.0
requests_bucket{le="0.75",route="GET:/healthcheck",status_code="200"} 1.0
requests_bucket{le="1.0",route="GET:/healthcheck",status_code="200"} 1.0
requests_bucket{le="2.5",route="GET:/healthcheck",status_code="200"} 1.0
requests_bucket{le="5.0",route="GET:/healthcheck",status_code="200"} 1.0
requests_bucket{le="7.5",route="GET:/healthcheck",status_code="200"} 1.0
requests_bucket{le="10.0",route="GET:/healthcheck",status_code="200"} 1.0
requests_bucket{le="+Inf",route="GET:/healthcheck",status_code="200"} 1.0
requests_count{route="GET:/healthcheck",status_code="200"} 1.0

If you send one invalid POST-request to /handlers route:

# HELP request_count_total Counter of requests
# TYPE request_count_total counter
request_count_total{path="POST:/handlers",status_code="401"} 1.0
# HELP request_errors_total Counter of request errors
# TYPE request_errors_total counter
request_errors_total{error_code="12010",path="POST:/handlers"} 1.0
# HELP requests Histogram of request time metrics
# TYPE requests histogram
requests_sum{route="POST:/handlers",status_code="401"} 0.004234567845298518
requests_bucket{le="0.0001",route="POST:/handlers",status_code="401"} 0.0
requests_bucket{le="0.00025",route="POST:/handlers",status_code="401"} 0.0
requests_bucket{le="0.0005",route="POST:/handlers",status_code="401"} 0.0
requests_bucket{le="0.001",route="POST:/handlers",status_code="401"} 0.0
requests_bucket{le="0.0025",route="POST:/handlers",status_code="401"} 1.0
requests_bucket{le="0.005",route="POST:/handlers",status_code="401"} 1.0
requests_bucket{le="0.01",route="POST:/handlers",status_code="401"} 1.0
requests_bucket{le="0.025",route="POST:/handlers",status_code="401"} 1.0
requests_bucket{le="0.05",route="POST:/handlers",status_code="401"} 1.0
requests_bucket{le="0.075",route="POST:/handlers",status_code="401"} 1.0
requests_bucket{le="0.1",route="POST:/handlers",status_code="401"} 1.0
requests_bucket{le="0.25",route="POST:/handlers",status_code="401"} 1.0
requests_bucket{le="0.5",route="POST:/handlers",status_code="401"} 1.0
requests_bucket{le="0.75",route="POST:/handlers",status_code="401"} 1.0
requests_bucket{le="1.0",route="POST:/handlers",status_code="401"} 1.0
requests_bucket{le="2.5",route="POST:/handlers",status_code="401"} 1.0
requests_bucket{le="5.0",route="POST:/handlers",status_code="401"} 1.0
requests_bucket{le="7.5",route="POST:/handlers",status_code="401"} 1.0
requests_bucket{le="10.0",route="POST:/handlers",status_code="401"} 1.0
requests_bucket{le="+Inf",route="POST:/handlers",status_code="401"} 1.0
requests_count{route="POST:/handlers",status_code="401"} 1.0