Conductor Workflow and Task Metric Tracking and Reporting

Introduction

Metrics and reports on workflow execution can aid in developing initial duration estimates for modeled processes and forecasting execution times for real systems in production. Additionally, these metrics can be used to monitor product improvements over time.

The Conductor Manager pod tracks execution times for both workflows and tasks, reporting these times as a new event type to be stored in the PostgreSQL database. To utilise these metrics, they are accessible via the REST API as a new metric resource, enabling upper-layer services to use them for execution estimations and schedule planning. This feedback loop will facilitate more accurate execution planning for the Upgrade Operator and for end users planning specific system maintenance schedules.

Metrics Schematic

The following metrics are collected:

Average execution time of each workflow.
Success rate of each workflow.
Average execution time of each task.
Success rate of each task.

Configuration

The configuration of the metrics acquisition and cleanup services is done through a Kubernetes ConfigMap. A few settings are exposed in order to allow some flexibility in how the metrics are handled:

metrics_collection: Whether metrics should be collected. Defaults to True.
metrics_cleanup: Whether metrics should be periodically cleaned. Defaults to True.
metrics_cleanup_periodicity: How often the metrics are periodically cleaned. Uses the Crontab syntax. Defaults to once a day (0 1 * * *).
metrics_cleanup_age: How old the metrics need to be to be cleaned by the cleanup service. Supports time intervals in days (“1d”), months (“1m”) or years (“1y”). Defaults to 1 year.

ConfigMap

CLI

$ cfy metric list -h
Usage: cfy metric list [OPTIONS]
 
  Display metrics for a deployment
 
Options:
  -d, --deployment-id TEXT        The unique identifier for the deployment
  -b, --blueprint-id TEXT         The unique identifier for the blueprint
  -e, --execution-id TEXT         The unique identifier for the execution
  -w, --workflow-id TEXT          The workflow to execute [default: None]
  -mn, --metric-name TEXT         The name of the resource
  --json-output                   Output events in a consumable JSON format
  -q, --quiet                     Show only critical logs
  -v, --verbose                   Show verbose output. You can supply this up
                                  to three times (i.e. -vvv)
  --format [plain|json]
  --json
  --manager TEXT                  Connect to a specific manager by IP or host
  -o, --pagination-offset INTEGER
                                  The number of resources to skip;
                                  --pagination-offset=1 skips the first
                                  resource [default: 0]
  -s, --pagination-size INTEGER   The max number of results to retrieve per
                                  page [default: 1000]
  -h, --help                      Show this message and exit.


$ cfy metric list
  
Listing all metrics...
 
Metrics:
+----+--------------------------+---------------------------+-----------+--------------+--------------------------------------+-------------------------------+--------------------------------------+
| id |        timestamp         |            name           |   value   | blueprint_id |            deployment_id             |          workflow_id          |             execution_id             |
+----+--------------------------+---------------------------+-----------+--------------+--------------------------------------+-------------------------------+--------------------------------------+
| 6  | 2024-08-14 19:32:49.354  | workflow_seconds_duration |  0.29907  |  blueprint   | 094aa6fb-c68a-4339-a224-31ce85636618 | create_deployment_environment | 26f1145f-3ff0-4910-9313-7634eba8d42f |
| 7  | 2024-08-14 19:33:07.186  | workflow_seconds_duration | 16.865752 |  blueprint   | 094aa6fb-c68a-4339-a224-31ce85636618 |            install            | b50c4a58-6a5d-469b-9617-45caf194389d |
+----+--------------------------+---------------------------+-----------+--------------+--------------------------------------+-------------------------------+--------------------------------------+
 
Showing 2 of 2 metric(s)
Debug messages are only shown when you use very verbose mode (-vv)
 
$ cfy metric list -d f1eb025e-5590-4b4e-91ee-1def5bdb72c9
 
Listing metrics for deployment f1eb025e-5590-4b4e-91ee-1def5bdb72c9...
 
Metrics:
+----+--------------------------+---------------------------+-----------+--------------+--------------------------------------+--------------------+--------------------------------------+
| id | timestamp | name | value | blueprint_id | deployment_id | workflow_id | execution_id |
+----+--------------------------+---------------------------+-----------+--------------+--------------------------------------+--------------------+--------------------------------------+
| 8 | 2024-08-15 18:48:56.162 | workflow_seconds_duration | 95.981168 | blueprint | f1eb025e-5590-4b4e-91ee-1def5bdb72c9 | audit_certificates | 97ca778a-6753-4137-857d-d3a1ce48ff8c |
+----+--------------------------+---------------------------+-----------+--------------+--------------------------------------+--------------------+--------------------------------------+
 
Showing 1 of 1 metric(s)
Debug messages are only shown when you use very verbose mode (-vv)