clearml-docs/docs/guides/services/slack_alerts.md

99 lines
5.1 KiB
Markdown
Raw Normal View History

2021-05-13 23:48:51 +00:00
---
title: Monitoring Service Posting Slack Alerts
---
2021-12-23 12:02:45 +00:00
The [Slack alerts example](https://github.com/allegroai/clearml/blob/master/examples/services/monitoring/slack_alerts.py)
demonstrates how to use the `clearml.automation.monitor` class to implement a service that monitors the completion and
failure of tasks, and posts alert messages on a Slack channel.
2021-05-13 23:48:51 +00:00
## Creating a Slack Bot
Before configuring and running the Slack alert service, create a new Slack Bot (**ClearML Bot**).
:::important
The Slack API token and channel you create are required to configure the Slack alert service.
:::
1. Login to your Slack account.
1. Go to [https://api.slack.com/apps/new](https://api.slack.com/apps/new).
1. In **App Name**, enter an app name; for example, "ClearML Bot".
1. In **Development Slack Workspace**, select a workspace.
1. Click **Create App**.
1. In **Basic Information**, under **Display Information**, complete the following:
- In **Short description**, enter "Allegro Train Bot".
- In **Background color**, enter "#202432".
1. Click **Save Changes**.
1. In **OAuth & Permissions**, under **Scopes**, click **Add an OAuth Scope**, and then select the following permissions
on the list:
* **channels:join**
* **channels:read**
* **chat:write**
1. In **OAuth Tokens & Redirect URLs**:
1. Click **Install App to Workspace**
1. In the confirmation dialog, click **Allow**.
1. Click **Copy** to copy the **Bot User OAuth Access Token**.
2021-12-23 12:02:45 +00:00
## Running the Script
2021-05-13 23:48:51 +00:00
2021-12-23 12:02:45 +00:00
:::info Self deployed ClearML server
A template `Slack Alerts` task is available in the `DevOps Services` project. You can clone it, adapt its [configuration](#configuration)
to your needs and enqueue for execution directly from the ClearML UI.
:::
2021-05-13 23:48:51 +00:00
2021-12-23 12:02:45 +00:00
Run the monitoring service in one of these ways:
2021-05-13 23:48:51 +00:00
* Run locally
2021-12-23 12:02:45 +00:00
* Run in ClearML Agent [services mode](../../clearml_agent.md#services-mode)
2021-05-13 23:48:51 +00:00
2021-12-23 12:02:45 +00:00
To run the monitoring service:
2021-05-13 23:48:51 +00:00
2021-12-23 12:02:45 +00:00
```bash
python slack_alerts.py --channel <Slack-channel-name> --slack-api <Slack-API-token> --local True [...]
```
2021-05-13 23:48:51 +00:00
* `channel` - The Slack channel where alerts will be posted.
* `slack_api` - Slack API key.
2021-12-23 12:02:45 +00:00
* `local` - If `True`, run monitoring services locally. If `False`, enqueue the task in the queue passed as the
`service_queue` (by default `services` queue) for remote execution.
2021-05-13 23:48:51 +00:00
2021-12-23 12:02:45 +00:00
The script supports the following additional command line options:
* `service_queue` - The queue to use when running remotely as a service. The default value is `services` (make sure
your workspace has such a queue and to assign a ClearML Agent to this queue).
* `message_prefix` - A message prefix for Slack alerts. For example, to alert all channel members use: "Hey <!here>".
* `min_num_iterations` - Minimal iteration threshold below which experiments are ignored. Use this option to eliminate
debug sessions that fail quickly. The default value is 0.
* `include_manual_experiments` - Whether to include experiments that are running locally:
* `True` (default) - Monitor all experiments (both local and remote, executed by ClearML Agent).
* `False` - Monitor only remote experiments.
* `include_completed_experiments` - If `False` (default), send alerts only for failed tasks. If `True`, send an alert
for completed and failed tasks.
* `refresh_rate` - How often to monitor the experiments in seconds. The default value is 10.0.
## Configuration
2021-05-13 23:48:51 +00:00
2021-12-23 12:02:45 +00:00
ClearML automatically logs command line options defined with argparse. They appear in the experiments **CONFIGURATION**
page under **HYPER PARAMETERS > Args**.
![Monitoring configuration](../../img/examples_slack_config.png)
The task can be reused to launch another monitor instance: clone the task, edit its parameters, and enqueue the task for
execution (youll typically want to use a ClearML Agent running in [services mode](../../clearml_agent.md#services-mode)
for such service tasks).
## Console
All console output appears in the experiments **RESULTS > CONSOLE** page.
2021-09-09 10:17:46 +00:00
## Additional Information about slack_alerts.py
2021-05-13 23:48:51 +00:00
In `slack_alerts.py`, the class `SlackMonitor` inherits from the `Monitor` class in `clearml.automation.monitor`.
`SlackMonitor` overrides the following `Monitor` class methods:
* `get_query_parameters` - Get the query parameters for Task monitoring.
* `process_task` - Get the information for a Task, post a Slack message, and output to console.
2021-12-23 12:02:45 +00:00
* Allows skipping failed Tasks, if a Task ran for few iterations. Calls [`Task.get_last_iteration`](../../references/sdk/task.md#get_last_iteration)
2021-05-13 23:48:51 +00:00
to get the number of iterations.
2021-12-23 12:02:45 +00:00
* Builds the Slack message which includes the most recent output to the console (retrieved by calling [`Task.get_reported_console_output`](../../references/sdk/task.md#get_reported_console_output)),
and the URL of the Task's output log in the ClearML Web UI (retrieved by calling [`Task.get_output_log_web_page`](../../references/sdk/task.md#get_output_log_web_page)).
2021-05-13 23:48:51 +00:00
2021-12-23 12:02:45 +00:00
The example provides the option to run locally or execute remotely by calling the [`Task.execute_remotely`](../../references/sdk/task.md#execute_remotely)
2021-05-13 23:48:51 +00:00
method.
To interface to Slack, the example uses `slack.WebClient` and `slack.errors.SlackApiError`.