Edit services examples (#140)

This commit is contained in:
pollfly 2021-12-23 14:02:45 +02:00 committed by GitHub
parent 2f37fd5030
commit 27b694eba5
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
5 changed files with 91 additions and 122 deletions

View File

@ -2,67 +2,64 @@
title: Cleanup Service title: Cleanup Service
--- ---
The cleanup service deletes: The [cleanup service](https://github.com/allegroai/clearml/blob/master/examples/services/cleanup/cleanup_service.py)
* Archived Tasks and their associated model checkpoints (snapshots) demonstrates how to use the `clearml.backend_api.session.client.APIClient` class to implement a service that deletes old
* Other artifacts archived tasks and their associated files: model checkpoints, other artifacts, and debug samples.
* Debug samples
The cleanup service can be configured with parameters specifying which Archived Tasks to delete and when to delete them. Modify the cleanup services parameters to specify which archived experiments to delete and when to delete them.
Its Task name is `Cleanup Service` and it is associated with the project `DevOps`.
`Cleanup Service` can be configured in the **ClearML Web UI**, and then the Task can be enqueued for execution in the ### Running the Cleanup Service
[ClearML services mode](../../clearml_agent.md#services-mode).
It is pre-loaded in **ClearML Server** and its status is *Draft* (editable). Or, run the script [cleanup_service.py](https://github.com/allegroai/clearml/blob/master/examples/services/cleanup/cleanup_service.py),
with options to run locally or as a service.
## Prerequisites :::info Self deployed ClearML server
A template `Cleanup Service` task is available in the `DevOps Services` project. You can clone it, adapt its [configuration](#configuration)
to your needs, and enqueue it for execution directly from the ClearML UI.
:::
* **ClearML Agent** is [installed and configured](../../clearml_agent.md#installation). Configure the task execution by modifying the `args` dictionary:
* **ClearML Agent** is launched in [services mode](../../clearml_agent.md#services-mode). * `delete_threshold_days` - Tasks older than this number of days will be deleted. The default value is 30 days.
* `cleanup_period_in_days` - Repeat the cleanup service at this interval, in days. The default value is 1.0 (run once a day).
* `force_delete` - If `False` (default), delete only Draft tasks. If `True`, allows deletion of tasks in any status.
* `run_as_service` - If `True` (default), the task will be enqueued for remote execution (default queue: "services"). Otherwise, the script will execute locally.
## Running the Cleanup Service :::note Remote Execution
If `run_as_service` is set to `True`, make sure a `clearml-agent` is assigned to the `services` queue.
:::
### Running Using the ClearML Web UI Now that the script is configured, execute it:
```bash
python cleanup_service.py
```
#### Step 1. Configuring the Cleanup Service A new task called `Cleanup Service` is created in the `DevOps` project on your ClearML server. The script output should
look similar to:
```console
ClearML Task: created new task id=8126c0af800f4903be07421aa344d7b3
ClearML results page: https://app.community.clear.ml/projects/608e9039/experiments/81261aa34d7b3/output/log
Cleanup service started
Starting cleanup
Deleting 100 tasks
```
1. In the **ClearML Web UI** **Projects** page, click the **DevOps** project **>** click the **Cleanup Service** Task. This is followed by details from the cleanup.
1. In the info panel, click the **CONFIGURATION** tab.
1. In the **GENERAL** section, hover over the parameter area **>** **EDIT**.
1. Configure the service parameters:
* **cleanup_period_in_days** - Repeat the cleanup service at this interval, in days. The default value is **1.0** (run once a day).
* **delete_threshold_days** - Tasks older than this number of days will be deleted. The default value is **30** days.
* **force_delete**
* **True** - Delete all Tasks older than **delete_threshold_days**.
* **False** - Delete only status **created** (*Draft*) Tasks. The default value is **False**.
* **run_as_service**
* **True** - Run the cleanup as a service (it repeats regularly).
* **False** - Run the Task once locally. The default value **False**.
#### Step 2. Enqueuing the cleanup service
* Right click the **Cleanup Service** Task **>** **Enqueue** **>** In the queue list, select **services** **>** **ENQUEUE**.
### Running Using the Script
The [cleanup_service.py](https://github.com/allegroai/clearml/blob/master/examples/services/cleanup/cleanup_service.py) allows
to enqueue the cleanup service to run in **ClearML Agent** services mode, because the `run_as_service` parameter is set to `True`.
python cleanup_service.py
## The Cleanup Service Code ## The Cleanup Service Code
[cleanup_service.py](https://github.com/allegroai/clearml/blob/master/examples/services/cleanup/cleanup_service.py) creates [cleanup_service.py](https://github.com/allegroai/clearml/blob/master/examples/services/cleanup/cleanup_service.py) creates
a **ClearML** API client session to delete the Tasks. It creates an `APIClient` object that establishes a session with the an `APIClient` object that establishes a session with the ClearML Server, and accomplishes the cleanup by calling:
**ClearML** backend (**ClearML Server**), and accomplishes the cleanup by calling: * [`Tasks.get_all`](../../references/api/tasks.md#post-tasksget_all) to get a list of Tasks to delete, providing the following parameters:
* `Tasks.get_all` to get a list of Tasks to delete, providing the following parameters:
* `system_tags` - Get only Tasks tagged as `archived`. * `system_tags` - Get only Tasks tagged as `archived`.
* `only_fields` - Get only the Task `id`. Only the Task `id` is needed to delete Tasks and its output.
* `order_by` - Order the list of Tasks returned by last activity timestamp, in descending order (most recent first).
* `page_size` - Set the number of Tasks returned in each page (the last page may contain fewer results).
* `page` - Set the number of the page in the resulting list of Tasks to return.
* `status_changed` - Get Tasks whose last status change is older than then delete threshold (in seconds). * `status_changed` - Get Tasks whose last status change is older than then delete threshold (in seconds).
* [`Task.delete`](../../references/sdk/task.md#delete) - Delete a Task.
* `Tasks.delete` - Delete a Task, optionally forcing the deletion of a Task, even if its status is not *Draft*. ## Configuration
The experiments hyperparameters are explicitly logged to ClearML using the [`Task.connect`](../../references/sdk/task.md#connect)
method. View them in the WebApp, in the experiments **CONFIGURATION** page under **HYPER PARAMETERS > General**.
The task can be reused. Clone the task, edit its parameters, and enqueue the task to run in ClearML Agent [services mode](../../clearml_agent.md#services-mode).
![Cleanup service configuration](../../img/example_cleanup_configuration.png)
## Console
All console output appears in the experiments **RESULTS > CONSOLE**.
![Cleanup service console](../../img/examples_cleanup_console.png)

View File

@ -2,19 +2,9 @@
title: Monitoring Service Posting Slack Alerts title: Monitoring Service Posting Slack Alerts
--- ---
The Slack alerts example runs as a **ClearML** service, which monitors the completion and failure of Tasks, and posts alert The [Slack alerts example](https://github.com/allegroai/clearml/blob/master/examples/services/monitoring/slack_alerts.py)
messages on a specified Slack channel. In the example, we configure the Slack details when creating a Slack bot, and set demonstrates how to use the `clearml.automation.monitor` class to implement a service that monitors the completion and
parameters for monitoring. The Task name is `Slack Alerts`, and it is associated with the project `Monitoring`. failure of tasks, and posts alert messages on a Slack channel.
`Slack Alerts` executes in [ClearML services mode](../../clearml_agent.md#services-mode) and is configurable. It is pre-loaded
in **ClearML Server** and its status is *Draft* (editable). Set the parameter values in the **ClearML Web UI**, and then
enqueue the Task to the `services` queue. Or, run the script [slack_alerts.py](https://github.com/allegroai/clearml/blob/master/examples/services/monitoring/slack_alerts.py),
with options to run locally, or enqueue the Task to the `services` queue.
## Prerequisites
* **ClearML Agent** is [installed and configured](../../clearml_agent.md#installation).
* **ClearML Agent** is launched in [services mode](../../clearml_agent.md#services-mode).
## Creating a Slack Bot ## Creating a Slack Bot
@ -43,72 +33,54 @@ The Slack API token and channel you create are required to configure the Slack a
1. In the confirmation dialog, click **Allow**. 1. In the confirmation dialog, click **Allow**.
1. Click **Copy** to copy the **Bot User OAuth Access Token**. 1. Click **Copy** to copy the **Bot User OAuth Access Token**.
## Running the Service ## Running the Script
There are two options to run the Slack alerts service:
* [Using the ClearML Web UI](#running-using-the-clearml-web-ui)
* [Using the script](#running-using-the-script)
### Running Using the ClearML Web UI :::info Self deployed ClearML server
A template `Slack Alerts` task is available in the `DevOps Services` project. You can clone it, adapt its [configuration](#configuration)
#### Step 1. Configuring the Service to your needs and enqueue for execution directly from the ClearML UI.
:::
1. In the **ClearML Web UI** **Projects** page, click the **Monitoring** project **>** click the **Slack Alerts** Task.
1. In the info panel, click the **CONFIGURATION** tab.
1. In the **GENERAL** section, hover over the parameter area **>** **EDIT**.
1. Configure the service parameters:
* **channel** - The name of the Slack channel (MANDATORY).
* **include_completed_experiments** - (bool) Whether to include completed experiments:
* **True** - Include
* **False** - Do not include (default)
* **include_manual_experiments** - Whether to include experiments that are running locally:
* **True** - Monitor local experiments, and remote experiments executed by **ClearML Agent** (default).
* **False** - Remote experiments, only.
* **local** - Run the monitor locally, instead of as a service. The default is **False**.
* **message_prefix** - A message prefix. For example, to alert all channel members use: "Hey <!here>,"
* **min_num_iterations** - The minimum number of iterations of failed/completed experiment to alert. The default is **0**, indicating all alerts.
* **project** - The name (or partial name) of the project to monitor, use empty for all projects.
* **refresh_rate** - How often to run the monitoring service (seconds). The default value is **10.0**.
* **service_queue** - The queue that clearml-agent is listening to for Tasks to execute as a service. The default is
**services**.
* **slack_api** - The Slack API key. The default value can be set in the environment variable, `SLACK_API_TOKEN` (MANDATORY).
#### Step 2. Enqueuing the Service
* Right click the **Monitoring** Task **>** **Enqueue** **>** Select **services** **>** **ENQUEUE**.
### Running Using the Script
The [slack_alerts.py](https://github.com/allegroai/clearml/blob/master/examples/services/monitoring/slack_alerts.py)
allows to configure the monitoring service, and then either:
Run the monitoring service in one of these ways:
* Run locally * Run locally
* Run in **ClearML Agent** services mode * Run in ClearML Agent [services mode](../../clearml_agent.md#services-mode)
**To run the monitoring service locally:** To run the monitoring service:
python slack_alerts.py --channel <Slack-channel-name> --slack-api <Slack-API-token> --local True [...] ```bash
python slack_alerts.py --channel <Slack-channel-name> --slack-api <Slack-API-token> --local True [...]
where, ```
* `channel` - The Slack channel where alerts will be posted. * `channel` - The Slack channel where alerts will be posted.
* `slack_api` - Slack API key. * `slack_api` - Slack API key.
* `local` - Run the monitoring service only locally. If `True`, then run locally. If `False`, then run locally and * `local` - If `True`, run monitoring services locally. If `False`, enqueue the task in the queue passed as the
enqueue the Task to run in **ClearML Agent** services mode. `service_queue` (by default `services` queue) for remote execution.
`slack_alerts.py` supports the following additional command line options: The script supports the following additional command line options:
* `service_queue` - The queue to use when running remotely as a service. The default value is `services` (make sure
your workspace has such a queue and to assign a ClearML Agent to this queue).
* `message_prefix` - A message prefix for Slack alerts. For example, to alert all channel members use: "Hey <!here>".
* `min_num_iterations` - Minimal iteration threshold below which experiments are ignored. Use this option to eliminate
debug sessions that fail quickly. The default value is 0.
* `include_manual_experiments` - Whether to include experiments that are running locally:
* `True` (default) - Monitor all experiments (both local and remote, executed by ClearML Agent).
* `False` - Monitor only remote experiments.
* `include_completed_experiments` - If `False` (default), send alerts only for failed tasks. If `True`, send an alert
for completed and failed tasks.
* `refresh_rate` - How often to monitor the experiments in seconds. The default value is 10.0.
* ``message_prefix`` - The default value is an empty string. ## Configuration
* ``min_num_iterations`` - Minimum number of iterations of failed / completed experiment to alert. Use this option to eliminate debug sessions that fail quickly. The default value is <code>0</code> (alerts for experiments).
* ``include_manual_experiments`` - Include experiments running manually (i.e. not by clearml-agent). The default value is
``False``.
* ``include_completed_experiments`` - If `False`, then include send alerts for
failed Tasks, only. If ``True``, then send alert for completed and failed Tasks. The default value is ``False``.
* ``refresh_rate`` - How often to check the experiments, in seconds. The default value is ``10`` (seconds).
* ``service_queue`` - The queue to use when running as a service. The default value is ``services``.
* ``local`` - If ``True``, run locally only instead of as a service. If ``False``, then automatically enqueue the Task
to run in **ClearML Agent** services mode. The default value is ``False``.
ClearML automatically logs command line options defined with argparse. They appear in the experiments **CONFIGURATION**
page under **HYPER PARAMETERS > Args**.
![Monitoring configuration](../../img/examples_slack_config.png)
The task can be reused to launch another monitor instance: clone the task, edit its parameters, and enqueue the task for
execution (youll typically want to use a ClearML Agent running in [services mode](../../clearml_agent.md#services-mode)
for such service tasks).
## Console
All console output appears in the experiments **RESULTS > CONSOLE** page.
## Additional Information about slack_alerts.py ## Additional Information about slack_alerts.py
In `slack_alerts.py`, the class `SlackMonitor` inherits from the `Monitor` class in `clearml.automation.monitor`. In `slack_alerts.py`, the class `SlackMonitor` inherits from the `Monitor` class in `clearml.automation.monitor`.
@ -116,12 +88,12 @@ In `slack_alerts.py`, the class `SlackMonitor` inherits from the `Monitor` class
* `get_query_parameters` - Get the query parameters for Task monitoring. * `get_query_parameters` - Get the query parameters for Task monitoring.
* `process_task` - Get the information for a Task, post a Slack message, and output to console. * `process_task` - Get the information for a Task, post a Slack message, and output to console.
* Allows skipping failed Tasks, if a Task ran for few iterations. Calls [Task.get_last_iteration](../../references/sdk/task.md#get_last_iteration) * Allows skipping failed Tasks, if a Task ran for few iterations. Calls [`Task.get_last_iteration`](../../references/sdk/task.md#get_last_iteration)
to get the number of iterations. to get the number of iterations.
* Builds the Slack message which includes the most recent output to the console (retrieved by calling [Task.get_reported_console_output](../../references/sdk/task.md#get_reported_console_output)), * Builds the Slack message which includes the most recent output to the console (retrieved by calling [`Task.get_reported_console_output`](../../references/sdk/task.md#get_reported_console_output)),
and the URL of the Task's output log in the **ClearML Web UI** (retrieved by calling [Task.get_output_log_web_page](../../references/sdk/task.md#get_output_log_web_page). and the URL of the Task's output log in the ClearML Web UI (retrieved by calling [`Task.get_output_log_web_page`](../../references/sdk/task.md#get_output_log_web_page)).
The example provides the option to run locally or execute remotely by calling the [Task.execute_remotely](../../references/sdk/task.md#execute_remotely) The example provides the option to run locally or execute remotely by calling the [`Task.execute_remotely`](../../references/sdk/task.md#execute_remotely)
method. method.
To interface to Slack, the example uses `slack.WebClient` and `slack.errors.SlackApiError`. To interface to Slack, the example uses `slack.WebClient` and `slack.errors.SlackApiError`.

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 151 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 48 KiB