mirror of
https://github.com/clearml/clearml-docs
synced 2025-03-03 10:42:51 +00:00
Edit services examples (#140)
This commit is contained in:
parent
2f37fd5030
commit
27b694eba5
@ -2,67 +2,64 @@
|
||||
title: Cleanup Service
|
||||
---
|
||||
|
||||
The cleanup service deletes:
|
||||
* Archived Tasks and their associated model checkpoints (snapshots)
|
||||
* Other artifacts
|
||||
* Debug samples
|
||||
The [cleanup service](https://github.com/allegroai/clearml/blob/master/examples/services/cleanup/cleanup_service.py)
|
||||
demonstrates how to use the `clearml.backend_api.session.client.APIClient` class to implement a service that deletes old
|
||||
archived tasks and their associated files: model checkpoints, other artifacts, and debug samples.
|
||||
|
||||
The cleanup service can be configured with parameters specifying which Archived Tasks to delete and when to delete them.
|
||||
Its Task name is `Cleanup Service` and it is associated with the project `DevOps`.
|
||||
Modify the cleanup service’s parameters to specify which archived experiments to delete and when to delete them.
|
||||
|
||||
`Cleanup Service` can be configured in the **ClearML Web UI**, and then the Task can be enqueued for execution in the
|
||||
[ClearML services mode](../../clearml_agent.md#services-mode).
|
||||
It is pre-loaded in **ClearML Server** and its status is *Draft* (editable). Or, run the script [cleanup_service.py](https://github.com/allegroai/clearml/blob/master/examples/services/cleanup/cleanup_service.py),
|
||||
with options to run locally or as a service.
|
||||
### Running the Cleanup Service
|
||||
|
||||
## Prerequisites
|
||||
:::info Self deployed ClearML server
|
||||
A template `Cleanup Service` task is available in the `DevOps Services` project. You can clone it, adapt its [configuration](#configuration)
|
||||
to your needs, and enqueue it for execution directly from the ClearML UI.
|
||||
:::
|
||||
|
||||
* **ClearML Agent** is [installed and configured](../../clearml_agent.md#installation).
|
||||
* **ClearML Agent** is launched in [services mode](../../clearml_agent.md#services-mode).
|
||||
Configure the task execution by modifying the `args` dictionary:
|
||||
* `delete_threshold_days` - Tasks older than this number of days will be deleted. The default value is 30 days.
|
||||
* `cleanup_period_in_days` - Repeat the cleanup service at this interval, in days. The default value is 1.0 (run once a day).
|
||||
* `force_delete` - If `False` (default), delete only Draft tasks. If `True`, allows deletion of tasks in any status.
|
||||
* `run_as_service` - If `True` (default), the task will be enqueued for remote execution (default queue: "services"). Otherwise, the script will execute locally.
|
||||
|
||||
## Running the Cleanup Service
|
||||
|
||||
### Running Using the ClearML Web UI
|
||||
|
||||
#### Step 1. Configuring the Cleanup Service
|
||||
|
||||
1. In the **ClearML Web UI** **Projects** page, click the **DevOps** project **>** click the **Cleanup Service** Task.
|
||||
1. In the info panel, click the **CONFIGURATION** tab.
|
||||
1. In the **GENERAL** section, hover over the parameter area **>** **EDIT**.
|
||||
1. Configure the service parameters:
|
||||
* **cleanup_period_in_days** - Repeat the cleanup service at this interval, in days. The default value is **1.0** (run once a day).
|
||||
* **delete_threshold_days** - Tasks older than this number of days will be deleted. The default value is **30** days.
|
||||
* **force_delete**
|
||||
* **True** - Delete all Tasks older than **delete_threshold_days**.
|
||||
* **False** - Delete only status **created** (*Draft*) Tasks. The default value is **False**.
|
||||
* **run_as_service**
|
||||
* **True** - Run the cleanup as a service (it repeats regularly).
|
||||
* **False** - Run the Task once locally. The default value **False**.
|
||||
|
||||
#### Step 2. Enqueuing the cleanup service
|
||||
|
||||
* Right click the **Cleanup Service** Task **>** **Enqueue** **>** In the queue list, select **services** **>** **ENQUEUE**.
|
||||
|
||||
### Running Using the Script
|
||||
|
||||
The [cleanup_service.py](https://github.com/allegroai/clearml/blob/master/examples/services/cleanup/cleanup_service.py) allows
|
||||
to enqueue the cleanup service to run in **ClearML Agent** services mode, because the `run_as_service` parameter is set to `True`.
|
||||
:::note Remote Execution
|
||||
If `run_as_service` is set to `True`, make sure a `clearml-agent` is assigned to the `services` queue.
|
||||
:::
|
||||
|
||||
Now that the script is configured, execute it:
|
||||
```bash
|
||||
python cleanup_service.py
|
||||
```
|
||||
|
||||
A new task called `Cleanup Service` is created in the `DevOps` project on your ClearML server. The script output should
|
||||
look similar to:
|
||||
```console
|
||||
ClearML Task: created new task id=8126c0af800f4903be07421aa344d7b3
|
||||
ClearML results page: https://app.community.clear.ml/projects/608e9039/experiments/81261aa34d7b3/output/log
|
||||
Cleanup service started
|
||||
Starting cleanup
|
||||
Deleting 100 tasks
|
||||
```
|
||||
|
||||
This is followed by details from the cleanup.
|
||||
|
||||
## The Cleanup Service Code
|
||||
|
||||
[cleanup_service.py](https://github.com/allegroai/clearml/blob/master/examples/services/cleanup/cleanup_service.py) creates
|
||||
a **ClearML** API client session to delete the Tasks. It creates an `APIClient` object that establishes a session with the
|
||||
**ClearML** backend (**ClearML Server**), and accomplishes the cleanup by calling:
|
||||
|
||||
* `Tasks.get_all` to get a list of Tasks to delete, providing the following parameters:
|
||||
|
||||
an `APIClient` object that establishes a session with the ClearML Server, and accomplishes the cleanup by calling:
|
||||
* [`Tasks.get_all`](../../references/api/tasks.md#post-tasksget_all) to get a list of Tasks to delete, providing the following parameters:
|
||||
* `system_tags` - Get only Tasks tagged as `archived`.
|
||||
* `only_fields` - Get only the Task `id`. Only the Task `id` is needed to delete Tasks and its output.
|
||||
* `order_by` - Order the list of Tasks returned by last activity timestamp, in descending order (most recent first).
|
||||
* `page_size` - Set the number of Tasks returned in each page (the last page may contain fewer results).
|
||||
* `page` - Set the number of the page in the resulting list of Tasks to return.
|
||||
* `status_changed` - Get Tasks whose last status change is older than then delete threshold (in seconds).
|
||||
* [`Task.delete`](../../references/sdk/task.md#delete) - Delete a Task.
|
||||
|
||||
* `Tasks.delete` - Delete a Task, optionally forcing the deletion of a Task, even if its status is not *Draft*.
|
||||
## Configuration
|
||||
The experiment’s hyperparameters are explicitly logged to ClearML using the [`Task.connect`](../../references/sdk/task.md#connect)
|
||||
method. View them in the WebApp, in the experiment’s **CONFIGURATION** page under **HYPER PARAMETERS > General**.
|
||||
|
||||
The task can be reused. Clone the task, edit its parameters, and enqueue the task to run in ClearML Agent [services mode](../../clearml_agent.md#services-mode).
|
||||
|
||||

|
||||
|
||||
## Console
|
||||
All console output appears in the experiment’s **RESULTS > CONSOLE**.
|
||||
|
||||

|
||||
|
@ -2,19 +2,9 @@
|
||||
title: Monitoring Service Posting Slack Alerts
|
||||
---
|
||||
|
||||
The Slack alerts example runs as a **ClearML** service, which monitors the completion and failure of Tasks, and posts alert
|
||||
messages on a specified Slack channel. In the example, we configure the Slack details when creating a Slack bot, and set
|
||||
parameters for monitoring. The Task name is `Slack Alerts`, and it is associated with the project `Monitoring`.
|
||||
|
||||
`Slack Alerts` executes in [ClearML services mode](../../clearml_agent.md#services-mode) and is configurable. It is pre-loaded
|
||||
in **ClearML Server** and its status is *Draft* (editable). Set the parameter values in the **ClearML Web UI**, and then
|
||||
enqueue the Task to the `services` queue. Or, run the script [slack_alerts.py](https://github.com/allegroai/clearml/blob/master/examples/services/monitoring/slack_alerts.py),
|
||||
with options to run locally, or enqueue the Task to the `services` queue.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
* **ClearML Agent** is [installed and configured](../../clearml_agent.md#installation).
|
||||
* **ClearML Agent** is launched in [services mode](../../clearml_agent.md#services-mode).
|
||||
The [Slack alerts example](https://github.com/allegroai/clearml/blob/master/examples/services/monitoring/slack_alerts.py)
|
||||
demonstrates how to use the `clearml.automation.monitor` class to implement a service that monitors the completion and
|
||||
failure of tasks, and posts alert messages on a Slack channel.
|
||||
|
||||
## Creating a Slack Bot
|
||||
|
||||
@ -43,71 +33,53 @@ The Slack API token and channel you create are required to configure the Slack a
|
||||
1. In the confirmation dialog, click **Allow**.
|
||||
1. Click **Copy** to copy the **Bot User OAuth Access Token**.
|
||||
|
||||
## Running the Service
|
||||
There are two options to run the Slack alerts service:
|
||||
* [Using the ClearML Web UI](#running-using-the-clearml-web-ui)
|
||||
* [Using the script](#running-using-the-script)
|
||||
## Running the Script
|
||||
|
||||
### Running Using the ClearML Web UI
|
||||
|
||||
#### Step 1. Configuring the Service
|
||||
|
||||
1. In the **ClearML Web UI** **Projects** page, click the **Monitoring** project **>** click the **Slack Alerts** Task.
|
||||
1. In the info panel, click the **CONFIGURATION** tab.
|
||||
1. In the **GENERAL** section, hover over the parameter area **>** **EDIT**.
|
||||
1. Configure the service parameters:
|
||||
|
||||
* **channel** - The name of the Slack channel (MANDATORY).
|
||||
* **include_completed_experiments** - (bool) Whether to include completed experiments:
|
||||
* **True** - Include
|
||||
* **False** - Do not include (default)
|
||||
* **include_manual_experiments** - Whether to include experiments that are running locally:
|
||||
* **True** - Monitor local experiments, and remote experiments executed by **ClearML Agent** (default).
|
||||
* **False** - Remote experiments, only.
|
||||
* **local** - Run the monitor locally, instead of as a service. The default is **False**.
|
||||
* **message_prefix** - A message prefix. For example, to alert all channel members use: "Hey <!here>,"
|
||||
* **min_num_iterations** - The minimum number of iterations of failed/completed experiment to alert. The default is **0**, indicating all alerts.
|
||||
* **project** - The name (or partial name) of the project to monitor, use empty for all projects.
|
||||
* **refresh_rate** - How often to run the monitoring service (seconds). The default value is **10.0**.
|
||||
* **service_queue** - The queue that clearml-agent is listening to for Tasks to execute as a service. The default is
|
||||
**services**.
|
||||
* **slack_api** - The Slack API key. The default value can be set in the environment variable, `SLACK_API_TOKEN` (MANDATORY).
|
||||
|
||||
#### Step 2. Enqueuing the Service
|
||||
|
||||
* Right click the **Monitoring** Task **>** **Enqueue** **>** Select **services** **>** **ENQUEUE**.
|
||||
|
||||
### Running Using the Script
|
||||
|
||||
The [slack_alerts.py](https://github.com/allegroai/clearml/blob/master/examples/services/monitoring/slack_alerts.py)
|
||||
allows to configure the monitoring service, and then either:
|
||||
:::info Self deployed ClearML server
|
||||
A template `Slack Alerts` task is available in the `DevOps Services` project. You can clone it, adapt its [configuration](#configuration)
|
||||
to your needs and enqueue for execution directly from the ClearML UI.
|
||||
:::
|
||||
|
||||
Run the monitoring service in one of these ways:
|
||||
* Run locally
|
||||
* Run in **ClearML Agent** services mode
|
||||
* Run in ClearML Agent [services mode](../../clearml_agent.md#services-mode)
|
||||
|
||||
**To run the monitoring service locally:**
|
||||
To run the monitoring service:
|
||||
|
||||
```bash
|
||||
python slack_alerts.py --channel <Slack-channel-name> --slack-api <Slack-API-token> --local True [...]
|
||||
|
||||
where,
|
||||
|
||||
```
|
||||
* `channel` - The Slack channel where alerts will be posted.
|
||||
* `slack_api` - Slack API key.
|
||||
* `local` - Run the monitoring service only locally. If `True`, then run locally. If `False`, then run locally and
|
||||
enqueue the Task to run in **ClearML Agent** services mode.
|
||||
* `local` - If `True`, run monitoring services locally. If `False`, enqueue the task in the queue passed as the
|
||||
`service_queue` (by default `services` queue) for remote execution.
|
||||
|
||||
`slack_alerts.py` supports the following additional command line options:
|
||||
The script supports the following additional command line options:
|
||||
* `service_queue` - The queue to use when running remotely as a service. The default value is `services` (make sure
|
||||
your workspace has such a queue and to assign a ClearML Agent to this queue).
|
||||
* `message_prefix` - A message prefix for Slack alerts. For example, to alert all channel members use: "Hey <!here>".
|
||||
* `min_num_iterations` - Minimal iteration threshold below which experiments are ignored. Use this option to eliminate
|
||||
debug sessions that fail quickly. The default value is 0.
|
||||
* `include_manual_experiments` - Whether to include experiments that are running locally:
|
||||
* `True` (default) - Monitor all experiments (both local and remote, executed by ClearML Agent).
|
||||
* `False` - Monitor only remote experiments.
|
||||
* `include_completed_experiments` - If `False` (default), send alerts only for failed tasks. If `True`, send an alert
|
||||
for completed and failed tasks.
|
||||
* `refresh_rate` - How often to monitor the experiments in seconds. The default value is 10.0.
|
||||
|
||||
* ``message_prefix`` - The default value is an empty string.
|
||||
* ``min_num_iterations`` - Minimum number of iterations of failed / completed experiment to alert. Use this option to eliminate debug sessions that fail quickly. The default value is <code>0</code> (alerts for experiments).
|
||||
* ``include_manual_experiments`` - Include experiments running manually (i.e. not by clearml-agent). The default value is
|
||||
``False``.
|
||||
* ``include_completed_experiments`` - If `False`, then include send alerts for
|
||||
failed Tasks, only. If ``True``, then send alert for completed and failed Tasks. The default value is ``False``.
|
||||
* ``refresh_rate`` - How often to check the experiments, in seconds. The default value is ``10`` (seconds).
|
||||
* ``service_queue`` - The queue to use when running as a service. The default value is ``services``.
|
||||
* ``local`` - If ``True``, run locally only instead of as a service. If ``False``, then automatically enqueue the Task
|
||||
to run in **ClearML Agent** services mode. The default value is ``False``.
|
||||
## Configuration
|
||||
|
||||
ClearML automatically logs command line options defined with argparse. They appear in the experiment’s **CONFIGURATION**
|
||||
page under **HYPER PARAMETERS > Args**.
|
||||
|
||||

|
||||
|
||||
The task can be reused to launch another monitor instance: clone the task, edit its parameters, and enqueue the task for
|
||||
execution (you’ll typically want to use a ClearML Agent running in [services mode](../../clearml_agent.md#services-mode)
|
||||
for such service tasks).
|
||||
|
||||
## Console
|
||||
All console output appears in the experiment’s **RESULTS > CONSOLE** page.
|
||||
|
||||
## Additional Information about slack_alerts.py
|
||||
|
||||
@ -116,12 +88,12 @@ In `slack_alerts.py`, the class `SlackMonitor` inherits from the `Monitor` class
|
||||
|
||||
* `get_query_parameters` - Get the query parameters for Task monitoring.
|
||||
* `process_task` - Get the information for a Task, post a Slack message, and output to console.
|
||||
* Allows skipping failed Tasks, if a Task ran for few iterations. Calls [Task.get_last_iteration](../../references/sdk/task.md#get_last_iteration)
|
||||
* Allows skipping failed Tasks, if a Task ran for few iterations. Calls [`Task.get_last_iteration`](../../references/sdk/task.md#get_last_iteration)
|
||||
to get the number of iterations.
|
||||
* Builds the Slack message which includes the most recent output to the console (retrieved by calling [Task.get_reported_console_output](../../references/sdk/task.md#get_reported_console_output)),
|
||||
and the URL of the Task's output log in the **ClearML Web UI** (retrieved by calling [Task.get_output_log_web_page](../../references/sdk/task.md#get_output_log_web_page).
|
||||
* Builds the Slack message which includes the most recent output to the console (retrieved by calling [`Task.get_reported_console_output`](../../references/sdk/task.md#get_reported_console_output)),
|
||||
and the URL of the Task's output log in the ClearML Web UI (retrieved by calling [`Task.get_output_log_web_page`](../../references/sdk/task.md#get_output_log_web_page)).
|
||||
|
||||
The example provides the option to run locally or execute remotely by calling the [Task.execute_remotely](../../references/sdk/task.md#execute_remotely)
|
||||
The example provides the option to run locally or execute remotely by calling the [`Task.execute_remotely`](../../references/sdk/task.md#execute_remotely)
|
||||
method.
|
||||
|
||||
To interface to Slack, the example uses `slack.WebClient` and `slack.errors.SlackApiError`.
|
BIN
docs/img/example_cleanup_configuration.png
Normal file
BIN
docs/img/example_cleanup_configuration.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 33 KiB |
BIN
docs/img/examples_cleanup_console.png
Normal file
BIN
docs/img/examples_cleanup_console.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 151 KiB |
BIN
docs/img/examples_slack_config.png
Normal file
BIN
docs/img/examples_slack_config.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 48 KiB |
Loading…
Reference in New Issue
Block a user