Edit autoscaler example (#139)

This commit is contained in:
pollfly 2021-12-23 14:01:02 +02:00 committed by GitHub
parent fe44deede0
commit 2f37fd5030
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 72 additions and 151 deletions

View File

@ -2,130 +2,40 @@
title: ClearML AWS Autoscaler Service title: ClearML AWS Autoscaler Service
--- ---
The **ClearML** AWS autoscaler optimizes AWS EC2 instance scaling according to the instance types used, and the The ClearML [AWS autoscaler example](https://github.com/allegroai/clearml/blob/master/examples/services/aws-autoscaler/aws_autoscaler.py)
budget configured. demonstrates how to use the [`clearml.automation.auto_scaler`](https://github.com/allegroai/clearml/blob/master/clearml/automation/auto_scaler.py)
module to implement a service that optimizes AWS EC2 instance scaling according to a defined instance budget.
In the budget, set the maximum number of each instance type to spin for experiments awaiting execution in a specific queue. It periodically polls your AWS cluster and automatically stops idle instances based on a defined maximum idle time or spins
Configure multiple instance types per queue, and multiple queues. The **ClearML** AWS up new instances when there aren't enough to execute pending tasks.
autoscaler will spin down idle instances based on the maximum idle time and the polling interval configurations.
## Running the ClearML AWS Autoscaler ## Running the ClearML AWS Autoscaler
The **ClearML** AWS autoscaler can execute in [ClearML services mode](../../clearml_agent.md#services-mode),
and is configurable.
Run **ClearML** AWS autoscaler in one of these ways: run the ClearML AWS autoscaler in one of these ways:
* Run the [aws_autoscaler.py](https://github.com/allegroai/clearml/blob/master/examples/services/aws-autoscaler/aws_autoscaler.py)
* In the ClearML Web UI. script locally
* The autoscaler is pre-loaded in the **ClearML Server** and its status is *Draft* (editable). * Launch through your [`services` queue](../../clearml_agent.md#services-mode)
* Set the instance types and configure the budget in the **ClearML Web UI**, and then enqueue the Task to the `services` queues.
* By running the [aws_autoscaler.py](https://github.com/allegroai/clearml/blob/master/examples/services/aws-autoscaler/aws_autoscaler.py)
script.
* Run script locally or as a service.
* When executed, a Task is created, named `AWS Auto-Scaler` that associated with the `DevOps` project.
### Running Using the ClearML Web UI :::note Default AMI
The autoscaler services uses by default the `NVIDIA Deep Learning AMI v20.11.0-46a68101-e56b-41cd-8e32-631ac6e5d02b` AMI
:::
Edit the parameters for the instance types, edit budget configuration by editing the Task, and then enqueue the Task to ### Running the Script
run in **ClearML Agent** services mode.
1. Open the **ClearML Web UI** **>** **Projects** page **>** **DevOps** project **>** **AWS Auto-Scaler** Task. :::info Self deployed ClearML server
1. Set the AWS and Git credentials, parameters for idle AWS EC2 instances, and a worker prefix. A template `AWS Auto-Scaler` task is available in the `DevOps Services` project.
* In the **CONFIGURATIONS** tab **>** **HYPER PARAMETERS** **>** **Args** **>** hover **>** **EDIT**. You can clone it, adapt its [configuration](#configuration) to your needs, and enqueue it for execution directly from the ClearML UI.
* **cloud_credentials_key** - AWS access key. :::
* **cloud_credentials_region** - AWS region.
* **cloud_credentials_secret** - AWS access secret.
* **cloud_provider** - AWS.
* **default_docker_image** - The default Docker image to use for the AWS EC2 instance.
* **git_pass** - Git password.
* **git_user** - Git username.
* **max_idle_time_min** - The maximum time an AWS EC2 instance can be idle before the **ClearML** AWS autoscaler spins it down.
* **polling_interval_time_min** - How often the **ClearML** AWS autoscaler checks for idle instances.
* **workers_prefix**
1. Configure the budget.
* In **CONFIGURATION OBJECTS** **>** **General** **>** hover **>** **EDIT**. Edit the `resource_configurations` dictionary:
resource_configurations {
<resource-name> {
instance_type = "<instance_type>"
is_spot = <boolean>
availability_zone = "<AWS-region>"
ami_id = "<AMI-ID>"
ebs_device_name = "<EBS-device-name>"
ebs_volume_size = <EBS-size-in-GB>
ebs_volume_type = "<EBS-vol-type>"
}
}
queues {
<queue-name> = [["<resource-name>", <max-instances-of-resource-name>]]
}
extra_clearml_conf = "<ClearML-config-file>"
extra_vm_bash_script = "<bash-script>"
* `<resource-name>` - The name assigned to each resource (AWS EC2 instance type). Used in the budget.
* `queues` - The **ClearML** AWS autoscaler will optimize scaling for experiments awaiting execution in these queues.
* `<queue-name>` - A specific queue.
* `<max-instances-of-resource-name>` - The maximum number of instances of the specified `resource-name` to spin up.
* `is_spot` - If `true`, then use a spot instance. If `false`, then use a reserved instance.
* `extra_clearml_conf` - A **ClearML** configuration file to use for executing experiments in **ClearML Agent**.
* `extra_vm_bash_script` - A bash script to execute when creating an instance, before **ClearML Agent** executes.
<br/>
<details className="cml-expansion-panel screenshot"> Launch the autoscaler locally by executing the following command:
<summary className="cml-expansion-panel-summary">View a screenshot</summary>
<div className="cml-expansion-panel-content">
![image](../../img/webapp_aws_autoscaler_05.png) ```bash
python aws_autoscaler.py --run
```
</div> When the script runs, a configuration wizard prompts for instance details and budget configuration.
</details>
1. Enter the AWS credentials and AWS region name.
1. Set the Task to run in **ClearML Agent** services mode.
1. In **HYPER PARAMETERS** **>** **Args** **>** hover **>** **EDIT**.
1. Change the **remote** parameter to **true**.
<details className="cml-expansion-panel screenshot">
<summary className="cml-expansion-panel-summary">View a screenshot</summary>
<div className="cml-expansion-panel-content">
![image](../../img/webapp_aws_autoscaler_02.png)
</div>
</details>
1. Click **SAVE**.
1. In the experiments table, right click the **AWS Auto-Scaler** Task **>** **Enqueue** **>** **services** queue **>** **ENQUEUE**.
### Running Using the Script
The [aws_autoscaler.py](https://github.com/allegroai/clearml/blob/master/examples/services/aws-autoscaler/aws_autoscaler.py)
script includes a wizard which prompts for instance details and budget configuration.
The script can run in two ways:
* Configure and enqueue.
* Enqueue with an existing configuration.
#### To Configure and Enqueue:
Use the `run` command line option:
python aws_autoscaler.py --run
When the script runs, a configuration wizard prompts for all required information.
<br/>
<details className="cml-expansion-panel configuration">
<summary className="cml-expansion-panel-summary">View the configuration wizard steps</summary>
<div className="cml-expansion-panel-content">
1. The setup wizard begins. Enter the AWS credentials and AWS region name.
```console ```console
AWS Autoscaler setup wizard AWS Autoscaler setup wizard
@ -139,7 +49,7 @@ Use the `run` command line option:
Enter AWS region name [us-east-1b]: Enter AWS region name [us-east-1b]:
``` ```
1. Enter Git credentials. These are required by **ClearML Agent** to set up a Task execution environment in an AWS EC2 instance. 1. Enter Git credentials. These are required by ClearML Agent to set up a Task execution environment in an AWS EC2 instance.
```console ```console
GIT credentials: GIT credentials:
@ -160,30 +70,24 @@ Use the `run` command line option:
``` ```
1. For each AWS EC2 instance type that will be used in the budget, do the following: 1. For each AWS EC2 instance type that will be used in the budget, do the following:
* Choose the instance type
* Choose whether to use spot instances ```console
* Select an AMI Configure the machine types for the auto-scaler:
* Define the Amazon EBS volume ------------------------------------------------
Select Amazon instance type ['g4dn.4xlarge']:
Select as many instance types as needed. Use spot instances? [y/N]: y
Select availability zone ['us-east-1b']:
```console Select the Amazon Machine Image id ['ami-07c95cafbb788face']:
Configure the machine types for the auto-scaler: Enter the Amazon EBS device ['/dev/xvda']:
------------------------------------------------ Enter the Amazon EBS volume size (in GiB) [100]:
Select Amazon instance type ['g4dn.4xlarge']: Enter the Amazon EBS volume type ['gp2']:
Use spot instances? [y/N]: y ```
Select availability zone ['us-east-1b']:
Select the Amazon Machine Image id ['ami-07c95cafbb788face']:
Enter the Amazon EBS device ['/dev/xvda']:
Enter the Amazon EBS volume size (in GiB) [100]:
Enter the Amazon EBS volume type ['gp2']:
```
Name the instance type that was configured. Later in the configuration, use this name to create the budget. Name the instance type that was configured. Later in the configuration, use this name to create the budget.
```console ```console
Select a name for this instance type (used in the budget section) For example 'aws4gpu': Select a name for this instance type (used in the budget section) For example 'aws4gpu':
``` ```
The wizard prompts whether to select another instance type. The wizard prompts whether to select another instance type.
@ -191,15 +95,14 @@ Use the `run` command line option:
Define another instance type? [y/N]: Define another instance type? [y/N]:
``` ```
1. Before **ClearML Agent** executes, enter any bash script to run on newly created instances. 1. Enter any bash script to run on newly created instances before launching the ClearML Agent.
```console ```console
Enter any pre-execution bash script to be executed on the newly created instances []: Enter any pre-execution bash script to be executed on the newly created instances []:
``` ```
1. Configure the AWS autoscaler budget. For each queue that will be used in the budget, select the queue and the maximum 1. Configure the AWS autoscaler budget. For each queue that will be used in the budget, enter the maximum number of
number of each instance type, which the **ClearML** AWS autoscaler can spin up to execute experiments awaiting execution instances of a selected type that can be spun up simultaneously.
in that queue.
```console ```console
Define the machines budget: Define the machines budget:
@ -216,7 +119,7 @@ Use the `run` command line option:
Do you wish to add another instance type to queue? [y/N]: Do you wish to add another instance type to queue? [y/N]:
``` ```
1. The **ClearML** AWS autoscalar polls instances, and if instances have been idle for the maximum idle time that was specified, 1. The ClearML AWS autoscaler polls instances, and if instances have been idle for the maximum idle time that was specified,
the autoscaler spins them down. the autoscaler spins them down.
```console ```console
@ -224,10 +127,10 @@ Use the `run` command line option:
Enter instances polling interval for the auto-scaler (in minutes) [5]: Enter instances polling interval for the auto-scaler (in minutes) [5]:
``` ```
The configuration is complete, and a new task called `AWS Auto-Scaler` is created in the `DevOps` project. The service begins,
and the script prints a hyperlink to the Task's log.
The configuration is complete. **ClearML** initializes the Task `AWS Auto-Scaler`, the service begins, and the script
prints a hyperlink to the Task's log.
```console ```console
CLEARML Task: created new task id=d0ee5309a9a3471d8802f2561da60dfa CLEARML Task: created new task id=d0ee5309a9a3471d8802f2561da60dfa
CLEARML Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring CLEARML Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring
@ -236,15 +139,33 @@ Running AWS auto-scaler as a service
Execution log https://app.community.clear.ml/projects/142a598b5d234bebb37a57d692f5689f/experiments/d0ee5309a9a3471d8802f2561da60dfa/output/log Execution log https://app.community.clear.ml/projects/142a598b5d234bebb37a57d692f5689f/experiments/d0ee5309a9a3471d8802f2561da60dfa/output/log
``` ```
### Remote Execution
Using the `--remote` command line option will enqueue the autoscaler to your [`services` queue](../../clearml_agent.md#services-mode)
once the configuration wizard is complete:
</div></details> ```bash
python aws_autoscaler.py --remote
<br/> ```
Make sure a `clearml-agent` is assigned to that queue.
#### To Enqueue with an Existing Configuration: ## WebApp
### Configuration
Use the `remote` command line option: The values configured through the wizard are stored in the tasks hyperparameters and configuration objects by using the
[`Task.connect`](../../references/sdk/task.md#connect) and [`Task.set_configuration_object`](../../references/sdk/task.md#set_configuration_object)
methods respectively. They can be viewed in the WebApp, in the tasks **CONFIGURATION** page under **HYPER PARAMETERS** and **CONFIGURATION OBJECTS > General**.
python aws_autoscaler.py --remote ClearML automatically logs command line arguments defined with argparse. View them in the experiments **CONFIGURATION**
page under **HYPER PARAMETERS > General**.
When the script runs, it allows you to create a new configuration. ![Autoscaler configuration](../../img/examples_aws_autoscaler_config.png)
The task can be reused to launch another autoscaler instance: clone the task, then edit its parameters for the instance
types and budget configuration, and enqueue the task for execution (youll typically want to use a ClearML Agent running
in [services mode](../../clearml_agent.md#services-mode) for such service tasks).
### Console
All other console output appears in the experiments **RESULTS > CONSOLE**.
![Autoscaler console](../../img/examples_aws_autoscaler_console.png)

Binary file not shown.

After

Width:  |  Height:  |  Size: 70 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 77 KiB