clearml-docs/docs/webapp/applications/apps_gcp_autoscaler.md

205 lines
13 KiB
Markdown
Raw Normal View History

2022-08-03 09:15:15 +00:00
---
title: GCP Autoscaler
---
:::info Pro Plan Offering
The ClearML GCP Autoscaler App is available under the ClearML Pro plan
:::
2023-01-25 11:25:29 +00:00
The GCP Autoscaler Application optimizes GCP VM instance usage according to a user defined instance budget: Define your
2022-08-03 09:15:15 +00:00
budget by specifying the type and amount of available compute resources.
Each resource type is associated with a ClearML [queue](../../fundamentals/agents_and_queues.md#what-is-a-queue) whose
status determines the need for instances of that resource type (i.e. spin up new instances if there are pending jobs on
the queue).
When running, the autoscaler periodically polls your GCP cluster. The autoscaler automatically deletes idle VM instances
based on a specified maximum idle time, or spins up new VM instances when there aren't enough to execute pending tasks
in a queue (until reaching the defined maximum number of instances). You can add an init script, which will be executed
when each VM instance is spun up.
For more information about how autoscalers work, see [Autoscalers Overview](../../cloud_autoscaling/autoscaling_overview.md#autoscaler-applications).
2022-08-03 09:15:15 +00:00
## Autoscaler Instance Configuration
When configuring a new GCP Autoscaler instance, you can fill in the required parameters or reuse the configuration of
a previously launched instance.
Launch an app instance with the configuration of a previously launched instance using one of the following options:
* Cloning a previously launched app instance will open the instance launch form with the original instance's
configuration prefilled.
* Importing an app configuration file. You can export the configuration of a previously launched instance as a JSON file
when viewing its configuration.
The prefilled instance launch form can be edited before starting the new app instance.
To configure a new app instance, click `Launch New` <img src="/docs/latest/icons/ico-add.svg" alt="Add new" className="icon size-md space-sm" />
to open the app's instance launch form.
### Configuration Options
* **Import Configuration** - Import an app instance configuration file. This will fill the instance launch form with the
2024-06-10 07:40:43 +00:00
values from the file, which can be modified before launching the app instance
2022-08-03 09:15:15 +00:00
* **GCP Configuration**
* GCP Project ID - Project used for spinning up VM instances
* GCP Zone - The GCP zone where the VM instances will be spun up. See [Regions and zones](https://cloud.google.com/compute/docs/regions-zones)
2024-06-09 10:26:44 +00:00
* Use full subnet path - Select to specify a full subnet path (i.e. referencing a subnet from a different project)
* GCP Subnet Full Path - Available if `Use full subnet path` was selected. The GCP subnetwork where the instances
will be spun up. This allows setting a custom subnet resource path, and allows setting subnets shared from other
projects as well. See [GCP Documentation](https://cloud.google.com/dataflow/docs/guides/specifying-networks).
2024-06-30 06:16:11 +00:00
* GCP Subnet Name - Available if `Use full subnet path` was not selected. The GCP subnetwork where the instances
2024-06-09 10:26:44 +00:00
will be spun up. GCP setting will be `projects/{project-id}/regions/{region}/subnetworks/{subnetwork}`
2022-09-28 09:21:19 +00:00
* GCP Credentials - Credentials with which the autoscaler can access your GCP account for spinning VM instances
up/down. See [Generating GCP Credentials](#generating-gcp-credentials).
2022-08-03 09:15:15 +00:00
* **Git Configuration** - Git credentials with which the ClearML Agents running on your VM instances will access your
repositories to retrieve the code for their jobs
* Git User
* Git Password / Personal Access Token
2024-06-09 10:26:44 +00:00
* **Use docker mode** - If selected, tasks enqueued to the autoscaler will be executed by ClearML Agents running in
2024-07-15 12:53:41 +00:00
[Docker mode](../../clearml_agent/clearml_agent_execution_env.md#docker-mode)
2024-06-09 10:26:44 +00:00
* **Base Docker Image** (optional) - Available when `Use docker mode` is selected. Default Docker image in which the ClearML Agent will run. Provide an image stored in a
2022-08-03 09:15:15 +00:00
Docker artifactory so VM instances can automatically fetch it
* **Compute Resources**
2022-09-15 13:10:11 +00:00
* Resource Name - Assign a name to the resource type. This name will appear in the Autoscaler dashboard
2022-08-03 09:15:15 +00:00
* GCP Machine Type - See list of [machine types](https://cloud.google.com/compute/docs/machine-types)
* Run in CPU mode - Select to have the autoscaler utilize only CPU VM instances
* GPU Type - See list of [supported GPUs by instance](https://cloud.google.com/compute/docs/gpus)
2022-09-15 13:10:11 +00:00
* Use Preemptible Instance - Choose whether VM instances of this type will be [preemptible](https://cloud.google.com/compute/docs/instances/preemptible)
2024-06-09 10:26:44 +00:00
* Preemptible Provision Model - Select the provision model. For more information, see [Spot VMs](https://cloud.google.com/compute/docs/instances/spot)
and [Preemptible VMs](https://cloud.google.com/compute/docs/instances/preemptible)
* Regular Instance Rollback - When selected, if a preemptible instance is unavailable for the time specified in the
`Regular Instance Rollback Timeout` field, a regular instance will be spun up instead
* Preemptible Instance Blackout Period - Time (in minutes) to attempt using regular instances instead in case
acquiring a preemptible instance fails
2022-08-03 09:15:15 +00:00
* Max Number of Instances - Maximum number of concurrent running VM instances of this type allowed
* Monitored Queue - Queue associated with this VM instance type. The tasks enqueued to this queue will be executed on VM instances of this type
2023-04-16 09:32:48 +00:00
* Machine Image (optional) - The GCP machine image to launch
2024-03-21 09:02:34 +00:00
:::note
The machine image used for the autoscaler must include docker runtime and virtualenv
:::
2023-04-16 09:32:48 +00:00
* Disc Size (in GB) (optional)
2024-06-09 10:26:44 +00:00
* Use the default GCP Service Account - If selected, the default service account will be used. To use a
different service account, clear this option and fill in the `Service Account Email` field
* Service Account Scopes - Comma-separated scope aliases. For more information, see [GCP documentation](https://cloud.google.com/sdk/gcloud/reference/compute/instances/create#--scopes).
Fully-qualified scope URIs are supported. If left empty, the autoscaler will use the default scopes.
* Metadata - GCP Metadata to be applied on this resource's instances. Input comma separated key=value pairs (e.g. `"Owner=Admin,Foo=Bar"`).
2022-08-03 09:15:15 +00:00
* \+ Add Item - Define another resource type
2024-01-16 15:18:04 +00:00
* **Global Metadata** (optional) - GCP Metadata to be applied on all instances. Input comma separated key=value pairs (e.g. `"Owner=Admin,Foo=Bar"`).
2023-04-16 09:32:48 +00:00
* **Autoscaler Instance Name** (optional) - Name for the Autoscaler instance. This will appear in the instance list
* **Max Idle Time** (optional) - Maximum time in minutes that a VM instance can be idle before the autoscaler spins it down
2023-10-09 12:48:19 +00:00
* **Workers Prefix** (optional) - A Prefix added to workers' names, associating them with this autoscaler
2023-04-16 09:32:48 +00:00
* **Polling Interval** (optional) - Time period in minutes at which the designated queue is polled for new tasks
* **Apply Task Owner Vault Configuration** - Select to apply values from the task owner's [configuration vault](../webapp_profile.md#configuration-vault) when executing the task (available under ClearML Enterprise Plan)
2023-06-06 07:50:20 +00:00
* **Warn if more than one instance is executing the same task** - Select to print warning to console when multiple
instances are running the same task. In most cases, this indicates an issue.
* **Exclude .bashrc script** - Select in order to skip `.bashrc` script execution
* **Ignore vault parsing errors** - If not selected, the autoscaler will abort if encountering errors when loading vaults
on startup. This only applies to vaults loaded by the autoscaler itself, not to vaults loaded on cloud instances or by
tasks run by the autoscaler. For more information, see [Configuration Vault note](#configuration_vault) (available under ClearML Enterprise Plan).
2023-04-16 09:32:48 +00:00
* **Init Script** (optional) - A bash script to execute after launching the VM instance
* **Additional ClearML Configuration** (optional) - A ClearML configuration file to use by the ClearML Agent when executing your experiments
* **Run with Service Account** - Select to allow running the application under a [Service Account](../webapp_profile.md#service-accounts) identity instead of under your own identity (available under ClearML Enterprise Plan)
2024-06-10 07:40:43 +00:00
* **Export Configuration** - Export the app instance configuration as a JSON file, which you can later import to create
a new instance with the same configuration
2024-06-09 10:26:44 +00:00
![GCP autoscaler instance launch form](../../img/apps_gcp_autoscaler_wizard.png)
2022-08-03 09:15:15 +00:00
<a id="configuration_vault"/>
2024-02-15 13:28:26 +00:00
:::important Enterprise Feature
2023-05-24 13:41:23 +00:00
You can utilize the [configuration vault](../../webapp/webapp_profile.md#configuration-vault) to configure GCP
credentials for the Autoscaler in the following format:
2022-08-03 09:15:15 +00:00
```
auto_scaler.v1 {
gcp {
gcp_credentials: """
{
"type": "service_account",
...
}
"""
}
}
```
:::
## Dashboard
2022-11-08 11:49:34 +00:00
Once an autoscaler is launched, the autoscaler's dashboard provides information about available VM instances and their
2022-08-03 09:15:15 +00:00
status.
![GCP autoscaler dashboard](../../img/apps_gcp_autoscaler.png)
The autoscaler dashboard shows:
* Number of Idle Instances
* Queues and the resource type associated with them
* Number of current running instances
* Console: the application log containing everything printed to stdout and stderr appears in the console log. The log
2023-10-09 12:48:19 +00:00
shows polling results of the autoscaler's associated queues, including the number of tasks enqueued, and updates VM
2023-06-05 07:40:29 +00:00
instances being spun up/down
:::tip Console Debugging
2023-10-09 12:48:19 +00:00
To make the autoscaler console log show additional debug information, change an active app instance's log level to DEBUG:
1. Go to the app instance task's page > **CONFIGURATION** tab > **USER PROPERTIES** section
2023-06-05 07:40:29 +00:00
1. Hover over the section > Click `Edit` > Click `+ADD PARAMETER`
1. Input `log_level` as the key and `DEBUG` as the value of the new parameter.
![Autoscaler debugging](../../img/webapp_autoscaler_debug_log.png)
2023-10-09 12:48:19 +00:00
The console's log level will update in the autoscaler's next iteration.
2023-06-05 07:40:29 +00:00
:::
* Instance log files - Click to access the app instance's logs. This takes you to the app instance task's ARTIFACTS tab,
2023-10-09 12:48:19 +00:00
which lists the app instance's logs. In a log's `File Path` field, click <img src="/docs/latest/icons/ico-download-json.svg" alt="Download" className="icon size-sm space-sm" />
2023-06-05 07:40:29 +00:00
to download the complete log.
2022-09-28 09:21:19 +00:00
2023-05-17 08:38:28 +00:00
:::tip EMBEDDING CLEARML VISUALIZATION
You can embed plots from the app instance dashboard into [ClearML Reports](../webapp_reports.md). These visualizations
are updated live as the app instance(s) updates. The Enterprise Plan and Hosted Service support embedding resources in
external tools (e.g. Notion). Hover over the plot and click <img src="/docs/latest/icons/ico-plotly-embed-code.svg" alt="Embed code" className="icon size-md space-sm" />
to copy the embed code, and navigate to a report to paste the embed code.
:::
2022-09-28 09:21:19 +00:00
## Generating GCP Credentials
The autoscaler app accesses your GCP account with the credentials you provide.
You will need to create a service account with the required access privileges. Then generate credential keys for that
account to configure the autoscaler app:
1. In your GCP account, in the project of your choice, go to **APIs & Services** > **Credentials**
1. Click **+ CREATE CREDENTIALS** and choose the **Service account** option
![GCP create credentials](../../img/apps_gcp_autoscaler_credentials_1.png)
1. In the **Create service account** window that is opened, fill out the service account details
![GCP service account details](../../img/apps_gcp_autoscaler_credentials_2.png)
1. Assign the `Service Account User` and `Compute Admin` roles to your service account
![GCP service account roles](../../img/apps_gcp_autoscaler_credentials_3.png)
1. Complete creating the account
![GCP service account creation completion](../../img/apps_gcp_autoscaler_credentials_4.png)
1. In the **API & Services** > **Credentials** page, under **Service Accounts**, click on the service account you just
created and go to its **KEYS** tab
![GCP credential keys](../../img/apps_gcp_autoscaler_credentials_5.png)
1. Click **ADD KEY** and create a key in JSON format. Copy the contents of the JSON file.
2022-09-28 09:21:19 +00:00
![GCP credential key creation](../../img/apps_gcp_autoscaler_credentials_6.png)
1. Go to the GCP Autoscaler instance launch form **>** open the **GCP Configuration** panel **>** click *Edit* in the
**GCP Credentials** field.
![GCP credentials field](../../img/apps_gcp_autoscaler_credentials_6a.png)
Paste the contents of the JSON file from the previous step into the **GCP Credentials** popup.
2022-09-28 09:21:19 +00:00
![GCP credential instance launch form input](../../img/apps_gcp_autoscaler_credentials_7.png)