Update GUI applications (#851)

This commit is contained in:
pollfly 2024-06-09 13:26:44 +03:00 committed by GitHub
parent 039dbb6b1b
commit d4c5f6fa24
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
3 changed files with 59 additions and 22 deletions

View File

@ -20,8 +20,7 @@ each instance is spun up.
For more information about how autoscalers work, see [Autoscalers Overview](../../cloud_autoscaling/autoscaling_overview.md#autoscaler-applications).
## Autoscaler Instance Configuration
* **Import Configuration** - Import an app instance configuration file. This will fill the configuration wizard with the
values from the file, which can be modified before launching the app instance
* **AWS Credentials** - Credentials with which the autoscaler can access your AWS account. See [Generating AWS IAM Credentials](#generating-aws-iam-credentials)
* Use IAM role - Select if you are running your autoscalers on your own EC2 instances which are attached to an [IAM
role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html). In such a case, no AWS IAM credentials are required
@ -37,15 +36,19 @@ For more information about how autoscalers work, see [Autoscalers Overview](../.
down
* **Workers Prefix** (optional) - A Prefix added to workers' names, associating them with this autoscaler
* **Polling Interval** (optional) - Time period in minutes at which the designated queue is polled for new tasks
* **Base Docker Image** (optional) - Default Docker image in which the ClearML Agent will run. Provide a Docker stored
in a Docker artifactory so instances can automatically fetch it
* **Use docker mode** - If selected, tasks enqueued to the autoscaler will be executed by ClearML Agents running in
[Docker mode](../../clearml_agent.md#docker-mode)
* **Base Docker Image** (optional) - Available when `Use docker mode` is selected: Default Docker image in which the
ClearML Agent will run. Provide an image stored in a Docker artifactory so instances can automatically fetch it
* **Compute Resources**
* Resource Name - Assign a name to the resource type. This name will appear in the Autoscaler dashboard
* EC2 Instance Type - See [Instance Types](https://aws.amazon.com/ec2/instance-types) for full list of types
* Run in CPU mode - Check box to run with CPU only
* Use Spot Instance - Check box to use a spot instance. Else, a reserved instance is used
* Regular Instance Rollback Timeout - Controls when the autoscaler will revert to starting a regular instance after failing to start a spot instance. It will first attempt to start a spot, and then wait and retry again and again. Once the time it waited exceeded the Regular Instance Rollback Timeout, the autoscaler will try to start a regular instance instead. This is for a specific attempt, where starting a spot fails and an alternative instance needs to be started.
* Use Spot Instance - Select to use a spot instance. Otherwise, a reserved instance is used.
* Regular Instance Rollback - When selected, if a spot instance is unavailable for the time specified in the `Regular Instance Rollback Timeout`, a reserved instance will be spun up instead
* Regular Instance Rollback Timeout - Controls how long the autoscaler will wait for a spot instance to become available. It will first attempt to start a spot instance, then periodically retry. Once the specified time is exceeded, the autoscaler will try to start a reserved instance instead. The timeout applies for a specific attempt, where starting a spot fails and an alternative instance needs to be started.
* Spot Instance Blackout Period - Specifies a blackout period after failing to start a spot instance. This is related to future attempts: after failing to start a spot instance, all requests to start additional spot instances will be converted to attempts to start regular instances, as a way of "easing" the spot requests load on the cloud provider and not creating a "DOS" situation in the cloud account which might cause the provider to refuse creating spots for a longer period.
* Place tags on resources - In addition to placing tags on the instance, choose which cloud resources tags will be placed on
* Availability Zone - The [EC2 availability zone](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html#Concepts.RegionsAndAvailabilityZones.AvailabilityZones)
to launch this resource in
* AMI ID - The AWS AMI to launch
@ -86,8 +89,7 @@ For more information about how autoscalers work, see [Autoscalers Overview](../.
creating the instances launch spec. See [boto3 EC2.client.run_instances Request Syntax](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ec2/client/run_instances.html)
and [AWS API Reference: RunInstances](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_RunInstances.html) (available under ClearML Enterprise Plan)
* **Run with Service Account** - Select to allow running the application under a [Service Account](../webapp_profile.md#service-accounts) identity instead of under your own identity (available under ClearML Enterprise Plan)
* **Export Configuration** - Export the app instance configuration as a JSON file, which you can later import to create
a new instance with the same configuration
![Autoscaler wizard](../../img/app_aws_autoscaler_wizard.png)
@ -104,6 +106,10 @@ You can utilize the [configuration vault](../../webapp/webapp_profile.md#configu
the one set in the `Init script` field of the autoscaler wizard
* `extra_clearml_conf` - ClearML configuration to use by the ClearML Agent when executing your experiments. This
configuration will be appended to that set in the `Additional ClearML Configuration` field of the autoscaler wizard
* `files` - Files to create at designated paths with predefined content on the launched cloud instances.
For more information, see [Files Section](../../configs/clearml_conf.md#files-section)
* `environment` - Dictionary of environment variables and values to set in the OS environment of the launched cloud
instances. For more information, see [Environment Section](../../configs/clearml_conf.md#environment-section)
For example, the following configuration would be applied to all autoscaler instances:
@ -118,6 +124,24 @@ auto_scaler.v1.aws {
extra_clearml_conf: """
agent.docker_force_pull: true
"""
files {
boto3_file {
contents: |
boto3 {
pool_connections: 512
max_multipart_concurrency: 16
}
path: "/boto3_config.yaml"
target_format: yaml
mode: "0o644"
}
}
}
environment {
DB_PASSWORD: "secretpassword"
LOG_LEVEL: "info"
}
}
```

View File

@ -21,19 +21,25 @@ when each VM instance is spun up.
For more information about how autoscalers work, see [Autoscalers Overview](../../cloud_autoscaling/autoscaling_overview.md#autoscaler-applications).
## Autoscaler Instance Configuration
* **Import Configuration** - Import an app instance configuration file. This will fill the configuration wizard with the
values from the file, which can be modified before launching the app instance
* **GCP Configuration**
* GCP Project ID - Project used for spinning up VM instances
* GCP Zone - The GCP zone where the VM instances will be spun up. See [Regions and zones](https://cloud.google.com/compute/docs/regions-zones)
* GCP Subnetwork - The GCP subnetwork where the instances will be spun up. GCP setting will be `projects/{project-id}/regions/{region}/subnetworks/{subnetwork}`
* Use full subnet path - Select to specify a full subnet path (i.e. referencing a subnet from a different project)
* GCP Subnet Full Path - Available if `Use full subnet path` was selected. The GCP subnetwork where the instances
will be spun up. This allows setting a custom subnet resource path, and allows setting subnets shared from other
projects as well. See [GCP Documentation](https://cloud.google.com/dataflow/docs/guides/specifying-networks).
* GCP Subnet Name - Available if `Use full subnet path` was not selected. The GCP subnetwork where the instances
will be spun up. GCP setting will be `projects/{project-id}/regions/{region}/subnetworks/{subnetwork}`
* GCP Credentials - Credentials with which the autoscaler can access your GCP account for spinning VM instances
up/down. See [Generating GCP Credentials](#generating-gcp-credentials).
* **Git Configuration** - Git credentials with which the ClearML Agents running on your VM instances will access your
repositories to retrieve the code for their jobs
* Git User
* Git Password / Personal Access Token
* **Base Docker Image** (optional) - Default Docker image in which the ClearML Agent will run. Provide a Docker stored in a
* **Use docker mode** - If selected, tasks enqueued to the autoscaler will be executed by ClearML Agents running in
[Docker mode](../../clearml_agent.md#docker-mode)
* **Base Docker Image** (optional) - Available when `Use docker mode` is selected. Default Docker image in which the ClearML Agent will run. Provide an image stored in a
Docker artifactory so VM instances can automatically fetch it
* **Compute Resources**
* Resource Name - Assign a name to the resource type. This name will appear in the Autoscaler dashboard
@ -41,6 +47,12 @@ For more information about how autoscalers work, see [Autoscalers Overview](../.
* Run in CPU mode - Select to have the autoscaler utilize only CPU VM instances
* GPU Type - See list of [supported GPUs by instance](https://cloud.google.com/compute/docs/gpus)
* Use Preemptible Instance - Choose whether VM instances of this type will be [preemptible](https://cloud.google.com/compute/docs/instances/preemptible)
* Preemptible Provision Model - Select the provision model. For more information, see [Spot VMs](https://cloud.google.com/compute/docs/instances/spot)
and [Preemptible VMs](https://cloud.google.com/compute/docs/instances/preemptible)
* Regular Instance Rollback - When selected, if a preemptible instance is unavailable for the time specified in the
`Regular Instance Rollback Timeout` field, a regular instance will be spun up instead
* Preemptible Instance Blackout Period - Time (in minutes) to attempt using regular instances instead in case
acquiring a preemptible instance fails
* Max Number of Instances - Maximum number of concurrent running VM instances of this type allowed
* Monitored Queue - Queue associated with this VM instance type. The tasks enqueued to this queue will be executed on VM instances of this type
* Machine Image (optional) - The GCP machine image to launch
@ -48,6 +60,11 @@ For more information about how autoscalers work, see [Autoscalers Overview](../.
The machine image used for the autoscaler must include docker runtime and virtualenv
:::
* Disc Size (in GB) (optional)
* Use the default GCP Service Account - If selected, the default service account will be used. To use a
different service account, clear this option and fill in the `Service Account Email` field
* Service Account Scopes - Comma-separated scope aliases. For more information, see [GCP documentation](https://cloud.google.com/sdk/gcloud/reference/compute/instances/create#--scopes).
Fully-qualified scope URIs are supported. If left empty, the autoscaler will use the default scopes.
* Metadata - GCP Metadata to be applied on this resource's instances. Input comma separated key=value pairs (e.g. `"Owner=Admin,Foo=Bar"`).
* \+ Add Item - Define another resource type
* **Global Metadata** (optional) - GCP Metadata to be applied on all instances. Input comma separated key=value pairs (e.g. `"Owner=Admin,Foo=Bar"`).
* **Autoscaler Instance Name** (optional) - Name for the Autoscaler instance. This will appear in the instance list
@ -64,9 +81,7 @@ For more information about how autoscalers work, see [Autoscalers Overview](../.
* **Init Script** (optional) - A bash script to execute after launching the VM instance
* **Additional ClearML Configuration** (optional) - A ClearML configuration file to use by the ClearML Agent when executing your experiments
* **Run with Service Account** - Select to allow running the application under a [Service Account](../webapp_profile.md#service-accounts) identity instead of under your own identity (available under ClearML Enterprise Plan)
* **Export Configuration** - Export the app instance configuration as a JSON file, which you can later import to create
a new instance with the same configuration.
![GCP autoscaler wizard](../../img/apps_gcp_autoscaler_wizard.png)
<a id="configuration_vault"/>

View File

@ -6,7 +6,7 @@ title: Hyperparameter Optimization
The ClearML HPO App is available under the ClearML Pro plan
:::
The Hyperparameter Optimization Application finds the set of parameter values that optimize a specific metric for your
The Hyperparameter Optimization Application finds the set of parameter values that optimize a specific metric(s) for your
model.
It takes in a ClearML experiment and its parameters to optimize. The parameter search space can be specified
@ -19,16 +19,15 @@ Control the optimization process with the advanced configuration options, which
limits.
## HPO Instance Configuration
* **Import Configuration** - Import an app instance configuration file. This will fill the configuration wizard with the
values from the file, which can be modified before launching the app instance
* **Initial Task to Optimize** - ID of a ClearML task to optimize. This task will be cloned, and each clone will
sample a different set of hyperparameters values
* **Optimization Configuration**
* Optimization Method - The optimization strategy to employ (e.g. random, grid, hyperband)
* **Optimization Method** - The optimization strategy to employ (e.g. random, grid, hyperband)
* **Objectives** - Set the optimization targets of minimizing or maximizing the values of a specified metric(s)
* Optimization Objective Metric's Title - Title of metric to optimize
* Optimization Objective Metric's Series - Metric series (variant) to optimize
* Optimization Objective Trend - Choose the optimization target, whether to maximize or minimize the value of the
metric specified above
* \+ Add item - Add an objective
* **Execution Queue** - The [ClearML Queue](../../fundamentals/agents_and_queues.md#what-is-a-queue) to which
optimization tasks will be enqueued (make sure an agent is assigned to that queue)
* **Parameters to Optimize** - Parameters comprising the optimization space
@ -70,8 +69,7 @@ limits.
stopped. Iterations are based on the experiments' own reporting (for example, if experiments report every epoch,
then iterations=epochs)
* Limit Total Optimization Instance Time (Minutes) - Time limit for the whole optimization process (in minutes)
* **Export Configuration** - Export the app instance configuration as a JSON file, which you can later import to create
a new instance with the same configuration.
![HPO app wizard](../../img/apps_hpo_wizard.png)