Update GUI applications (#851)

This commit is contained in:
pollfly 2024-06-09 13:26:44 +03:00 committed by GitHub
parent 039dbb6b1b
commit d4c5f6fa24
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
3 changed files with 59 additions and 22 deletions

View File

@ -20,8 +20,7 @@ each instance is spun up.
For more information about how autoscalers work, see [Autoscalers Overview](../../cloud_autoscaling/autoscaling_overview.md#autoscaler-applications). For more information about how autoscalers work, see [Autoscalers Overview](../../cloud_autoscaling/autoscaling_overview.md#autoscaler-applications).
## Autoscaler Instance Configuration ## Autoscaler Instance Configuration
* **Import Configuration** - Import an app instance configuration file. This will fill the configuration wizard with the
values from the file, which can be modified before launching the app instance
* **AWS Credentials** - Credentials with which the autoscaler can access your AWS account. See [Generating AWS IAM Credentials](#generating-aws-iam-credentials) * **AWS Credentials** - Credentials with which the autoscaler can access your AWS account. See [Generating AWS IAM Credentials](#generating-aws-iam-credentials)
* Use IAM role - Select if you are running your autoscalers on your own EC2 instances which are attached to an [IAM * Use IAM role - Select if you are running your autoscalers on your own EC2 instances which are attached to an [IAM
role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html). In such a case, no AWS IAM credentials are required role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html). In such a case, no AWS IAM credentials are required
@ -37,15 +36,19 @@ For more information about how autoscalers work, see [Autoscalers Overview](../.
down down
* **Workers Prefix** (optional) - A Prefix added to workers' names, associating them with this autoscaler * **Workers Prefix** (optional) - A Prefix added to workers' names, associating them with this autoscaler
* **Polling Interval** (optional) - Time period in minutes at which the designated queue is polled for new tasks * **Polling Interval** (optional) - Time period in minutes at which the designated queue is polled for new tasks
* **Base Docker Image** (optional) - Default Docker image in which the ClearML Agent will run. Provide a Docker stored * **Use docker mode** - If selected, tasks enqueued to the autoscaler will be executed by ClearML Agents running in
in a Docker artifactory so instances can automatically fetch it [Docker mode](../../clearml_agent.md#docker-mode)
* **Base Docker Image** (optional) - Available when `Use docker mode` is selected: Default Docker image in which the
ClearML Agent will run. Provide an image stored in a Docker artifactory so instances can automatically fetch it
* **Compute Resources** * **Compute Resources**
* Resource Name - Assign a name to the resource type. This name will appear in the Autoscaler dashboard * Resource Name - Assign a name to the resource type. This name will appear in the Autoscaler dashboard
* EC2 Instance Type - See [Instance Types](https://aws.amazon.com/ec2/instance-types) for full list of types * EC2 Instance Type - See [Instance Types](https://aws.amazon.com/ec2/instance-types) for full list of types
* Run in CPU mode - Check box to run with CPU only * Run in CPU mode - Check box to run with CPU only
* Use Spot Instance - Check box to use a spot instance. Else, a reserved instance is used * Use Spot Instance - Select to use a spot instance. Otherwise, a reserved instance is used.
* Regular Instance Rollback Timeout - Controls when the autoscaler will revert to starting a regular instance after failing to start a spot instance. It will first attempt to start a spot, and then wait and retry again and again. Once the time it waited exceeded the Regular Instance Rollback Timeout, the autoscaler will try to start a regular instance instead. This is for a specific attempt, where starting a spot fails and an alternative instance needs to be started. * Regular Instance Rollback - When selected, if a spot instance is unavailable for the time specified in the `Regular Instance Rollback Timeout`, a reserved instance will be spun up instead
* Regular Instance Rollback Timeout - Controls how long the autoscaler will wait for a spot instance to become available. It will first attempt to start a spot instance, then periodically retry. Once the specified time is exceeded, the autoscaler will try to start a reserved instance instead. The timeout applies for a specific attempt, where starting a spot fails and an alternative instance needs to be started.
* Spot Instance Blackout Period - Specifies a blackout period after failing to start a spot instance. This is related to future attempts: after failing to start a spot instance, all requests to start additional spot instances will be converted to attempts to start regular instances, as a way of "easing" the spot requests load on the cloud provider and not creating a "DOS" situation in the cloud account which might cause the provider to refuse creating spots for a longer period. * Spot Instance Blackout Period - Specifies a blackout period after failing to start a spot instance. This is related to future attempts: after failing to start a spot instance, all requests to start additional spot instances will be converted to attempts to start regular instances, as a way of "easing" the spot requests load on the cloud provider and not creating a "DOS" situation in the cloud account which might cause the provider to refuse creating spots for a longer period.
* Place tags on resources - In addition to placing tags on the instance, choose which cloud resources tags will be placed on
* Availability Zone - The [EC2 availability zone](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html#Concepts.RegionsAndAvailabilityZones.AvailabilityZones) * Availability Zone - The [EC2 availability zone](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html#Concepts.RegionsAndAvailabilityZones.AvailabilityZones)
to launch this resource in to launch this resource in
* AMI ID - The AWS AMI to launch * AMI ID - The AWS AMI to launch
@ -86,8 +89,7 @@ For more information about how autoscalers work, see [Autoscalers Overview](../.
creating the instances launch spec. See [boto3 EC2.client.run_instances Request Syntax](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ec2/client/run_instances.html) creating the instances launch spec. See [boto3 EC2.client.run_instances Request Syntax](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ec2/client/run_instances.html)
and [AWS API Reference: RunInstances](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_RunInstances.html) (available under ClearML Enterprise Plan) and [AWS API Reference: RunInstances](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_RunInstances.html) (available under ClearML Enterprise Plan)
* **Run with Service Account** - Select to allow running the application under a [Service Account](../webapp_profile.md#service-accounts) identity instead of under your own identity (available under ClearML Enterprise Plan) * **Run with Service Account** - Select to allow running the application under a [Service Account](../webapp_profile.md#service-accounts) identity instead of under your own identity (available under ClearML Enterprise Plan)
* **Export Configuration** - Export the app instance configuration as a JSON file, which you can later import to create
a new instance with the same configuration
![Autoscaler wizard](../../img/app_aws_autoscaler_wizard.png) ![Autoscaler wizard](../../img/app_aws_autoscaler_wizard.png)
@ -104,6 +106,10 @@ You can utilize the [configuration vault](../../webapp/webapp_profile.md#configu
the one set in the `Init script` field of the autoscaler wizard the one set in the `Init script` field of the autoscaler wizard
* `extra_clearml_conf` - ClearML configuration to use by the ClearML Agent when executing your experiments. This * `extra_clearml_conf` - ClearML configuration to use by the ClearML Agent when executing your experiments. This
configuration will be appended to that set in the `Additional ClearML Configuration` field of the autoscaler wizard configuration will be appended to that set in the `Additional ClearML Configuration` field of the autoscaler wizard
* `files` - Files to create at designated paths with predefined content on the launched cloud instances.
For more information, see [Files Section](../../configs/clearml_conf.md#files-section)
* `environment` - Dictionary of environment variables and values to set in the OS environment of the launched cloud
instances. For more information, see [Environment Section](../../configs/clearml_conf.md#environment-section)
For example, the following configuration would be applied to all autoscaler instances: For example, the following configuration would be applied to all autoscaler instances:
@ -118,6 +124,24 @@ auto_scaler.v1.aws {
extra_clearml_conf: """ extra_clearml_conf: """
agent.docker_force_pull: true agent.docker_force_pull: true
""" """
files {
boto3_file {
contents: |
boto3 {
pool_connections: 512
max_multipart_concurrency: 16
}
path: "/boto3_config.yaml"
target_format: yaml
mode: "0o644"
}
}
}
environment {
DB_PASSWORD: "secretpassword"
LOG_LEVEL: "info"
}
} }
``` ```

View File

@ -21,19 +21,25 @@ when each VM instance is spun up.
For more information about how autoscalers work, see [Autoscalers Overview](../../cloud_autoscaling/autoscaling_overview.md#autoscaler-applications). For more information about how autoscalers work, see [Autoscalers Overview](../../cloud_autoscaling/autoscaling_overview.md#autoscaler-applications).
## Autoscaler Instance Configuration ## Autoscaler Instance Configuration
* **Import Configuration** - Import an app instance configuration file. This will fill the configuration wizard with the
values from the file, which can be modified before launching the app instance
* **GCP Configuration** * **GCP Configuration**
* GCP Project ID - Project used for spinning up VM instances * GCP Project ID - Project used for spinning up VM instances
* GCP Zone - The GCP zone where the VM instances will be spun up. See [Regions and zones](https://cloud.google.com/compute/docs/regions-zones) * GCP Zone - The GCP zone where the VM instances will be spun up. See [Regions and zones](https://cloud.google.com/compute/docs/regions-zones)
* GCP Subnetwork - The GCP subnetwork where the instances will be spun up. GCP setting will be `projects/{project-id}/regions/{region}/subnetworks/{subnetwork}` * Use full subnet path - Select to specify a full subnet path (i.e. referencing a subnet from a different project)
* GCP Subnet Full Path - Available if `Use full subnet path` was selected. The GCP subnetwork where the instances
will be spun up. This allows setting a custom subnet resource path, and allows setting subnets shared from other
projects as well. See [GCP Documentation](https://cloud.google.com/dataflow/docs/guides/specifying-networks).
* GCP Subnet Name - Available if `Use full subnet path` was not selected. The GCP subnetwork where the instances
will be spun up. GCP setting will be `projects/{project-id}/regions/{region}/subnetworks/{subnetwork}`
* GCP Credentials - Credentials with which the autoscaler can access your GCP account for spinning VM instances * GCP Credentials - Credentials with which the autoscaler can access your GCP account for spinning VM instances
up/down. See [Generating GCP Credentials](#generating-gcp-credentials). up/down. See [Generating GCP Credentials](#generating-gcp-credentials).
* **Git Configuration** - Git credentials with which the ClearML Agents running on your VM instances will access your * **Git Configuration** - Git credentials with which the ClearML Agents running on your VM instances will access your
repositories to retrieve the code for their jobs repositories to retrieve the code for their jobs
* Git User * Git User
* Git Password / Personal Access Token * Git Password / Personal Access Token
* **Base Docker Image** (optional) - Default Docker image in which the ClearML Agent will run. Provide a Docker stored in a * **Use docker mode** - If selected, tasks enqueued to the autoscaler will be executed by ClearML Agents running in
[Docker mode](../../clearml_agent.md#docker-mode)
* **Base Docker Image** (optional) - Available when `Use docker mode` is selected. Default Docker image in which the ClearML Agent will run. Provide an image stored in a
Docker artifactory so VM instances can automatically fetch it Docker artifactory so VM instances can automatically fetch it
* **Compute Resources** * **Compute Resources**
* Resource Name - Assign a name to the resource type. This name will appear in the Autoscaler dashboard * Resource Name - Assign a name to the resource type. This name will appear in the Autoscaler dashboard
@ -41,6 +47,12 @@ For more information about how autoscalers work, see [Autoscalers Overview](../.
* Run in CPU mode - Select to have the autoscaler utilize only CPU VM instances * Run in CPU mode - Select to have the autoscaler utilize only CPU VM instances
* GPU Type - See list of [supported GPUs by instance](https://cloud.google.com/compute/docs/gpus) * GPU Type - See list of [supported GPUs by instance](https://cloud.google.com/compute/docs/gpus)
* Use Preemptible Instance - Choose whether VM instances of this type will be [preemptible](https://cloud.google.com/compute/docs/instances/preemptible) * Use Preemptible Instance - Choose whether VM instances of this type will be [preemptible](https://cloud.google.com/compute/docs/instances/preemptible)
* Preemptible Provision Model - Select the provision model. For more information, see [Spot VMs](https://cloud.google.com/compute/docs/instances/spot)
and [Preemptible VMs](https://cloud.google.com/compute/docs/instances/preemptible)
* Regular Instance Rollback - When selected, if a preemptible instance is unavailable for the time specified in the
`Regular Instance Rollback Timeout` field, a regular instance will be spun up instead
* Preemptible Instance Blackout Period - Time (in minutes) to attempt using regular instances instead in case
acquiring a preemptible instance fails
* Max Number of Instances - Maximum number of concurrent running VM instances of this type allowed * Max Number of Instances - Maximum number of concurrent running VM instances of this type allowed
* Monitored Queue - Queue associated with this VM instance type. The tasks enqueued to this queue will be executed on VM instances of this type * Monitored Queue - Queue associated with this VM instance type. The tasks enqueued to this queue will be executed on VM instances of this type
* Machine Image (optional) - The GCP machine image to launch * Machine Image (optional) - The GCP machine image to launch
@ -48,6 +60,11 @@ For more information about how autoscalers work, see [Autoscalers Overview](../.
The machine image used for the autoscaler must include docker runtime and virtualenv The machine image used for the autoscaler must include docker runtime and virtualenv
::: :::
* Disc Size (in GB) (optional) * Disc Size (in GB) (optional)
* Use the default GCP Service Account - If selected, the default service account will be used. To use a
different service account, clear this option and fill in the `Service Account Email` field
* Service Account Scopes - Comma-separated scope aliases. For more information, see [GCP documentation](https://cloud.google.com/sdk/gcloud/reference/compute/instances/create#--scopes).
Fully-qualified scope URIs are supported. If left empty, the autoscaler will use the default scopes.
* Metadata - GCP Metadata to be applied on this resource's instances. Input comma separated key=value pairs (e.g. `"Owner=Admin,Foo=Bar"`).
* \+ Add Item - Define another resource type * \+ Add Item - Define another resource type
* **Global Metadata** (optional) - GCP Metadata to be applied on all instances. Input comma separated key=value pairs (e.g. `"Owner=Admin,Foo=Bar"`). * **Global Metadata** (optional) - GCP Metadata to be applied on all instances. Input comma separated key=value pairs (e.g. `"Owner=Admin,Foo=Bar"`).
* **Autoscaler Instance Name** (optional) - Name for the Autoscaler instance. This will appear in the instance list * **Autoscaler Instance Name** (optional) - Name for the Autoscaler instance. This will appear in the instance list
@ -64,9 +81,7 @@ For more information about how autoscalers work, see [Autoscalers Overview](../.
* **Init Script** (optional) - A bash script to execute after launching the VM instance * **Init Script** (optional) - A bash script to execute after launching the VM instance
* **Additional ClearML Configuration** (optional) - A ClearML configuration file to use by the ClearML Agent when executing your experiments * **Additional ClearML Configuration** (optional) - A ClearML configuration file to use by the ClearML Agent when executing your experiments
* **Run with Service Account** - Select to allow running the application under a [Service Account](../webapp_profile.md#service-accounts) identity instead of under your own identity (available under ClearML Enterprise Plan) * **Run with Service Account** - Select to allow running the application under a [Service Account](../webapp_profile.md#service-accounts) identity instead of under your own identity (available under ClearML Enterprise Plan)
* **Export Configuration** - Export the app instance configuration as a JSON file, which you can later import to create
a new instance with the same configuration.
![GCP autoscaler wizard](../../img/apps_gcp_autoscaler_wizard.png) ![GCP autoscaler wizard](../../img/apps_gcp_autoscaler_wizard.png)
<a id="configuration_vault"/> <a id="configuration_vault"/>

View File

@ -6,7 +6,7 @@ title: Hyperparameter Optimization
The ClearML HPO App is available under the ClearML Pro plan The ClearML HPO App is available under the ClearML Pro plan
::: :::
The Hyperparameter Optimization Application finds the set of parameter values that optimize a specific metric for your The Hyperparameter Optimization Application finds the set of parameter values that optimize a specific metric(s) for your
model. model.
It takes in a ClearML experiment and its parameters to optimize. The parameter search space can be specified It takes in a ClearML experiment and its parameters to optimize. The parameter search space can be specified
@ -19,16 +19,15 @@ Control the optimization process with the advanced configuration options, which
limits. limits.
## HPO Instance Configuration ## HPO Instance Configuration
* **Import Configuration** - Import an app instance configuration file. This will fill the configuration wizard with the
values from the file, which can be modified before launching the app instance
* **Initial Task to Optimize** - ID of a ClearML task to optimize. This task will be cloned, and each clone will * **Initial Task to Optimize** - ID of a ClearML task to optimize. This task will be cloned, and each clone will
sample a different set of hyperparameters values sample a different set of hyperparameters values
* **Optimization Configuration** * **Optimization Method** - The optimization strategy to employ (e.g. random, grid, hyperband)
* Optimization Method - The optimization strategy to employ (e.g. random, grid, hyperband) * **Objectives** - Set the optimization targets of minimizing or maximizing the values of a specified metric(s)
* Optimization Objective Metric's Title - Title of metric to optimize * Optimization Objective Metric's Title - Title of metric to optimize
* Optimization Objective Metric's Series - Metric series (variant) to optimize * Optimization Objective Metric's Series - Metric series (variant) to optimize
* Optimization Objective Trend - Choose the optimization target, whether to maximize or minimize the value of the * Optimization Objective Trend - Choose the optimization target, whether to maximize or minimize the value of the
metric specified above metric specified above
* \+ Add item - Add an objective
* **Execution Queue** - The [ClearML Queue](../../fundamentals/agents_and_queues.md#what-is-a-queue) to which * **Execution Queue** - The [ClearML Queue](../../fundamentals/agents_and_queues.md#what-is-a-queue) to which
optimization tasks will be enqueued (make sure an agent is assigned to that queue) optimization tasks will be enqueued (make sure an agent is assigned to that queue)
* **Parameters to Optimize** - Parameters comprising the optimization space * **Parameters to Optimize** - Parameters comprising the optimization space
@ -70,8 +69,7 @@ limits.
stopped. Iterations are based on the experiments' own reporting (for example, if experiments report every epoch, stopped. Iterations are based on the experiments' own reporting (for example, if experiments report every epoch,
then iterations=epochs) then iterations=epochs)
* Limit Total Optimization Instance Time (Minutes) - Time limit for the whole optimization process (in minutes) * Limit Total Optimization Instance Time (Minutes) - Time limit for the whole optimization process (in minutes)
* **Export Configuration** - Export the app instance configuration as a JSON file, which you can later import to create
a new instance with the same configuration.
![HPO app wizard](../../img/apps_hpo_wizard.png) ![HPO app wizard](../../img/apps_hpo_wizard.png)