Add ClearML GUI apps (#301)

This commit is contained in:
pollfly 2022-08-03 12:15:15 +03:00 committed by GitHub
parent c162d280cc
commit 495f30649a
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
18 changed files with 357 additions and 2 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 143 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 51 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 280 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 129 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 66 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 262 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 140 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 56 KiB

BIN
docs/img/apps_hpo.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 238 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 53 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 157 KiB

View File

@ -0,0 +1,144 @@
---
title: AWS Autoscaler
---
:::info Pro Plan Offering
The ClearML AWS Autoscaler App is available under the ClearML Pro plan
:::
The AWS Autoscaler Application optimizes AWS EC2 instance usage according to a user defined resource budget: define your
budget by specifying the type and amount of available compute resources.
Each resource type is associated with a ClearML [queue](../../fundamentals/agents_and_queues.md#what-is-a-queue) whose status determines the need for instances of that resource
type (i.e. spin up new instances if there are pending jobs on the queue).
When running, the autoscaler periodically polls your AWS cluster. The autoscaler automatically terminates idle instances
based on a specified maximum idle time, or spins up new instances when there aren't enough to execute pending tasks in a
queue (until reaching the defined maximum number of instances). You can add an init script, which will be executed when
each instance is spun up.
## Autoscaler Instance Configuration
* **AWS Credentials** - Credentials with which the autoscaler can access your AWS account. See [Generating AWS IAM Credentials](#generating-aws-iam-credentials)
* Use IAM role - Select if you are running your autoscalers on your own EC2 instances which are attached to an [IAM
role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html). In such a case, no AWS IAM credentials are required.
* AWS Region - [AWS Region](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html#Concepts.RegionsAndAvailabilityZones.Regions)
where the EC2 instances will be spun up
* AWS Access Key ID and AWS Secret Access Key - The credentials with which the autoscaler will access your AWS
account for spinning EC2 instances up/down
* **Git Configuration** - Git credentials with which the ClearML Agents running on your EC2 instances will access your
repositories to retrieve the code for their jobs
* Git User
* Git Password / Personal Access Token
* **Max Idle Time** (Optional) - Maximum time in minutes that an EC2 instance can be idle before the autoscaler spins it
down
* **Workers Prefix** (Optional) - A Prefix added to workers names, associating them with this autoscaler
* **Polling Interval** (Optional) - Time period in minutes at which the designated queue is polled for new tasks
* **Base Docker Image** (Optional) - Default Docker image in which the ClearML Agent will run. Provide a Docker stored
in a Docker artifactory so instances can automatically fetch it
* **Compute Resources**
* Resource Name - Assign a name to the resource type. This name will appear in the Autoscaler dashboard
* EC2 Instance Type - See [Instance Types](https://aws.amazon.com/ec2/instance-types) for full list of types
* Use Spot Instance - Check box to use a spot instance. Else, a reserved instance is used
* Availability Zone - The [EC2 availability zone](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html#Concepts.RegionsAndAvailabilityZones.AvailabilityZones)
to launch this resource in
* AMI ID - The AWS AMI to launch
* Max Number of Instances - Maximum number of concurrent running instances of this type allowed
* Monitored Queue - Queue associated with this instance type. The tasks enqueued to this queue will be executed on
instances of this type
* EC2 Tags (Optional) - AWS instance tags to attach to launched EC2 instances. Insert key=value pairs, separated by
commas
* EBS Device (Optional) - Disk mount point
* EBS Volume Size (Optional) - Disk size (GB)
* EBS Volume Type (Optional) - See [here](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volume-types.html)
for full list of types
* Instance Key Pair (Optional) - AWS key pair that is provided to the spun EC2 instances for connecting to them via
SSH. Provide the Key Pair's name, as was created in AWS. See [here](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html)
for more details.
* Security Group ID (Optional) - Comma separated list of AWS VPC Security Group IDs to attach to the launched
instance. Read more [here](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_SecurityGroups.html)
* \+ Add Item - Define another resource type
* **IAM Instance Profile** (Optional) - Set an IAM instance profile for all instances spun by the Autoscaler
* Arn - Amazon Resource Name specifying the instance profile
* Name - Name identifying the instance profile
* **Autoscaler Instance Name** (Optional) - Name for the Autoscaler instance. This will appear in the instance list.
* **Init script** (Optional) - A bash script to execute after launching the EC2 instance
* **Additional ClearML Configuration** (Optional) - A ClearML configuration file to use by the ClearML Agent when
executing your experiments
![Autoscaler wizard](../../img/app_aws_autoscaler_wizard.png)
:::note Enterprise Feature
You can utilize the [configuration vault](../../webapp/webapp_profile.md#configuration-vault) to globally add your AWS
credentials in the following format:
```
auto_scaler.v1 {
aws {
cloud_credentials_key: XXX
cloud_credentials_secret: XXX
}
```
:::
## Dashboard
Once an autoscaler is launched, the autoscaler's dashboard provides information about available EC2 instances and their
status.
![Autoscaler dashboard](../../img/app_aws_autoscaler.png)
The autoscaler dashboard shows:
* Number of idle Instances
* Queues and the resource type associated with them
* Number of current running instances
* Console: the application log containing everything printed to stdout and stderr appears in the console log. The log
shows polling results of the autoscalers associated queues, including the number of tasks enqueued, and updates EC2
instances being spun up/down.
## Generating AWS IAM Credentials
The autoscaler app accesses your AWS account with the credentials you provide.
You will need to create an AWS policy which grants the autoscaler app the required access privileges, attach the policy
to an IAM user, and create credentials keys for that user to configure in the autoscaler app:
1. In your AWS account, go to Services **Menu > IAM > Policies**
![AWS Policies](../../img/apps_aws_permissions_1.png)
1. Under policies, click **Create Policy**
![AWS create policy](../../img/apps_aws_permissions_2.png)
1. In the **Create Policy** modal, click on the JSON option
![AWS create policy JSON](../../img/apps_aws_permissions_3.png)
1. Insert the following policy into the text box:
```
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"ec2:DescribeInstances",
"ec2:TerminateInstances",
"ec2:RequestSpotInstances",
"ec2:DeleteTags",
"ec2:CreateTags",
"ec2:RunInstances",
"ec2:DescribeSpotInstanceRequests",
"ec2:GetConsoleOutput"
],
"Resource": "*"
}
]
}
```
1. Complete creating the policy
1. Attach the created policy to an IAM user/group whose credentials will be used in the autoscaler app (you can create a
new IAM user/group for this purpose)
1. Obtain a set of AWS IAM credentials for the user/group to which you have attached the created policy in the previous step

View File

@ -0,0 +1,84 @@
---
title: GCP Autoscaler
---
:::info Pro Plan Offering
The ClearML GCP Autoscaler App is available under the ClearML Pro plan
:::
The GCP Autoscaler Application optimizes GCP VM instance usage according to a user defined instance budget: Define your
budget by specifying the type and amount of available compute resources.
Each resource type is associated with a ClearML [queue](../../fundamentals/agents_and_queues.md#what-is-a-queue) whose
status determines the need for instances of that resource type (i.e. spin up new instances if there are pending jobs on
the queue).
When running, the autoscaler periodically polls your GCP cluster. The autoscaler automatically deletes idle VM instances
based on a specified maximum idle time, or spins up new VM instances when there aren't enough to execute pending tasks
in a queue (until reaching the defined maximum number of instances). You can add an init script, which will be executed
when each VM instance is spun up.
## Autoscaler Instance Configuration
* **GCP Configuration**
* GCP Project ID - Project used for spinning up VM instances
* GCP Zone - The GCP zone where the VM instances will be spun up. See [Regions and zones](https://cloud.google.com/compute/docs/regions-zones)
* GCP Credentials - Project credentials, see [here](https://cloud.google.com/docs/authentication/production) for
more details.
* **Git Configuration** - Git credentials with which the ClearML Agents running on your VM instances will access your
repositories to retrieve the code for their jobs
* Git User
* Git Password / Personal Access Token
* **Base Docker Image** (Optional) - Default Docker image in which the ClearML Agent will run. Provide a Docker stored in a
Docker artifactory so VM instances can automatically fetch it
* **Compute Resources**
* Resource Name - Assign a name to the resource type. This name will appear in the Autoscaler dashboard.
* GCP Machine Type - See list of [machine types](https://cloud.google.com/compute/docs/machine-types)
* Run in CPU mode - Select to have the autoscaler utilize only CPU VM instances
* GPU Type - See list of [supported GPUs by instance](https://cloud.google.com/compute/docs/gpus)
* Use Preemptible Instance - Choose whether VM instances of this type will be [preemptible](https://cloud.google.com/compute/docs/instances/preemptible).
* Max Number of Instances - Maximum number of concurrent running VM instances of this type allowed
* Monitored Queue - Queue associated with this VM instance type. The tasks enqueued to this queue will be executed on VM instances of this type
* Machine Image (Optional) - The GCP machine image to launch
* Disc Size (in GB) (Optional)
* \+ Add Item - Define another resource type
* **Autoscaler Instance Name** (Optional) - Name for the Autoscaler instance. This will appear in the instance list.
* **Max Idle Time** (Optional) - Maximum time in minutes that a VM instance can be idle before the autoscaler spins it down
* **Workers Prefix** (Optional) - A Prefix added to workers names, associating them with this autoscaler
* **Polling Interval** (Optional) - Time period in minutes at which the designated queue is polled for new tasks
* **Init Script** (Optional) - A bash script to execute after launching the VM instance
* **Additional ClearML Configuration** (Optional) - A ClearML configuration file to use by the ClearML Agent when executing your experiments
![GCP autoscaler wizard](../../img/apps_gcp_autoscaler_wizard.png)
:::note Enterprise Feature
You can utilize the [configuration vault](../../webapp/webapp_profile.md#configuration-vault) to globally add your GCP
credentials in the following format:
```
auto_scaler.v1 {
gcp {
gcp_credentials: """
{
"type": "service_account",
...
}
"""
}
}
```
:::
## Dashboard
Once an autoscaler is launched, The autoscaler's dashboard provides information about available VM instances and their
status.
![GCP autoscaler dashboard](../../img/apps_gcp_autoscaler.png)
The autoscaler dashboard shows:
* Number of Idle Instances
* Queues and the resource type associated with them
* Number of current running instances
* Console: the application log containing everything printed to stdout and stderr appears in the console log. The log
shows polling results of the autoscalers associated queues, including the number of tasks enqueued, and updates VM
instances being spun up/down.

View File

@ -0,0 +1,71 @@
---
title: Hyperparameter Optimization
---
:::info Pro Plan Offering
The ClearML HPO App is available under the ClearML Pro plan
:::
The Hyperparameter Optimization Application finds the set of parameter values that optimize a specific metric for your
model.
It takes in an existing ClearML experiment and its parameters to optimize. The parameter search space can be specified
by specific (discrete) values and/or value ranges (uniform parameters).
The optimization app launches multiple copies of the original experiment, each time sampling different parameter sets,
applying a user-selected optimization strategy (random search, Bayesian, etc.).
Control the optimization process with the advanced configuration options, which include time, iteration, and experiment
limits.
## HPO Instance Configuration
* **Initial Task to Optimize** - ID of an existing ClearML task to optimize. This task will be cloned, and each clone will
sample a different set of hyperparameters values.
* **Optimization Configuration**
* Optimization Method - The optimization strategy to employ (e.g. random, grid, hyperband)
* Optimization Objective Metrics Title - Title of metric to optimize
* Optimization Objective Metrics Series - Metric series (variant) to optimize
* Optimization Objective Trend - Choose the optimization target, whether to maximize or minimize the value of the
metric specified above
* **Execution Queue** - The [ClearML Queue](../../fundamentals/agents_and_queues.md#what-is-a-queue) to which
optimization tasks will be enqueued (make sure an agent is assigned to that queue)
* **Parameters to Optimize** - Parameters comprising the optimization space
* Type
* Uniform Parameters - A value range to sample
* Minimum Value
* Maximum Value
* Step Size - Step size between samples
* Discrete Parameters - A set of values to sample
* Values - Comma separated list of values to sample
* Name - The original tasks configuration parameter name (including section name e.g. `Args/lr`)
* **Optimization Job Title** (Optional) - Name for the HPO instance. This will appear in the instance list.
* **Optimization Experiments Destination Project** (Optional) - The project where optimization tasks will be saved.
Leave empty to use the same project as the Initial task.
* **Maximum Concurrent Tasks** - The maximum number of simultaneously running optimization experiments
* **Advanced Configuration** (Optional)
* Limit Total HPO Experiments - Maximum total number of optimization experiments
* Number of Top Experiments to Save - Number of best performing experiments to save (the rest are archived).
* Limit Single Experiment Running Time (Minutes) - Time limit per optimization experiment. Experiments will be
stopped after the specified time elapsed.
* Minimal Number of Iterations Per Single Experiment - Some search methods, such as Optuna, prune underperforming
experiments. This is the minimum number of iterations per experiment before it can be stopped. Iterations are
based on the experiments' own reporting (for example, if experiments report every epoch, then iterations=epochs)
* Maximum Number of Iterations Per Single Experiment - Maximum iterations per experiment after which it will be
stopped. Iterations are based on the experiments' own reporting (for example, if experiments report every epoch,
then iterations=epochs)
* Limit Total Optimization Instance Time (Minutes) - Time limit for the whole optimization process (in minutes)
![HPO app wizard](../../img/apps_hpo_wizard.png)
## Dashboard
Once an HPO instance is launched, the dashboard displays a summary of the optimization process.
![HPO dashboard](../../img/apps_hpo.png)
The HPO dashboard shows:
* Optimization Metric - Last reported and maximum / minimum values of objective metric over time
* Optimization Objective - Objective metric values per experiment
* Parallel coordinates - A visualization of parameter value impact on optimization objective
* Summary - Experiment summary table: experiment execution information, objective metric and parameter values.
* Budget - Available iterations and tasks budget (percentage, out of the values defined in the HPO instance's advanced configuration)
* Resources - Number of workers servicing the HPO execution queue, and the number of currently running optimization tasks

View File

@ -0,0 +1,46 @@
---
title: Overview
---
:::info Pro Plan Offering
ClearML Applications are available under the ClearML Pro plan
:::
Use ClearMLs GUI Applications to manage ML workloads and automatically run your recurring workflows without any coding.
![Apps page](../../img/apps_overview_page.png)
Configure and launch app instances, then track their execution from the app dashboard.
ClearML provides the following applications:
* [**AWS Autoscaler**](apps_aws_autoscaler.md) - Optimize AWS EC2 instance usage according to a defined instance budget
* [**GCP Autoscaler**](apps_gcp_autoscaler.md) - Optimize GCP instance usage according to a defined instance budget
* [**Hyperparameter Optimization**](apps_hpo) - Find the parameter values that yield the best performing models
* **Nvidia Clara** - Train models using Nvidias Clara framework
* **Project Dashboard** - High-level project monitoring with Slack alerts
## App Pages Layout
Each applications page is split into two sections:
* App Instance List - Launch new app instances and view previously launched instances. Click on an instance to view its
dashboard. Hover over it to access the [app instance actions](#app-instance-actions).
* App Instance Dashboard - The main section of the app page: displays the selected app instances status and results.
![App format](../../img/apps_format_overview.png)
## Launching an App Instance
1. Choose the desired app
1. Click the `Launch New` button <img src="/docs/latest/icons/ico-add.svg" alt="Add new" className="icon size-md space-sm" /> to open the apps configuration wizard
1. Fill in the configuration details
1. **Launch**
## App Instance Actions
Access app instance actions, by right clicking an instance, or through the menu button <img src="/docs/latest/icons/ico-dots-v-menu.svg" alt="Dot menu" className="icon size-md space-sm" /> (available on hover).
![App context menu](../../img/app_context_menu.png)
* **Rename** - Rename the instance
* **Configuration** - View an instances configuration
* **Stop** - Shutdown the instance
* **Clone** - Launch a new instance with same configuration prefilled
* **Delete** - Delete the instance

View File

@ -28,7 +28,7 @@ The ClearML Web UI is composed of the following pages:
* [Datasets](datasets/webapp_dataset_page.md) <img src="/docs/latest/icons/ico-side-bar-datasets.svg" alt="Datasets" className="icon size-md space-sm" /> - View and manage your datasets.
* [Pipelines](pipelines/webapp_pipeline_page.md) <img src="/docs/latest/icons/ico-pipelines.svg" className="icon size-md space-sm" /> - View and manage your pipelines.
* [Workers and Queues](webapp_workers_queues.md) <img src="/docs/latest/icons/ico-workers.svg" alt="Workers and Queues" className="icon size-md space-sm" /> - The resource monitoring and queues management page.
* [Applications](applications/apps_overview.md) <img src="/docs/latest/icons/ico-applications.svg" alt="ClearML Apps" className="icon size-md space-sm" /> - ClearML's GUI applications for no-code workflow execution.
* [Settings](webapp_profile.md) (available through the profile menu <img src="/docs/latest/icons/ico-me.svg" alt="Profile button" className="icon size-lg space-sm" />) -
Manage your ClearML user account:
* Set WebApp preferences

View File

@ -54,7 +54,17 @@ module.exports = {
'webapp/pipelines/webapp_pipeline_page', 'webapp/pipelines/webapp_pipeline_table', 'webapp/pipelines/webapp_pipeline_viewing'
]
},
'webapp/webapp_workers_queues', 'webapp/webapp_profile']
'webapp/webapp_workers_queues',
{
'ClearML Applications': [
'webapp/applications/apps_overview',
'webapp/applications/apps_aws_autoscaler',
'webapp/applications/apps_gcp_autoscaler',
'webapp/applications/apps_hpo'
]
},
'webapp/webapp_profile']
},
{'Configurations': ['configs/configuring_clearml', 'configs/clearml_conf', 'configs/env_vars']},
//'References': ['references/clearml_ref','references/clearml_agent_ref'],