Add ClearML GUI apps (#301)
BIN
docs/img/app_aws_autoscaler.png
Normal file
After Width: | Height: | Size: 143 KiB |
BIN
docs/img/app_aws_autoscaler_wizard.png
Normal file
After Width: | Height: | Size: 51 KiB |
BIN
docs/img/app_context_menu.png
Normal file
After Width: | Height: | Size: 27 KiB |
BIN
docs/img/apps_aws_permissions_1.png
Normal file
After Width: | Height: | Size: 280 KiB |
BIN
docs/img/apps_aws_permissions_2.png
Normal file
After Width: | Height: | Size: 129 KiB |
BIN
docs/img/apps_aws_permissions_3.png
Normal file
After Width: | Height: | Size: 66 KiB |
BIN
docs/img/apps_format_overview.png
Normal file
After Width: | Height: | Size: 262 KiB |
BIN
docs/img/apps_gcp_autoscaler.png
Normal file
After Width: | Height: | Size: 140 KiB |
BIN
docs/img/apps_gcp_autoscaler_wizard.png
Normal file
After Width: | Height: | Size: 56 KiB |
BIN
docs/img/apps_hpo.png
Normal file
After Width: | Height: | Size: 238 KiB |
BIN
docs/img/apps_hpo_wizard.png
Normal file
After Width: | Height: | Size: 53 KiB |
BIN
docs/img/apps_overview_page.png
Normal file
After Width: | Height: | Size: 157 KiB |
144
docs/webapp/applications/apps_aws_autoscaler.md
Normal file
@ -0,0 +1,144 @@
|
||||
---
|
||||
title: AWS Autoscaler
|
||||
---
|
||||
|
||||
:::info Pro Plan Offering
|
||||
The ClearML AWS Autoscaler App is available under the ClearML Pro plan
|
||||
:::
|
||||
|
||||
The AWS Autoscaler Application optimizes AWS EC2 instance usage according to a user defined resource budget: define your
|
||||
budget by specifying the type and amount of available compute resources.
|
||||
|
||||
Each resource type is associated with a ClearML [queue](../../fundamentals/agents_and_queues.md#what-is-a-queue) whose status determines the need for instances of that resource
|
||||
type (i.e. spin up new instances if there are pending jobs on the queue).
|
||||
|
||||
When running, the autoscaler periodically polls your AWS cluster. The autoscaler automatically terminates idle instances
|
||||
based on a specified maximum idle time, or spins up new instances when there aren't enough to execute pending tasks in a
|
||||
queue (until reaching the defined maximum number of instances). You can add an init script, which will be executed when
|
||||
each instance is spun up.
|
||||
|
||||
## Autoscaler Instance Configuration
|
||||
* **AWS Credentials** - Credentials with which the autoscaler can access your AWS account. See [Generating AWS IAM Credentials](#generating-aws-iam-credentials)
|
||||
* Use IAM role - Select if you are running your autoscalers on your own EC2 instances which are attached to an [IAM
|
||||
role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html). In such a case, no AWS IAM credentials are required.
|
||||
* AWS Region - [AWS Region](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html#Concepts.RegionsAndAvailabilityZones.Regions)
|
||||
where the EC2 instances will be spun up
|
||||
* AWS Access Key ID and AWS Secret Access Key - The credentials with which the autoscaler will access your AWS
|
||||
account for spinning EC2 instances up/down
|
||||
* **Git Configuration** - Git credentials with which the ClearML Agents running on your EC2 instances will access your
|
||||
repositories to retrieve the code for their jobs
|
||||
* Git User
|
||||
* Git Password / Personal Access Token
|
||||
* **Max Idle Time** (Optional) - Maximum time in minutes that an EC2 instance can be idle before the autoscaler spins it
|
||||
down
|
||||
* **Workers Prefix** (Optional) - A Prefix added to workers’ names, associating them with this autoscaler
|
||||
* **Polling Interval** (Optional) - Time period in minutes at which the designated queue is polled for new tasks
|
||||
* **Base Docker Image** (Optional) - Default Docker image in which the ClearML Agent will run. Provide a Docker stored
|
||||
in a Docker artifactory so instances can automatically fetch it
|
||||
* **Compute Resources**
|
||||
* Resource Name - Assign a name to the resource type. This name will appear in the Autoscaler dashboard
|
||||
* EC2 Instance Type - See [Instance Types](https://aws.amazon.com/ec2/instance-types) for full list of types
|
||||
* Use Spot Instance - Check box to use a spot instance. Else, a reserved instance is used
|
||||
* Availability Zone - The [EC2 availability zone](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html#Concepts.RegionsAndAvailabilityZones.AvailabilityZones)
|
||||
to launch this resource in
|
||||
* AMI ID - The AWS AMI to launch
|
||||
* Max Number of Instances - Maximum number of concurrent running instances of this type allowed
|
||||
* Monitored Queue - Queue associated with this instance type. The tasks enqueued to this queue will be executed on
|
||||
instances of this type
|
||||
* EC2 Tags (Optional) - AWS instance tags to attach to launched EC2 instances. Insert key=value pairs, separated by
|
||||
commas
|
||||
* EBS Device (Optional) - Disk mount point
|
||||
* EBS Volume Size (Optional) - Disk size (GB)
|
||||
* EBS Volume Type (Optional) - See [here](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volume-types.html)
|
||||
for full list of types
|
||||
* Instance Key Pair (Optional) - AWS key pair that is provided to the spun EC2 instances for connecting to them via
|
||||
SSH. Provide the Key Pair's name, as was created in AWS. See [here](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html)
|
||||
for more details.
|
||||
* Security Group ID (Optional) - Comma separated list of AWS VPC Security Group IDs to attach to the launched
|
||||
instance. Read more [here](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_SecurityGroups.html)
|
||||
* \+ Add Item - Define another resource type
|
||||
* **IAM Instance Profile** (Optional) - Set an IAM instance profile for all instances spun by the Autoscaler
|
||||
* Arn - Amazon Resource Name specifying the instance profile
|
||||
* Name - Name identifying the instance profile
|
||||
* **Autoscaler Instance Name** (Optional) - Name for the Autoscaler instance. This will appear in the instance list.
|
||||
* **Init script** (Optional) - A bash script to execute after launching the EC2 instance
|
||||
* **Additional ClearML Configuration** (Optional) - A ClearML configuration file to use by the ClearML Agent when
|
||||
executing your experiments
|
||||
|
||||

|
||||
|
||||
:::note Enterprise Feature
|
||||
You can utilize the [configuration vault](../../webapp/webapp_profile.md#configuration-vault) to globally add your AWS
|
||||
credentials in the following format:
|
||||
|
||||
```
|
||||
auto_scaler.v1 {
|
||||
aws {
|
||||
cloud_credentials_key: XXX
|
||||
cloud_credentials_secret: XXX
|
||||
}
|
||||
```
|
||||
:::
|
||||
|
||||
## Dashboard
|
||||
Once an autoscaler is launched, the autoscaler's dashboard provides information about available EC2 instances and their
|
||||
status.
|
||||
|
||||

|
||||
|
||||
The autoscaler dashboard shows:
|
||||
* Number of idle Instances
|
||||
* Queues and the resource type associated with them
|
||||
* Number of current running instances
|
||||
* Console: the application log containing everything printed to stdout and stderr appears in the console log. The log
|
||||
shows polling results of the autoscaler’s associated queues, including the number of tasks enqueued, and updates EC2
|
||||
instances being spun up/down.
|
||||
|
||||
## Generating AWS IAM Credentials
|
||||
|
||||
The autoscaler app accesses your AWS account with the credentials you provide.
|
||||
|
||||
You will need to create an AWS policy which grants the autoscaler app the required access privileges, attach the policy
|
||||
to an IAM user, and create credentials keys for that user to configure in the autoscaler app:
|
||||
|
||||
1. In your AWS account, go to Services **Menu > IAM > Policies**
|
||||
|
||||

|
||||
|
||||
1. Under policies, click **Create Policy**
|
||||
|
||||

|
||||
|
||||
1. In the **Create Policy** modal, click on the JSON option
|
||||
|
||||

|
||||
|
||||
1. Insert the following policy into the text box:
|
||||
|
||||
```
|
||||
{
|
||||
"Version": "2012-10-17",
|
||||
"Statement": [
|
||||
{
|
||||
"Sid": "VisualEditor0",
|
||||
"Effect": "Allow",
|
||||
"Action": [
|
||||
"ec2:DescribeInstances",
|
||||
"ec2:TerminateInstances",
|
||||
"ec2:RequestSpotInstances",
|
||||
"ec2:DeleteTags",
|
||||
"ec2:CreateTags",
|
||||
"ec2:RunInstances",
|
||||
"ec2:DescribeSpotInstanceRequests",
|
||||
"ec2:GetConsoleOutput"
|
||||
],
|
||||
"Resource": "*"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
1. Complete creating the policy
|
||||
1. Attach the created policy to an IAM user/group whose credentials will be used in the autoscaler app (you can create a
|
||||
new IAM user/group for this purpose)
|
||||
1. Obtain a set of AWS IAM credentials for the user/group to which you have attached the created policy in the previous step
|
84
docs/webapp/applications/apps_gcp_autoscaler.md
Normal file
@ -0,0 +1,84 @@
|
||||
---
|
||||
title: GCP Autoscaler
|
||||
---
|
||||
|
||||
:::info Pro Plan Offering
|
||||
The ClearML GCP Autoscaler App is available under the ClearML Pro plan
|
||||
:::
|
||||
|
||||
The GCP Autoscaler Application optimizes GCP VM instance usage according to a user defined instance budget: Define your
|
||||
budget by specifying the type and amount of available compute resources.
|
||||
|
||||
Each resource type is associated with a ClearML [queue](../../fundamentals/agents_and_queues.md#what-is-a-queue) whose
|
||||
status determines the need for instances of that resource type (i.e. spin up new instances if there are pending jobs on
|
||||
the queue).
|
||||
|
||||
When running, the autoscaler periodically polls your GCP cluster. The autoscaler automatically deletes idle VM instances
|
||||
based on a specified maximum idle time, or spins up new VM instances when there aren't enough to execute pending tasks
|
||||
in a queue (until reaching the defined maximum number of instances). You can add an init script, which will be executed
|
||||
when each VM instance is spun up.
|
||||
|
||||
## Autoscaler Instance Configuration
|
||||
* **GCP Configuration**
|
||||
* GCP Project ID - Project used for spinning up VM instances
|
||||
* GCP Zone - The GCP zone where the VM instances will be spun up. See [Regions and zones](https://cloud.google.com/compute/docs/regions-zones)
|
||||
* GCP Credentials - Project credentials, see [here](https://cloud.google.com/docs/authentication/production) for
|
||||
more details.
|
||||
* **Git Configuration** - Git credentials with which the ClearML Agents running on your VM instances will access your
|
||||
repositories to retrieve the code for their jobs
|
||||
* Git User
|
||||
* Git Password / Personal Access Token
|
||||
* **Base Docker Image** (Optional) - Default Docker image in which the ClearML Agent will run. Provide a Docker stored in a
|
||||
Docker artifactory so VM instances can automatically fetch it
|
||||
* **Compute Resources**
|
||||
* Resource Name - Assign a name to the resource type. This name will appear in the Autoscaler dashboard.
|
||||
* GCP Machine Type - See list of [machine types](https://cloud.google.com/compute/docs/machine-types)
|
||||
* Run in CPU mode - Select to have the autoscaler utilize only CPU VM instances
|
||||
* GPU Type - See list of [supported GPUs by instance](https://cloud.google.com/compute/docs/gpus)
|
||||
* Use Preemptible Instance - Choose whether VM instances of this type will be [preemptible](https://cloud.google.com/compute/docs/instances/preemptible).
|
||||
* Max Number of Instances - Maximum number of concurrent running VM instances of this type allowed
|
||||
* Monitored Queue - Queue associated with this VM instance type. The tasks enqueued to this queue will be executed on VM instances of this type
|
||||
* Machine Image (Optional) - The GCP machine image to launch
|
||||
* Disc Size (in GB) (Optional)
|
||||
* \+ Add Item - Define another resource type
|
||||
* **Autoscaler Instance Name** (Optional) - Name for the Autoscaler instance. This will appear in the instance list.
|
||||
* **Max Idle Time** (Optional) - Maximum time in minutes that a VM instance can be idle before the autoscaler spins it down
|
||||
* **Workers Prefix** (Optional) - A Prefix added to workers’ names, associating them with this autoscaler
|
||||
* **Polling Interval** (Optional) - Time period in minutes at which the designated queue is polled for new tasks
|
||||
* **Init Script** (Optional) - A bash script to execute after launching the VM instance
|
||||
* **Additional ClearML Configuration** (Optional) - A ClearML configuration file to use by the ClearML Agent when executing your experiments
|
||||
|
||||

|
||||
|
||||
:::note Enterprise Feature
|
||||
You can utilize the [configuration vault](../../webapp/webapp_profile.md#configuration-vault) to globally add your GCP
|
||||
credentials in the following format:
|
||||
|
||||
```
|
||||
auto_scaler.v1 {
|
||||
gcp {
|
||||
gcp_credentials: """
|
||||
{
|
||||
"type": "service_account",
|
||||
...
|
||||
}
|
||||
"""
|
||||
}
|
||||
}
|
||||
```
|
||||
:::
|
||||
|
||||
## Dashboard
|
||||
|
||||
Once an autoscaler is launched, The autoscaler's dashboard provides information about available VM instances and their
|
||||
status.
|
||||
|
||||

|
||||
|
||||
The autoscaler dashboard shows:
|
||||
* Number of Idle Instances
|
||||
* Queues and the resource type associated with them
|
||||
* Number of current running instances
|
||||
* Console: the application log containing everything printed to stdout and stderr appears in the console log. The log
|
||||
shows polling results of the autoscaler’s associated queues, including the number of tasks enqueued, and updates VM
|
||||
instances being spun up/down.
|
71
docs/webapp/applications/apps_hpo.md
Normal file
@ -0,0 +1,71 @@
|
||||
---
|
||||
title: Hyperparameter Optimization
|
||||
---
|
||||
|
||||
:::info Pro Plan Offering
|
||||
The ClearML HPO App is available under the ClearML Pro plan
|
||||
:::
|
||||
|
||||
The Hyperparameter Optimization Application finds the set of parameter values that optimize a specific metric for your
|
||||
model.
|
||||
|
||||
It takes in an existing ClearML experiment and its parameters to optimize. The parameter search space can be specified
|
||||
by specific (discrete) values and/or value ranges (uniform parameters).
|
||||
|
||||
The optimization app launches multiple copies of the original experiment, each time sampling different parameter sets,
|
||||
applying a user-selected optimization strategy (random search, Bayesian, etc.).
|
||||
|
||||
Control the optimization process with the advanced configuration options, which include time, iteration, and experiment
|
||||
limits.
|
||||
|
||||
## HPO Instance Configuration
|
||||
* **Initial Task to Optimize** - ID of an existing ClearML task to optimize. This task will be cloned, and each clone will
|
||||
sample a different set of hyperparameters values.
|
||||
* **Optimization Configuration**
|
||||
* Optimization Method - The optimization strategy to employ (e.g. random, grid, hyperband)
|
||||
* Optimization Objective Metric’s Title - Title of metric to optimize
|
||||
* Optimization Objective Metric’s Series - Metric series (variant) to optimize
|
||||
* Optimization Objective Trend - Choose the optimization target, whether to maximize or minimize the value of the
|
||||
metric specified above
|
||||
* **Execution Queue** - The [ClearML Queue](../../fundamentals/agents_and_queues.md#what-is-a-queue) to which
|
||||
optimization tasks will be enqueued (make sure an agent is assigned to that queue)
|
||||
* **Parameters to Optimize** - Parameters comprising the optimization space
|
||||
* Type
|
||||
* Uniform Parameters - A value range to sample
|
||||
* Minimum Value
|
||||
* Maximum Value
|
||||
* Step Size - Step size between samples
|
||||
* Discrete Parameters - A set of values to sample
|
||||
* Values - Comma separated list of values to sample
|
||||
* Name - The original task’s configuration parameter name (including section name e.g. `Args/lr`)
|
||||
* **Optimization Job Title** (Optional) - Name for the HPO instance. This will appear in the instance list.
|
||||
* **Optimization Experiments Destination Project** (Optional) - The project where optimization tasks will be saved.
|
||||
Leave empty to use the same project as the Initial task.
|
||||
* **Maximum Concurrent Tasks** - The maximum number of simultaneously running optimization experiments
|
||||
* **Advanced Configuration** (Optional)
|
||||
* Limit Total HPO Experiments - Maximum total number of optimization experiments
|
||||
* Number of Top Experiments to Save - Number of best performing experiments to save (the rest are archived).
|
||||
* Limit Single Experiment Running Time (Minutes) - Time limit per optimization experiment. Experiments will be
|
||||
stopped after the specified time elapsed.
|
||||
* Minimal Number of Iterations Per Single Experiment - Some search methods, such as Optuna, prune underperforming
|
||||
experiments. This is the minimum number of iterations per experiment before it can be stopped. Iterations are
|
||||
based on the experiments' own reporting (for example, if experiments report every epoch, then iterations=epochs)
|
||||
* Maximum Number of Iterations Per Single Experiment - Maximum iterations per experiment after which it will be
|
||||
stopped. Iterations are based on the experiments' own reporting (for example, if experiments report every epoch,
|
||||
then iterations=epochs)
|
||||
* Limit Total Optimization Instance Time (Minutes) - Time limit for the whole optimization process (in minutes)
|
||||
|
||||

|
||||
|
||||
## Dashboard
|
||||
Once an HPO instance is launched, the dashboard displays a summary of the optimization process.
|
||||
|
||||

|
||||
|
||||
The HPO dashboard shows:
|
||||
* Optimization Metric - Last reported and maximum / minimum values of objective metric over time
|
||||
* Optimization Objective - Objective metric values per experiment
|
||||
* Parallel coordinates - A visualization of parameter value impact on optimization objective
|
||||
* Summary - Experiment summary table: experiment execution information, objective metric and parameter values.
|
||||
* Budget - Available iterations and tasks budget (percentage, out of the values defined in the HPO instance's advanced configuration)
|
||||
* Resources - Number of workers servicing the HPO execution queue, and the number of currently running optimization tasks
|
46
docs/webapp/applications/apps_overview.md
Normal file
@ -0,0 +1,46 @@
|
||||
---
|
||||
title: Overview
|
||||
---
|
||||
|
||||
:::info Pro Plan Offering
|
||||
ClearML Applications are available under the ClearML Pro plan
|
||||
:::
|
||||
|
||||
Use ClearML’s GUI Applications to manage ML workloads and automatically run your recurring workflows without any coding.
|
||||
|
||||

|
||||
|
||||
Configure and launch app instances, then track their execution from the app dashboard.
|
||||
|
||||
ClearML provides the following applications:
|
||||
* [**AWS Autoscaler**](apps_aws_autoscaler.md) - Optimize AWS EC2 instance usage according to a defined instance budget
|
||||
* [**GCP Autoscaler**](apps_gcp_autoscaler.md) - Optimize GCP instance usage according to a defined instance budget
|
||||
* [**Hyperparameter Optimization**](apps_hpo) - Find the parameter values that yield the best performing models
|
||||
* **Nvidia Clara** - Train models using Nvidia’s Clara framework
|
||||
* **Project Dashboard** - High-level project monitoring with Slack alerts
|
||||
|
||||
## App Pages Layout
|
||||
Each application’s page is split into two sections:
|
||||
* App Instance List - Launch new app instances and view previously launched instances. Click on an instance to view its
|
||||
dashboard. Hover over it to access the [app instance actions](#app-instance-actions).
|
||||
* App Instance Dashboard - The main section of the app page: displays the selected app instance’s status and results.
|
||||
|
||||

|
||||
|
||||
## Launching an App Instance
|
||||
|
||||
1. Choose the desired app
|
||||
1. Click the `Launch New` button <img src="/docs/latest/icons/ico-add.svg" alt="Add new" className="icon size-md space-sm" /> to open the app’s configuration wizard
|
||||
1. Fill in the configuration details
|
||||
1. **Launch**
|
||||
|
||||
## App Instance Actions
|
||||
Access app instance actions, by right clicking an instance, or through the menu button <img src="/docs/latest/icons/ico-dots-v-menu.svg" alt="Dot menu" className="icon size-md space-sm" /> (available on hover).
|
||||
|
||||

|
||||
|
||||
* **Rename** - Rename the instance
|
||||
* **Configuration** - View an instance’s configuration
|
||||
* **Stop** - Shutdown the instance
|
||||
* **Clone** - Launch a new instance with same configuration prefilled
|
||||
* **Delete** - Delete the instance
|
@ -28,7 +28,7 @@ The ClearML Web UI is composed of the following pages:
|
||||
* [Datasets](datasets/webapp_dataset_page.md) <img src="/docs/latest/icons/ico-side-bar-datasets.svg" alt="Datasets" className="icon size-md space-sm" /> - View and manage your datasets.
|
||||
* [Pipelines](pipelines/webapp_pipeline_page.md) <img src="/docs/latest/icons/ico-pipelines.svg" className="icon size-md space-sm" /> - View and manage your pipelines.
|
||||
* [Workers and Queues](webapp_workers_queues.md) <img src="/docs/latest/icons/ico-workers.svg" alt="Workers and Queues" className="icon size-md space-sm" /> - The resource monitoring and queues management page.
|
||||
|
||||
* [Applications](applications/apps_overview.md) <img src="/docs/latest/icons/ico-applications.svg" alt="ClearML Apps" className="icon size-md space-sm" /> - ClearML's GUI applications for no-code workflow execution.
|
||||
* [Settings](webapp_profile.md) (available through the profile menu <img src="/docs/latest/icons/ico-me.svg" alt="Profile button" className="icon size-lg space-sm" />) -
|
||||
Manage your ClearML user account:
|
||||
* Set WebApp preferences
|
||||
|
12
sidebars.js
@ -54,7 +54,17 @@ module.exports = {
|
||||
'webapp/pipelines/webapp_pipeline_page', 'webapp/pipelines/webapp_pipeline_table', 'webapp/pipelines/webapp_pipeline_viewing'
|
||||
]
|
||||
},
|
||||
'webapp/webapp_workers_queues', 'webapp/webapp_profile']
|
||||
'webapp/webapp_workers_queues',
|
||||
{
|
||||
'ClearML Applications': [
|
||||
'webapp/applications/apps_overview',
|
||||
'webapp/applications/apps_aws_autoscaler',
|
||||
'webapp/applications/apps_gcp_autoscaler',
|
||||
'webapp/applications/apps_hpo'
|
||||
]
|
||||
|
||||
},
|
||||
'webapp/webapp_profile']
|
||||
},
|
||||
{'Configurations': ['configs/configuring_clearml', 'configs/clearml_conf', 'configs/env_vars']},
|
||||
//'References': ['references/clearml_ref','references/clearml_agent_ref'],
|
||||
|