Add ClearML GUI apps (#301)

2025-06-26 18:17:44 +00:00 · 2022-08-03 12:15:15 +03:00 · 2022-08-03 12:15:15 +03:00 · 495f30649a
commit 495f30649a
parent c162d280cc
18 changed files with 357 additions and 2 deletions
--- a/docs/img/app_aws_autoscaler.png
+++ b/docs/img/app_aws_autoscaler.png
--- a/docs/img/app_aws_autoscaler_wizard.png
+++ b/docs/img/app_aws_autoscaler_wizard.png
--- a/docs/img/app_context_menu.png
+++ b/docs/img/app_context_menu.png
--- a/docs/img/apps_aws_permissions_1.png
+++ b/docs/img/apps_aws_permissions_1.png
--- a/docs/img/apps_aws_permissions_2.png
+++ b/docs/img/apps_aws_permissions_2.png
--- a/docs/img/apps_aws_permissions_3.png
+++ b/docs/img/apps_aws_permissions_3.png
--- a/docs/img/apps_format_overview.png
+++ b/docs/img/apps_format_overview.png
--- a/docs/img/apps_gcp_autoscaler.png
+++ b/docs/img/apps_gcp_autoscaler.png
--- a/docs/img/apps_gcp_autoscaler_wizard.png
+++ b/docs/img/apps_gcp_autoscaler_wizard.png
--- a/docs/img/apps_hpo.png
+++ b/docs/img/apps_hpo.png
--- a/docs/img/apps_hpo_wizard.png
+++ b/docs/img/apps_hpo_wizard.png
--- a/docs/img/apps_overview_page.png
+++ b/docs/img/apps_overview_page.png
--- a/docs/webapp/applications/apps_aws_autoscaler.md
+++ b/docs/webapp/applications/apps_aws_autoscaler.md
@ -0,0 +1,144 @@
+---
+title: AWS Autoscaler
+---
+
+:::info Pro Plan Offering
+The ClearML AWS Autoscaler App is available under the ClearML Pro plan
+:::
+
+The AWS Autoscaler Application optimizes AWS EC2 instance usage according to a user defined resource budget: define your 
+budget by specifying the type and amount of available compute resources.
+
+Each resource type is associated with a ClearML [queue](../../fundamentals/agents_and_queues.md#what-is-a-queue) whose status determines the need for instances of that resource 
+type (i.e. spin up new instances if there are pending jobs on the queue).
+
+When running, the autoscaler periodically polls your AWS cluster. The autoscaler automatically terminates idle instances 
+based on a specified maximum idle time, or spins up new instances when there aren't enough to execute pending tasks in a 
+queue (until reaching the defined maximum number of instances). You can add an init script, which will be executed when 
+each instance is spun up. 
+
+## Autoscaler Instance Configuration
+* **AWS Credentials** - Credentials with which the autoscaler can access your AWS account. See [Generating AWS IAM Credentials](#generating-aws-iam-credentials)
+    * Use IAM role - Select if you are running your autoscalers on your own EC2 instances which are attached to an [IAM 
+      role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html). In such a case, no AWS IAM credentials are required.
+    * AWS Region - [AWS Region](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html#Concepts.RegionsAndAvailabilityZones.Regions) 
+      where the EC2 instances will be spun up
+    * AWS Access Key ID and AWS Secret Access Key - The credentials with which the autoscaler will access your AWS 
+      account for spinning EC2 instances up/down
+* **Git Configuration** - Git credentials with which the ClearML Agents running on your EC2 instances will access your 
+  repositories to retrieve the code for their jobs
+    * Git User 
+    * Git Password / Personal Access Token
+* **Max Idle Time** (Optional) - Maximum time in minutes that an EC2 instance can be idle before the autoscaler spins it 
+  down 
+* **Workers Prefix** (Optional) - A Prefix added to workers’ names, associating them with this autoscaler
+* **Polling Interval** (Optional) - Time period in minutes at which the designated queue is polled for new tasks
+* **Base Docker Image** (Optional) - Default Docker image in which the ClearML Agent will run. Provide a Docker stored 
+  in a Docker artifactory so instances can automatically fetch it
+* **Compute Resources**
+    * Resource Name - Assign a name to the resource type. This name will appear in the Autoscaler dashboard
+    * EC2 Instance Type - See [Instance Types](https://aws.amazon.com/ec2/instance-types) for full list of types
+    * Use Spot Instance - Check box to use a spot instance. Else, a reserved instance is used
+    * Availability Zone - The [EC2 availability zone](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html#Concepts.RegionsAndAvailabilityZones.AvailabilityZones) 
+      to launch this resource in
+    * AMI ID - The AWS AMI to launch
+    * Max Number of Instances - Maximum number of concurrent running instances of this type allowed
+    * Monitored Queue - Queue associated with this instance type. The tasks enqueued to this queue will be executed on 
+      instances of this type
+    * EC2 Tags (Optional) - AWS instance tags to attach to launched EC2 instances. Insert key=value pairs, separated by 
+      commas 
+    * EBS Device (Optional) - Disk mount point
+    * EBS Volume Size (Optional) - Disk size  (GB)
+    * EBS Volume Type (Optional) - See [here](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volume-types.html) 
+      for full list of types
+    * Instance Key Pair (Optional) - AWS key pair that is provided to the spun EC2 instances for connecting to them via 
+      SSH. Provide the Key Pair's name, as was created in AWS. See [here](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html) 
+      for more details. 
+    * Security Group ID (Optional) - Comma separated list of AWS VPC Security Group IDs to attach to the launched 
+      instance. Read more [here](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_SecurityGroups.html) 
+    * \+ Add Item - Define another resource type
+* **IAM Instance Profile** (Optional) - Set an IAM instance profile for all instances spun by the Autoscaler 
+    * Arn - Amazon Resource Name specifying the instance profile
+    * Name - Name identifying the instance profile
+* **Autoscaler Instance Name** (Optional) - Name for the Autoscaler instance. This will appear in the instance list. 
+* **Init script** (Optional) - A bash script to execute after launching the EC2 instance 
+* **Additional ClearML Configuration** (Optional) - A ClearML configuration file to use by the ClearML Agent when 
+  executing your experiments
+
+![Autoscaler wizard](../../img/app_aws_autoscaler_wizard.png)
+
+:::note Enterprise Feature
+You can utilize the [configuration vault](../../webapp/webapp_profile.md#configuration-vault) to globally add your AWS 
+credentials in the following format: 
+
+```
+auto_scaler.v1 {
+    aws {
+        cloud_credentials_key: XXX
+        cloud_credentials_secret: XXX
+    }
+```
+:::
+
+## Dashboard
+Once an autoscaler is launched, the autoscaler's dashboard provides information about available EC2 instances and their 
+status.
+
+![Autoscaler dashboard](../../img/app_aws_autoscaler.png)
+
+The autoscaler dashboard shows:
+* Number of idle Instances
+* Queues and the resource type associated with them
+* Number of current running instances 
+* Console: the application log containing everything printed to stdout and stderr appears in the console log. The log 
+  shows polling results of the autoscaler’s associated queues, including the number of tasks enqueued, and updates EC2 
+  instances being spun up/down.  
+
+## Generating AWS IAM Credentials
+
+The autoscaler app accesses your AWS account with the credentials you provide. 
+
+You will need to create an AWS policy which grants the autoscaler app the required access privileges, attach the policy 
+to an IAM user, and create credentials keys for that user to configure in the autoscaler app: 
+
+1. In your AWS account, go to Services **Menu > IAM > Policies** 
+    
+   ![AWS Policies](../../img/apps_aws_permissions_1.png)
+
+1. Under policies, click **Create Policy** 
+
+   ![AWS create policy](../../img/apps_aws_permissions_2.png)
+
+1. In the **Create Policy** modal, click on the JSON option
+
+   ![AWS create policy JSON](../../img/apps_aws_permissions_3.png)
+
+1. Insert the following policy into the text box: 
+
+    ```
+    {                  
+        "Version": "2012-10-17",
+        "Statement": [
+            {
+                "Sid": "VisualEditor0",
+                "Effect": "Allow",
+                "Action": [
+                    "ec2:DescribeInstances",
+                    "ec2:TerminateInstances",
+                    "ec2:RequestSpotInstances",
+                    "ec2:DeleteTags",
+                    "ec2:CreateTags",
+                    "ec2:RunInstances",
+                    "ec2:DescribeSpotInstanceRequests",
+                    "ec2:GetConsoleOutput"
+                ],
+                "Resource": "*"
+            }
+        ]
+    }
+    ```
+   
+1. Complete creating the policy
+1. Attach the created policy to an IAM user/group whose credentials will be used in the autoscaler app (you can create a 
+   new IAM user/group for this purpose)
+1. Obtain a set of AWS IAM credentials for the user/group to which  you have attached the created policy in the previous step  
--- a/docs/webapp/applications/apps_gcp_autoscaler.md
+++ b/docs/webapp/applications/apps_gcp_autoscaler.md
@ -0,0 +1,84 @@
+---
+title: GCP Autoscaler
+---
+
+:::info Pro Plan Offering
+The ClearML GCP Autoscaler App is available under the ClearML Pro plan
+:::
+
+The GCP Autoscaler Application optimizes GCP VM instance usage according to a user defined instance  budget: Define your 
+budget by specifying the type and amount of available compute resources.
+
+Each resource type is associated with a ClearML [queue](../../fundamentals/agents_and_queues.md#what-is-a-queue) whose 
+status determines the need for instances of that resource type (i.e. spin up new instances if there are pending jobs on 
+the queue).
+
+When running, the autoscaler periodically polls your GCP cluster. The autoscaler automatically deletes idle VM instances 
+based on a specified maximum idle time, or spins up new VM instances when there aren't enough to execute pending tasks 
+in a queue (until reaching the defined maximum number of instances). You can add an init script, which will be executed 
+when each VM instance is spun up. 
+
+## Autoscaler Instance Configuration
+* **GCP Configuration**
+    * GCP Project ID - Project used for spinning up VM instances
+    * GCP Zone - The GCP zone where the VM instances will be spun up. See [Regions and zones](https://cloud.google.com/compute/docs/regions-zones)
+    * GCP Credentials - Project credentials, see [here](https://cloud.google.com/docs/authentication/production) for 
+      more details.
+* **Git Configuration** - Git credentials with which the ClearML Agents running on your VM instances will access your 
+  repositories to retrieve the code for their jobs
+    * Git User 
+    * Git Password / Personal Access Token
+* **Base Docker Image** (Optional) - Default Docker image in which the ClearML Agent will run. Provide a Docker stored in a 
+  Docker artifactory so VM instances can automatically fetch it
+* **Compute Resources**
+    * Resource Name - Assign a name to the resource type. This name will appear in the Autoscaler dashboard.
+    * GCP Machine Type - See list of [machine types](https://cloud.google.com/compute/docs/machine-types)
+    * Run in CPU mode - Select to have the autoscaler utilize only CPU VM instances
+    * GPU Type - See list of [supported GPUs by instance](https://cloud.google.com/compute/docs/gpus)
+    * Use Preemptible Instance - Choose whether VM instances of this type will be [preemptible](https://cloud.google.com/compute/docs/instances/preemptible).
+    * Max Number of Instances - Maximum number of concurrent running VM instances of this type allowed
+    * Monitored Queue - Queue associated with this VM instance type. The tasks enqueued to this queue will be executed on VM instances of this type
+    * Machine Image (Optional)  - The GCP machine image to launch 
+    * Disc Size (in GB) (Optional) 
+    * \+ Add Item - Define another resource type
+* **Autoscaler Instance Name** (Optional) - Name for the Autoscaler instance. This will appear in the instance list. 
+* **Max Idle Time** (Optional) - Maximum time in minutes that a VM instance can be idle before the autoscaler spins it down
+* **Workers Prefix** (Optional) - A Prefix added to workers’ names, associating them with this autoscaler
+* **Polling Interval** (Optional) - Time period in minutes at which the designated queue is polled for new tasks
+* **Init Script** (Optional) - A bash script to execute after launching the VM instance
+* **Additional ClearML Configuration** (Optional) - A ClearML configuration file to use by the ClearML Agent when executing your experiments
+
+![GCP autoscaler wizard](../../img/apps_gcp_autoscaler_wizard.png)
+
+:::note Enterprise Feature
+You can utilize the [configuration vault](../../webapp/webapp_profile.md#configuration-vault) to globally add your GCP 
+credentials in the following format: 
+
+```
+auto_scaler.v1 {
+    gcp {
+        gcp_credentials: """
+        {
+          "type": "service_account",
+          ...
+        }
+        """
+    }
+}
+```
+:::
+
+## Dashboard
+
+Once an autoscaler is launched, The autoscaler's dashboard provides information about available VM instances and their 
+status.
+
+![GCP autoscaler dashboard](../../img/apps_gcp_autoscaler.png)
+
+The autoscaler dashboard shows:
+* Number of Idle Instances
+* Queues and the resource type associated with them
+* Number of current running instances
+* Console: the application log containing everything printed to stdout and stderr appears in the console log. The log 
+  shows polling results of the autoscaler’s associated queues, including the number of tasks enqueued, and updates VM 
+  instances being spun up/down.   
--- a/docs/webapp/applications/apps_hpo.md
+++ b/docs/webapp/applications/apps_hpo.md
@ -0,0 +1,71 @@
+---
+title: Hyperparameter Optimization
+---
+
+:::info Pro Plan Offering
+The ClearML HPO App is available under the ClearML Pro plan
+:::
+
+The Hyperparameter Optimization Application finds the set of parameter values that optimize a specific metric for your 
+model.
+
+It takes in an existing ClearML experiment and its parameters to optimize. The parameter search space can be specified
+by specific (discrete) values and/or value ranges (uniform parameters). 
+
+The optimization app launches multiple copies of the original experiment, each time sampling different parameter sets, 
+applying a user-selected optimization strategy (random search, Bayesian, etc.). 
+
+Control the optimization process with the advanced configuration options, which include time, iteration, and experiment 
+limits.
+
+## HPO Instance Configuration
+* **Initial Task to Optimize** - ID of an existing ClearML task to optimize. This task will be cloned, and each clone will 
+  sample a different set of hyperparameters values.
+* **Optimization Configuration**
+    * Optimization Method - The optimization strategy to employ (e.g. random, grid, hyperband)
+    * Optimization Objective Metric’s Title - Title of metric to optimize
+    * Optimization Objective Metric’s Series - Metric series (variant) to optimize
+    * Optimization Objective Trend - Choose the optimization target, whether to maximize or minimize the value of the 
+      metric specified above
+* **Execution Queue** - The [ClearML Queue](../../fundamentals/agents_and_queues.md#what-is-a-queue) to which 
+  optimization tasks will be enqueued (make sure an agent is assigned to that queue)
+* **Parameters to Optimize** - Parameters comprising the optimization space
+    * Type 
+        * Uniform Parameters - A value range to sample
+            * Minimum Value
+            * Maximum Value
+            * Step Size - Step size between samples
+        * Discrete Parameters - A set of values to sample
+            * Values - Comma separated list of values to sample
+    * Name - The original task’s configuration parameter name (including section name e.g. `Args/lr`)
+* **Optimization Job Title** (Optional) - Name for the HPO instance. This will appear in the instance list. 
+* **Optimization Experiments Destination Project** (Optional) - The project where optimization tasks will be saved. 
+  Leave empty to use the same project as the Initial task. 
+* **Maximum Concurrent Tasks** - The maximum number of simultaneously running optimization experiments
+* **Advanced Configuration** (Optional)
+    * Limit Total HPO Experiments - Maximum total number of optimization experiments
+    * Number of Top Experiments to Save - Number of best performing experiments to save (the rest are archived).
+    * Limit Single Experiment Running Time (Minutes) - Time limit per optimization experiment. Experiments will be 
+      stopped after the specified time elapsed.
+    * Minimal Number of Iterations Per Single Experiment - Some search methods, such as Optuna, prune underperforming 
+      experiments. This is the minimum number of iterations per experiment before it can be stopped. Iterations are 
+      based on the experiments' own reporting (for example, if experiments report every epoch, then iterations=epochs)
+    * Maximum Number of Iterations Per Single Experiment - Maximum iterations per experiment after which it will be 
+      stopped. Iterations are based on the experiments' own reporting (for example, if experiments report every epoch, 
+      then iterations=epochs)
+    * Limit Total Optimization Instance Time (Minutes) - Time limit for the whole optimization process (in minutes)
+
+![HPO app wizard](../../img/apps_hpo_wizard.png)
+ 
+## Dashboard
+Once an HPO instance is launched, the dashboard displays a summary of the optimization process.
+
+![HPO dashboard](../../img/apps_hpo.png)
+
+The HPO dashboard shows:
+* Optimization Metric - Last reported and maximum / minimum values of objective metric over time
+* Optimization Objective - Objective metric values per experiment
+* Parallel coordinates - A visualization of parameter value impact on optimization objective
+* Summary - Experiment summary table: experiment execution information, objective metric and parameter values.
+* Budget - Available iterations and tasks budget (percentage, out of the values defined in the HPO instance's advanced configuration)
+* Resources - Number of workers servicing the HPO execution queue, and the number of currently running optimization tasks
--- a/docs/webapp/applications/apps_overview.md
+++ b/docs/webapp/applications/apps_overview.md
@ -0,0 +1,46 @@
+---
+title: Overview
+---
+
+:::info Pro Plan Offering
+ClearML Applications are available under the ClearML Pro plan
+:::
+
+Use ClearML’s GUI Applications to manage ML workloads and automatically run your recurring workflows without any coding. 
+
+![Apps page](../../img/apps_overview_page.png)
+
+Configure and launch app instances, then track their execution from the app dashboard.
+
+ClearML provides the following applications:
+* [**AWS Autoscaler**](apps_aws_autoscaler.md) - Optimize AWS EC2 instance usage according to a defined instance budget
+* [**GCP Autoscaler**](apps_gcp_autoscaler.md) - Optimize  GCP instance usage according to a defined instance budget
+* [**Hyperparameter Optimization**](apps_hpo) - Find the parameter values that yield the best performing models
+* **Nvidia Clara** - Train models using Nvidia’s Clara framework
+* **Project Dashboard** - High-level project monitoring with Slack alerts
+
+## App Pages Layout
+Each application’s page is split into two sections:
+* App Instance List - Launch new app instances and view previously launched instances. Click on an instance to view its 
+  dashboard. Hover over it to access the [app instance actions](#app-instance-actions).
+* App Instance Dashboard - The main section of the app page: displays the selected app instance’s status and results.
+
+![App format](../../img/apps_format_overview.png)
+
+## Launching an App Instance
+
+1. Choose the desired app
+1. Click the `Launch New` button <img src="/docs/latest/icons/ico-add.svg" alt="Add new" className="icon size-md space-sm" />  to open the app’s configuration wizard
+1. Fill in the configuration details
+1. **Launch**
+
+## App Instance Actions
+Access app instance actions, by right clicking an instance, or through the menu button <img src="/docs/latest/icons/ico-dots-v-menu.svg" alt="Dot menu" className="icon size-md space-sm" /> (available on hover).
+
+![App context menu](../../img/app_context_menu.png)
+
+* **Rename** - Rename the instance 
+* **Configuration** - View an instance’s configuration 
+* **Stop** - Shutdown the instance
+* **Clone** - Launch a new instance with same configuration prefilled
+* **Delete** - Delete the instance
--- a/docs/webapp/webapp_overview.md
+++ b/docs/webapp/webapp_overview.md
@ -28,7 +28,7 @@ The ClearML Web UI is composed of the following pages:
 * [Datasets](datasets/webapp_dataset_page.md) <img src="/docs/latest/icons/ico-side-bar-datasets.svg" alt="Datasets" className="icon size-md space-sm" /> - View and manage your datasets. 
 * [Pipelines](pipelines/webapp_pipeline_page.md) <img src="/docs/latest/icons/ico-pipelines.svg" className="icon size-md space-sm" /> - View and manage your pipelines.
 * [Workers and Queues](webapp_workers_queues.md) <img src="/docs/latest/icons/ico-workers.svg" alt="Workers and Queues" className="icon size-md space-sm" /> - The resource monitoring and queues management page.
-  
+* [Applications](applications/apps_overview.md)  <img src="/docs/latest/icons/ico-applications.svg" alt="ClearML Apps" className="icon size-md space-sm" /> - ClearML's GUI applications for no-code workflow execution.
 * [Settings](webapp_profile.md) (available through the profile menu <img src="/docs/latest/icons/ico-me.svg" alt="Profile button" className="icon size-lg space-sm" />) - 
  Manage your ClearML user account:
  * Set WebApp preferences
--- a/sidebars.js
+++ b/sidebars.js
@ -54,7 +54,17 @@ module.exports = {
                    'webapp/pipelines/webapp_pipeline_page', 'webapp/pipelines/webapp_pipeline_table', 'webapp/pipelines/webapp_pipeline_viewing'
                ]
            },
-            'webapp/webapp_workers_queues', 'webapp/webapp_profile']
+            'webapp/webapp_workers_queues',
+            {
+                'ClearML Applications': [
+                    'webapp/applications/apps_overview',
+                    'webapp/applications/apps_aws_autoscaler',
+                    'webapp/applications/apps_gcp_autoscaler',
+                    'webapp/applications/apps_hpo'
+                ]
+
+            },
+            'webapp/webapp_profile']
        },
        {'Configurations': ['configs/configuring_clearml', 'configs/clearml_conf', 'configs/env_vars']},
        //'References': ['references/clearml_ref','references/clearml_agent_ref'],