diff --git a/docs/img/apps_gpu_compute_dashboard.png b/docs/img/apps_gpu_compute_dashboard.png new file mode 100644 index 00000000..94f92b10 Binary files /dev/null and b/docs/img/apps_gpu_compute_dashboard.png differ diff --git a/docs/img/apps_gpu_compute_wizard.png b/docs/img/apps_gpu_compute_wizard.png new file mode 100644 index 00000000..fdc0de31 Binary files /dev/null and b/docs/img/apps_gpu_compute_wizard.png differ diff --git a/docs/webapp/applications/apps_gpu_compute.md b/docs/webapp/applications/apps_gpu_compute.md new file mode 100644 index 00000000..1bd6a28b --- /dev/null +++ b/docs/webapp/applications/apps_gpu_compute.md @@ -0,0 +1,62 @@ +--- +title: GPU Compute +--- + +:::info Pro Plan Offering +The ClearML GPU Compute App is available under the ClearML Pro plan +::: + +Set up to run your workloads on 100% green cloud machines at optimized costs – no setup required! The ClearML GPU Compute +Application automatically spins cloud machines up or down based on demand. The app optimizes machine usage according to +a user defined resource budget: define your budget by specifying the GPU type and number of GPUs you want to use. + +Each application instance monitors a ClearML queue: new cloud machines are spun up if there are pending jobs on the +queue. The app instance automatically terminates idle machines based on a specified maximum idle time. + +## GPU Compute Instance Configuration +* **Import Configuration** - Import an app instance configuration file. This will fill the configuration wizard with the + values from the file, which can be modified before launching the app instance +* **Machine Specification** + * GPU Type - NVIDIA GPU on the machine + * Number of GPUs - Number of GPUs in the cloud machine + * The rest of the machine’s available resources are dependent on the number and type of GPUs specified above: + * vCPUs - Number of vCPUs in the cloud machine + * Memory - RAM available to the cloud machine + * Hourly Price - Machine's hourly rate + * Disk - Amount of Disk space available to the cloud machine + * Monitored Queue - Queue associated with application instance. The tasks enqueued to this queue will be executed on + machines of this specification + * Cloud Machine Limit - Maximum number of concurrent machines to launch +* **Idle Time Limit** (Optional) - Maximum time in minutes that a cloud machine can be idle before it is spun down +* **Default Docker Image** (Optional) - Default Docker image in which the ClearML Agent will run. Provide a Docker stored + in a Docker artifactory so instances can automatically fetch it +* **Git Configuration** - Git credentials with which the ClearML Agents running on your cloud instances will access your repositories to retrieve the code for their jobs + * Git User + * Git Password / Personal Access Token +* **Cloud Storage Access** (Optional) - Access credentials to cloud storage service. Provides ClearML Tasks running on cloud + machines access to your storage + +![GPU Compute wizard](../../img/apps_gpu_compute_wizard.png) + +## Dashboard + +Once a GPU Compute instance is launched, the dashboard displays a summary of your cloud usage and costs. + +![GPU Compute dashboard](../../img/apps_gpu_compute_dashboard.png) + +The GPU Compute dashboard shows: +* Service status indicator + * Working server - Cloud service is available + * Not working server - Cloud service is currently unavailable +* Cloud instance details + * GPU type + * Number of GPUs + * Number of vCPUs + * RAM + * Storage +* Cost details + * Instance rate + * Total cost for current billing cycle +* Number of current running cloud instances +* Instance History - Number of running cloud instances over time +* Console - The log shows updates of cloud instances being spun up/down. diff --git a/docs/webapp/applications/apps_overview.md b/docs/webapp/applications/apps_overview.md index f9d0a57d..e4749a3d 100644 --- a/docs/webapp/applications/apps_overview.md +++ b/docs/webapp/applications/apps_overview.md @@ -13,6 +13,8 @@ Use ClearML’s GUI Applications to manage ML workloads and automatically run yo Configure and launch app instances, then track their execution from the app dashboard. ClearML provides the following applications: +* [**GPU Compute**](apps_gpu_compute.md) - Launch cloud machines on demand and optimize their usage according to a + defined budget--no previous setup necessary * [**AWS Autoscaler**](apps_aws_autoscaler.md) - Optimize AWS EC2 instance usage according to a defined instance budget * [**GCP Autoscaler**](apps_gcp_autoscaler.md) - Optimize GCP instance usage according to a defined instance budget * [**Hyperparameter Optimization**](apps_hpo.md) - Find the parameter values that yield the best performing models diff --git a/sidebars.js b/sidebars.js index 5b592033..d16b24e0 100644 --- a/sidebars.js +++ b/sidebars.js @@ -70,6 +70,7 @@ module.exports = { { 'ClearML Applications': [ 'webapp/applications/apps_overview', + 'webapp/applications/apps_gpu_compute', 'webapp/applications/apps_aws_autoscaler', 'webapp/applications/apps_gcp_autoscaler', 'webapp/applications/apps_hpo', diff --git a/static/icons/ico-server-alert.svg b/static/icons/ico-server-alert.svg new file mode 100644 index 00000000..e6350e66 --- /dev/null +++ b/static/icons/ico-server-alert.svg @@ -0,0 +1,9 @@ + + + + + + + + + \ No newline at end of file diff --git a/static/icons/ico-server-ok.svg b/static/icons/ico-server-ok.svg new file mode 100644 index 00000000..fcc2b9ec --- /dev/null +++ b/static/icons/ico-server-ok.svg @@ -0,0 +1,8 @@ + + + + + + + + \ No newline at end of file