diff --git a/docs/webapp/applications/apps_aws_autoscaler.md b/docs/webapp/applications/apps_aws_autoscaler.md index cb59c0eb..f86cff7c 100644 --- a/docs/webapp/applications/apps_aws_autoscaler.md +++ b/docs/webapp/applications/apps_aws_autoscaler.md @@ -20,8 +20,7 @@ each instance is spun up. For more information about how autoscalers work, see [Autoscalers Overview](../../cloud_autoscaling/autoscaling_overview.md#autoscaler-applications). ## Autoscaler Instance Configuration -* **Import Configuration** - Import an app instance configuration file. This will fill the configuration wizard with the - values from the file, which can be modified before launching the app instance + * **AWS Credentials** - Credentials with which the autoscaler can access your AWS account. See [Generating AWS IAM Credentials](#generating-aws-iam-credentials) * Use IAM role - Select if you are running your autoscalers on your own EC2 instances which are attached to an [IAM role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html). In such a case, no AWS IAM credentials are required @@ -37,15 +36,19 @@ For more information about how autoscalers work, see [Autoscalers Overview](../. down * **Workers Prefix** (optional) - A Prefix added to workers' names, associating them with this autoscaler * **Polling Interval** (optional) - Time period in minutes at which the designated queue is polled for new tasks -* **Base Docker Image** (optional) - Default Docker image in which the ClearML Agent will run. Provide a Docker stored - in a Docker artifactory so instances can automatically fetch it +* **Use docker mode** - If selected, tasks enqueued to the autoscaler will be executed by ClearML Agents running in +[Docker mode](../../clearml_agent.md#docker-mode) + * **Base Docker Image** (optional) - Available when `Use docker mode` is selected: Default Docker image in which the + ClearML Agent will run. Provide an image stored in a Docker artifactory so instances can automatically fetch it * **Compute Resources** * Resource Name - Assign a name to the resource type. This name will appear in the Autoscaler dashboard * EC2 Instance Type - See [Instance Types](https://aws.amazon.com/ec2/instance-types) for full list of types * Run in CPU mode - Check box to run with CPU only - * Use Spot Instance - Check box to use a spot instance. Else, a reserved instance is used - * Regular Instance Rollback Timeout - Controls when the autoscaler will revert to starting a regular instance after failing to start a spot instance. It will first attempt to start a spot, and then wait and retry again and again. Once the time it waited exceeded the Regular Instance Rollback Timeout, the autoscaler will try to start a regular instance instead. This is for a specific attempt, where starting a spot fails and an alternative instance needs to be started. + * Use Spot Instance - Select to use a spot instance. Otherwise, a reserved instance is used. + * Regular Instance Rollback - When selected, if a spot instance is unavailable for the time specified in the `Regular Instance Rollback Timeout`, a reserved instance will be spun up instead + * Regular Instance Rollback Timeout - Controls how long the autoscaler will wait for a spot instance to become available. It will first attempt to start a spot instance, then periodically retry. Once the specified time is exceeded, the autoscaler will try to start a reserved instance instead. The timeout applies for a specific attempt, where starting a spot fails and an alternative instance needs to be started. * Spot Instance Blackout Period - Specifies a blackout period after failing to start a spot instance. This is related to future attempts: after failing to start a spot instance, all requests to start additional spot instances will be converted to attempts to start regular instances, as a way of "easing" the spot requests load on the cloud provider and not creating a "DOS" situation in the cloud account which might cause the provider to refuse creating spots for a longer period. + * Place tags on resources - In addition to placing tags on the instance, choose which cloud resources tags will be placed on * Availability Zone - The [EC2 availability zone](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html#Concepts.RegionsAndAvailabilityZones.AvailabilityZones) to launch this resource in * AMI ID - The AWS AMI to launch @@ -86,8 +89,7 @@ For more information about how autoscalers work, see [Autoscalers Overview](../. creating the instances launch spec. See [boto3 EC2.client.run_instances Request Syntax](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ec2/client/run_instances.html) and [AWS API Reference: RunInstances](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_RunInstances.html) (available under ClearML Enterprise Plan) * **Run with Service Account** - Select to allow running the application under a [Service Account](../webapp_profile.md#service-accounts) identity instead of under your own identity (available under ClearML Enterprise Plan) -* **Export Configuration** - Export the app instance configuration as a JSON file, which you can later import to create - a new instance with the same configuration + ![Autoscaler wizard](../../img/app_aws_autoscaler_wizard.png) @@ -104,6 +106,10 @@ You can utilize the [configuration vault](../../webapp/webapp_profile.md#configu the one set in the `Init script` field of the autoscaler wizard * `extra_clearml_conf` - ClearML configuration to use by the ClearML Agent when executing your experiments. This configuration will be appended to that set in the `Additional ClearML Configuration` field of the autoscaler wizard +* `files` - Files to create at designated paths with predefined content on the launched cloud instances. +For more information, see [Files Section](../../configs/clearml_conf.md#files-section) +* `environment` - Dictionary of environment variables and values to set in the OS environment of the launched cloud +instances. For more information, see [Environment Section](../../configs/clearml_conf.md#environment-section) For example, the following configuration would be applied to all autoscaler instances: @@ -118,6 +124,24 @@ auto_scaler.v1.aws { extra_clearml_conf: """ agent.docker_force_pull: true """ + files { + boto3_file { + contents: | + boto3 { + pool_connections: 512 + max_multipart_concurrency: 16 + } + path: "/boto3_config.yaml" + target_format: yaml + mode: "0o644" + } + } + } + environment { + DB_PASSWORD: "secretpassword" + LOG_LEVEL: "info" + } + } ``` diff --git a/docs/webapp/applications/apps_gcp_autoscaler.md b/docs/webapp/applications/apps_gcp_autoscaler.md index 1e6fc191..5e81193e 100644 --- a/docs/webapp/applications/apps_gcp_autoscaler.md +++ b/docs/webapp/applications/apps_gcp_autoscaler.md @@ -21,19 +21,25 @@ when each VM instance is spun up. For more information about how autoscalers work, see [Autoscalers Overview](../../cloud_autoscaling/autoscaling_overview.md#autoscaler-applications). ## Autoscaler Instance Configuration -* **Import Configuration** - Import an app instance configuration file. This will fill the configuration wizard with the - values from the file, which can be modified before launching the app instance + * **GCP Configuration** * GCP Project ID - Project used for spinning up VM instances * GCP Zone - The GCP zone where the VM instances will be spun up. See [Regions and zones](https://cloud.google.com/compute/docs/regions-zones) - * GCP Subnetwork - The GCP subnetwork where the instances will be spun up. GCP setting will be `projects/{project-id}/regions/{region}/subnetworks/{subnetwork}` + * Use full subnet path - Select to specify a full subnet path (i.e. referencing a subnet from a different project) + * GCP Subnet Full Path - Available if `Use full subnet path` was selected. The GCP subnetwork where the instances + will be spun up. This allows setting a custom subnet resource path, and allows setting subnets shared from other + projects as well. See [GCP Documentation](https://cloud.google.com/dataflow/docs/guides/specifying-networks). + * GCP Subnet Name - Available if `Use full subnet path` was not selected. The GCP subnetwork where the instances + will be spun up. GCP setting will be `projects/{project-id}/regions/{region}/subnetworks/{subnetwork}` * GCP Credentials - Credentials with which the autoscaler can access your GCP account for spinning VM instances up/down. See [Generating GCP Credentials](#generating-gcp-credentials). * **Git Configuration** - Git credentials with which the ClearML Agents running on your VM instances will access your repositories to retrieve the code for their jobs * Git User * Git Password / Personal Access Token -* **Base Docker Image** (optional) - Default Docker image in which the ClearML Agent will run. Provide a Docker stored in a +* **Use docker mode** - If selected, tasks enqueued to the autoscaler will be executed by ClearML Agents running in +[Docker mode](../../clearml_agent.md#docker-mode) +* **Base Docker Image** (optional) - Available when `Use docker mode` is selected. Default Docker image in which the ClearML Agent will run. Provide an image stored in a Docker artifactory so VM instances can automatically fetch it * **Compute Resources** * Resource Name - Assign a name to the resource type. This name will appear in the Autoscaler dashboard @@ -41,6 +47,12 @@ For more information about how autoscalers work, see [Autoscalers Overview](../. * Run in CPU mode - Select to have the autoscaler utilize only CPU VM instances * GPU Type - See list of [supported GPUs by instance](https://cloud.google.com/compute/docs/gpus) * Use Preemptible Instance - Choose whether VM instances of this type will be [preemptible](https://cloud.google.com/compute/docs/instances/preemptible) + * Preemptible Provision Model - Select the provision model. For more information, see [Spot VMs](https://cloud.google.com/compute/docs/instances/spot) + and [Preemptible VMs](https://cloud.google.com/compute/docs/instances/preemptible) + * Regular Instance Rollback - When selected, if a preemptible instance is unavailable for the time specified in the + `Regular Instance Rollback Timeout` field, a regular instance will be spun up instead + * Preemptible Instance Blackout Period - Time (in minutes) to attempt using regular instances instead in case + acquiring a preemptible instance fails * Max Number of Instances - Maximum number of concurrent running VM instances of this type allowed * Monitored Queue - Queue associated with this VM instance type. The tasks enqueued to this queue will be executed on VM instances of this type * Machine Image (optional) - The GCP machine image to launch @@ -48,6 +60,11 @@ For more information about how autoscalers work, see [Autoscalers Overview](../. The machine image used for the autoscaler must include docker runtime and virtualenv ::: * Disc Size (in GB) (optional) + * Use the default GCP Service Account - If selected, the default service account will be used. To use a + different service account, clear this option and fill in the `Service Account Email` field + * Service Account Scopes - Comma-separated scope aliases. For more information, see [GCP documentation](https://cloud.google.com/sdk/gcloud/reference/compute/instances/create#--scopes). + Fully-qualified scope URIs are supported. If left empty, the autoscaler will use the default scopes. + * Metadata - GCP Metadata to be applied on this resource's instances. Input comma separated key=value pairs (e.g. `"Owner=Admin,Foo=Bar"`). * \+ Add Item - Define another resource type * **Global Metadata** (optional) - GCP Metadata to be applied on all instances. Input comma separated key=value pairs (e.g. `"Owner=Admin,Foo=Bar"`). * **Autoscaler Instance Name** (optional) - Name for the Autoscaler instance. This will appear in the instance list @@ -64,9 +81,7 @@ For more information about how autoscalers work, see [Autoscalers Overview](../. * **Init Script** (optional) - A bash script to execute after launching the VM instance * **Additional ClearML Configuration** (optional) - A ClearML configuration file to use by the ClearML Agent when executing your experiments * **Run with Service Account** - Select to allow running the application under a [Service Account](../webapp_profile.md#service-accounts) identity instead of under your own identity (available under ClearML Enterprise Plan) -* **Export Configuration** - Export the app instance configuration as a JSON file, which you can later import to create - a new instance with the same configuration. - + ![GCP autoscaler wizard](../../img/apps_gcp_autoscaler_wizard.png) diff --git a/docs/webapp/applications/apps_hpo.md b/docs/webapp/applications/apps_hpo.md index 02c5deb7..50aaaab7 100644 --- a/docs/webapp/applications/apps_hpo.md +++ b/docs/webapp/applications/apps_hpo.md @@ -6,7 +6,7 @@ title: Hyperparameter Optimization The ClearML HPO App is available under the ClearML Pro plan ::: -The Hyperparameter Optimization Application finds the set of parameter values that optimize a specific metric for your +The Hyperparameter Optimization Application finds the set of parameter values that optimize a specific metric(s) for your model. It takes in a ClearML experiment and its parameters to optimize. The parameter search space can be specified @@ -19,16 +19,15 @@ Control the optimization process with the advanced configuration options, which limits. ## HPO Instance Configuration -* **Import Configuration** - Import an app instance configuration file. This will fill the configuration wizard with the - values from the file, which can be modified before launching the app instance * **Initial Task to Optimize** - ID of a ClearML task to optimize. This task will be cloned, and each clone will sample a different set of hyperparameters values -* **Optimization Configuration** - * Optimization Method - The optimization strategy to employ (e.g. random, grid, hyperband) +* **Optimization Method** - The optimization strategy to employ (e.g. random, grid, hyperband) +* **Objectives** - Set the optimization targets of minimizing or maximizing the values of a specified metric(s) * Optimization Objective Metric's Title - Title of metric to optimize * Optimization Objective Metric's Series - Metric series (variant) to optimize * Optimization Objective Trend - Choose the optimization target, whether to maximize or minimize the value of the metric specified above + * \+ Add item - Add an objective * **Execution Queue** - The [ClearML Queue](../../fundamentals/agents_and_queues.md#what-is-a-queue) to which optimization tasks will be enqueued (make sure an agent is assigned to that queue) * **Parameters to Optimize** - Parameters comprising the optimization space @@ -70,8 +69,7 @@ limits. stopped. Iterations are based on the experiments' own reporting (for example, if experiments report every epoch, then iterations=epochs) * Limit Total Optimization Instance Time (Minutes) - Time limit for the whole optimization process (in minutes) -* **Export Configuration** - Export the app instance configuration as a JSON file, which you can later import to create - a new instance with the same configuration. + ![HPO app wizard](../../img/apps_hpo_wizard.png)