mirror of
https://github.com/clearml/clearml-docs
synced 2025-06-26 18:17:44 +00:00
Fix top navigation bar highlighting (#1044)
This commit is contained in:
20
docs/getting_started/clearml_agent_base_docker.md
Normal file
20
docs/getting_started/clearml_agent_base_docker.md
Normal file
@@ -0,0 +1,20 @@
|
||||
---
|
||||
title: Building Task Execution Environments in a Container
|
||||
---
|
||||
|
||||
### Base Container
|
||||
|
||||
Build a container according to the execution environment of a specific task.
|
||||
|
||||
```bash
|
||||
clearml-agent build --id <task-id> --docker --target <new-docker-name>
|
||||
```
|
||||
|
||||
You can add the container as the base container image to a task, using one of the following methods:
|
||||
|
||||
- Using the **ClearML Web UI** - See [Default Container](../webapp/webapp_exp_tuning.md#default-container).
|
||||
- In the ClearML configuration file - Use the ClearML configuration file [`agent.default_docker`](../configs/clearml_conf.md#agentdefault_docker)
|
||||
options.
|
||||
|
||||
Check out [this tutorial](../guides/clearml_agent/exp_environment_containers.md) for building a Docker container
|
||||
replicating the execution environment of an existing task.
|
||||
30
docs/getting_started/clearml_agent_docker_exec.md
Normal file
30
docs/getting_started/clearml_agent_docker_exec.md
Normal file
@@ -0,0 +1,30 @@
|
||||
---
|
||||
title: Building Executable Task Containers
|
||||
---
|
||||
|
||||
## Exporting a Task into a Standalone Docker Container
|
||||
|
||||
### Task Container
|
||||
|
||||
Build a Docker container that when launched executes a specific task, or a clone (copy) of that task.
|
||||
|
||||
- Build a Docker container that at launch will execute a specific Task:
|
||||
|
||||
```bash
|
||||
clearml-agent build --id <task-id> --docker --target <new-docker-name> --entry-point reuse_task
|
||||
```
|
||||
|
||||
- Build a Docker container that at launch will clone a Task specified by Task ID, and will execute the newly cloned Task:
|
||||
|
||||
```bash
|
||||
clearml-agent build --id <task-id> --docker --target <new-docker-name> --entry-point clone_task
|
||||
```
|
||||
|
||||
- Run built Docker by executing:
|
||||
|
||||
```bash
|
||||
docker run <new-docker-name>
|
||||
```
|
||||
|
||||
Check out [this tutorial](../guides/clearml_agent/executable_exp_containers.md) for building executable task
|
||||
containers.
|
||||
121
docs/getting_started/clearml_agent_scheduling.md
Normal file
121
docs/getting_started/clearml_agent_scheduling.md
Normal file
@@ -0,0 +1,121 @@
|
||||
---
|
||||
title: Managing Agent Work Schedules
|
||||
---
|
||||
|
||||
:::important Enterprise Feature
|
||||
This feature is available under the ClearML Enterprise plan.
|
||||
:::
|
||||
|
||||
The Agent scheduler enables scheduling working hours for each Agent. During working hours, a worker will actively poll
|
||||
queues for Tasks, fetch and execute them. Outside working hours, a worker will be idle.
|
||||
|
||||
Schedule workers by:
|
||||
|
||||
* Setting configuration file options
|
||||
* Running `clearml-agent` from the command line (overrides configuration file options)
|
||||
|
||||
Override worker schedules by:
|
||||
|
||||
* Setting runtime properties to force a worker on or off
|
||||
* Tagging a queue on or off
|
||||
|
||||
## Running clearml-agent with a Schedule (Command Line)
|
||||
|
||||
Set a schedule for a worker from the command line when running `clearml-agent`. Two properties enable setting working hours:
|
||||
|
||||
:::warning
|
||||
Use only one of these properties
|
||||
:::
|
||||
|
||||
* `uptime` - Time span during which a worker will actively poll a queue(s) for Tasks, and execute them. Outside this
|
||||
time span, the worker will be idle.
|
||||
* `downtime` - Time span during which a worker will be idle. Outside this time span, the worker will actively poll and
|
||||
execute Tasks.
|
||||
|
||||
Define `uptime` or `downtime` as `"<hours> <days>"`, where:
|
||||
|
||||
* `<hours>` - A span of hours (`00-23`) or a single hour. A single hour defines a span from that hour to midnight.
|
||||
* `<days>` - A span of days (`SUN-SAT`) or a single day.
|
||||
|
||||
Use `-` for a span, and `,` to separate individual values. To span before midnight to after midnight, use two spans.
|
||||
|
||||
For example:
|
||||
|
||||
* `"20-23 SUN"` - 8 PM to 11 PM on Sundays.
|
||||
* `"20-23 SUN,TUE"` - 8 PM to 11 PM on Sundays and Tuesdays.
|
||||
* `"20-23 SUN-TUE"` - 8 PM to 11 PM on Sundays, Mondays, and Tuesdays.
|
||||
* `"20 SUN"` - 8 PM to midnight on Sundays.
|
||||
* `"20-00,00-08 SUN"` - 8 PM to midnight and midnight to 8 AM on Sundays
|
||||
* `"20-00 SUN", "00-08 MON"` - 8 PM on Sundays to 8 AM on Mondays (spans from before midnight to after midnight).
|
||||
|
||||
## Setting Worker Schedules in the Configuration File
|
||||
|
||||
Set a schedule for a worker using configuration file options. The options are:
|
||||
|
||||
:::warning
|
||||
Use only one of these properties
|
||||
:::
|
||||
|
||||
* ``agent.uptime``
|
||||
* ``agent.downtime``
|
||||
|
||||
Use the same time span format for days and hours as is used in the command line.
|
||||
|
||||
For example, set a worker's schedule from 5 PM to 8 PM on Sunday through Tuesday, and 1 PM to 10 PM on Wednesday.
|
||||
|
||||
```
|
||||
agent.uptime: ["17-20 SUN-TUE", "13-22 WED"]
|
||||
```
|
||||
|
||||
## Overriding Worker Schedules Using Runtime Properties
|
||||
|
||||
Runtime properties override the command line uptime / downtime properties. The runtime properties are:
|
||||
|
||||
:::warning
|
||||
Use only one of these properties
|
||||
:::
|
||||
|
||||
* `force:on` - Pull and execute Tasks until the property expires.
|
||||
* `force:off` - Prevent pulling and execution of Tasks until the property expires.
|
||||
|
||||
Currently, these runtime properties can only be set using an ClearML REST API call to the `workers.set_runtime_properties`
|
||||
endpoint, as follows:
|
||||
|
||||
* The body of the request must contain the `worker-id`, and the runtime property to add.
|
||||
* An expiry date is optional. Use the format `"expiry":<time>`. For example, `"expiry":86400` will set an expiry of 24 hours.
|
||||
* To delete the property, set the expiry date to zero, `"expiry":0`.
|
||||
|
||||
For example, to force a worker on for 24 hours:
|
||||
|
||||
```
|
||||
curl --user <key>:<secret> --header "Content-Type: application/json" --data '{"worker":"<worker_id>","runtime_properties":[{"key": "force", "value": "on", "expiry": 86400}]}' http://<api-server-hostname-or-ip>:8008/workers.set_runtime_properties
|
||||
```
|
||||
|
||||
## Overriding Worker Schedules Using Queue Tags
|
||||
|
||||
Queue tags override command line and runtime properties. The queue tags are the following:
|
||||
|
||||
:::warning
|
||||
Use only one of these properties
|
||||
:::
|
||||
|
||||
* ``force_workers:on`` - Any worker listening to the queue will keep pulling Tasks from the queue.
|
||||
* ``force_workers:off`` - Prevent all workers listening to the queue from pulling Tasks from the queue.
|
||||
|
||||
Currently, you can set queue tags using an ClearML REST API call to the ``queues.update`` endpoint, or the
|
||||
APIClient. The body of the call must contain the ``queue-id`` and the tags to add.
|
||||
|
||||
For example, force workers on for a queue using the APIClient:
|
||||
|
||||
```python
|
||||
from clearml.backend_api.session.client import APIClient
|
||||
|
||||
client = APIClient()
|
||||
client.queues.update(queue="<queue_id>", tags=["force_workers:on"])
|
||||
```
|
||||
|
||||
Or, force workers on for a queue using the REST API:
|
||||
|
||||
```bash
|
||||
curl --user <key>:<secret> --header "Content-Type: application/json" --data '{"queue":"<queue_id>","tags":["force_workers:on"]}' http://<api-server-hostname-or-ip>:8008/queues.update
|
||||
```
|
||||
@@ -1,80 +0,0 @@
|
||||
---
|
||||
title: Best Practices
|
||||
---
|
||||
|
||||
This section talks about what made us design ClearML the way we did and how it reflects on AI workflows.
|
||||
While ClearML was designed to fit into any workflow, the practices described below brings a lot of advantages from organizing one's workflow
|
||||
to preparing it to scale in the long term.
|
||||
|
||||
:::important
|
||||
The following is only an opinion. ClearML is designed to accommodate any workflow whether it conforms to our way or not!
|
||||
:::
|
||||
|
||||
## Develop Locally
|
||||
|
||||
**Work on a machine that is easily manageable!**
|
||||
|
||||
During early stages of model development, while code is still being modified heavily, this is the usual setup we'd expect to see used by data scientists:
|
||||
|
||||
- **Local development machine**, usually a laptop (and usually using only CPU) with a fraction of the dataset for faster
|
||||
iterations. Use a local machine for writing, training, and debugging pipeline code.
|
||||
- **Workstation with a GPU**, usually with a limited amount of memory for small batch-sizes. Use this workstation to train
|
||||
the model and ensure that you choose a model that makes sense, and the training procedure works. Can be used to provide initial models for testing.
|
||||
|
||||
These setups can be folded into each other and that's great! If you have a GPU machine for each researcher, that's awesome!
|
||||
The goal of this phase is to get a code, dataset, and environment set up, so you can start digging to find the best model!
|
||||
|
||||
- [ClearML SDK](../../clearml_sdk/clearml_sdk.md) should be integrated into your code (check out [ClearML Setup](../../clearml_sdk/clearml_sdk_setup.md)).
|
||||
This helps visualizing the results and tracking progress.
|
||||
- [ClearML Agent](../../clearml_agent.md) helps moving your work to other machines without the hassle of rebuilding the environment every time,
|
||||
while also creating an easy queue interface that easily lets you drop your tasks to be executed one by one
|
||||
(great for ensuring that the GPUs are churning during the weekend).
|
||||
- [ClearML Session](../../apps/clearml_session.md) helps with developing on remote machines, in the same way that you'd develop on your local laptop!
|
||||
|
||||
## Train Remotely
|
||||
|
||||
In this phase, you scale your training efforts, and try to come up with the best code / parameter / data combination that
|
||||
yields the best performing model for your task!
|
||||
|
||||
- The real training (usually) should **not** be executed on your development machine.
|
||||
- Training sessions should be launched and monitored from a web UI.
|
||||
- You should continue coding while tasks are being executed without interrupting them.
|
||||
- Stop optimizing your code because your machine struggles, and run it on a beefier machine (cloud / on-prem).
|
||||
|
||||
Visualization and comparison dashboards keep your sanity at bay! At this stage you usually have a docker container with all the binaries
|
||||
that you need.
|
||||
- [ClearML SDK](../../clearml_sdk/clearml_sdk.md) ensures that all the metrics, parameters and Models are automatically logged and can later be
|
||||
accessed, [compared](../../webapp/webapp_exp_comparing.md) and [tracked](../../webapp/webapp_exp_track_visual.md).
|
||||
- [ClearML Agent](../../clearml_agent.md) does the heavy lifting. It reproduces the execution environment, clones your code,
|
||||
applies code patches, manages parameters (including overriding them on the fly), executes the code, and queues multiple tasks.
|
||||
It can even [build](../../clearml_agent/clearml_agent_docker_exec#exporting-a-task-into-a-standalone-docker-container) the docker container for you!
|
||||
- [ClearML Pipelines](../../pipelines/pipelines.md) ensure that steps run in the same order,
|
||||
programmatically chaining tasks together, while giving an overview of the execution pipeline's status.
|
||||
|
||||
**Your entire environment should magically be able to run on any machine, without you working hard.**
|
||||
|
||||
## Track EVERYTHING
|
||||
|
||||
Track everything--from obscure parameters to weird metrics, it's impossible to know what will end up
|
||||
improving your results later on!
|
||||
|
||||
- Make sure tasks are reproducible! ClearML logs code, parameters, and environment in a single, easily searchable place.
|
||||
- Development is not linear. Configuration / Parameters should not be stored in your git, as
|
||||
they are temporary and constantly changing. They still need to be logged because who knows, one day...
|
||||
- Uncommitted changes to your code should be stored for later forensics in case that magic number actually saved the day. Not every line change should be committed.
|
||||
- Mark potentially good tasks, make them the new baseline for comparison.
|
||||
|
||||
## Visibility Matters
|
||||
|
||||
While you can track tasks with one tool, and pipeline them with another, having
|
||||
everything under the same roof has its benefits!
|
||||
|
||||
Being able to track task progress and compare tasks, and, based on that, send tasks to execution on remote
|
||||
machines (that also build the environment themselves) has tremendous benefits in terms of visibility and ease of integration.
|
||||
|
||||
Being able to have visibility in your pipeline, while using tasks already defined in the platform,
|
||||
enables users to have a clearer picture of the pipeline's status
|
||||
and makes it easier to start using pipelines earlier in the process by simplifying chaining tasks.
|
||||
|
||||
Managing datasets with the same tools and APIs that manage the tasks also lowers the barrier of entry into
|
||||
task and data provenance.
|
||||
34
docs/getting_started/hpo.md
Normal file
34
docs/getting_started/hpo.md
Normal file
@@ -0,0 +1,34 @@
|
||||
---
|
||||
title: Hyperparameter Optimization
|
||||
---
|
||||
|
||||
## What is Hyperparameter Optimization?
|
||||
Hyperparameters are variables that directly control the behaviors of training algorithms, and have a significant effect on
|
||||
the performance of the resulting machine learning models. Hyperparameter optimization (HPO) is crucial for improving
|
||||
model performance and generalization.
|
||||
|
||||
Finding the hyperparameter values that yield the best performing models can be complicated. Manually adjusting
|
||||
hyperparameters over the course of many training trials can be slow and tedious. Luckily, ClearML offers automated
|
||||
solutions to boost hyperparameter optimization efficiency.
|
||||
|
||||
## Workflow
|
||||
|
||||

|
||||
|
||||
The preceding diagram demonstrates the typical flow of hyperparameter optimization where the parameters of a base task are optimized:
|
||||
|
||||
1. Configure an Optimization Task with a base task whose parameters will be optimized, optimization targets, and a set of parameter values to
|
||||
test
|
||||
1. Clone the base task. Each clone's parameter is overridden with a value from the optimization task
|
||||
1. Enqueue each clone for execution by a ClearML Agent
|
||||
1. The Optimization Task records and monitors the cloned tasks' configuration and execution details, and returns a
|
||||
summary of the optimization results.
|
||||
|
||||
## ClearML Solutions
|
||||
|
||||
ClearML offers three solutions for hyperparameter optimization:
|
||||
* [GUI application](../webapp/applications/apps_hpo.md): The Hyperparameter Optimization app allows you to run and manage the optimization tasks
|
||||
directly from the web interface--no code necessary (available under the ClearML Pro plan).
|
||||
* [Command-Line Interface (CLI)](../apps/clearml_param_search.md): The `clearml-param-search` CLI tool enables you to configure and launch the optimization process from your terminal.
|
||||
* [Python Interface](../clearml_sdk/hpo_sdk.md): The `HyperParameterOptimizer` class within the ClearML SDK allows you to
|
||||
configure and launch optimization tasks, and seamlessly integrate them in your existing model training tasks.
|
||||
@@ -112,7 +112,7 @@ alert you whenever your model improves in accuracy)
|
||||
- Automatically scale cloud instances according to your resource needs with ClearML's
|
||||
[AWS Autoscaler](../webapp/applications/apps_aws_autoscaler.md) and [GCP Autoscaler](../webapp/applications/apps_gcp_autoscaler.md)
|
||||
GUI applications
|
||||
- Run [hyperparameter optimization](../hpo.md)
|
||||
- Run [hyperparameter optimization](hpo.md)
|
||||
- Build [pipelines](../pipelines/pipelines.md) from code
|
||||
- Much more!
|
||||
|
||||
|
||||
@@ -1,40 +0,0 @@
|
||||
---
|
||||
title: Best Practices
|
||||
---
|
||||
|
||||
In short - **automate everything** :)
|
||||
From training models to data processing to deploying to production.
|
||||
|
||||
## Development - Preparing for Automation
|
||||
Basically, track everything. There is nothing that is not worth having visibility to.
|
||||
If you are afraid of clutter, use the archive option, and set up your own [cleanup service](../../guides/services/cleanup_service.md).
|
||||
|
||||
- Track the code base. There is no reason not to add metrics to any process in your workflow, even if it is not directly ML. Visibility is key to iterative improvement of your code / workflow.
|
||||
- Create per-project [leaderboards](../../guides/ui/building_leader_board.md) based on custom columns
|
||||
(hyperparameters and performance accuracy), and bookmark them (full URL will always reproduce the same view and table).
|
||||
- Share tasks with your colleagues and team-leaders.
|
||||
Invite more people to see how your project is progressing, and suggest they add metric reporting for their own.
|
||||
These metrics can later be part of your own in-house monitoring solution, don't let good data go to waste :)
|
||||
|
||||
## Clone Tasks
|
||||
Define a ClearML Task with one of the following options:
|
||||
- Run the actual code with the `Task.init()` call. This will create and auto-populate the Task in CleaML (including Git Repo / Python Packages / Command line etc.).
|
||||
- Register local / remote code repository with `clearml-task`. See [details](../../apps/clearml_task.md).
|
||||
|
||||
Once you have a Task in ClearML, you can clone and edit its definitions in the UI, then launch it on one of your nodes with [ClearML Agent](../../clearml_agent.md).
|
||||
|
||||
## Advanced Automation
|
||||
- Create daily / weekly cron jobs for retraining best performing models on.
|
||||
- Create data monitoring & scheduling and launch inference jobs to test performance on any new coming dataset.
|
||||
- Once there are two or more tasks that run after another, group them together into a [pipeline](../../pipelines/pipelines.md).
|
||||
|
||||
## Manage Your Data
|
||||
Use [ClearML Data](../../clearml_data/clearml_data.md) to version your data, then link it to running tasks for easy reproduction.
|
||||
Make datasets machine agnostic (i.e. store original dataset in a shared storage location, e.g. shared-folder / S3 / Gs / Azure).
|
||||
ClearML Data supports efficient Dataset storage and caching, differentiable and compressed.
|
||||
|
||||
## Scale Your Work
|
||||
Use [ClearML Agent](../../clearml_agent.md) to scale work. Install the agent machines (remote or local) and manage
|
||||
training workload with it.
|
||||
|
||||
Improve team collaboration by transparent resource monitoring, always know what is running where.
|
||||
Reference in New Issue
Block a user