Fix top navigation bar highlighting (#1044)

This commit is contained in:
pollfly
2025-02-25 18:54:01 +02:00
committed by GitHub
parent 7bda34ca90
commit 02d24aff06
28 changed files with 64 additions and 51 deletions

View File

@@ -0,0 +1,20 @@
---
title: Building Task Execution Environments in a Container
---
### Base Container
Build a container according to the execution environment of a specific task.
```bash
clearml-agent build --id <task-id> --docker --target <new-docker-name>
```
You can add the container as the base container image to a task, using one of the following methods:
- Using the **ClearML Web UI** - See [Default Container](../webapp/webapp_exp_tuning.md#default-container).
- In the ClearML configuration file - Use the ClearML configuration file [`agent.default_docker`](../configs/clearml_conf.md#agentdefault_docker)
options.
Check out [this tutorial](../guides/clearml_agent/exp_environment_containers.md) for building a Docker container
replicating the execution environment of an existing task.

View File

@@ -0,0 +1,30 @@
---
title: Building Executable Task Containers
---
## Exporting a Task into a Standalone Docker Container
### Task Container
Build a Docker container that when launched executes a specific task, or a clone (copy) of that task.
- Build a Docker container that at launch will execute a specific Task:
```bash
clearml-agent build --id <task-id> --docker --target <new-docker-name> --entry-point reuse_task
```
- Build a Docker container that at launch will clone a Task specified by Task ID, and will execute the newly cloned Task:
```bash
clearml-agent build --id <task-id> --docker --target <new-docker-name> --entry-point clone_task
```
- Run built Docker by executing:
```bash
docker run <new-docker-name>
```
Check out [this tutorial](../guides/clearml_agent/executable_exp_containers.md) for building executable task
containers.

View File

@@ -0,0 +1,121 @@
---
title: Managing Agent Work Schedules
---
:::important Enterprise Feature
This feature is available under the ClearML Enterprise plan.
:::
The Agent scheduler enables scheduling working hours for each Agent. During working hours, a worker will actively poll
queues for Tasks, fetch and execute them. Outside working hours, a worker will be idle.
Schedule workers by:
* Setting configuration file options
* Running `clearml-agent` from the command line (overrides configuration file options)
Override worker schedules by:
* Setting runtime properties to force a worker on or off
* Tagging a queue on or off
## Running clearml-agent with a Schedule (Command Line)
Set a schedule for a worker from the command line when running `clearml-agent`. Two properties enable setting working hours:
:::warning
Use only one of these properties
:::
* `uptime` - Time span during which a worker will actively poll a queue(s) for Tasks, and execute them. Outside this
time span, the worker will be idle.
* `downtime` - Time span during which a worker will be idle. Outside this time span, the worker will actively poll and
execute Tasks.
Define `uptime` or `downtime` as `"<hours> <days>"`, where:
* `<hours>` - A span of hours (`00-23`) or a single hour. A single hour defines a span from that hour to midnight.
* `<days>` - A span of days (`SUN-SAT`) or a single day.
Use `-` for a span, and `,` to separate individual values. To span before midnight to after midnight, use two spans.
For example:
* `"20-23 SUN"` - 8 PM to 11 PM on Sundays.
* `"20-23 SUN,TUE"` - 8 PM to 11 PM on Sundays and Tuesdays.
* `"20-23 SUN-TUE"` - 8 PM to 11 PM on Sundays, Mondays, and Tuesdays.
* `"20 SUN"` - 8 PM to midnight on Sundays.
* `"20-00,00-08 SUN"` - 8 PM to midnight and midnight to 8 AM on Sundays
* `"20-00 SUN", "00-08 MON"` - 8 PM on Sundays to 8 AM on Mondays (spans from before midnight to after midnight).
## Setting Worker Schedules in the Configuration File
Set a schedule for a worker using configuration file options. The options are:
:::warning
Use only one of these properties
:::
* ``agent.uptime``
* ``agent.downtime``
Use the same time span format for days and hours as is used in the command line.
For example, set a worker's schedule from 5 PM to 8 PM on Sunday through Tuesday, and 1 PM to 10 PM on Wednesday.
```
agent.uptime: ["17-20 SUN-TUE", "13-22 WED"]
```
## Overriding Worker Schedules Using Runtime Properties
Runtime properties override the command line uptime / downtime properties. The runtime properties are:
:::warning
Use only one of these properties
:::
* `force:on` - Pull and execute Tasks until the property expires.
* `force:off` - Prevent pulling and execution of Tasks until the property expires.
Currently, these runtime properties can only be set using an ClearML REST API call to the `workers.set_runtime_properties`
endpoint, as follows:
* The body of the request must contain the `worker-id`, and the runtime property to add.
* An expiry date is optional. Use the format `"expiry":<time>`. For example, `"expiry":86400` will set an expiry of 24 hours.
* To delete the property, set the expiry date to zero, `"expiry":0`.
For example, to force a worker on for 24 hours:
```
curl --user <key>:<secret> --header "Content-Type: application/json" --data '{"worker":"<worker_id>","runtime_properties":[{"key": "force", "value": "on", "expiry": 86400}]}' http://<api-server-hostname-or-ip>:8008/workers.set_runtime_properties
```
## Overriding Worker Schedules Using Queue Tags
Queue tags override command line and runtime properties. The queue tags are the following:
:::warning
Use only one of these properties
:::
* ``force_workers:on`` - Any worker listening to the queue will keep pulling Tasks from the queue.
* ``force_workers:off`` - Prevent all workers listening to the queue from pulling Tasks from the queue.
Currently, you can set queue tags using an ClearML REST API call to the ``queues.update`` endpoint, or the
APIClient. The body of the call must contain the ``queue-id`` and the tags to add.
For example, force workers on for a queue using the APIClient:
```python
from clearml.backend_api.session.client import APIClient
client = APIClient()
client.queues.update(queue="<queue_id>", tags=["force_workers:on"])
```
Or, force workers on for a queue using the REST API:
```bash
curl --user <key>:<secret> --header "Content-Type: application/json" --data '{"queue":"<queue_id>","tags":["force_workers:on"]}' http://<api-server-hostname-or-ip>:8008/queues.update
```

View File

@@ -1,80 +0,0 @@
---
title: Best Practices
---
This section talks about what made us design ClearML the way we did and how it reflects on AI workflows.
While ClearML was designed to fit into any workflow, the practices described below brings a lot of advantages from organizing one's workflow
to preparing it to scale in the long term.
:::important
The following is only an opinion. ClearML is designed to accommodate any workflow whether it conforms to our way or not!
:::
## Develop Locally
**Work on a machine that is easily manageable!**
During early stages of model development, while code is still being modified heavily, this is the usual setup we'd expect to see used by data scientists:
- **Local development machine**, usually a laptop (and usually using only CPU) with a fraction of the dataset for faster
iterations. Use a local machine for writing, training, and debugging pipeline code.
- **Workstation with a GPU**, usually with a limited amount of memory for small batch-sizes. Use this workstation to train
the model and ensure that you choose a model that makes sense, and the training procedure works. Can be used to provide initial models for testing.
These setups can be folded into each other and that's great! If you have a GPU machine for each researcher, that's awesome!
The goal of this phase is to get a code, dataset, and environment set up, so you can start digging to find the best model!
- [ClearML SDK](../../clearml_sdk/clearml_sdk.md) should be integrated into your code (check out [ClearML Setup](../../clearml_sdk/clearml_sdk_setup.md)).
This helps visualizing the results and tracking progress.
- [ClearML Agent](../../clearml_agent.md) helps moving your work to other machines without the hassle of rebuilding the environment every time,
while also creating an easy queue interface that easily lets you drop your tasks to be executed one by one
(great for ensuring that the GPUs are churning during the weekend).
- [ClearML Session](../../apps/clearml_session.md) helps with developing on remote machines, in the same way that you'd develop on your local laptop!
## Train Remotely
In this phase, you scale your training efforts, and try to come up with the best code / parameter / data combination that
yields the best performing model for your task!
- The real training (usually) should **not** be executed on your development machine.
- Training sessions should be launched and monitored from a web UI.
- You should continue coding while tasks are being executed without interrupting them.
- Stop optimizing your code because your machine struggles, and run it on a beefier machine (cloud / on-prem).
Visualization and comparison dashboards keep your sanity at bay! At this stage you usually have a docker container with all the binaries
that you need.
- [ClearML SDK](../../clearml_sdk/clearml_sdk.md) ensures that all the metrics, parameters and Models are automatically logged and can later be
accessed, [compared](../../webapp/webapp_exp_comparing.md) and [tracked](../../webapp/webapp_exp_track_visual.md).
- [ClearML Agent](../../clearml_agent.md) does the heavy lifting. It reproduces the execution environment, clones your code,
applies code patches, manages parameters (including overriding them on the fly), executes the code, and queues multiple tasks.
It can even [build](../../clearml_agent/clearml_agent_docker_exec#exporting-a-task-into-a-standalone-docker-container) the docker container for you!
- [ClearML Pipelines](../../pipelines/pipelines.md) ensure that steps run in the same order,
programmatically chaining tasks together, while giving an overview of the execution pipeline's status.
**Your entire environment should magically be able to run on any machine, without you working hard.**
## Track EVERYTHING
Track everything--from obscure parameters to weird metrics, it's impossible to know what will end up
improving your results later on!
- Make sure tasks are reproducible! ClearML logs code, parameters, and environment in a single, easily searchable place.
- Development is not linear. Configuration / Parameters should not be stored in your git, as
they are temporary and constantly changing. They still need to be logged because who knows, one day...
- Uncommitted changes to your code should be stored for later forensics in case that magic number actually saved the day. Not every line change should be committed.
- Mark potentially good tasks, make them the new baseline for comparison.
## Visibility Matters
While you can track tasks with one tool, and pipeline them with another, having
everything under the same roof has its benefits!
Being able to track task progress and compare tasks, and, based on that, send tasks to execution on remote
machines (that also build the environment themselves) has tremendous benefits in terms of visibility and ease of integration.
Being able to have visibility in your pipeline, while using tasks already defined in the platform,
enables users to have a clearer picture of the pipeline's status
and makes it easier to start using pipelines earlier in the process by simplifying chaining tasks.
Managing datasets with the same tools and APIs that manage the tasks also lowers the barrier of entry into
task and data provenance.

View File

@@ -0,0 +1,34 @@
---
title: Hyperparameter Optimization
---
## What is Hyperparameter Optimization?
Hyperparameters are variables that directly control the behaviors of training algorithms, and have a significant effect on
the performance of the resulting machine learning models. Hyperparameter optimization (HPO) is crucial for improving
model performance and generalization.
Finding the hyperparameter values that yield the best performing models can be complicated. Manually adjusting
hyperparameters over the course of many training trials can be slow and tedious. Luckily, ClearML offers automated
solutions to boost hyperparameter optimization efficiency.
## Workflow
![Hyperparameter optimization diagram](../img/hpo_diagram.png)
The preceding diagram demonstrates the typical flow of hyperparameter optimization where the parameters of a base task are optimized:
1. Configure an Optimization Task with a base task whose parameters will be optimized, optimization targets, and a set of parameter values to
test
1. Clone the base task. Each clone's parameter is overridden with a value from the optimization task
1. Enqueue each clone for execution by a ClearML Agent
1. The Optimization Task records and monitors the cloned tasks' configuration and execution details, and returns a
summary of the optimization results.
## ClearML Solutions
ClearML offers three solutions for hyperparameter optimization:
* [GUI application](../webapp/applications/apps_hpo.md): The Hyperparameter Optimization app allows you to run and manage the optimization tasks
directly from the web interface--no code necessary (available under the ClearML Pro plan).
* [Command-Line Interface (CLI)](../apps/clearml_param_search.md): The `clearml-param-search` CLI tool enables you to configure and launch the optimization process from your terminal.
* [Python Interface](../clearml_sdk/hpo_sdk.md): The `HyperParameterOptimizer` class within the ClearML SDK allows you to
configure and launch optimization tasks, and seamlessly integrate them in your existing model training tasks.

View File

@@ -112,7 +112,7 @@ alert you whenever your model improves in accuracy)
- Automatically scale cloud instances according to your resource needs with ClearML's
[AWS Autoscaler](../webapp/applications/apps_aws_autoscaler.md) and [GCP Autoscaler](../webapp/applications/apps_gcp_autoscaler.md)
GUI applications
- Run [hyperparameter optimization](../hpo.md)
- Run [hyperparameter optimization](hpo.md)
- Build [pipelines](../pipelines/pipelines.md) from code
- Much more!

View File

@@ -1,40 +0,0 @@
---
title: Best Practices
---
In short - **automate everything** :)
From training models to data processing to deploying to production.
## Development - Preparing for Automation
Basically, track everything. There is nothing that is not worth having visibility to.
If you are afraid of clutter, use the archive option, and set up your own [cleanup service](../../guides/services/cleanup_service.md).
- Track the code base. There is no reason not to add metrics to any process in your workflow, even if it is not directly ML. Visibility is key to iterative improvement of your code / workflow.
- Create per-project [leaderboards](../../guides/ui/building_leader_board.md) based on custom columns
(hyperparameters and performance accuracy), and bookmark them (full URL will always reproduce the same view and table).
- Share tasks with your colleagues and team-leaders.
Invite more people to see how your project is progressing, and suggest they add metric reporting for their own.
These metrics can later be part of your own in-house monitoring solution, don't let good data go to waste :)
## Clone Tasks
Define a ClearML Task with one of the following options:
- Run the actual code with the `Task.init()` call. This will create and auto-populate the Task in CleaML (including Git Repo / Python Packages / Command line etc.).
- Register local / remote code repository with `clearml-task`. See [details](../../apps/clearml_task.md).
Once you have a Task in ClearML, you can clone and edit its definitions in the UI, then launch it on one of your nodes with [ClearML Agent](../../clearml_agent.md).
## Advanced Automation
- Create daily / weekly cron jobs for retraining best performing models on.
- Create data monitoring & scheduling and launch inference jobs to test performance on any new coming dataset.
- Once there are two or more tasks that run after another, group them together into a [pipeline](../../pipelines/pipelines.md).
## Manage Your Data
Use [ClearML Data](../../clearml_data/clearml_data.md) to version your data, then link it to running tasks for easy reproduction.
Make datasets machine agnostic (i.e. store original dataset in a shared storage location, e.g. shared-folder / S3 / Gs / Azure).
ClearML Data supports efficient Dataset storage and caching, differentiable and compressed.
## Scale Your Work
Use [ClearML Agent](../../clearml_agent.md) to scale work. Install the agent machines (remote or local) and manage
training workload with it.
Improve team collaboration by transparent resource monitoring, always know what is running where.