clearml-docs/docs/getting_started/ds/best_practices.md

81 lines
5.2 KiB
Markdown
Raw Normal View History

2021-05-13 23:48:51 +00:00
---
title: Best Practices
---
2023-11-14 16:51:45 +00:00
This section talks about what made us design ClearML the way we did and how it reflects on AI workflows.
2024-02-06 14:06:43 +00:00
While ClearML was designed to fit into any workflow, the practices described below brings a lot of advantages from organizing one's workflow
to preparing it to scale in the long term.
2021-05-13 23:48:51 +00:00
:::important
2024-03-27 09:56:21 +00:00
The following is only an opinion. ClearML is designed to accommodate any workflow whether it conforms to our way or not!
2021-05-13 23:48:51 +00:00
:::
## Develop Locally
2021-09-30 06:54:52 +00:00
**Work on a machine that is easily manageable!**
2021-05-13 23:48:51 +00:00
During early stages of model development, while code is still being modified heavily, this is the usual setup we'd expect to see used by data scientists:
2024-03-27 09:56:21 +00:00
- **Local development machine**, usually a laptop (and usually using only CPU) with a fraction of the dataset for faster
iterations. Use a local machine for writing, training, and debugging pipeline code.
- **Workstation with a GPU**, usually with a limited amount of memory for small batch-sizes. Use this workstation to train
2022-01-18 11:23:47 +00:00
the model and ensure that you choose a model that makes sense, and the training procedure works. Can be used to provide initial models for testing.
2021-05-13 23:48:51 +00:00
2021-09-30 06:54:52 +00:00
The abovementioned setups might be folded into each other and that's great! If you have a GPU machine for each researcher, that's awesome!
2024-02-06 14:06:43 +00:00
The goal of this phase is to get a code, dataset, and environment set up, so you can start digging to find the best model!
2021-05-13 23:48:51 +00:00
2023-08-09 10:28:25 +00:00
- [ClearML SDK](../../clearml_sdk/clearml_sdk.md) should be integrated into your code (check out [Getting Started](ds_first_steps.md)).
2021-09-30 06:54:52 +00:00
This helps visualizing the results and tracking progress.
2021-05-13 23:48:51 +00:00
- [ClearML Agent](../../clearml_agent.md) helps moving your work to other machines without the hassle of rebuilding the environment every time,
2024-02-15 13:28:26 +00:00
while also creating an easy queue interface that easily lets you drop your experiments to be executed one by one
2021-09-30 06:54:52 +00:00
(great for ensuring that the GPUs are churning during the weekend).
2024-02-15 13:28:26 +00:00
- [ClearML Session](../../apps/clearml_session.md) helps with developing on remote machines, in the same way that you'd develop on your local laptop!
2021-05-13 23:48:51 +00:00
## Train Remotely
2023-02-16 10:17:53 +00:00
In this phase, you scale your training efforts, and try to come up with the best code / parameter / data combination that
yields the best performing model for your task!
2021-05-13 23:48:51 +00:00
- The real training (usually) should **not** be executed on your development machine.
- Training sessions should be launched and monitored from a web UI.
- You should continue coding while experiments are being executed without interrupting them.
2021-09-30 06:54:52 +00:00
- Stop optimizing your code because your machine struggles, and run it on a beefier machine (cloud / on-prem).
2021-05-13 23:48:51 +00:00
2023-09-11 10:33:30 +00:00
Visualization and comparison dashboards keep your sanity at bay! At this stage you usually have a docker container with all the binaries
2023-09-04 12:40:42 +00:00
that you need.
2022-05-18 08:49:31 +00:00
- [ClearML SDK](../../clearml_sdk/clearml_sdk.md) ensures that all the metrics, parameters and Models are automatically logged and can later be
2021-05-13 23:48:51 +00:00
accessed, [compared](../../webapp/webapp_exp_comparing.md) and [tracked](../../webapp/webapp_exp_track_visual.md).
2021-09-30 06:54:52 +00:00
- [ClearML Agent](../../clearml_agent.md) does the heavy lifting. It reproduces the execution environment, clones your code,
2023-05-04 07:52:09 +00:00
applies code patches, manages parameters (including overriding them on the fly), executes the code, and queues multiple tasks.
2024-07-15 12:53:41 +00:00
It can even [build](../../clearml_agent/clearml_agent_docker.md#exporting-a-task-into-a-standalone-docker-container) the docker container for you!
2022-04-26 10:42:55 +00:00
- [ClearML Pipelines](../../pipelines/pipelines.md) ensure that steps run in the same order,
2021-09-30 06:54:52 +00:00
programmatically chaining tasks together, while giving an overview of the execution pipeline's status.
2021-05-13 23:48:51 +00:00
**Your entire environment should magically be able to run on any machine, without you working hard.**
## Track EVERYTHING
2023-02-16 10:17:53 +00:00
Track everything--from obscure parameters to weird metrics, it's impossible to know what will end up
improving your results later on!
2021-05-13 23:48:51 +00:00
2023-02-21 08:32:54 +00:00
- Make sure experiments are reproducible! ClearML logs code, parameters, and environment in a single, easily searchable place.
2021-10-03 08:08:45 +00:00
- Development is not linear. Configuration / Parameters should not be stored in your git, as
they are temporary and constantly changing. They still need to be logged because who knows, one day...
2021-05-13 23:48:51 +00:00
- Uncommitted changes to your code should be stored for later forensics in case that magic number actually saved the day. Not every line change should be committed.
- Mark potentially good experiments, make them the new baseline for comparison.
## Visibility Matters
2024-02-15 13:28:26 +00:00
While you can track experiments with one tool, and pipeline them with another, having
2021-09-30 06:54:52 +00:00
everything under the same roof has its benefits!
2023-11-28 08:03:58 +00:00
Being able to track experiment progress and compare experiments, and, based on that, send experiments to execution on remote
2021-09-30 06:54:52 +00:00
machines (that also build the environment themselves) has tremendous benefits in terms of visibility and ease of integration.
2023-02-21 08:32:54 +00:00
Being able to have visibility in your pipeline, while using experiments already defined in the platform,
2021-09-30 06:54:52 +00:00
enables users to have a clearer picture of the pipeline's status
and makes it easier to start using pipelines earlier in the process by simplifying chaining tasks.
2021-06-20 22:00:16 +00:00
Managing datasets with the same tools and APIs that manage the experiments also lowers the barrier of entry into
experiment and data provenance.