diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index cb05db93..8f8f2bf7 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -28,4 +28,4 @@ jobs: # Runs a single command using the runners shell - name: Run a one-line script run: | - grep -r -Eoh "(https?://github.com/[a-zA-Z0-9./?=_%:-]*)" $GITHUB_WORKSPACE | sort -u | grep -v "://github.com/allegroai/clearml-docs" | xargs -n 1 sh -c 'curl --output /dev/null --silent --head --fail $0 || curl --output /dev/null --silent --head --fail --write-out "%{url_effective}: %{http_code}\n" $0' + grep -r -Eoh "(https?://github.com/[a-zA-Z0-9./?=_%:-]*)" $GITHUB_WORKSPACE | sort -u | grep -v "://github.com/clearml/clearml-docs" | xargs -n 1 sh -c 'curl --output /dev/null --silent --head --fail $0 || curl --output /dev/null --silent --head --fail --write-out "%{url_effective}: %{http_code}\n" $0' diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml index 878dfa8a..843730c9 100644 --- a/.github/workflows/test.yml +++ b/.github/workflows/test.yml @@ -25,7 +25,7 @@ jobs: env: INCOMING_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }} with: - text: Link Checker failure in github.com/allegroai/clearl-docs + text: Link Checker failure in github.com/clearml/clearl-docs blocks: | [ {"type": "section", "text": {"type": "mrkdwn", "text": "Testing!"}} diff --git a/docs/apps/clearml_param_search.md b/docs/apps/clearml_param_search.md index b15aa9de..094fb443 100644 --- a/docs/apps/clearml_param_search.md +++ b/docs/apps/clearml_param_search.md @@ -34,7 +34,7 @@ of the optimization results in table and graph forms. |`--objective-metric-sign`| Optimization target, whether to maximize or minimize the value of the objective metric specified. Possible values: "min", "max", "min_global", "max_global". For more information, see [Optimization Objective](#optimization-objective). |Yes| |`--objective-metric-title`| Objective metric title to maximize/minimize (e.g. 'validation').|Yes| |`--optimization-time-limit`|The maximum time (minutes) for the optimization to run. The default is `None`, indicating no time limit.|No| -|`--optimizer-class`|The optimizer to use. Possible values are: OptimizerOptuna (default), OptimizerBOHB, GridSearch, RandomSearch. For more information, see [Supported Optimizers](../fundamentals/hpo.md#supported-optimizers). |Yes| +|`--optimizer-class`|The optimizer to use. Possible values are: OptimizerOptuna (default), OptimizerBOHB, GridSearch, RandomSearch. For more information, see [Supported Optimizers](../clearml_sdk/hpo_sdk.md#supported-optimizers). |Yes| |`--params-search`|Parameters space for optimization. See more information in [Specifying the Parameter Space](#specifying-the-parameter-space). |Yes| |`--params-override`|Additional parameters of the base task to override for this parameter search. Use the following JSON format for each parameter: `{"name": "param_name", "value": }`. Windows users, see [JSON format note](#json_note).|No| |`--pool-period-min`|The time between two consecutive polls (minutes).|No| diff --git a/docs/apps/clearml_session.md b/docs/apps/clearml_session.md index 91f7d276..fd5de9c3 100644 --- a/docs/apps/clearml_session.md +++ b/docs/apps/clearml_session.md @@ -23,7 +23,7 @@ VS Code remote sessions use ports 8878 and 8898 respectively. ## Prerequisites -* `clearml` installed and configured. See [Getting Started](../getting_started/ds/ds_first_steps.md) for details. +* `clearml` installed and configured. See [ClearML Setup](../clearml_sdk/clearml_sdk_setup) for details. * At least one `clearml-agent` running on a remote host. See [installation](../clearml_agent/clearml_agent_setup.md#installation) for details. * An SSH client installed on your machine. To verify, open your terminal and execute `ssh`. If you did not receive an error, you are good to go. @@ -56,7 +56,7 @@ error, you are good to go. 1. The session Task is enqueued in the selected queue, and a ClearML Agent pulls and executes it. The agent downloads the appropriate IDE(s) and launches it. -1. Once the agent finishes the initial setup of the interactive Task, the local `cleaml-session` connects to the host +1. Once the agent finishes the initial setup of the interactive Task, the local `clearml-session` connects to the host machine via SSH, and tunnels both SSH and IDE over the SSH connection. If a container is specified, the IDE environment runs inside of it. @@ -142,7 +142,7 @@ sessions: maxServices: 20 ``` -For more information, see [Kubernetes](../clearml_agent/clearml_agent_deployment.md#kubernetes). +For more information, see [Kubernetes](../clearml_agent/clearml_agent_deployment_k8s.md). ### Installing Requirements diff --git a/docs/apps/clearml_task.md b/docs/apps/clearml_task.md index f40bf3cd..00fb0379 100644 --- a/docs/apps/clearml_task.md +++ b/docs/apps/clearml_task.md @@ -11,7 +11,7 @@ line arguments, Python module dependencies, and a requirements.txt file! ## What Is ClearML Task For? * Launching off-the-shelf code on a remote machine with dedicated resources (e.g. GPU) -* Running [hyperparameter optimization](../fundamentals/hpo.md) on a codebase that is still not in ClearML +* Running [hyperparameter optimization](../getting_started/hpo.md) on a codebase that is still not in ClearML * Creating a pipeline from an assortment of scripts, that you need to turn into ClearML tasks * Running some code on a remote machine, either using an on-prem cluster or on the cloud diff --git a/docs/clearml_data/best_practices.md b/docs/best_practices/data_best_practices.md similarity index 89% rename from docs/clearml_data/best_practices.md rename to docs/best_practices/data_best_practices.md index aaf32892..04d6950c 100644 --- a/docs/clearml_data/best_practices.md +++ b/docs/best_practices/data_best_practices.md @@ -9,7 +9,8 @@ See [Hyper-Datasets](../hyperdatasets/overview.md) for ClearML's advanced querya The following are some recommendations for using ClearML Data. -![Dataset UI gif](../img/gif/dataset.gif) +![Dataset UI gif](../img/gif/dataset.gif#light-mode-only) +![Dataset UI gif](../img/gif/dataset_dark.gif#dark-mode-only) ## Versioning Datasets @@ -25,7 +26,7 @@ version contents ready to be updated. Organize the datasets according to use-cases and use tags. This makes managing multiple datasets and accessing the most updated datasets for different use-cases easier. -Like any ClearML tasks, datasets can be organized into [projects (and subprojects)](../fundamentals/projects.md#creating-subprojects). +Like any ClearML tasks, datasets can be organized into [projects (and subprojects)](../fundamentals/projects.md#creating-projects-and-subprojects). Additionally, when creating a dataset, tags can be applied to the dataset, which will make searching for the dataset easier. Organizing your datasets into projects by use-case makes it easier to access the most recent dataset version for that use-case. @@ -55,5 +56,5 @@ serves as a dataset's single point of truth, you can schedule a script which use will update the dataset based on the modifications made to the folder. This way, there is no need to manually modify a dataset. This functionality will also track the modifications made to a folder. -See the sync function with the [CLI](clearml_data_cli.md#sync) or [SDK](clearml_data_sdk.md#syncing-local-storage) +See the sync function with the [CLI](../clearml_data/clearml_data_cli.md#sync) or [SDK](../clearml_data/clearml_data_sdk.md#syncing-local-storage) interface. diff --git a/docs/getting_started/ds/best_practices.md b/docs/best_practices/data_scientist_best_practices.md similarity index 79% rename from docs/getting_started/ds/best_practices.md rename to docs/best_practices/data_scientist_best_practices.md index 1952e3c4..f092791c 100644 --- a/docs/getting_started/ds/best_practices.md +++ b/docs/best_practices/data_scientist_best_practices.md @@ -24,12 +24,12 @@ During early stages of model development, while code is still being modified hea These setups can be folded into each other and that's great! If you have a GPU machine for each researcher, that's awesome! The goal of this phase is to get a code, dataset, and environment set up, so you can start digging to find the best model! -- [ClearML SDK](../../clearml_sdk/clearml_sdk.md) should be integrated into your code (check out [Getting Started](ds_first_steps.md)). +- [ClearML SDK](../clearml_sdk/clearml_sdk.md) should be integrated into your code (check out [ClearML Setup](../clearml_sdk/clearml_sdk_setup.md)). This helps visualizing the results and tracking progress. -- [ClearML Agent](../../clearml_agent.md) helps moving your work to other machines without the hassle of rebuilding the environment every time, +- [ClearML Agent](../clearml_agent.md) helps moving your work to other machines without the hassle of rebuilding the environment every time, while also creating an easy queue interface that easily lets you drop your tasks to be executed one by one (great for ensuring that the GPUs are churning during the weekend). -- [ClearML Session](../../apps/clearml_session.md) helps with developing on remote machines, in the same way that you'd develop on your local laptop! +- [ClearML Session](../apps/clearml_session.md) helps with developing on remote machines, in the same way that you'd develop on your local laptop! ## Train Remotely @@ -43,12 +43,12 @@ yields the best performing model for your task! Visualization and comparison dashboards keep your sanity at bay! At this stage you usually have a docker container with all the binaries that you need. -- [ClearML SDK](../../clearml_sdk/clearml_sdk.md) ensures that all the metrics, parameters and Models are automatically logged and can later be - accessed, [compared](../../webapp/webapp_exp_comparing.md) and [tracked](../../webapp/webapp_exp_track_visual.md). -- [ClearML Agent](../../clearml_agent.md) does the heavy lifting. It reproduces the execution environment, clones your code, +- [ClearML SDK](../clearml_sdk/clearml_sdk.md) ensures that all the metrics, parameters and Models are automatically logged and can later be + accessed, [compared](../webapp/webapp_exp_comparing.md) and [tracked](../webapp/webapp_exp_track_visual.md). +- [ClearML Agent](../clearml_agent.md) does the heavy lifting. It reproduces the execution environment, clones your code, applies code patches, manages parameters (including overriding them on the fly), executes the code, and queues multiple tasks. - It can even [build](../../clearml_agent/clearml_agent_docker.md#exporting-a-task-into-a-standalone-docker-container) the docker container for you! -- [ClearML Pipelines](../../pipelines/pipelines.md) ensure that steps run in the same order, + It can even [build](../../clearml_agent/clearml_agent_docker_exec#exporting-a-task-into-a-standalone-docker-container) the docker container for you! +- [ClearML Pipelines](../pipelines/pipelines.md) ensure that steps run in the same order, programmatically chaining tasks together, while giving an overview of the execution pipeline's status. **Your entire environment should magically be able to run on any machine, without you working hard.** diff --git a/docs/getting_started/mlops/mlops_best_practices.md b/docs/best_practices/mlops_best_practices.md similarity index 74% rename from docs/getting_started/mlops/mlops_best_practices.md rename to docs/best_practices/mlops_best_practices.md index 0ec64a66..71c9d7d7 100644 --- a/docs/getting_started/mlops/mlops_best_practices.md +++ b/docs/best_practices/mlops_best_practices.md @@ -7,10 +7,10 @@ From training models to data processing to deploying to production. ## Development - Preparing for Automation Basically, track everything. There is nothing that is not worth having visibility to. -If you are afraid of clutter, use the archive option, and set up your own [cleanup service](../../guides/services/cleanup_service.md). +If you are afraid of clutter, use the archive option, and set up your own [cleanup service](../guides/services/cleanup_service.md). - Track the code base. There is no reason not to add metrics to any process in your workflow, even if it is not directly ML. Visibility is key to iterative improvement of your code / workflow. -- Create per-project [leaderboards](../../guides/ui/building_leader_board.md) based on custom columns +- Create per-project [leaderboards](../guides/ui/building_leader_board.md) based on custom columns (hyperparameters and performance accuracy), and bookmark them (full URL will always reproduce the same view and table). - Share tasks with your colleagues and team-leaders. Invite more people to see how your project is progressing, and suggest they add metric reporting for their own. @@ -18,23 +18,23 @@ If you are afraid of clutter, use the archive option, and set up your own [clean ## Clone Tasks Define a ClearML Task with one of the following options: -- Run the actual code with the `Task.init()` call. This will create and auto-populate the Task in CleaML (including Git Repo / Python Packages / Command line etc.). -- Register local / remote code repository with `clearml-task`. See [details](../../apps/clearml_task.md). +- Run the actual code with the `Task.init()` call. This will create and auto-populate the Task in ClearML (including Git Repo / Python Packages / Command line etc.). +- Register local / remote code repository with `clearml-task`. See [details](../apps/clearml_task.md). -Once you have a Task in ClearML, you can clone and edit its definitions in the UI, then launch it on one of your nodes with [ClearML Agent](../../clearml_agent.md). +Once you have a Task in ClearML, you can clone and edit its definitions in the UI, then launch it on one of your nodes with [ClearML Agent](../clearml_agent.md). ## Advanced Automation - Create daily / weekly cron jobs for retraining best performing models on. - Create data monitoring & scheduling and launch inference jobs to test performance on any new coming dataset. -- Once there are two or more tasks that run after another, group them together into a [pipeline](../../pipelines/pipelines.md). +- Once there are two or more tasks that run after another, group them together into a [pipeline](../pipelines/pipelines.md). ## Manage Your Data -Use [ClearML Data](../../clearml_data/clearml_data.md) to version your data, then link it to running tasks for easy reproduction. +Use [ClearML Data](../clearml_data/clearml_data.md) to version your data, then link it to running tasks for easy reproduction. Make datasets machine agnostic (i.e. store original dataset in a shared storage location, e.g. shared-folder / S3 / Gs / Azure). ClearML Data supports efficient Dataset storage and caching, differentiable and compressed. ## Scale Your Work -Use [ClearML Agent](../../clearml_agent.md) to scale work. Install the agent machines (remote or local) and manage +Use [ClearML Agent](../clearml_agent.md) to scale work. Install the agent machines (remote or local) and manage training workload with it. Improve team collaboration by transparent resource monitoring, always know what is running where. diff --git a/docs/build_interactive_models.md b/docs/build_interactive_models.md new file mode 100644 index 00000000..eda59ab0 --- /dev/null +++ b/docs/build_interactive_models.md @@ -0,0 +1,18 @@ +--- +title: Build Interactive Model Demos +--- + +:::info Enterprise Feature +The Gradio Launcher and Streamlit Apps are available under the ClearML Enterprise plan. +::: + +ClearML supports quickly creating web-based interfaces for AI models, making it easier to +test, demo, and iterate on new capabilities. With ClearML's built-in orchestration, you can effortlessly launch, manage, +and optimize AI-powered applications to accelerate their way to production. + +ClearML provides the following applications for building an interactive model interface: +* [Gradio Launcher](webapp/applications/apps_gradio.md) +* [Streamlit Launcher](webapp/applications/apps_streamlit.md) + +![Gradio Dashboard](img/apps_gradio.png#light-mode-only) +![Gradio Dashboard](img/apps_gradio_dark.png#dark-mode-only) \ No newline at end of file diff --git a/docs/clearml_agent.md b/docs/clearml_agent.md index d6be238c..0750c4b8 100644 --- a/docs/clearml_agent.md +++ b/docs/clearml_agent.md @@ -60,9 +60,9 @@ original values: * Code-level configuration instrumented with [`Task.connect()`](references/sdk/task.md#connect) will be overridden by modified hyperparameters ClearML Agent can be deployed in various setups to suit different workflows and infrastructure needs: -* [Bare Metal](clearml_agent/clearml_agent_deployment.md#spinning-up-an-agent) -* [Kubernetes](clearml_agent/clearml_agent_deployment.md#kubernetes) -* [Slurm](clearml_agent/clearml_agent_deployment.md#slurm) +* [Bare Metal](clearml_agent/clearml_agent_deployment_bare_metal.md#spinning-up-an-agent) +* [Kubernetes](clearml_agent/clearml_agent_deployment_k8s.md) +* [Slurm](clearml_agent/clearml_agent_deployment_slurm.md) * [Google Colab](guides/ide/google_colab.md) ## References diff --git a/docs/clearml_agent/clearml_agent_deployment.md b/docs/clearml_agent/clearml_agent_deployment.md deleted file mode 100644 index c23ca2e1..00000000 --- a/docs/clearml_agent/clearml_agent_deployment.md +++ /dev/null @@ -1,292 +0,0 @@ ---- -title: Deployment ---- - -## Spinning Up an Agent -You can spin up an agent on any machine: on-prem and/or cloud instance. When spinning up an agent, you assign it to -service a queue(s). Utilize the machine by enqueuing tasks to the queue that the agent is servicing, and the agent will -pull and execute the tasks. - -:::tip cross-platform execution -ClearML Agent is platform-agnostic. When using the ClearML Agent to execute tasks cross-platform, set platform -specific environment variables before launching the agent. - -For example, to run an agent on an ARM device, set the core type environment variable before spinning up the agent: - -```bash -export OPENBLAS_CORETYPE=ARMV8 -clearml-agent daemon --queue -``` -::: - -### Executing an Agent -To execute an agent, listening to a queue, run: - -```bash -clearml-agent daemon --queue -``` - -### Executing in Background -To execute an agent in the background, run: -```bash -clearml-agent daemon --queue --detached -``` -### Stopping Agents -To stop an agent running in the background, run: -```bash -clearml-agent daemon --stop -``` - -### Allocating Resources -To specify GPUs associated with the agent, add the `--gpus` flag. - -:::info Docker Mode -Make sure to include the `--docker` flag, as GPU management through the agent is only supported in [Docker Mode](clearml_agent_execution_env.md#docker-mode). -::: - -To execute multiple agents on the same machine (usually assigning GPU for the different agents), run: -```bash -clearml-agent daemon --gpus 0 --queue default --docker -clearml-agent daemon --gpus 1 --queue default --docker -``` -To allocate more than one GPU, provide a list of allocated GPUs -```bash -clearml-agent daemon --gpus 0,1 --queue dual_gpu --docker -``` - -### Queue Prioritization -A single agent can listen to multiple queues. The priority is set by their order. - -```bash -clearml-agent daemon --queue high_q low_q -``` -This ensures the agent first tries to pull a Task from the `high_q` queue, and only if it is empty, the agent will try to pull -from the `low_q` queue. - -To make sure an agent pulls from all queues equally, add the `--order-fairness` flag. -```bash -clearml-agent daemon --queue group_a group_b --order-fairness -``` -It will make sure the agent will pull from the `group_a` queue, then from `group_b`, then back to `group_a`, etc. This ensures -that `group_a` or `group_b` will not be able to starve one another of resources. - -### SSH Access -By default, ClearML Agent maps the host's `~/.ssh` into the container's `/root/.ssh` directory (configurable, -see [clearml.conf](../configs/clearml_conf.md#docker_internal_mounts)). - -If you want to use existing auth sockets with ssh-agent, you can verify your host ssh-agent is working correctly with: - -```commandline -echo $SSH_AUTH_SOCK -``` - -You should see a path to a temporary file, something like this: - -```console -/tmp/ssh-/agent. -``` - -Then run your `clearml-agent` in Docker mode, which will automatically detect the `SSH_AUTH_SOCK` environment variable, -and mount the socket into any container it spins. - -You can also explicitly set the `SSH_AUTH_SOCK` environment variable when executing an agent. The command below will -execute an agent in Docker mode and assign it to service a queue. The agent will have access to -the SSH socket provided in the environment variable. - -``` -SSH_AUTH_SOCK= clearml-agent daemon --gpus --queue --docker -``` - -## Kubernetes - -Agents can be deployed bare-metal or as Docker containers in a Kubernetes cluster. ClearML Agent adds missing scheduling capabilities to Kubernetes, enabling more flexible automation from code while leveraging all of ClearML Agent's features. - -ClearML Agent is deployed onto a Kubernetes cluster using **Kubernetes-Glue**, which maps ClearML jobs directly to Kubernetes jobs. This allows seamless task execution and resource allocation across your cluster. - -### Deployment Options -You can deploy ClearML Agent onto Kubernetes using one of the following methods: - -1. **ClearML Agent Helm Chart (Recommended)**: - Use the [ClearML Agent Helm Chart](https://github.com/clearml/clearml-helm-charts/tree/main/charts/clearml-agent) to spin up an agent pod acting as a controller. This is the recommended and scalable approach. - -2. **K8s Glue Script**: - Run a [K8s Glue script](https://github.com/clearml/clearml-agent/blob/master/examples/k8s_glue_example.py) on a Kubernetes CPU node. This approach is less scalable and typically suited for simpler use cases. - -### How It Works -The ClearML Kubernetes-Glue performs the following: -- Pulls jobs from the ClearML execution queue. -- Prepares a Kubernetes job based on a provided YAML template. -- Inside each job pod, the `clearml-agent`: - - Installs the required environment for the task. - - Executes and monitors the task process. - -:::important Enterprise Features -ClearML Enterprise adds advanced Kubernetes features: -- **Multi-Queue Support**: Service multiple ClearML queues within the same Kubernetes cluster. -- **Pod-Specific Templates**: Define resource configurations per queue using pod templates. - -For example, you can configure resources for different queues as shown below: - -```yaml -agentk8sglue: - queues: - example_queue_1: - templateOverrides: - nodeSelector: - nvidia.com/gpu.product: A100-SXM4-40GB-MIG-1g.5gb - resources: - limits: - nvidia.com/gpu: 1 - example_queue_2: - templateOverrides: - nodeSelector: - nvidia.com/gpu.product: A100-SXM4-40GB - resources: - limits: - nvidia.com/gpu: 2 -``` -::: - -## Slurm - -:::important Enterprise Feature -Slurm Glue is available under the ClearML Enterprise plan. -::: - -Agents can be deployed bare-metal or inside [`Singularity`](https://docs.sylabs.io/guides/3.5/user-guide/introduction.html) -containers in Linux clusters managed with Slurm. - -ClearML Agent Slurm Glue maps jobs to Slurm batch scripts: associate a ClearML queue to a batch script template, then -when a Task is pushed into the queue, it will be converted and executed as an `sbatch` job according to the sbatch -template specification attached to the queue. - -1. Install the Slurm Glue on a machine where you can run `sbatch` / `squeue` etc. - - ``` - pip3 install -U --extra-index-url https://*****@*****.allegro.ai/repository/clearml_agent_slurm/simple clearml-agent-slurm - ``` - -1. Create a batch template. Make sure to set the `SBATCH` variables to the resources you want to attach to the queue. - The script below sets up an agent to run bare-metal, creating a virtual environment per job. For example: - - ``` - #!/bin/bash - # available template variables (default value separator ":") - # ${CLEARML_QUEUE_NAME} - # ${CLEARML_QUEUE_ID} - # ${CLEARML_WORKER_ID}. - # complex template variables (default value separator ":") - # ${CLEARML_TASK.id} - # ${CLEARML_TASK.name} - # ${CLEARML_TASK.project.id} - # ${CLEARML_TASK.hyperparams.properties.user_key.value} - - - # example - #SBATCH --job-name=clearml_task_${CLEARML_TASK.id} # Job name DO NOT CHANGE - #SBATCH --ntasks=1 # Run on a single CPU - # #SBATCH --mem=1mb # Job memory request - # #SBATCH --time=00:05:00 # Time limit hrs:min:sec - #SBATCH --output=task-${CLEARML_TASK.id}-%j.log - #SBATCH --partition debug - #SBATCH --cpus-per-task=1 - #SBATCH --priority=5 - #SBATCH --nodes=${CLEARML_TASK.hyperparams.properties.num_nodes.value:1} - - - ${CLEARML_PRE_SETUP} - - echo whoami $(whoami) - - ${CLEARML_AGENT_EXECUTE} - - ${CLEARML_POST_SETUP} - ``` - - Notice: If you are using Slurm with Singularity container support replace `${CLEARML_AGENT_EXECUTE}` in the batch - template with `singularity exec ${CLEARML_AGENT_EXECUTE}`. For additional required settings, see [Slurm with Singularity](#slurm-with-singularity). - - :::tip - You can override the default values of a Slurm job template via the ClearML Web UI. The following command in the - template sets the `nodes` value to be the ClearML Task’s `num_nodes` user property: - ``` - #SBATCH --nodes=${CLEARML_TASK.hyperparams.properties.num_nodes.value:1} - ``` - This user property can be modified in the UI, in the task's **CONFIGURATION > User Properties** section, and when the - task is executed the new modified value will be used. - ::: - -3. Launch the ClearML Agent Slurm Glue and assign the Slurm configuration to a ClearML queue. For example, the following - associates the `default` queue to the `slurm.example.template` script, so any jobs pushed to this queue will use the - resources set by that script. - ``` - clearml-agent-slurm --template-files slurm.example.template --queue default - ``` - - You can also pass multiple templates and queues. For example: - ``` - clearml-agent-slurm --template-files slurm.template1 slurm.template2 --queue queue1 queue2 - ``` - -### Slurm with Singularity -If you are running Slurm with Singularity containers support, set the following: - -1. Make sure your `sbatch` template contains: - ``` - singularity exec ${CLEARML_AGENT_EXECUTE} - ``` - Additional singularity arguments can be added, for example: - ``` - singularity exec --uts ${CLEARML_AGENT_EXECUTE}` - ``` -1. Set the default Singularity container to use in your [clearml.conf](../configs/clearml_conf.md) file: - ``` - agent.default_docker.image="shub://repo/hello-world" - ``` - Or - ``` - agent.default_docker.image="docker://ubuntu" - ``` - -1. Add `--singularity-mode` to the command line, for example: - ``` - clearml-agent-slurm --singularity-mode --template-files slurm.example_singularity.template --queue default - ``` - -## Google Colab - -ClearML Agent can run on a [Google Colab](https://colab.research.google.com/) instance. This helps users to leverage -compute resources provided by Google Colab and send tasks for execution on it. - -Check out [this tutorial](../guides/ide/google_colab.md) on how to run a ClearML Agent on Google Colab! - -## Explicit Task Execution - -ClearML Agent can also execute specific tasks directly, without listening to a queue. - -### Execute a Task without Queue - -Execute a Task with a `clearml-agent` worker without a queue. -```bash -clearml-agent execute --id -``` -### Clone a Task and Execute the Cloned Task - -Clone the specified Task and execute the cloned Task with a `clearml-agent` worker without a queue. -```bash -clearml-agent execute --id --clone -``` - -### Execute Task inside a Docker - -Execute a Task with a `clearml-agent` worker using a Docker container without a queue. -```bash -clearml-agent execute --id --docker -``` - -## Debugging - -Run a `clearml-agent` daemon in foreground mode, sending all output to the console. -```bash -clearml-agent daemon --queue default --foreground -``` diff --git a/docs/clearml_agent/clearml_agent_deployment_bare_metal.md b/docs/clearml_agent/clearml_agent_deployment_bare_metal.md new file mode 100644 index 00000000..cfe55bab --- /dev/null +++ b/docs/clearml_agent/clearml_agent_deployment_bare_metal.md @@ -0,0 +1,136 @@ +--- +title: Manual Deployment +--- + +## Spinning Up an Agent +You can spin up an agent on any machine: on-prem and/or cloud instance. When spinning up an agent, you assign it to +service a queue(s). Utilize the machine by enqueuing tasks to the queue that the agent is servicing, and the agent will +pull and execute the tasks. + +:::tip cross-platform execution +ClearML Agent is platform-agnostic. When using the ClearML Agent to execute tasks cross-platform, set platform +specific environment variables before launching the agent. + +For example, to run an agent on an ARM device, set the core type environment variable before spinning up the agent: + +```bash +export OPENBLAS_CORETYPE=ARMV8 +clearml-agent daemon --queue +``` +::: + +### Executing an Agent +To execute an agent, listening to a queue, run: + +```bash +clearml-agent daemon --queue +``` + +### Executing in Background +To execute an agent in the background, run: +```bash +clearml-agent daemon --queue --detached +``` +### Stopping Agents +To stop an agent running in the background, run: +```bash +clearml-agent daemon --stop +``` + +### Allocating Resources +To specify GPUs associated with the agent, add the `--gpus` flag. + +:::info Docker Mode +Make sure to include the `--docker` flag, as GPU management through the agent is only supported in [Docker Mode](clearml_agent_execution_env.md#docker-mode). +::: + +To execute multiple agents on the same machine (usually assigning GPU for the different agents), run: +```bash +clearml-agent daemon --gpus 0 --queue default --docker +clearml-agent daemon --gpus 1 --queue default --docker +``` +To allocate more than one GPU, provide a list of allocated GPUs +```bash +clearml-agent daemon --gpus 0,1 --queue dual_gpu --docker +``` + +### Queue Prioritization +A single agent can listen to multiple queues. The priority is set by their order. + +```bash +clearml-agent daemon --queue high_q low_q +``` +This ensures the agent first tries to pull a Task from the `high_q` queue, and only if it is empty, the agent will try to pull +from the `low_q` queue. + +To make sure an agent pulls from all queues equally, add the `--order-fairness` flag. +```bash +clearml-agent daemon --queue group_a group_b --order-fairness +``` +It will make sure the agent will pull from the `group_a` queue, then from `group_b`, then back to `group_a`, etc. This ensures +that `group_a` or `group_b` will not be able to starve one another of resources. + +### SSH Access +By default, ClearML Agent maps the host's `~/.ssh` into the container's `/root/.ssh` directory (configurable, +see [clearml.conf](../configs/clearml_conf.md#docker_internal_mounts)). + +If you want to use existing auth sockets with ssh-agent, you can verify your host ssh-agent is working correctly with: + +```commandline +echo $SSH_AUTH_SOCK +``` + +You should see a path to a temporary file, something like this: + +```console +/tmp/ssh-/agent. +``` + +Then run your `clearml-agent` in Docker mode, which will automatically detect the `SSH_AUTH_SOCK` environment variable, +and mount the socket into any container it spins. + +You can also explicitly set the `SSH_AUTH_SOCK` environment variable when executing an agent. The command below will +execute an agent in Docker mode and assign it to service a queue. The agent will have access to +the SSH socket provided in the environment variable. + +``` +SSH_AUTH_SOCK= clearml-agent daemon --gpus --queue --docker +``` + +## Google Colab + +ClearML Agent can run on a [Google Colab](https://colab.research.google.com/) instance. This helps users to leverage +compute resources provided by Google Colab and send tasks for execution on it. + +Check out [this tutorial](../guides/ide/google_colab.md) on how to run a ClearML Agent on Google Colab! + +## Explicit Task Execution + +ClearML Agent can also execute specific tasks directly, without listening to a queue. + +### Execute a Task without Queue + +Execute a Task with a `clearml-agent` worker without a queue. +```bash +clearml-agent execute --id +``` +### Clone a Task and Execute the Cloned Task + +Clone the specified Task and execute the cloned Task with a `clearml-agent` worker without a queue. +```bash +clearml-agent execute --id --clone +``` + +### Execute Task inside a Docker + +Execute a Task with a `clearml-agent` worker using a Docker container without a queue. +```bash +clearml-agent execute --id --docker +``` + +## Debugging + +Run a `clearml-agent` daemon in foreground mode, sending all output to the console. +```bash +clearml-agent daemon --queue default --foreground +``` diff --git a/docs/clearml_agent/clearml_agent_deployment_k8s.md b/docs/clearml_agent/clearml_agent_deployment_k8s.md new file mode 100644 index 00000000..4fb9f6bf --- /dev/null +++ b/docs/clearml_agent/clearml_agent_deployment_k8s.md @@ -0,0 +1,51 @@ +--- +title: Kubernetes +--- + +Agents can be deployed bare-metal or as Docker containers in a Kubernetes cluster. ClearML Agent adds missing scheduling capabilities to Kubernetes, enabling more flexible automation from code while leveraging all of ClearML Agent's features. + +ClearML Agent is deployed onto a Kubernetes cluster using **Kubernetes-Glue**, which maps ClearML jobs directly to Kubernetes jobs. This allows seamless task execution and resource allocation across your cluster. + +## Deployment Options +You can deploy ClearML Agent onto Kubernetes using one of the following methods: + +1. **ClearML Agent Helm Chart (Recommended)**: + Use the [ClearML Agent Helm Chart](https://github.com/clearml/clearml-helm-charts/tree/main/charts/clearml-agent) to spin up an agent pod acting as a controller. This is the recommended and scalable approach. + +2. **K8s Glue Script**: + Run a [K8s Glue script](https://github.com/clearml/clearml-agent/blob/master/examples/k8s_glue_example.py) on a Kubernetes CPU node. This approach is less scalable and typically suited for simpler use cases. + +## How It Works +The ClearML Kubernetes-Glue performs the following: +- Pulls jobs from the ClearML execution queue. +- Prepares a Kubernetes job based on a provided YAML template. +- Inside each job pod, the `clearml-agent`: + - Installs the required environment for the task. + - Executes and monitors the task process. + +:::important Enterprise Features +ClearML Enterprise adds advanced Kubernetes features: +- **Multi-Queue Support**: Service multiple ClearML queues within the same Kubernetes cluster. +- **Pod-Specific Templates**: Define resource configurations per queue using pod templates. + +For example, you can configure resources for different queues as shown below: + +```yaml +agentk8sglue: + queues: + example_queue_1: + templateOverrides: + nodeSelector: + nvidia.com/gpu.product: A100-SXM4-40GB-MIG-1g.5gb + resources: + limits: + nvidia.com/gpu: 1 + example_queue_2: + templateOverrides: + nodeSelector: + nvidia.com/gpu.product: A100-SXM4-40GB + resources: + limits: + nvidia.com/gpu: 2 +``` +::: diff --git a/docs/clearml_agent/clearml_agent_deployment_slurm.md b/docs/clearml_agent/clearml_agent_deployment_slurm.md new file mode 100644 index 00000000..b236ae97 --- /dev/null +++ b/docs/clearml_agent/clearml_agent_deployment_slurm.md @@ -0,0 +1,107 @@ +--- +title: Slurm +--- + +:::important Enterprise Feature +Slurm Glue is available under the ClearML Enterprise plan. +::: + +Agents can be deployed bare-metal or inside [`Singularity`](https://docs.sylabs.io/guides/3.5/user-guide/introduction.html) +containers in Linux clusters managed with Slurm. + +ClearML Agent Slurm Glue maps jobs to Slurm batch scripts: associate a ClearML queue to a batch script template, then +when a Task is pushed into the queue, it will be converted and executed as an `sbatch` job according to the sbatch +template specification attached to the queue. + +1. Install the Slurm Glue on a machine where you can run `sbatch` / `squeue` etc. + + ``` + pip3 install -U --extra-index-url https://*****@*****.allegro.ai/repository/clearml_agent_slurm/simple clearml-agent-slurm + ``` + +1. Create a batch template. Make sure to set the `SBATCH` variables to the resources you want to attach to the queue. + The script below sets up an agent to run bare-metal, creating a virtual environment per job. For example: + + ``` + #!/bin/bash + # available template variables (default value separator ":") + # ${CLEARML_QUEUE_NAME} + # ${CLEARML_QUEUE_ID} + # ${CLEARML_WORKER_ID}. + # complex template variables (default value separator ":") + # ${CLEARML_TASK.id} + # ${CLEARML_TASK.name} + # ${CLEARML_TASK.project.id} + # ${CLEARML_TASK.hyperparams.properties.user_key.value} + + + # example + #SBATCH --job-name=clearml_task_${CLEARML_TASK.id} # Job name DO NOT CHANGE + #SBATCH --ntasks=1 # Run on a single CPU + # #SBATCH --mem=1mb # Job memory request + # #SBATCH --time=00:05:00 # Time limit hrs:min:sec + #SBATCH --output=task-${CLEARML_TASK.id}-%j.log + #SBATCH --partition debug + #SBATCH --cpus-per-task=1 + #SBATCH --priority=5 + #SBATCH --nodes=${CLEARML_TASK.hyperparams.properties.num_nodes.value:1} + + + ${CLEARML_PRE_SETUP} + + echo whoami $(whoami) + + ${CLEARML_AGENT_EXECUTE} + + ${CLEARML_POST_SETUP} + ``` + + Notice: If you are using Slurm with Singularity container support replace `${CLEARML_AGENT_EXECUTE}` in the batch + template with `singularity exec ${CLEARML_AGENT_EXECUTE}`. For additional required settings, see [Slurm with Singularity](#slurm-with-singularity). + + :::tip + You can override the default values of a Slurm job template via the ClearML Web UI. The following command in the + template sets the `nodes` value to be the ClearML Task’s `num_nodes` user property: + ``` + #SBATCH --nodes=${CLEARML_TASK.hyperparams.properties.num_nodes.value:1} + ``` + This user property can be modified in the UI, in the task's **CONFIGURATION > User Properties** section, and when the + task is executed the new modified value will be used. + ::: + +3. Launch the ClearML Agent Slurm Glue and assign the Slurm configuration to a ClearML queue. For example, the following + associates the `default` queue to the `slurm.example.template` script, so any jobs pushed to this queue will use the + resources set by that script. + ``` + clearml-agent-slurm --template-files slurm.example.template --queue default + ``` + + You can also pass multiple templates and queues. For example: + ``` + clearml-agent-slurm --template-files slurm.template1 slurm.template2 --queue queue1 queue2 + ``` + +## Slurm with Singularity +If you are running Slurm with Singularity containers support, set the following: + +1. Make sure your `sbatch` template contains: + ``` + singularity exec ${CLEARML_AGENT_EXECUTE} + ``` + Additional singularity arguments can be added, for example: + ``` + singularity exec --uts ${CLEARML_AGENT_EXECUTE}` + ``` +1. Set the default Singularity container to use in your [clearml.conf](../configs/clearml_conf.md) file: + ``` + agent.default_docker.image="shub://repo/hello-world" + ``` + Or + ``` + agent.default_docker.image="docker://ubuntu" + ``` + +1. Add `--singularity-mode` to the command line, for example: + ``` + clearml-agent-slurm --singularity-mode --template-files slurm.example_singularity.template --queue default + ``` diff --git a/docs/clearml_agent/clearml_agent_dynamic_gpus.md b/docs/clearml_agent/clearml_agent_dynamic_gpus.md index 2b72d0fc..8481cf92 100644 --- a/docs/clearml_agent/clearml_agent_dynamic_gpus.md +++ b/docs/clearml_agent/clearml_agent_dynamic_gpus.md @@ -21,7 +21,7 @@ clearml-agent daemon --dynamic-gpus --gpus 0-2 --queue dual_gpus=2 single_gpu=1 Make sure to include the `--docker` flag, as dynamic GPU allocation is only supported in [Docker Mode](clearml_agent_execution_env.md#docker-mode). ::: -## Example +#### Example Let's say a server has three queues: * `dual_gpu` diff --git a/docs/clearml_agent/clearml_agent_setup.md b/docs/clearml_agent/clearml_agent_setup.md index a64eb93a..2e744de0 100644 --- a/docs/clearml_agent/clearml_agent_setup.md +++ b/docs/clearml_agent/clearml_agent_setup.md @@ -9,7 +9,7 @@ If ClearML was previously configured, follow [this](#adding-clearml-agent-to-a-c ClearML Agent specific configurations ::: -To install ClearML Agent, execute +To install [ClearML Agent](../clearml_agent.md), execute ```bash pip install clearml-agent ``` @@ -27,7 +27,7 @@ it can't do that when running from a virtual environment. clearml-agent init ``` - The setup wizard prompts for ClearML credentials (see [here](../webapp/settings/webapp_settings_profile.md#clearml-credentials) about obtaining credentials). + The setup wizard prompts for ClearML credentials (see [here](../webapp/settings/webapp_settings_profile.md#clearml-api-credentials) about obtaining credentials). ``` Please create new clearml credentials through the settings page in your `clearml-server` web app, or create a free account at https://app.clear.ml/settings/webapp-configuration diff --git a/docs/clearml_data/clearml_data.md b/docs/clearml_data/clearml_data.md index de127ce7..ca3bc274 100644 --- a/docs/clearml_data/clearml_data.md +++ b/docs/clearml_data/clearml_data.md @@ -37,7 +37,7 @@ lineage and content information. See [dataset UI](../webapp/datasets/webapp_data ## Setup -`clearml-data` comes built-in with the `clearml` Python package! Check out the [Getting Started](../getting_started/ds/ds_first_steps.md) +`clearml-data` comes built-in with the `clearml` Python package! Check out the [ClearML Setup](../clearml_sdk/clearml_sdk_setup) guide for more info! ## Using ClearML Data @@ -46,7 +46,7 @@ ClearML Data supports two interfaces: - `clearml-data` - A CLI utility for creating, uploading, and managing datasets. See [CLI](clearml_data_cli.md) for a reference of `clearml-data` commands. - `clearml.Dataset` - A Python interface for creating, retrieving, managing, and using datasets. See [SDK](clearml_data_sdk.md) for an overview of the basic methods of the `Dataset` module. -For an overview of recommendations for ClearML Data workflows and practices, see [Best Practices](best_practices.md). +For an overview of recommendations for ClearML Data workflows and practices, see [Best Practices](../best_practices/data_best_practices.md). ## Dataset Version States The following table displays the possible states for a dataset version. diff --git a/docs/clearml_sdk/clearml_sdk.md b/docs/clearml_sdk/clearml_sdk.md index d5802c82..bf0e93c6 100644 --- a/docs/clearml_sdk/clearml_sdk.md +++ b/docs/clearml_sdk/clearml_sdk.md @@ -7,7 +7,7 @@ tasks for you, and an extensive set of powerful features and functionality you c and other workflows. :::tip Installation -For installation instructions, see [Getting Started](../getting_started/ds/ds_first_steps.md#install-clearml). +For installation instructions, see [ClearML Setup](../clearml_sdk/clearml_sdk_setup#install-clearml). ::: The ClearML Python Package collects the scripts' entire execution information, including: diff --git a/docs/getting_started/ds/ds_first_steps.md b/docs/clearml_sdk/clearml_sdk_setup.md similarity index 56% rename from docs/getting_started/ds/ds_first_steps.md rename to docs/clearml_sdk/clearml_sdk_setup.md index ccbbad4c..6f8a82c7 100644 --- a/docs/getting_started/ds/ds_first_steps.md +++ b/docs/clearml_sdk/clearml_sdk_setup.md @@ -1,7 +1,9 @@ --- -title: First Steps +title: ClearML Python Package --- +This is step-by-step guide for installing the `clearml` Python package and connecting it to the ClearML Server. Once done, +you can integrate `clearml` into your code. ## Install ClearML @@ -44,8 +46,8 @@ pip install clearml CLEARML_CONFIG_FILE = MyOtherClearML.conf ``` - For more information about running experiments inside Docker containers, see [ClearML Agent Deployment](../../clearml_agent/clearml_agent_deployment.md) - and [ClearML Agent Reference](../../clearml_agent/clearml_agent_ref.md). + For more information about running tasks inside Docker containers, see [ClearML Agent Deployment](../clearml_agent/clearml_agent_deployment_bare_metal.md) + and [ClearML Agent Reference](../clearml_agent/clearml_agent_ref.md). @@ -83,7 +85,7 @@ pip install clearml CLEARML setup completed successfully. ``` -Now you can integrate ClearML into your code! Continue [here](#auto-log-experiment). +Now you can integrate ClearML into your code! Continue [here](../getting_started/auto_log_exp.md). ### Jupyter Notebook To use ClearML with Jupyter Notebook, you need to configure ClearML Server access credentials for your notebook. @@ -94,49 +96,3 @@ To use ClearML with Jupyter Notebook, you need to configure ClearML Server acces 1. Add these commands to your notebook Now you can use ClearML in your notebook! - -## Auto-log Experiment - -In ClearML, experiments are organized as [Tasks](../../fundamentals/task.md). - -ClearML automatically logs your task and code, including outputs and parameters from popular ML frameworks, -once you integrate the ClearML [SDK](../../clearml_sdk/clearml_sdk.md) with your code. To control what ClearML automatically logs, see this [FAQ](../../faq.md#controlling_logging). - -At the beginning of your code, import the `clearml` package: - -```python -from clearml import Task -``` - -:::tip Full Automatic Logging -To ensure full automatic logging, it is recommended to import the `clearml` package at the top of your entry script. -::: - -Then initialize the Task object in your `main()` function, or the beginning of the script. - -```python -task = Task.init(project_name='great project', task_name='best task') -``` - -If the project does not already exist, a new one is created automatically. - -The console should display the following output: - -``` -ClearML Task: created new task id=1ca59ef1f86d44bd81cb517d529d9e5a -2021-07-25 13:59:09 -ClearML results page: https://app.clear.ml/projects/4043a1657f374e9298649c6ba72ad233/experiments/1ca59ef1f86d44bd81cb517d529d9e5a/output/log -2021-07-25 13:59:16 -``` - -**That's it!** You are done integrating ClearML with your code :) - -Now, [command-line arguments](../../fundamentals/hyperparameters.md#tracking-hyperparameters), [console output](../../fundamentals/logger.md#types-of-logged-results) as well as Tensorboard and Matplotlib will automatically be logged in the UI under the created Task. - -Sit back, relax, and watch your models converge :) or continue to see what else can be done with ClearML [here](ds_second_steps.md). - -## YouTube Playlist - -Or watch the **Getting Started** playlist on ClearML's YouTube Channel! - -[![Watch the video](https://img.youtube.com/vi/bjWwZAzDxTY/hqdefault.jpg)](https://www.youtube.com/watch?v=bjWwZAzDxTY&list=PLMdIlCuMqSTnoC45ME5_JnsJX0zWqDdlO&index=2) diff --git a/docs/fundamentals/hpo.md b/docs/clearml_sdk/hpo_sdk.md similarity index 80% rename from docs/fundamentals/hpo.md rename to docs/clearml_sdk/hpo_sdk.md index cd559d92..ad39a9f6 100644 --- a/docs/fundamentals/hpo.md +++ b/docs/clearml_sdk/hpo_sdk.md @@ -2,16 +2,8 @@ title: Hyperparameter Optimization --- -## What is Hyperparameter Optimization? -Hyperparameters are variables that directly control the behaviors of training algorithms, and have a significant effect on -the performance of the resulting machine learning models. Finding the hyperparameter values that yield the best -performing models can be complicated. Manually adjusting hyperparameters over the course of many training trials can be -slow and tedious. Luckily, you can automate and boost hyperparameter optimization (HPO) with ClearML's -[**`HyperParameterOptimizer`**](../references/sdk/hpo_optimization_hyperparameteroptimizer.md) class. - -## ClearML's Hyperparameter Optimization - -ClearML provides the `HyperParameterOptimizer` class, which takes care of the entire optimization process for users +You can automate and boost hyperparameter optimization (HPO) with ClearML's +[**`HyperParameterOptimizer`**](../references/sdk/hpo_optimization_hyperparameteroptimizer.md) class, which takes care of the entire optimization process with a simple interface. ClearML's approach to hyperparameter optimization is scalable, easy to set up and to manage, and it makes it easy to @@ -57,11 +49,11 @@ optimization. documentation. * **BOHB** - [`automation.hpbandster.OptimizerBOHB`](../references/sdk/hpo_hpbandster_bandster_optimizerbohb.md). BOHB performs robust and efficient hyperparameter optimization at scale by combining the speed of Hyperband searches with the guidance and guarantees of convergence of Bayesian Optimization. - For more information about HpBandSter BOHB, see the [HpBandSter](https://automl.github.io/HpBandSter/build/html/index.html) + For more information about HpBandSter BOHB, see the [HpBandSter](../https://automl.github.io/HpBandSter/build/html/index.html) documentation and a [code example](../guides/frameworks/pytorch/notebooks/image/hyperparameter_search.md). * **Random** uniform sampling of hyperparameters - [`automation.RandomSearch`](../references/sdk/hpo_optimization_randomsearch.md). * **Full grid** sampling strategy of every hyperparameter combination - [`automation.GridSearch`](../references/sdk/hpo_optimization_gridsearch.md). -* **Custom** - [`automation.optimization.SearchStrategy`](https://github.com/clearml/clearml/blob/master/clearml/automation/optimization.py#L268) - Use a custom class and inherit from the ClearML automation base strategy class. +* **Custom** - [`automation.optimization.SearchStrategy`](../https://github.com/clearml/clearml/blob/master/clearml/automation/optimization.py#L268) - Use a custom class and inherit from the ClearML automation base strategy class. ## Defining a Hyperparameter Optimization Search Example @@ -137,9 +129,9 @@ optimization. ## Optimizer Execution Options -The `HyperParameterOptimizer` provides options to launch the optimization tasks locally or through a ClearML [queue](agents_and_queues.md#what-is-a-queue). +The `HyperParameterOptimizer` provides options to launch the optimization tasks locally or through a ClearML [queue](../fundamentals/agents_and_queues.md#what-is-a-queue). Start a `HyperParameterOptimizer` instance using either [`HyperParameterOptimizer.start()`](../references/sdk/hpo_optimization_hyperparameteroptimizer.md#start) -or [`HyperParameterOptimizer.start_locally()`](../references/sdk/hpo_optimization_hyperparameteroptimizer.md#start_locally). +or [`HyperParameterOptimizer.start_locally()`](references/sdk/hpo_optimization_hyperparameteroptimizer.md#start_locally). Both methods run the optimizer controller locally. `start()` launches the base task clones through a queue specified when instantiating the controller, while `start_locally()` runs the tasks locally. @@ -156,17 +148,3 @@ Check out the [Hyperparameter Optimization tutorial](../guides/optimization/hype ## SDK Reference For detailed information, see the complete [HyperParameterOptimizer SDK reference page](../references/sdk/hpo_optimization_hyperparameteroptimizer.md). - -## CLI - -ClearML also provides `clearml-param-search`, a CLI utility for managing the hyperparameter optimization process. See -[ClearML Param Search](../apps/clearml_param_search.md) for more information. - -## UI Application - -:::info Pro Plan Offering -The ClearML HPO App is available under the ClearML Pro plan. -::: - -ClearML provides the [Hyperparameter Optimization GUI application](../webapp/applications/apps_hpo.md) for launching and -managing the hyperparameter optimization process. diff --git a/docs/clearml_serving/clearml_serving_setup.md b/docs/clearml_serving/clearml_serving_setup.md index 1d7da9d4..05219921 100644 --- a/docs/clearml_serving/clearml_serving_setup.md +++ b/docs/clearml_serving/clearml_serving_setup.md @@ -13,7 +13,7 @@ The following page goes over how to set up and upgrade `clearml-serving`. ## Initial Setup 1. Set up your [ClearML Server](../deploying_clearml/clearml_server.md) or use the [free hosted service](https://app.clear.ml) -1. Connect `clearml` SDK to the server, see instructions [here](../getting_started/ds/ds_first_steps.md#install-clearml) +1. Connect `clearml` SDK to the server, see instructions [here](../clearml_sdk/clearml_sdk_setup#install-clearml) 1. Install clearml-serving CLI: diff --git a/docs/cloud_autoscaling/autoscaling_overview.md b/docs/cloud_autoscaling/autoscaling_overview.md index 381b63be..1a7f0908 100644 --- a/docs/cloud_autoscaling/autoscaling_overview.md +++ b/docs/cloud_autoscaling/autoscaling_overview.md @@ -71,7 +71,7 @@ execute the tasks in the GPU queue. #### Docker Every task a cloud instance pulls will be run inside a docker container. When setting up an autoscaler app instance, you can specify a default container to run the tasks inside. If the task has its own container configured, it will -override the autoscaler’s default docker image (see [Base Docker Image](../clearml_agent/clearml_agent_docker.md#base-docker-container)). +override the autoscaler’s default docker image (see [Base Container](../getting_started/clearml_agent_base_docker.md#base-container)). #### Git Configuration If your code is saved in a private repository, you can add your Git credentials so the ClearML Agents running on your diff --git a/docs/custom_apps.md b/docs/custom_apps.md new file mode 100644 index 00000000..518ef916 --- /dev/null +++ b/docs/custom_apps.md @@ -0,0 +1,24 @@ +--- +title: Custom Applications +--- + +:::info Enterprise Feature +The custom applications are available under the ClearML Enterprise plan. +::: + +ClearML supports creating your own GUI applications for deploying GenAI apps into your Enterprise environment. +Instantly spin up apps with customized dashboards for internal customers, enabling seamless model testing, interactive +demos, automated workflow and more. + +## Why Use Custom Applications? + +Custom Applications provide: + +* Instant Deployment: Launch interactive applications directly within your Enterprise environment +* Tailored UI: Customize forms and dashboards for monitoring processes +* Automated Execution: Run AI workflows with structured inputs and repeatable processes +* Accessible: Enable non-technical users to interact with models through GUI interfaces +* Seamless Integration: Connect with ClearML's ecosystem for task tracking and visualization + +See [Custom Application Setup](deploying_clearml/enterprise_deploy/app_custom.md) for instructions on creating and +deploying custom ClearML applications. \ No newline at end of file diff --git a/docs/deploying_clearml/clearml_server.md b/docs/deploying_clearml/clearml_server.md index 4eb44d77..dc137d76 100644 --- a/docs/deploying_clearml/clearml_server.md +++ b/docs/deploying_clearml/clearml_server.md @@ -4,14 +4,14 @@ title: ClearML Server ## What is ClearML Server? The ClearML Server is the backend service infrastructure for ClearML. It allows multiple users to collaborate and -manage their experiments by working seamlessly with the ClearML Python package and [ClearML Agent](../clearml_agent.md). +manage their tasks by working seamlessly with the ClearML Python package and [ClearML Agent](../clearml_agent.md). ClearML Server is composed of the following: -* Web server including the [ClearML Web UI](../webapp/webapp_overview.md), which is the user interface for tracking, comparing, and managing experiments. +* Web server including the [ClearML Web UI](../webapp/webapp_overview.md), which is the user interface for tracking, comparing, and managing tasks. * API server which is a RESTful API for: - * Documenting and logging experiments, including information, statistics, and results. - * Querying experiments history, logs, and results. + * Documenting and logging tasks, including information, statistics, and results. + * Querying task history, logs, and results. * File server which stores media and models making them easily accessible using the ClearML Web UI. @@ -23,9 +23,9 @@ The ClearML Web UI is the ClearML user interface and is part of ClearML Server. Use the ClearML Web UI to: -* Track experiments -* Compare experiments -* Manage experiments +* Track tasks +* Compare tasks +* Manage tasks For detailed information about the ClearML Web UI, see [User Interface](../webapp/webapp_overview.md). @@ -49,7 +49,7 @@ authentication, subdomains, and load balancers, and use any of its many configur 1. Optionally, [configure ClearML Server](clearml_server_config.md) for additional features, including subdomains and load balancers, web login authentication, and the non-responsive task watchdog. -1. [Connect the ClearML SDK to the ClearML Server](../getting_started/ds/ds_first_steps.md) +1. [Connect the ClearML SDK to the ClearML Server](../clearml_sdk/clearml_sdk_setup) ## Updating diff --git a/docs/deploying_clearml/clearml_server_aws_ec2_ami.md b/docs/deploying_clearml/clearml_server_aws_ec2_ami.md index d44096e9..e3c3a8e0 100644 --- a/docs/deploying_clearml/clearml_server_aws_ec2_ami.md +++ b/docs/deploying_clearml/clearml_server_aws_ec2_ami.md @@ -150,4 +150,4 @@ The following section contains a list of AMI Image IDs per-region for the latest ## Next Step To keep track of your experiments and/or data, the `clearml` package needs to communicate with your server. -For instruction to connect the ClearML SDK to the server, see [Getting Started: First Steps](../getting_started/ds/ds_first_steps.md). +For instruction to connect the ClearML SDK to the server, see [ClearML Setup](../clearml_sdk/clearml_sdk_setup). diff --git a/docs/deploying_clearml/clearml_server_config.md b/docs/deploying_clearml/clearml_server_config.md index 0c69cb66..aa4026e1 100644 --- a/docs/deploying_clearml/clearml_server_config.md +++ b/docs/deploying_clearml/clearml_server_config.md @@ -12,7 +12,7 @@ This page describes the ClearML Server [deployment](#clearml-server-deployment-c * [Opening Elasticsearch, MongoDB, and Redis for External Access](#opening-elasticsearch-mongodb-and-redis-for-external-access) * [Web login authentication](#web-login-authentication) - Create and manage users and passwords * [Using hashed passwords](#using-hashed-passwords) - Option to use hashed passwords instead of plain-text passwords -* [Non-responsive Task watchdog](#non-responsive-task-watchdog) - For inactive experiments +* [Non-responsive Task watchdog](#non-responsive-task-watchdog) - For inactive tasks * [Custom UI context menu actions](#custom-ui-context-menu-actions) For all configuration options, see the [ClearML Configuration Reference](../configs/clearml_conf.md) page. @@ -361,7 +361,7 @@ You can also use hashed passwords instead of plain-text passwords. To do that: ### Non-responsive Task Watchdog -The non-responsive experiment watchdog monitors experiments that were not updated for a specified time interval, and then +The non-responsive task watchdog monitors tasks that were not updated for a specified time interval, and then the watchdog marks them as `aborted`. The non-responsive experiment watchdog is always active. Modify the following settings for the watchdog: @@ -391,7 +391,7 @@ Modify the following settings for the watchdog: ``` :::tip - If the `apiserver.conf` file does not exist, create your own in ClearML Server's `/opt/clearml/config` directory (or + If the `services.conf` file does not exist, create your own in ClearML Server's `/opt/clearml/config` directory (or an alternate folder you configured), and input the modified configuration ::: @@ -464,8 +464,8 @@ an alternate folder you configured), and input the modified configuration ::: The action will appear in the context menu for the object type in which it was specified: -* Task, model, dataview - Right-click an object in the [experiments](../webapp/webapp_exp_table.md), [models](../webapp/webapp_model_table.md), - and [dataviews](../hyperdatasets/webapp/webapp_dataviews.md) tables respectively. Alternatively, click the object to +* Task, model, dataview - Right-click an object in the [task](../webapp/webapp_exp_table.md), [model](../webapp/webapp_model_table.md), + and [dataview](../hyperdatasets/webapp/webapp_dataviews.md) tables respectively. Alternatively, click the object to open its info tab, then click the menu button to access the context menu. * Project - In the project page > click the menu button diff --git a/docs/deploying_clearml/clearml_server_gcp.md b/docs/deploying_clearml/clearml_server_gcp.md index 7b37f5e6..db5996a8 100644 --- a/docs/deploying_clearml/clearml_server_gcp.md +++ b/docs/deploying_clearml/clearml_server_gcp.md @@ -7,7 +7,7 @@ provides custom images for each released version of ClearML Server. For a list o [ClearML Server GCP Custom Image](#clearml-server-gcp-custom-image). To keep track of your experiments and/or data, the `clearml` package needs to communicate with the server you have deployed. -For instruction to connect the ClearML SDK to the server, see [Getting Started: First Steps](../getting_started/ds/ds_first_steps.md). +For instruction to connect the ClearML SDK to the server, see [ClearML Setup](../clearml_sdk/clearml_sdk_setup). :::info In order for `clearml` to work with a ClearML Server on GCP, set `CLEARML_API_DEFAULT_REQ_METHOD=PUT` or @@ -155,4 +155,4 @@ The following section contains a list of Custom Image URLs (exported in differen ## Next Step To keep track of your experiments and/or data, the `clearml` package needs to communicate with your server. -For instruction to connect the ClearML SDK to the server, see [Getting Started: First Steps](../getting_started/ds/ds_first_steps.md). +For instruction to connect the ClearML SDK to the server, see [ClearML Setup](../clearml_sdk/clearml_sdk_setup). diff --git a/docs/deploying_clearml/clearml_server_kubernetes_helm.md b/docs/deploying_clearml/clearml_server_kubernetes_helm.md index 5325b336..3023e738 100644 --- a/docs/deploying_clearml/clearml_server_kubernetes_helm.md +++ b/docs/deploying_clearml/clearml_server_kubernetes_helm.md @@ -32,4 +32,4 @@ instructions in the [Security](clearml_server_security.md) page. ## Next Step To keep track of your experiments and/or data, the `clearml` package needs to communicate with your server. -For instruction to connect the ClearML SDK to the server, see [Getting Started: First Steps](../getting_started/ds/ds_first_steps.md). \ No newline at end of file +For instruction to connect the ClearML SDK to the server, see [ClearML Setup](../clearml_sdk/clearml_sdk_setup). \ No newline at end of file diff --git a/docs/deploying_clearml/clearml_server_linux_mac.md b/docs/deploying_clearml/clearml_server_linux_mac.md index a22eacc4..9509c748 100644 --- a/docs/deploying_clearml/clearml_server_linux_mac.md +++ b/docs/deploying_clearml/clearml_server_linux_mac.md @@ -227,4 +227,4 @@ If needed, restore data and configuration by doing the following: ## Next Step To keep track of your experiments and/or data, the `clearml` package needs to communicate with your server. -For instruction to connect the ClearML SDK to the server, see [Getting Started: First Steps](../getting_started/ds/ds_first_steps.md). \ No newline at end of file +For instruction to connect the ClearML SDK to the server, see [ClearML Setup](../clearml_sdk/clearml_sdk_setup). \ No newline at end of file diff --git a/docs/deploying_clearml/clearml_server_win.md b/docs/deploying_clearml/clearml_server_win.md index ef4e849f..f3a54e20 100644 --- a/docs/deploying_clearml/clearml_server_win.md +++ b/docs/deploying_clearml/clearml_server_win.md @@ -89,4 +89,4 @@ After deploying ClearML Server, the services expose the following node ports: ## Next Step To keep track of your experiments and/or data, the `clearml` package needs to communicate with your server. -For instruction to connect the ClearML SDK to the server, see [Getting Started: First Steps](../getting_started/ds/ds_first_steps.md). \ No newline at end of file +For instruction to connect the ClearML SDK to the server, see [ClearML Setup](../clearml_sdk/clearml_sdk_setup). \ No newline at end of file diff --git a/docs/deploying_clearml/enterprise_deploy/app_custom.md b/docs/deploying_clearml/enterprise_deploy/app_custom.md new file mode 100644 index 00000000..84a37223 --- /dev/null +++ b/docs/deploying_clearml/enterprise_deploy/app_custom.md @@ -0,0 +1,479 @@ +--- +title: Custom Applications +--- + +The following is a guide for creating and installing custom ClearML applications on ClearML on-premises Enterprise servers. +ClearML applications are Python programs that are run as ClearML tasks whose UI--input form and output dashboard--is +defined in an attached configuration file. + +This guide will follow the `simple-app` application as an example. The application can be found on [GitHub](https://github.com/clearml/clearml-apps/tree/main/demo_apps/simple-app). + +An application will generally consist of the following: +* Configuration file: File that describes the content of the application, such as: + * The task to run and from where to run it + * The structure of the input form for launching an application instance + * The information to display in the application instances dashboard. +* Assets: Optional images and artifacts for the application, such as icons and HTML placeholders. +* Task: Python code that is run when the application is launched. Should be in a Git repository. + +## Configuration File +The configuration file describes the application. The file is a hocon file, typically named: `.app.conf`. It +contains the following sections: +* General: The root section, describing the application’s general information such as name, ID, version, icon, and queue +* Task: Information about the task to execute, such as repository info and hyperparameters +* Wizard: Fields for the application instance launch form, and where to store the input provided by the user +* Dashboard: Information section displayed for the running application instances + +### General +The `General` section is the root-level section of the configuration file, and contains the configuration options: +* `id` - A unique id for the application +* `name` - The name to display in the web application +* `version` - The version of the application implementation. Recommended to have three numbers and to bump up when updating applications, so that older running instances can still be displayed +* `provider` - The person/team/group who is the owner of the application. This will appears in the UI +* `description` - Short description of the application to be displayed in the ClearML Web UI +* `icon` (*Optional*) - Small image to display in the ClearML web UI as an icon for the application. Can be a public web url or an image in the application’s assets directory (described below) +* `no_info_html` (*Optional*) - HTML content to display as a placeholder for the dashboard when no instance is available. Can be a public web url or a file in the application’s assets directory (described below) +* `default-queue` - The queue to which application instance will be sent when launching a new instance. This queue should have an appropriate agent servicing it. See details in the Custom Apps Agent section below. +* `badges` (*Optional*) - List of strings to display as a bacge/label in the UI +* `resumable` - Boolean indication whether a running application instance can be restarted if required. Default is false. +* `category` (*Optional*) - Way to separate apps into different tabs in the ClearML web UI +* `featured` (*Optional*) - Value affecting the order of applications. Lower values are displayed first. Defaults to 500 + +#### Example +The root section in the simple application example: +``` +id: "simple-app" +version: "1.0.0" +name: "Simple example application" +provider: "ClearML" +description: "A simple example of an application" +icon: "${ASSET:app-simple-app@2x.png}" +badges: [] +details_page: "task" +no_info_html: "${ASSET:index.html}" +default_queue: "custom_apps_queue" +``` + +### Task +The `task` section describes the task to run, containing the following fields: +* `script` - Contains information about what task code to run: + * `repository` - The git repository. Note that credentials must be described in the Custom Apps Agent’s configuration. See details below. + * `branch` - The branch to use + * `entry_point` - The python file to run + * `working_dir` - The directory to run it from +* `hyperparams` (*Optional*) - A list of the task’s hyper-parameters used by the application, with their default values. There is no need to specify all the parameters here, but it enables summarizing of the parameters that will be targeted by the wizard entries described below, and allows to specify default values to optional parameters appearing in the wizard. + +#### Example +The `task` section in the simple application example: +``` +task { + script { + repository: "https://bitbucket.org/seematics/clearml_apps.git" + entry_point: "main.py" + working_dir: "demo_apps/simple-app" + branch: "master" + } + hyperparams { + General { + a_number: 30.0 + a_string: "testing 1, 2, 3" + a_boolean: False + a_project_id: "" + }, + } +} +``` + +### Wizard +The `wizard` section defines the entries to display in the application instance’s UI launch form. Each entry may contain the following fields: +* `name` - Field name +* `title` - Title to display in the wizard above the field +* `info` - Optional information hint to the user +* `type` - Can be one of the following: + * Basic types: + * `string` + * `integer` + * `float` + * `dropdown` + * `checkbox` + * `multiline_text` + * Complex types: + * `group` - Fields grouped together in a joint section. Fields of the group are defined within a list called + `item_template` + * `list` - A field or group of fields that can be inserted more than once. Target should be specified for the entire + list. Fields of the list are defined within a list called `item_template` +* `required` - Boolean indication whether the user must fill the field. Default is `false` +* `default` - Default value for the field +* `placeholder` - Text to show in the field before typing +* `collapsible` - Boolean indicates if the group can be collapsed. Default is `false` +* `collapsibleTitleTemplate` - Optional title for collapsible fields. You can use `${field name}` to reference a field. + Useful for lists. +* `conditional` - Allows setting a condition for the displaying of a field. Specify a list of entries, each containing + the name of a field that appears earlier and its expected value. The field will be displayed only if the referenced + previous fields were filled with the matching value. See example below. +* `default_conditional_on` - allows setting a field whose default value depends on the value of a previous field in the wizard. + Need to specify the `name` of the previous field and a `value` dictionary, in which each key is a potential value of the previous field and each value is the default value for the default_conditional_field. +* `choices` - for dropdown - Can be either an array of hard-coded options, for example: `["Option 1","Option 2"]`, or a ClearML object, such as task, project, queue to choose from. The following should be specified: + * `source` - The source object. One of following: + * `project` + * `task` + * `model` + * `queue` + * `dataset_version` + * `display_field` - The field of the source object to display in the list. Usually “name” + * `value_field` - The field of the source object to use for configuring the app instance. Usually “id” + * `filter` - Allows to limit the choices list by setting a filter on one or more of the object’s fields. See Project Selection example below +* `target` - Where in the application instance’s task the values will be set. Contains the following: + * `field` - Either `configuration` or `hyperparams` + * `section` - For hyperparams - the section within the field + * `name` - Key in which to store the value + * `format` - The format of the value to store. `str` By default. Use `json` for lists. +* `item_template` - list of items for `group` or for `list` fields. + +#### Example +The example is based on the `simple-app` application `wizard` section: + +* Wizard Section: + + ``` + wizard { + entries: [ + … + ] + } + ``` + +* Boolean Field: A simple boolean field stored in the `General` hyperparameters section: + + ``` + { + name: boolean_field + title: A boolean choice + default: false + type: checkbox + required: false + target { + field: hyperparams + section: General + name: a_bool + } + } + ``` + + This will look like this: + + ![Bool choice](../../img/app_bool_choice.png#light-mode-only) + ![Bool choice](../../img/app_bool_choice_dark.png#dark-mode-only) + +* Conditional String Field: A string field presented only if the boolean field was checked: + + ``` + { + name: string_field + title: A String + info: "Select a sting to be passed to the application" + type: string + placeholder: "a string..." + conditional: { + entries: [ + { + name: boolean_field + value: True + } + ] + } + target { + field: hyperparams + section: General + name: a_string + } + } + ``` + + This will look like this: + + ![Conditional string field](../../img/app_cond_str.png#light-mode-only) + ![Conditional string field](../../img/app_cond_str_dark.png#dark-mode-only) + +* Project Selection: Choices field for a projects selection, containing all projects whose names does not begin with `example`: + + ``` + { + name: a_project_field + title: Choose a Project + info: "The app will count the tasks in this project" + type: dropdown + required: true + autocomplete: true + choices { + source: project + value_field: id + display_field: name + filter { + fields { + name: "^(?i)(?!example).*$" + } + } + } + target { + field: hyperparams + section: General + name: a_project_id + } + } + ``` + + This will look like this: + + ![Project selection](../../img/app_proj_selection.png#light-mode-only) + ![Project selection](../../img/app_proj_selection_dark.png#dark-mode-only) + +* Group: Group with single field option: + + ``` + { + type: group + name: more_options_group + title: More options + collapsible: true + item_template: [ + { + name: a_text_field + title: Some Text + info: "Contains some text" + type: multiline_text + required: false + target: { + field: configuration + name: text_blob + } + } + ] + } + ``` + + This will look like this: + + ![Group with single field](../../img/app_group.png#light-mode-only) + ![Group with single field](../../img/app_group_dark.png#dark-mode-only) + + +### Dashboard +The Dashboard section of the configuration file describes the fields that will appear in the instance's dashboard display. +The dashboard elements are organized into lines. + +The section contains the following information: +* `lines` - The array of line elements, each containing: + * `style` - CSS definitions for the line e.g setting the line height + * `contents` - An array of dashboard elements to display in a given line. Each element may have several fields: + * `title` - Text to display at the top of the field + * `type` - one of the following: + * scalar-histogram + * plot + * debug-images + * log + * scalar + * hyperparameter + * configuration + * html + * `text` - For HTML. You can refer to task elements such as hyper-parameters by using `${hyperparams.
..value}` + * `metric` - For plot, scalar-histogram, debug-images, scalar - Name of the metric + * `variant` - For plot, scalar-histogram, debug-images, scalar - List of variants to display + * `key` - For histograms, one of the following: `iter`, `timestamp` or, `iso_time` + * `hide_legend` - Whether to hide the legend + +#### Example +The example is based on the `simple-app` application `Dashboard` section: +* Dashboard Section + + ``` + dashboard { + lines: [ + … + ] + } + ``` + +* Html Elements: Header with two HTML elements based on the user's input: + + ``` + { + style { + height: "initial" + } + contents: [ + { + title: "HTML box with the string selected by the user" + type: html + text: "

The string is ${hyperparams.General.a_string.value}

" + }, + { + title: "HTML box with the count of tasks" + type: html + text: "

Project ${hyperparams.General.project_name.value} contains ${hyperparams.General.tasks_count.value} tasks

" + } + ] + } + ``` + + This will look like this: + + ![HTML elements](../../img/app_html_elements.png#light-mode-only) + ![HTML elements](../../img/app_html_elements_dark.png#dark-mode-only) + +* Plot + + ``` + { + contents: [ + { + title: "A random plot" + type: plot + metric: "Plots" + variant: "plot" + } + ] + } + ``` + + This will look like this: + + ![Plot](../../img/app_plot.png#light-mode-only) + ![Plot](../../img/app_plot_dark.png#dark-mode-only) + +* Log + + ``` + { + contents: [ + { + title: "Logs" + type: log + } + ] + } + ``` + + This will look like this: + + ![Log](../../img/app_log.png#light-mode-only) + ![Log](../../img/app_log_dark.png#dark-mode-only) + +### Assets +Assets are optional elements used by the application configuration, to allow customization of the application display in +the ClearML web UI. They typically contain icons, empty-state HTML, and any other object required. Assets are stored in +a directory called `assets`. + +To access assets from the application configuration file, use `${ASSET:}`. For example: +``` +icon: "${ASSET:app-simple-app@2x.png}" +``` + +### Python Code +The code of the task that handles the application logic must be stored in a Git repository. +It is referenced by the script entry in the configuration file. For example: + +``` +script { + repository: "https://bitbucket.org/seematics/clearml_apps.git" + entry_point: "main.py" + working_dir: "demo_apps/simple-app" + branch: "master" +} +``` + +The task is run by a [Custom Applications Agent](#custom-apps-agent) within a Docker. Any packages used should be +described in a `requirements.txt` file in the working directory. + +The task can read input from configuration and from the `hyperparams` section, as defined in the configuration file of +the application, and it's the task's responsibility to update any element displayed in the dashboard. + +## Deploying Custom Applications +### Custom Apps Agent +Custom applications require a separate agent then the ClearML built-in applications since their code is downloaded from +a different Git repository. + +To define a custom-apps agent, add the following to the `docker-compose.yml` or to the `docker-compose.override.yml`: +* In the `apiserver` service section, add the following lines in the environment to create a user for handling the custom-apps: + ``` + - CLEARML__secure__credentials__custom_apps_agent__user_key="${CUSTOM_APPS_AGENT_USER_KEY}" + - CLEARML__secure__credentials__custom_apps_agent__user_secret="${CUSTOM_APPS_AGENT_USER_SECRET}" + - CLEARML__secure__credentials__custom_apps_agent__role="admin" + ``` + +* Add the custom-apps-agent service: + + ```custom-apps-agent: + container_name: custom-apps-agent + image: ${APPS_DAEMON_DOCKER_IMAGE} + restart: unless-stopped + privileged: true + environment: + - CLEARML_API_HOST=https://app.${SERVER_URL}/api + - CLEARML_FILES_HOST=https://files.${SERVER_URL} + - CLEARML_WEB_HOST=https://app.${SERVER_URL} + - CLEARML_API_ACCESS_KEY=${CUSTOM_APPS_AGENT_USER_KEY} + - CLEARML_API_SECRET_KEY=${CUSTOM_APPS_AGENT_USER_SECRET} + - CLEARML_AGENT_GIT_USER=${CUSTOM_APPS_AGENT_GIT_USER} + - CLEARML_AGENT_GIT_PASS=${CUSTOM_APPS_AGENT_GIT_PASSWORD} + - CLEARML_AGENT_DEFAULT_BASE_DOCKER=${APPS_WORKER_DOCKER_IMAGE} + - CLEARML_WORKER_ID=custom-apps-agent + - CLEARML_NO_DEFAULT_SERVER=true + - CLEARML_AGENT_DOCKER_HOST_MOUNT=/opt/allegro/data/agent/custom-app-agent:/root/.clearml + - CLEARML_AGENT_DAEMON_OPTIONS=--foreground --create-queue --use-owner-token --child-report-tags application --services-mode=${APPS_AGENT_INSTANCES:?err} + - CLEARML_AGENT_QUEUES=custom_apps_queue + - CLEARML_AGENT_NO_UPDATE: 1 + - CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=/root/venv/bin/python3 + # Disable Vault so that the apps will be downloaded with git credentials provided above, and not take any user's git credentials from the Vault. + - CLEARML_AGENT_EXTRA_DOCKER_ARGS=-e CLEARML_AGENT_DISABLE_VAULT_SUPPORT=1 + - CLEARML_AGENT_SERVICES_DOCKER_RESTART=on-failure;application.resumable=True + - CLEARML_AGENT_DISABLE_SSH_MOUNT=1 + - CLEARML_AGENT__AGENT__DOCKER_CONTAINER_NAME_FORMAT="custom-app-{task_id}-{rand_string:.8}" + - CLEARML_AGENT_EXTRA_DOCKER_LABELS="allegro-type=application subtype=custom" + labels: + ai.allegro.devops.allegro-software-type: "custom-apps-agent" + networks: + - backend + volumes: + - /var/run/docker.sock:/var/run/docker.sock + - /opt/allegro/data/agent/custom-app-agent:/root/.clearml + - /opt/allegro/data/agent/custom-app-agent-v2/tmp:/tmp + depends_on: + - apiserver + logging: + driver: "json-file" + options: + max-size: "10m" + max-file: "3" + ``` + + * Make sure to define the following variables in the `constants.env` or `runtime_created.env` configuration files: + * `CUSTOM_APPS_AGENT_USER_KEY` - A unique key for the user - any random string can be used + * `CUSTOM_APPS_AGENT_USER_SECRET` - A unique secret for the user - random UUID + * `CUSTOM_APPS_AGENT_GIT_USER` - The user for the Git repository + * `CUSTOM_APPS_AGENT_GIT_PASSWORD` - The password/app-password/token for the Git repository + +### Deploying Apps +#### Packaging an App +Create a zip file with the configuration, and with the assets, if applicable. + +``` +zip -r simple-app.zip simple-app.app.conf assets/ +``` + +#### Installing an App +Run the `upload_apps.py` script to upload the applications. You will need to provide credentials for an admin user in the system + +``` +upload_apps.py --host --user --password --files simple-app.zip +``` + +* `` can be something like `https://api.my-server.allegro.ai` or `http://localhost:8008` if running on the server. +* `--user` and `--password` are key/secret credentials of any ClearML admin user. These can be generated in the ClearML web UI. + +#### Removing an App +Applications can be uninstalled by running the `manage_apps.py` script as follows: + +``` +manage_apps.py delete --host --user --password -app +``` diff --git a/docs/deploying_clearml/enterprise_deploy/app_install_ex_server.md b/docs/deploying_clearml/enterprise_deploy/app_install_ex_server.md new file mode 100644 index 00000000..9b1990a9 --- /dev/null +++ b/docs/deploying_clearml/enterprise_deploy/app_install_ex_server.md @@ -0,0 +1,87 @@ +--- +title: Installing External Applications Server +--- + +ClearML supports applications, which are extensions that allow additional capabilities, such as cloud auto-scaling, +Hyperparameter Optimizations, etc. For more information, see [ClearML Applications](../../webapp/applications/apps_overview.md). + +Applications run inside Docker containers, which can either reside on the ClearML Server side, or on an external server. +The `clearml-apps-agent` polls an internal applications queue, and spawns additional Docker containers for application +instances that are launched using the ClearML web UI. + +This document provides a short guide on how to configure an external applications server. + +## Requirements + +* A server, as described in [Server Requirements](#server-requirements) +* `docker-compose.yml` file provided by ClearML +* `constants.env` - Environment file with required credentials +* Credentials to access ClearML’s enterprise Dockerhub registry + +### Server Requirements + +* Operating system: Linux-based +* CPU: Since applications do not produce a high CPU load, we recommend 2-4 virtual CPUs, assuming around 10 concurrent + applications are required +* Memory: Around 1 GiB of RAM is required per each concurrent application instance +* Storage: About 100 GB of storage is recommended for the system volume, with an additional 100 GB of storage for + application caching. In AWS, `m6a.xlarge` can be used for running up to 10 applications in parallel. + +## Installation + +:::note +Installing an external server requires removing the applications’ agent from the ClearML Enterprise Server. This +is done by ClearML in hosted environments, or by removing the `apps-agent` service from the `docker-compose` override +file in VPC and on-premises installations. For K8s environments, please consult the ClearML team. +::: + +1. Install Docker. See [Docker documentation](https://docs.docker.com/engine/install/ubuntu/) +1. Copy the `docker-compose.yml` and `constants.env` files to `/opt/allegro`. The + `constants.env` file should contain following definitions: + + * `APISERVER_URL_FOR_EXTERNAL_WORKERS` - URL of the ClearML API server + * `WEBSERVER_URL_FOR_EXTERNAL_WORKERS` - URL of the ClearML WebApp + * `FILESERVER_URL_FOR_EXTERNAL_WORKERS` - URL of the ClearML files server + * `APPS_AGENT_USER_KEY` - Provided by ClearML + * `APPS_AGENT_USER_SECRET` - Provided by ClearML + * `APPS_AGENT_GIT_USER` - Provided by ClearML (required up to ClearML Server 1.8) + * `APPS_AGENT_GIT_PASSWORD` - Provided by ClearML (required up to ClearML Server 1.8) + * `APPS_WORKER_DOCKER_IMAGE` - Provided by ClearML (required up to ClearML Server 1.8) + * `APPS_DAEMON_DOCKER_IMAGE` - Provided by ClearML + +1. Log in to the Docker registry: + + ``` + sudo docker login -username allegroaienterprise + ``` + +1. Pull the container: + + ``` + docker compose -env-file constants.env pull + ``` + +1. Start the service: + + ``` + docker compose -env-file constants.env up -d + ``` + + +## Clearing Stopped Containers +Containers of running applications that are stopped are not automatically deleted. Therefore, it is recommended to +periodically delete stopped containers. This can be done by adding the following to the cron file: + +``` +0 0 * * * root docker container prune --force --filter "until=96h" --filter "label=allegro-type=application" +``` + +## Monitoring +We recommend monitoring the following: +* Available memory +* CPU usage +* Remaining Storage + +For more information contact ClearML's support team. + + diff --git a/docs/deploying_clearml/enterprise_deploy/app_install_ubuntu_on_prem.md b/docs/deploying_clearml/enterprise_deploy/app_install_ubuntu_on_prem.md new file mode 100644 index 00000000..3fbcb504 --- /dev/null +++ b/docs/deploying_clearml/enterprise_deploy/app_install_ubuntu_on_prem.md @@ -0,0 +1,72 @@ +--- +title: Application Installation on On-Prem and VPC Servers +--- + +ClearML Applications are like plugins that allow you to manage ML workloads and automatically run recurring workflows +without any coding. Applications are installed on top of the ClearML Server. + +## Requirements +To run application you will need the following: +* RAM: Make sure you have at least 400 MB of RAM per application instance. +* Applications Service: Make sure that the applications agent service is up and running on your server: + * If you are using a docker-compose solution, make sure that the clearml-apps-agent service is running. + * If you are using a Kubernetes cluster, check for the clearml-clearml-enterprise-apps component. +* Installation Files: Each application has its installation zip file. Make sure you have the relevant files for the +applications you wish to install. +* Installation Script - See below + +## Air-Gapped Environments +For Air-Gapped installations you need to copy docker images to the local registry and then update the application +configuration files to use this repository. This can be achieved by using the `convert_image_registry.py` script with +the `--repo` flag. For example: + +``` +python convert_image_registry.py \ + --apps-dir /path/to/apps/ \ + --repo local_registry/clearml-apps +``` + +The script will change the application zip files to point to the new registry, and will output the list of containers +that need to be copied to the local registry. For example: + +``` +make sure allegroai/clearml-apps:hpo-1.10.0-1062 was added to local_registry/clearml-apps +``` + +## Installing on ClearML Server +The `upload_apps.py` script handles uploading the app packages to the ClearML Server. It requires Python3. + +To see the options, run: + +```commandline +python3 upload_apps.py --help +``` + +### Credentials +The script requires user and password (`USER_KEY`/`USER_SECRET` in the example below). These can be taken from +the credentials of an admin user, which can be generated in the ClearML web application. + +### Host +For the host, supply the `apiserver` address. If running locally on the server, you can use `localhost:8008`. + +### Uploading a Single Application + +```commandline +python3 upload_apps.py \ + --host \ + --user \ + --password \ + --files "YOUR_APP.zip" +``` + +### Uploading Multiple Applications +If you wish to install more than one app you can use the `--dir` instead of the `--files` argument: + +```commandline +python3 upload_apps.py \ + --host \ + --user \ + --password \ + --dir "DIRECTORY_CONTAINING_APPS_ZIP_FILES" +``` + diff --git a/docs/deploying_clearml/enterprise_deploy/appgw.md b/docs/deploying_clearml/enterprise_deploy/appgw.md new file mode 100644 index 00000000..fe302472 --- /dev/null +++ b/docs/deploying_clearml/enterprise_deploy/appgw.md @@ -0,0 +1,44 @@ +--- +title: AI Application Gateway +--- + +:::important Enterprise Feature +This feature is available under the ClearML Enterprise plan. +::: + +Services running through a cluster orchestrator such as Kubernetes or cloud hyperscaler require meticulous configuration +to make them available as these environments do not expose their networks to external users. + +The ClearML AI Application Gateway facilitates setting up secure, authenticated access to jobs running on your compute +nodes from external networks. + +Using the AI Application Gateway, services are allocated externally accessible, SSL secure network routes which provide +access in adherence to ClearML RBAC privileges. The AI Application Gateway supports HTTP/S as well as raw TCP routing. + +The following ClearML UI applications make use of the AI Application Gateway to provide authenticated HTTPS access to +their instances: + +* GPUaaS + * [JupyterLab](../../webapp/applications/apps_jupyter_lab.md) + * [VScode](../../webapp/applications/apps_vscode.md) + * [SSH Session](../../webapp/applications/apps_ssh_session.md) +* UI Dev + * [Gradio launcher](../../webapp/applications/apps_gradio.md) + * [Streamlit launcher](../../webapp/applications/apps_streamlit.md) +* Deploy + * [vLLM Deployment](../../webapp/applications/apps_model_deployment.md) + * [Embedding Model Deployment](../../webapp/applications/apps_embed_model_deployment.md) + * [Llama.cpp Model Deployment](../../webapp/applications/apps_llama_deployment.md) + +The AI Application Gateway is provided through an additional component to the ClearML Server deployment: The ClearML Task Traffic Router. +If your ClearML Deployment does not have the Task Traffic Router properly installed, these application instances may not be accessible. + +#### Installation + +The Task Traffic Router supports two deployment options: + +* [Docker Compose](appgw_install_compose.md) +* [Kubernetes](appgw_install_k8s.md) + +The deployment configuration specifies the external and internal address and port mappings for routing requests. + diff --git a/docs/deploying_clearml/enterprise_deploy/appgw_install_compose.md b/docs/deploying_clearml/enterprise_deploy/appgw_install_compose.md new file mode 100644 index 00000000..70335045 --- /dev/null +++ b/docs/deploying_clearml/enterprise_deploy/appgw_install_compose.md @@ -0,0 +1,126 @@ +# Docker-Compose Deployment + +## Requirements + +* Linux OS (x86) machine +* Root access +* Credentials for the ClearML/allegroai docker repository +* A valid ClearML Server installation + +## Host configurations + +### Docker installation + +Installing docker and docker-compose might vary depending on the specific operating system you’re using. Here is an example for AmazonLinux: + +``` +sudo dnf -y install docker +DOCKER_CONFIG="/usr/local/lib/docker" +sudo mkdir -p $DOCKER_CONFIG/cli-plugins +sudo curl -SL https://github.com/docker/compose/releases/download/v2.17.3/docker-compose-linux-x86_64 -o $DOCKER_CONFIG/cli-plugins/docker-compose +sudo chmod +x $DOCKER_CONFIG/cli-plugins/docker-compose +sudo systemctl enable docker +sudo systemctl start docker + +sudo docker login +``` + +Use the ClearML/allegroai dockerhub credentials when prompted by docker login. + +### Docker-compose file + +This is an example of the docker-compose file you will need: + +``` +version: '3.5' +services: +task_traffic_webserver: + image: allegroai/task-traffic-router-webserver:${TASK-TRAFFIC-ROUTER-WEBSERVER-TAG} + ports: + - "80:8080" + restart: unless-stopped + container_name: task_traffic_webserver + volumes: + - ./task_traffic_router/config/nginx:/etc/nginx/conf.d:ro + - ./task_traffic_router/config/lua:/usr/local/openresty/nginx/lua:ro +task_traffic_router: + image: allegroai/task-traffic-router:${TASK-TRAFFIC-ROUTER-TAG} + restart: unless-stopped + container_name: task_traffic_router + volumes: + - /var/run/docker.sock:/var/run/docker.sock + - ./task_traffic_router/config/nginx:/etc/nginx/conf.d:rw + - ./task_traffic_router/config/lua:/usr/local/openresty/nginx/lua:rw + environment: + - LOGGER_LEVEL=INFO + - CLEARML_API_HOST=${CLEARML_API_HOST:?err} + - CLEARML_API_ACCESS_KEY=${CLEARML_API_ACCESS_KEY:?err} + - CLEARML_API_SECRET_KEY=${CLEARML_API_SECRET_KEY:?err} + - ROUTER_URL=${ROUTER_URL:?err} + - ROUTER_NAME=${ROUTER_NAME:?err} + - AUTH_ENABLED=${AUTH_ENABLED:?err} + - SSL_VERIFY=${SSL_VERIFY:?err} + - AUTH_COOKIE_NAME=${AUTH_COOKIE_NAME:?err} + - AUTH_BASE64_JWKS_KEY=${AUTH_BASE64_JWKS_KEY:?err} + - LISTEN_QUEUE_NAME=${LISTEN_QUEUE_NAME} + - EXTRA_BASH_COMMAND=${EXTRA_BASH_COMMAND} + - TCP_ROUTER_ADDRESS=${TCP_ROUTER_ADDRESS} + - TCP_PORT_START=${TCP_PORT_START} + - TCP_PORT_END=${TCP_PORT_END} + +``` + +Create a *runtime.env* file containing the following entries: + +``` +TASK-TRAFFIC-ROUTER-WEBSERVER-TAG= +TASK-TRAFFIC-ROUTER-TAG= +CLEARML_API_HOST=https://api. +CLEARML_API_ACCESS_KEY= +CLEARML_API_SECRET_KEY= +ROUTER_URL= +ROUTER_NAME=main-router +AUTH_ENABLED=true +SSL_VERIFY=true +AUTH_COOKIE_NAME= +AUTH_BASE64_JWKS_KEY= +LISTEN_QUEUE_NAME= +EXTRA_BASH_COMMAND= +TCP_ROUTER_ADDRESS= +TCP_PORT_START= +TCP_PORT_END= +``` + +Edit it according to the following guidelines: + +* `CLEARML_API_HOST`: URL usually starting with `https://api.` +* `CLEARML_API_ACCESS_KEY`: ClearML server api key +* `CLEARML_API_SECRET_KEY`: ClearML server secret key +* `ROUTER_URL`: URL for this router that was previously configured in the load balancer starting with `https://` +* `ROUTER_NAME`: unique name for this router +* `AUTH_ENABLED`: enable or disable http calls authentication when the router is communicating with the ClearML server +* `SSL_VERIFY`: enable or disable SSL certificate validation when the router is communicating with the ClearML server +* `AUTH_COOKIE_NAME`: the cookie name used by the ClearML server to store the ClearML authentication cookie. This can usually be found in the `value_prefix` key starting with `allegro_token` in `envoy.yaml` file in the ClearML server installation (`/opt/allegro/config/envoy/envoy.yaml`) (see below) +* `AUTH_SECURE_ENABLED`: enable the Set-Cookie `secure` parameter +* `AUTH_BASE64_JWKS_KEY`: value form `k` key in the `jwks.json` file in the ClearML server installation +* `LISTEN_QUEUE_NAME`: (optional) name of queue to check for tasks (if none, every task is checked) +* `EXTRA_BASH_COMMAND`: command to be launched before starting the router +* `TCP_ROUTER_ADDRESS`: router external address, can be an IP or the host machine or a load balancer hostname, depends on network configuration +* `TCP_PORT_START`: start port for the TCP Session feature +* `TCP_PORT_END`: end port port for the TCP Session feature + +Run the following command to start the router: + +``` +sudo docker compose --env-file runtime.env up -d +``` + +:::Note How to find my jwkskey + +The *JSON Web Key Set* (*JWKS*) is a set of keys containing the public keys used to verify any JSON Web Token (JWT). + +In a docker-compose server installation, this can be found in the `CLEARML__secure__auth__token_secret` env var in the apiserver server component. + +::: + + diff --git a/docs/deploying_clearml/enterprise_deploy/appgw_install_k8s.md b/docs/deploying_clearml/enterprise_deploy/appgw_install_k8s.md new file mode 100644 index 00000000..9eef033b --- /dev/null +++ b/docs/deploying_clearml/enterprise_deploy/appgw_install_k8s.md @@ -0,0 +1,96 @@ +# Kubernetes Deployment + +This guide details the installation of the ClearML AI Application Gateway, specifically the ClearML Task Router Component. + +## Requirements + +* Kubernetes cluster: `>= 1.21.0-0 < 1.32.0-0` +* Helm installed and configured +* Helm token to access allegroai helm-chart repo +* Credentials for allegroai docker repo +* A valid ClearML Server installation + +## Optional for HTTPS + +* A valid DNS entry for the new TTR instance +* A valid SSL certificate + +## Helm + +### Login + +``` +helm repo add allegroai-enterprise \ +https://raw.githubusercontent.com/allegroai/clearml-enterprise-helm-charts/gh-pages \ +--username \ +--password +``` + +### Prepare values + +Before installing the TTR create an helm-override files named `task-traffic-router.values-override.yaml`: + +``` +imageCredentials: + password: "" +clearml: + apiServerKey: "" + apiServerSecret: "" + apiServerUrlReference: "https://api." + jwksKey: "" + authCookieName: "" +ingress: + enabled: true + hostName: "task-router.dev" +tcpSession: + routerAddress: "" + portRange: + start: + end: +``` + +Edit it accordingly to this guidelines: + +* `clearml.apiServerUrlReference`: url usually starting with `https://api.` +* `clearml.apiServerKey`: ClearML server api key +* `clearml.apiServerSecret`: ClearML server secret key +* `ingress.hostName`: url of router we configured previously for loadbalancer starting with `https://` +* `clearml.sslVerify`: enable or disable SSL certificate validation on apiserver calls check +* `clearml.authCookieName`: value from `value_prefix` key starting with `allegro_token` in `envoy.yaml` file in ClearML server installation. +* `clearml.jwksKey`: value form `k` key in `jwks.json` file in ClearML server installation (see below) +* `tcpSession.routerAddress`: router external address can be an IP or the host machine or a loadbalancer hostname, depends on the network configuration +* `tcpSession.portRange.start`: start port for the TCP Session feature +* `tcpSession.portRange.end`: end port port for the TCP Session feature + +::: How to find my jwkskey + +The *JSON Web Key Set* (*JWKS*) is a set of keys containing the public keys used to verify any JSON Web Token (JWT). + +``` +kubectl -n clearml get secret clearml-conf \ +-o jsonpath='{.data.secure_auth_token_secret}' \ +| base64 -d && echo +``` + +::: + + +The whole list of supported configuration is available with the command: + +``` +helm show readme allegroai-enterprise/clearml-enterprise-task-traffic-router +``` + +### Install + +To install the TTR component via Helm use the following command: + +``` +helm upgrade --install \ + \ +-n \ +allegroai-enterprise/clearml-enterprise-task-traffic-router \ +--version \ +-f task-traffic-router.values-override.yaml +``` + diff --git a/docs/deploying_clearml/enterprise_deploy/change_artifact_links.md b/docs/deploying_clearml/enterprise_deploy/change_artifact_links.md new file mode 100644 index 00000000..34e67d5c --- /dev/null +++ b/docs/deploying_clearml/enterprise_deploy/change_artifact_links.md @@ -0,0 +1,78 @@ +--- +title: Changing ClearML Artifacts Links +--- + +This guide describes how to update artifact references in the ClearML Enterprise server. + +By default, artifacts are stored on the file server; however, an external storage such as AWS S3, Minio, Google Cloud +Storage, etc. may be used to store artifacts. References to these artifacts may exist in ClearML databases: MongoDB and ElasticSearch. +This procedure should be used if external storage is being migrated to a different location or URL. + +:::important +This procedure does not deal with the actual migration of the data--only with changing the references in ClearML that +point to the data. +::: + +## Preparation + +### Version Confirmation + +To change the links, use the `fix_fileserver_urls.py` script, located inside the `allegro-apiserver` +Docker container. This script will be executed from within the `apiserver` container. Make sure the `apiserver` version +is 3.20 or higher. + +### Backup + +It is highly recommended to back up the ClearML MongoDB and ElasticSearch databases before running the script, as the +script changes the values in the databases, and can't be undone. + +## Fixing MongoDB links + +1. Access the `apiserver` Docker container: + * In `docker-compose:` + + ```commandline + sudo docker exec -it allegro-apiserver /bin/bash + ``` + + * In Kubernetes: + + ```commandline + kubectl exec -it -n clearml -- bash + ``` + +1. Navigate to the script location in the `upgrade` folder: + + ```commandline + cd /opt/seematics/apiserver/server/upgrade + ``` + +1. Run the following command: + + :::important + Before running the script, verify that this is indeed the correct version (`apiserver` v3.20 or higher, + or that the script provided by ClearML was copied into the container). + :::: + + ```commandline + python3 fix_fileserver_urls.py \ + --mongo-host mongodb://mongo:27017 \ + --elastic-host elasticsearch:9200 \ + --host-source "" \ + --host-target "" --datasets + ``` + +:::note Notes +* If MongoDB or ElasticSearch services are accessed from the `apiserver` container using custom addresses, then +`--mongo-host` and `--elastic-host` arguments should be updated accordingly. +* If ElasticSearch is set up to require authentication then the following arguments should be used to pass the user +and password: `--elastic-user --elastic-password ` +::: + +The script fixes the links in MongoDB, and outputs `cURL` commands for updating the links in ElasticSearch. + +## Fixing the ElasticSearch Links + +Copy the `cURL` commands printed by the script run in the previous stage, and run them one after the other. Make sure to +inspect that a "success" result was returned from each command. Depending on the amount of the data in the ElasticSearch, +running these commands may take some time. \ No newline at end of file diff --git a/docs/deploying_clearml/enterprise_deploy/custom_billing.md b/docs/deploying_clearml/enterprise_deploy/custom_billing.md new file mode 100644 index 00000000..3c3ee30d --- /dev/null +++ b/docs/deploying_clearml/enterprise_deploy/custom_billing.md @@ -0,0 +1,118 @@ +--- +title: Custom Billing Events +--- + +ClearML supports sending custom events to selected Kafka topics. Event sending is triggered by API calls and +is available only for the companies with the `custom_events` settings set. + +## Enabling Custom Events in ClearML Server + +:::important Prerequisite +**Precondition**: Customer Kafka for custom events is installed and reachable from the `apiserver`. +::: + +Set the following environment variables in the ClearML Enterprise helm chart under the `apiserver.extraEnv`: + +* Enable custom events: + + ``` + - name: CLEARML__services__custom_events__enabled + value: "true" + ``` +* Mount custom message template files into `/mnt/custom_events/templates` folder in the `apiserver` container and point + the `apiserver` into it: + + ``` + - name: CLEARML__services__custom_events__template_folder + value: "/mnt/custom_events/templates" + ``` +* Configure the Kafka host for sending events: + + ``` + - name: CLEARML__hosts__kafka__custom_events__host + value: "[]" + ``` + Configure Kafka security parameters. Below is the example for SASL plaintext security: + + ``` + - name: CLEARML__SECURE__KAFKA__CUSTOM_EVENTS__security_protocol + value: "SASL_PLAINTEXT" + - name: CLEARML__SECURE__KAFKA__CUSTOM_EVENTS__sasl_mechanism + value: "SCRAM-SHA-512" + - name: CLEARML__SECURE__KAFKA__CUSTOM_EVENTS__sasl_plain_username + value: "" + - name: CLEARML__SECURE__KAFKA__CUSTOM_EVENTS__sasl_plain_password + value: "" + ``` +* Define Kafka topics for lifecycle and inventory messages: + + ``` + - name: CLEARML__services__custom_events__channels__main__topics__service_instance_lifecycle + value: "lifecycle" + - name: CLEARML__services__custom_events__channels__main__topics__service_instance_inventory + value: "inventory" + ``` +* For the desired companies set up the custom events properties required by the event message templates: + + ``` + curl $APISERVER_URL/system.update_company_custom_events_config -H "Content-Type: application/json" -u $APISERVER_KEY:$APISERVER_SECRET -d'{ + "company": "", + "fields": { + "service_instance_id": "", + "service_instance_name": "", + "service_instance_customer_tenant_name": "", + "service_instance_customer_space_name": "", + "service_instance_customer_space_id": "", + "parameters_connection_points": ["", ""] + }}' + ``` + +## Sending Custom Events to the API Server + +:::important Prerequisite +**Precondition:** Dedicated custom-events Redis instance installed and reachable from all the custom events deployments. +::: + +Environment lifecycle events are sent directly by the `apiserver`. Other event types are emitted by the following helm charts: + +* `clearml-pods-monitor-exporter` - Monitors running pods and sends container lifecycle events (should run one per cluster with a unique identifier, a UUID is required for the installation): + + ``` + # -- Universal Unique string to identify Pods Monitor instances across worker clusters. It cannot be empty. + # Uniqueness is required across different cluster installations to preserve the reported data status. + podsMonitorUUID: "" + # Interval + checkIntervalSeconds: 60 + ``` +* `clearml-pods-inventory` - Periodically sends inventory events about running pods. + + ``` + # Cron schedule - https://crontab.guru/ + cronJob: + schedule: "@daily" + ``` +* `clearml-company-inventory` - Monitors Clearml companies and sends environment inventory events. + + ``` + # Cron schedule - https://crontab.guru/ + cronJob: + schedule: "@daily" + ``` + +For every script chart add the below configuration to enable redis access and connection to the `apiserver`: + +``` +clearml: + apiServerUrlReference: "" + apiServerKey: "" + apiServerSecret: "" +redisConnection: + host: "" + port: + password: "" +``` + +See all other available options to customize the `custom-events` charts by running: +``` +helm show readme allegroai-enterprise/ +``` \ No newline at end of file diff --git a/docs/deploying_clearml/enterprise_deploy/delete_tenant.md b/docs/deploying_clearml/enterprise_deploy/delete_tenant.md new file mode 100644 index 00000000..9542de79 --- /dev/null +++ b/docs/deploying_clearml/enterprise_deploy/delete_tenant.md @@ -0,0 +1,115 @@ +--- +title: Deleting Tenants from ClearML +--- + +The following is a step-by-step guide for deleting tenants (i.e. companies, workspaces) from ClearML. + +:::caution +Deleting a tenant is a destructive operation that cannot be undone. +* Make sure you have the data prior to deleting the tenant. +* Backing up the system before deleting is recommended. +::: + +The tenant deletion is done from MongoDB, ElasticsSearch, and the Fileserver. + +The first two are done from within the `apiserver` container, and last from within the `fileserver` container. + +Any external artifacts (ex: AWS S3, GCS, minio) can be removed manually. + +## Deleting Tenants from MongoDB and ElasticSearch + +1. Enter the `apiserver` in one of the following ways + * In `docker-compose`: + + ``` + sudo docker exec -it allegro-apiserver /bin/bash + ``` + * In Kubernetes: + + ``` + kubectl -n exec -it -c clearml-apiserver -- /bin/bash + ``` + +1. Set the ID and the name of the company (tenant) you wish to delete + + ``` + tenant_to_delete= + company_name_to_delete="" + ``` + +1. Delete the company's data from MongoDB: + + ``` + PYTHONPATH=../trains-server-repo python3 \ + -m jobs.management.delete_company_data_from_mongo \ + --id $tenant_to_delete \ + --name \ + --delete-user + ``` + + :::note + This also deletes the admin users. Remove `--delete-user` to avoid this. + ::: + +1. Delete the company's data from ElasticSearch: + + ``` + PYTHONPATH=../trains-server-repo python3 \ + -m jobs.management.cleanup_deleted_companies \ + --ids $tenant_to_delete --delete-company + ``` + +1. Exit pod/container + +## Deleting Tenants from the Fileserver + +To remove a tenant's data from the fileserver, you can choose one of the following methods depending on your deployment setup: + +* Option 1: Delete the tenant's data from within the fileserver container or pod. +* Option 2: Delete the tenant's data externally from the host system. + +### Option 1 - From Within the Fileserver + + +1. Enter the `fileserver` in one of the following ways + * In `docker-compose`: + + ``` + sudo docker exec -it allegro-fileserver /bin/bash + ``` + * In Kubernetes: + + ``` + kubectl -n exec -it -c clearml-fileserver -- /bin/bash + ``` + +1. Run the following: + + ``` + rm -rf /mnt/fileserver/ + ``` + +1. Exit pod/container + +### Option 2 - External Deletion + +#### Docker compose + +Run the following: + +``` +rm -rf /opt/allegro/data/fileserver/ +``` + +#### Kubernetes + +Run the following: + +``` +kubectl -n exec -it -c clearml-apiserver -- /bin/bash -c "PYTHONPATH=../trains-server-repo python3 -m jobs.management.delete_company_data_from_mongo --id --delete-user" + +kubectl -n exec -it -c clearml-apiserver -- /bin/bash -c "PYTHONPATH=../trains-server-repo python3 -m jobs.management.cleanup_deleted_companies --ids --delete-company" + +kubectl -n exec -it -c clearml-fileserver -- /bin/bash -c "rm -rf /mnt/fileserver/" +``` + diff --git a/docs/deploying_clearml/enterprise_deploy/import_projects.md b/docs/deploying_clearml/enterprise_deploy/import_projects.md new file mode 100644 index 00000000..3fd8a624 --- /dev/null +++ b/docs/deploying_clearml/enterprise_deploy/import_projects.md @@ -0,0 +1,240 @@ +--- +title: Exporting and Importing ClearML Projects +--- + +When migrating from a ClearML Open Server to a ClearML Enterprise Server, you may need to transfer projects. This is done +using the `data_tool.py` script. This utility is available in the `apiserver` Docker image, and can be used for +exporting and importing ClearML project data for both open source and Enterprise versions. + +This guide covers the following: +* Exporting data from Open Source and Enterprise servers +* Importing data into an Enterprise server +* Handling the artifacts stored on the file server. + +:::note +Export instructions differ for ClearML open and Enterprise servers. Make sure you follow the guidelines that match your +server type. +::: + +## Exporting Data + +The export process is done by running the ***data_tool*** script that generates a zip file containing project and task +data. This file should then be copied to the server on which the import will run. + +Note that artifacts stored in the ClearML ***file server*** should be copied manually if required (see [Handling Artifacts](#handling-artifacts)). + +### Exporting Data from ClearML Open Servers + +#### Preparation + +* Make sure the `apiserver` is at least Open Source server version 1.12.0. +* Note that any `pending` or `running` tasks will not be exported. If you wish to export them, make sure to stop/dequeue +them before exporting. + +#### Running the Data Tool + +Execute the data tool within the `apiserver` container. + +Open a bash session inside the `apiserver` container of the server: +* In docker-compose: + + ```commandline + sudo docker exec -it clearml-apiserver /bin/bash + ``` + +* In Kubernetes: + + ```commandline + kubectl exec -it -n -- bash + ``` + +#### Export Commands +**To export specific projects:** + +```commandline +python3 -m apiserver.data_tool export --projects +--statuses created stopped published failed completed --output .zip +``` + +As a result, you should get a `.zip` file that contains all the data from the specified projects and +their children. + +**To export all the projects:** + +```commandline +python3 -m apiserver.data_tool export \ + --all \ + --statuses created stopped published failed completed \ + --output .zip +``` + +#### Optional Parameters + +* `--experiments ` - If not specified then all experiments from the specified projects are exported +* `--statuses ` - Export tasks of specific statuses. If the parameter + is omitted, only `published` tasks are exported +* `--no-events` - Do not export task events, i.e. logs and metrics (scalar, plots, debug samples). + +Make sure to copy the generated zip file containing the exported data. + +### Exporting Data from ClearML Enterprise Servers + +#### Preparation + +* Make sure the `apiserver` is at least Enterprise Server version 3.18.0. +* Note that any `pending` or `running` tasks will not be exported. If you wish to export them, make sure to stop/dequeue +before exporting. + +#### Running the Data Tool + +Execute the data tool from within the `apiserver` docker container. + +Open a bash session inside the `apiserver` container of the server: +* In `docker-compose`: + + ```commandline + sudo docker exec -it allegro-apiserver /bin/bash + ``` + +* In Kubernetes: + + ```commandline + kubectl exec -it -n -- bash + ``` + +#### Export Commands + +**To export specific projects:** + +```commandline +PYTHONPATH=/opt/seematics/apiserver/trains-server-repo python3 data_tool.py \ + export \ + --projects \ + --statuses created stopped published failed completed \ + --output .zip +``` + +As a result, you should get `.zip` file that contains all the data from the specified projects and +their children. + +**To export all the projects:** + +```commandline +PYTHONPATH=/opt/seematics/apiserver/trains-server-repo python3 data_tool.py \ + export \ + --all \ + --statuses created stopped published failed completed \ + --output .zip +``` + +#### Optional Parameters + +* `--experiments ` - If not specified then all experiments from the specified projects are exported +* `--statuses ` - Can be used to allow exporting tasks of specific statuses. If the parameter is + omitted, only `published` tasks are exported. +* `--no-events` - Do not export task events, i.e. logs, and metrics (scalar, plots, debug samples). + +Make sure to copy the generated zip file containing the exported data. + +## Importing Data + +This section explains how to import the exported data into a ClearML Enterprise server. + +### Preparation + +* It is highly recommended to back up the ClearML databases before importing data, as import injects data into the +databases, and can't be undone. +* Make sure you are working with `apiserver` version 3.22.3 or higher. +* Make the zip file accessible from within the `apiserver` container by copying the exported data to the +`apiserver` container or to a folder on the host, which the `apiserver` is mounted to. + +### Usage + +The data tool should be executed from within the `apiserver` docker container. + +1. Open a bash session inside the `apiserver` container of the server: + * In `docker-compose`: + + ```commandline + sudo docker exec -it allegro-apiserver /bin/bash + ``` + + * In Kubernetes: + + ```commandline + kubectl exec -it -n -- bash + ``` + +1. Run the data tool script in *import* mode: + + ```commandline + PYTHONPATH=/opt/seematics/apiserver/trains-server-repo python3 data_tool.py \ + import \ + \ + --company \ + --user + ``` + + * `company_id`- The default company ID used in the target deployment. Inside the `apiserver` container you can + usually get it from the environment variable `CLEARML__APISERVER__DEFAULT_COMPANY`. + If you do not specify the `--company` parameter then all the data will be imported as `Examples` (read-only) + * `user_id` - The ID of the user in the target deployment who will become the owner of the imported data + +## Handling Artifacts + +***Artifacts*** refers to any content which the ClearML server holds references to. This can include: +* Dataset or Hyper-Dataset frame URLs +* ClearML artifact URLs +* Model snapshots +* Debug samples + +Artifacts may be stored in any external storage (e.g., AWS S3, minio, Google Cloud Storage) or in the ClearML file server. +* If the artifacts are **not** stored in the ClearML file server, they do not need to be moved during the export/import process, +as the URLs registered in ClearML entities pointing to these artifacts will not change. +* If the artifacts are stored in the ClearML file server, then the file server content must also be moved, and the URLs + in the ClearML databases must point to the new location. See instructions [below](#exporting-file-server-data-for-clearml-open-server). + +### Exporting File Server Data for ClearML Open Server + +Data in the file server is organized by project. For each project, all data references by entities in that project is +stored in a folder bearing the name of the project. This folder can be located in: + +``` +/opt/clearml/data/fileserver/ +``` + +The entire projects' folders content should be copied to the target server (see [Importing Fileserver Data](#importing-file-server-data)). + +### Exporting File Server Data for ClearML Enterprise Server + +Data in the file server is organized by tenant and project. For each project, all data references by entities in that +project is stored in a folder bearing the name of the project. This folder can be located in: + +``` +/opt/allegro/data/fileserver// +``` + +The entire projects' folders content should be copied to the target server (see [Importing Fileserver Data](#importing-file-server-data)). + +## Importing File Server Data + +### Copying the Data + +Place the exported projects' folder(s) content into the target file server's storage in the following folder: + +``` +/opt/allegro/data/fileserver// +``` + +### Fixing Registered URLs + +Since URLs pointing to the file server contain the file server's address, these need to be changed to the address of the +new file server. + +Note that this is not required if the new file server is replacing the old file server and can be accessed using the same +exact address. + +Once the projects' data has been copied to the target server, and the projects themselves were imported, see +[Changing ClearML Artifacts Links](change_artifact_links.md) for information on how to fix the URLs. + + diff --git a/docs/deploying_clearml/enterprise_deploy/multi_tenant_k8s.md b/docs/deploying_clearml/enterprise_deploy/multi_tenant_k8s.md new file mode 100644 index 00000000..45b4d4d2 --- /dev/null +++ b/docs/deploying_clearml/enterprise_deploy/multi_tenant_k8s.md @@ -0,0 +1,543 @@ +--- +title: Multi-Tenant Service on Kubernetes +--- + +This guide provides step-by-step instructions for installing a ClearML multi-tenant service on a Kubernetes cluster. + +It covers the installation and configuration steps necessary to set up ClearML in a cloud environment, including +enabling specific features and setting up necessary components. + +## Prerequisites + +* A Kubernetes cluster +* Credentials for the ClearML Enterprise Helm chart repository +* Credentials for the ClearML Enterprise DockerHub repository +* Credentials for the ClearML billing DockerHub repository +* URL for downloading the ClearML Enterprise applications configuration +* ClearML Billing server Helm chart + +## Setting up ClearML Helm Repository + +You need to add the ClearML Enterprise Helm repository to your local Helm setup. This repository contains the Helm +charts required for deploying the ClearML Server and its components. + +To add the ClearML Enterprise repository using the following command. Replace `` with the private tokens sent to +you by ClearML: + +``` +helm repo add allegroai-enterprise --username --password +``` + +## Enabling Dynamic MIG GPUs + +Allocating GPU fractions dynamically make use of the NVIDIA GPU operator. + +1. Add the NVIDIA Helm repository: + + ``` + helm repo add nvidia + helm repo update + ``` + +2. Install the NVIDIA GPU operator with the following configuration: + + ``` + helm install -n gpu-operator \\ + gpu-operator \\ + nvidia/gpu-operator \\ + --create-namespace \\ + --set migManager.enabled=false \\ + --set mig.strategy=mixed + ``` + +## Install CDMO Chart + +The ClearML Dynamic MIG Operator (CDMO) enables running AI workloads on k8s with optimized hardware utilization and +workload performance by facilitating MIG GPUs partitioning. + +1. Prepare the `overrides.yaml` file so it will contain the following content. Replace `` + with the private token provided by ClearML: + + ``` + imageCredentials: + password: "" + ``` + +2. Install the CDMO chart: + + ``` + helm install -n cdmo-operator \\ + cdmo \\ + allegroai-enterprise/clearml-dynamic-mig-operator \\ + --create-namespace \\ + -f overrides.yaml + ``` + +### Enable MIG support + +1. Enable dynamic MIG support on your cluster by running the following command on **all nodes used for training** (run + for **each GPU** ID in your cluster): + + ``` + nvidia-smi -i -mig 1 + ``` + + This command can be issued from inside the `nvidia-device-plugin-daemonset` pod on the related node. + + If the result of the previous command indicates that a node reboot is necessary, perform the reboot. + +2. After enabling MIG support, label the MIG GPU nodes accordingly. This labeling helps in identifying nodes configured + with MIG support for resource management and scheduling: + + ``` + kubectl label nodes "cdmo.clear.ml/gpu-partitioning=mig" + ``` + +## Install ClearML Chart + +Install the ClearML chart with the required configuration: + +1. Prepare the `overrides.yaml` file and input the following content. Make sure to replace `` and `` + with a valid domain that will have records pointing to the ingress controller accordingly. + The credentials specified in `` and `` can be used to log in as the + supervisor user in the web UI. + Note that the `` value must be explicitly quoted. To do so, put `\\"` around the quoted value. + For example `"\\"email@example.com\\””` + + ``` + imageCredentials: + password: "" + clearml: + cookieDomain: "" + apiserver: + image: + tag: "3.21.6-1443" + ingress: + enabled: true + hostName: "api." + service: + type: ClusterIP + extraEnvs: + - name: CLEARML__billing__enabled: + value: "true" + - name: CLEARML__HOSTS__KAFKA__BILLING__HOST + value: "[clearml-billing-kafka.clearml-billing:9092]" + - name: CLEARML__HOSTS__REDIS__BILLING__HOST + value: clearml-billing-redis-master.clearml-billing + - name: CLEARML__HOSTS__REDIS__BILLING__DB + value: "2" + - name: CLEARML__SECURE__KAFKA__BILLING__security_protocol + value: SASL_PLAINTEXT + - name: CLEARML__SECURE__KAFKA__BILLING__sasl_mechanism + value: SCRAM-SHA-512 + - name: CLEARML__SECURE__KAFKA__BILLING__sasl_plain_username + value: billing + - name: CLEARML__SECURE__KAFKA__BILLING__sasl_plain_password + value: "jdhfKmsd1" + - name: CLEARML__secure__login__sso__oauth_client__auth0__client_id + value: "" + - name: CLEARML__secure__login__sso__oauth_client__auth0__client_secret + value: "" + - name: CLEARML__services__login__sso__oauth_client__auth0__base_url + value: "" + - name: CLEARML__services__login__sso__oauth_client__auth0__authorize_url + value: "" + - name: CLEARML__services__login__sso__oauth_client__auth0__access_token_url + value: "" + - name: CLEARML__services__login__sso__oauth_client__auth0__audience + value: "" + - name: CLEARML__services__organization__features__user_management_advanced + value: "true" + - name: CLEARML__services__auth__ui_features_per_role__user__show_datasets + value: "false" + - name: CLEARML__services__auth__ui_features_per_role__user__show_orchestration + value: "false" + - name: CLEARML__services__applications__max_running_apps_per_company + value: "3" + - name: CLEARML__services__auth__default_groups__users__features + value: "[\\"applications\\"]" + - name: CLEARML__services__auth__default_groups__admins__features + value: "[\\"config_vault\\", \\"experiments\\", \\"queues\\", \\"show_projects\\", \\"resource_dashboard\\", \\"user_management\\", \\"user_management_advanced\\", \\"app_management\\", \\"sso_management\\", \\"service_users\\", \\"resource_policy\\"]" + - name: CLEARML__services__workers__resource_usages__supervisor_company + value: "d1bd92a3b039400cbafc60a7a5b1e52b" # Default company + - name: CLEARML__secure__credentials__supervisor__role + value: "system" + - name: CLEARML__secure__credentials__supervisor__allow_login + value: "true" + - name: CLEARML__secure__credentials__supervisor__user_key + value: "" + - name: CLEARML__secure__credentials__supervisor__user_secret + value: "" + - name: CLEARML__secure__credentials__supervisor__sec_groups + value: "[\\"users\\", \\"admins\\", \\"queue_admins\\"]" + - name: CLEARML__secure__credentials__supervisor__email + value: "\\"\\"" + - name: CLEARML__apiserver__company__unique_names + value: "true" + fileserver: + ingress: + enabled: true + hostName: "file." + service: + type: ClusterIP + webserver: + image: + tag: "3.21.3-1657" + ingress: + enabled: true + hostName: "app." + service: + type: ClusterIP + clearmlApplications: + enabled: true + ``` + +2. Install ClearML + + ``` + helm install -n clearml \\ + clearml \\ + allegroai-enterprise/clearml-enterprise \\ + --create-namespace \\ + -f overrides.yaml + ``` + +## Shared Redis installation + +Set up a shared Redis instance that multiple components of your ClearML deployment can use: + +1. lf not there already, add Bitnami repository: + + ``` + helm repo add bitnami + ``` + +2. Prepare the `overrides.yaml` with the following content: + + ``` + auth: + password: "sdkWoq23" + ``` + +3. Install Redis: + + ``` + helm install -n redis-shared \\ + redis \\ + bitnami/redis \\ + --create-namespace \\ + --version=17.8.3 \\ + -f overrides.yaml + ``` + +## Install Billing Chart + +The billing chart is not available as part of the ClearML private Helm repo. `clearml-billing-1.1.0.tgz` is directly +provided by the ClearML team. + +1. Prepare `values.override.yaml` - Create the file with the following content, replacing `` + with the appropriate value: + + ``` + imageCredentials: + username: dockerhubcustpocbillingaccess + password: "" + ``` + +1. Install the billing chart: + + ``` + helm install -n clearml-billing \\ + clearml-billing \\ + clearml-billing-1.0.0.tgz \\ + --create-namespace \\ + -f overrides.yaml + ``` + +## Namespace Isolation using Network Policies + +For enhanced security, isolate namespaces using the following NetworkPolicies: + +``` +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: default-deny-ingress + namespace: clearml +spec: + podSelector: {} + policyTypes: + - Ingress + ingress: + - from: + - podSelector: {} +--- +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: allow-clearml-ingress + namespace: clearml +spec: + podSelector: + matchLabels: + app.kubernetes.io/name: clearml-clearml-enterprise + policyTypes: + - Ingress + ingress: + - from: + - ipBlock: + cidr: 0.0.0.0/0 +--- +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: allow-clearml-ingress + namespace: clearml-billing +spec: + podSelector: {} + policyTypes: + - Ingress + ingress: + - from: + - podSelector: {} + - namespaceSelector: + matchLabels: + kubernetes.io/metadata.name: clearml +``` + +## Applications Installation + +To install ClearML GUI applications, follow these steps: + +1. Get the apps to install and the installation script by downloading and extracting the archive provided by ClearML + + ``` + wget -O apps.zip "" + unzip apps.zip + ``` + +2. Install the apps: + + ``` + python upload_apps.py \\ --host $APISERVER_ADDRESS \\ --user $APISERVER_USER --password $APISERVER_PASSWORD \\ --dir apps -ml + ``` + +## Tenant Configuration + +Create tenants and corresponding admin users, and set up an SSO domain whitelist for secure access. To configure tenants, +follow these steps (all requests must be authenticated by root or admin). Note that placeholders like `` +must be substituted with valid domain names or values from responses. + +1. Define the following variables: + + ``` + APISERVER_URL="https://api." + APISERVER_KEY="GGS9F4M6XB2DXJ5AFT9F" + APISERVER_SECRET="2oGujVFhPfaozhpuz2GzQfA5OyxmMsR3WVJpsCR5hrgHFs20PO" + ``` + +2. Create a *Tenant* (company): + + ``` + curl $APISERVER_URL/system.create_company \\ + -H "Content-Type: application/json" \\ + -u $APISERVER_KEY:$APISERVER_SECRET \\ + -d '{"name":""}' + ``` + + This returns the new Company ID (``). If needed, you can list all companies with the following command: + + ``` + curl -u $APISERVER_KEY:$APISERVER_SECRET $APISERVER_URL/system.get_companies + ``` + +3. Create an *Admin User*: + + ``` + curl $APISERVER_URL/auth.create_user \\ + -H "Content-Type: application/json" \\ + -u $APISERVER_KEY:$APISERVER_SECRET \\ + -d '{"name":"","company":"","email":"","role":"admin"}' + ``` + + This returns the new User ID (``). + +4. Generate *Credentials* for the new Admin User: + + ``` + curl $APISERVER_URL/auth.create_credentials \\ + -H "Content-Type: application/json" \\ + -H "X-Clearml-Impersonate-As: " \\ + -u $APISERVER_KEY:$APISERVER_SECRET + ``` + + This returns a set of key and secret credentials associated with the new Admin User. + +5. Create an SSO Domain *Whitelist*. The `` is the email domain setup for users to access through SSO. + + ``` + curl $APISERVER_URL/login.set_domains \\ + -H "Content-Type: application/json" \\ + -H "X-Clearml-Act-As: " \\ + -u $APISERVER_KEY:$APISERVER_SECRET \\ + -d '{"domains":[""]}' + ``` + +### Install ClearML Agent Chart + +To install the ClearML Agent Chart, follow these steps: + +1. Prepare the `overrides.yaml` file with the following content. Make sure to replace placeholders like + ``, ``, and `` with the appropriate values: + + ``` + imageCredentials: + password: "" + clearml: + agentk8sglueKey: "-" # TODO --> Generate credentials from API in the new tenant + agentk8sglueSecret: "-" # TODO --> Generate credentials from API in the new tenant + agentk8sglue: + extraEnvs: + - name: CLEARML_K8S_SUPPORT_SUSPENSION + value: "1" + - name: CLEARML_K8S_PORTS_MODE_ON_REQUEST_ONLY + value: "1" + - name: CLEARML_AGENT_REDIS_HOST + value: "redis-master.redis-shared" + - name: CLEARML_AGENT_REDIS_PORT + value: "6379" + - name: CLEARML_AGENT_REDIS_DB + value: "0" + - name: CLEARML_AGENT_REDIS_PASSWORD + value: "sdkWoq23" + image: + tag: 1.24-1.8.1rc99-159 + monitoredResources: + maxResources: 3 + minResourcesFieldName: "metadata|labels|required-resources" + maxResourcesFieldName: "metadata|labels|required-resources" + apiServerUrlReference: "https://api." + fileServerUrlReference: "https://file." + webServerUrlReference: "https://app." + defaultContainerImage: "python:3.9" + debugMode: true + createQueues: true + queues: + default: + templateOverrides: + labels: + required-resources: "0.5" + billing-monitored: "true" + queueSettings: + maxPods: 10 + gpu-fraction-1_00: + templateOverrides: + labels: + required-resources: "1" + billing-monitored: "true" + resources: + limits: + nvidia.com/mig-7g.40gb: 1 + clear.ml/fraction-1: "1" + queueSettings: + maxPods: 10 + gpu-fraction-0_50: + templateOverrides: + labels: + required-resources: "0.5" + billing-monitored: "true" + resources: + limits: + nvidia.com/mig-3g.20gb: 1 + clear.ml/fraction-1: "0.5" + queueSettings: + maxPods: 10 + gpu-fraction-0_25: + templateOverrides: + labels: + required-resources: "0.25" + billing-monitored: "true" + resources: + limits: + nvidia.com/mig-2g.10gb: 1 + clear.ml/fraction-1: "0.25" + queueSettings: + maxPods: 10 + sessions: + portModeEnabled: false # set to true when using TCP ports mode + agentID: "" + externalIP: 0.0.0.0 # IP of one of the workers + startingPort: 31010 # be careful to not overlap other tenants (startingPort + maxServices) + maxServices: 10 + ``` + +2. Install the ClearML Agent Chart in the specified tenant namespace: + + ``` + helm install -n \\ + clearml-agent \\ + allegroai-enterprise/clearml-enterprise-agent \\ + --create-namespace \\ + -f overrides.yaml + ``` + +3. Create a queue via the API: + + ``` + curl $APISERVER_URL/queues.create \\ + -H "Content-Type: application/json" \\ + -H "X-Clearml-Impersonate-As: 75557e2ab172405bbe153705e91d1782" \\ + -u $APISERVER_KEY:$APISERVER_SECRET \\ + -d '{"name":"default"}' + ``` + +### Tenant Namespace isolation with NetworkPolicies + +To ensure network isolation for each tenant, you need to create a `NetworkPolicy` in the tenant namespace. This way +the entire namespace/tenant will not accept any connection from other namespaces. + +Create a `NetworkPolicy` in the tenant namespace with the following configuration: + + ``` + apiVersion: networking.k8s.io/v1 + kind: NetworkPolicy + metadata: + name: default-deny-ingress + spec: + podSelector: {} + policyTypes: + - Ingress + ingress: + - from: + - podSelector: {} + ``` + +### Install Task Traffic Router Chart + +Install the [Task Traffic Router](appgw.md) in your Kubernetes cluster, allowing it to manage and route tasks: + +1. Prepare the `overrides.yaml` file with the following content: + + ``` + imageCredentials: + password: "" + clearml: + apiServerUrlReference: "" + apiserverKey: "" + apiserverSecret: "" + jwksKey: "ymLh1ok5k5xNUQfS944Xdx9xjf0wueokqKM2dMZfHuH9ayItG2" + ingress: + enabled: true + hostName: "" + ``` + +2. Install Task Traffic Router in the specified tenant namespace: + + ``` + helm install -n \\ + clearml-ttr \\ + allegroai-enterprise/clearml-task-traffic-router \\ + --create-namespace \\ + -f overrides.yaml + ``` + diff --git a/docs/deploying_clearml/enterprise_deploy/on_prem_ubuntu.md b/docs/deploying_clearml/enterprise_deploy/on_prem_ubuntu.md new file mode 100644 index 00000000..1d823655 --- /dev/null +++ b/docs/deploying_clearml/enterprise_deploy/on_prem_ubuntu.md @@ -0,0 +1,350 @@ +--- +title: On-Premises on Ubuntu +--- + +This guide provides step-by-step instruction for installing the ClearML Enterprise Server on a single Linux Ubuntu server. + +## Prerequisites +The following are required for the ClearML on-premises server: + +- At least 8 CPUs +- At least 32 GB RAM +- OS - Ubuntu 20 or higher +- 4 Disks + - Root + - For storing the system and dockers + - Recommended at least 30 GB + - mounted to `/` + - Docker + - For storing Docker data + - Recommended at least 80GB + - mounted to `/var/lib/docker` with permissions 710 + - Data + - For storing Elastic and Mongo databases + - Size depends on the usage. Recommended not to start with less than 100 GB + - Mounted to `/opt/allegro/data` + - File Server + - For storing `fileserver` files (models and debug samples) + - Size depends on usage + - Mounted to `/opt/allegro/data/fileserver` +- User for running ClearML services with administrator privileges +- Ports 8080, 8081, and 8008 available for the ClearML Server services + +In addition, make sure you have the following (provided by ClearML): + +- Docker hub credentials to access the ClearML images +- `docker-compose.yml` - The main compose file containing the services definitions +- `docker-compose.override.yml` - The override file containing additions that are server specific, such as SSO integration +- `constants.env` - The `env` file contains values of items in the `docker-compose` that are unique for +a specific environment, such as keys and secrets for system users, credentials, and image versions. The constant file +should be reviewed and modified prior to the server installation + + +## Installing ClearML Server +### Preliminary Steps + +1. Install Docker CE + + ``` + https://docs.docker.com/install/linux/docker-ce/ubuntu/ + ``` +1. Verify the Docker CE installation: + + ``` + docker run hello-world + ``` + + Expected output: + + ``` + Hello from Docker! + This message shows that your installation appears to be working correctly. + To generate this message, Docker took the following steps: + + 1. The Docker client contacted the Docker daemon. + 2. The Docker daemon pulled the "hello-world" image from the Docker Hub. (amd64) + 3. The Docker daemon created a new container from that image which runs the executable that produces the output you are currently reading. + 4. The Docker daemon streamed that output to the Docker client, which sent it to your terminal. + ``` +1. Install `docker-compose`: + + ``` + sudo curl -L "https://github.com/docker/compose/releases/download/1.24.1/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose + sudo chmod +x /usr/local/bin/docker-compose + ``` + + :::note + You might need to downgrade urlib3 by running `sudo pip3 install urllib3==1.26.2` + ::: + +1. Increase `vm.max_map_count` for Elasticsearch in Docker: + + ``` + echo "vm.max_map_count=262144" > /tmp/99-allegro.conf + echo "vm.overcommit_memory=1" >> /tmp/99-allegro.conf + echo "fs.inotify.max_user_instances=256" >> /tmp/99-allegro.conf + sudo mv /tmp/99-allegro.conf /etc/sysctl.d/99-allegro.conf + sudo sysctl -w vm.max_map_count=262144 + sudo service docker restart + ``` + +1. Disable THP. Create the `/etc/systemd/system/disable-thp.service` service file with the following content: + + :::important + The `ExecStart` string (Under `[Service]) should be a single line. + ::: + + ``` + [Unit] + Description=Disable Transparent Huge Pages (THP) + + [Service] + Type=simple + ExecStart=/bin/sh -c "echo 'never' > /sys/kernel/mm/transparent_hugepage/enabled && echo 'never' > /sys/kernel/mm/transparent_hugepage/defrag" + + [Install] + WantedBy=multi-user.target + ``` + +1. Enable the online service: + + ``` + sudo systemctl daemon-reload + sudo systemctl enable disable-thp + ``` + +1. Restart the machine + +### Installing the Server +1. Remove any previous installation of ClearML Server + + ``` + sudo rm -R /opt/clearml/ + sudo rm -R /opt/allegro/ + ``` + +1. Create local directories for the databases and storage: + + ``` + sudo mkdir -pv /opt/allegro/data/elastic7plus + sudo chown 1000:1000 /opt/allegro/data/elastic7plus + sudo mkdir -pv /opt/allegro/data/mongo_4/configdb + sudo mkdir -pv /opt/allegro/data/mongo_4/db + sudo mkdir -pv /opt/allegro/data/redis + sudo mkdir -pv /opt/allegro/logs/apiserver + sudo mkdir -pv /opt/allegro/documentation + sudo mkdir -pv /opt/allegro/data/fileserver + sudo mkdir -pv /opt/allegro/logs/fileserver + sudo mkdir -pv /opt/allegro/logs/fileserver-proxy + sudo mkdir -pv /opt/allegro/data/fluentd/buffer + sudo mkdir -pv /opt/allegro/config/webserver_external_files + sudo mkdir -pv /opt/allegro/config/onprem_poc + ``` + +1. Copy the following ClearML configuration files to `/opt/allegro` + * `constants.env` + * `docker-compose.override.yml` + * `docker-compose.yml` + +1. Create an initial ClearML configuration file `/opt/allegro/config/onprem_poc/apiserver.conf` with a fixed user: + + ``` + auth { + fixed_users { + enabled: true, + users: [ + {username: "support", password: "", admin: true, name: "allegro.ai Support User"}, + ] + } + } + ``` + +1. Log into the Docker Hub repository using the username and password provided by ClearML: + + ``` + sudo docker login -u=$DOCKERHUB_USER -p=$DOCKERHUB_PASSWORD + ``` + +1. Start the `docker-compose` by changing directories to the directory containing the docker-compose files and running the following command: +sudo docker-compose --env-file constants.env up -d + +1. Verify web access by browsing to your URL (IP address) and port 8080. + + ``` + http://:8080 + ``` + +## Security +To ensure the server's security, it's crucial to open only the necessary ports. + +### Working with HTTP +Directly accessing the server using `HTTP` is not recommended. However, if you choose to do so, only the following ports +should be open to any location where a ClearML client (`clearml-agent`, SDK, or web browser) may operate: +* Port 8080 for accessing the WebApp +* Port 8008 for accessing the API server +* Port 8081 for accessing the file server + +### Working with TLS / HTTPS +TLS termination through an external mechanism, such as a load balancer, is supported and recommended. For such a setup, +the following subdomains should be forwarded to the corresponding ports on the server: +* `https://api.` should be forwarded to port 8008 +* `https://app.` should be forwarded to port 8080 +* `https://files.` should be forwarded to port 8081 + +**Critical: Ensure no other ports are open to maintain the highest level of security.** + +Additionally, ensure that the following URLs are correctly configured in the server's environment file: + +``` +WEBSERVER_URL_FOR_EXTERNAL_WORKERS=https://app. +APISERVER_URL_FOR_EXTERNAL_WORKERS=https://api. +FILESERVER_URL_FOR_EXTERNAL_WORKERS=https://files. +``` + +:::note +If you prefer to use URLs that do not begin with `app`, `api`, or `files`, you must also add the following configuration +for the web server in your `docker-compose.override.yml` file: + +``` +webserver: + environment: + - WEBSERVER__displayedServerUrls={"apiServer":"$APISERVER_URL_FOR_EXTERNAL_WORKERS","filesServer":"$FILESERVER_URL_FOR_EXTERNAL_WORKERS"} +``` +::: + + +## Backups +The main components that contain data are the databases: +* MongoDB +* ElasticSearch +* File server + +It is recommended to back them periodically. + +### Fileserver +It is recommended to back up the entire file server volume. +* Recommended to perform at least a daily backup. +* Recommended backup retention of 2 days at the least. + +### ElasticSearch +Please refer to [ElasticSearch documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/snapshot-restore.html) for creating snapshots. + + +#### MongoDB +Please refer to [MongoDB’s documentation](https://www.mongodb.com/docs/manual/core/backups/) for backing up / restoring. + +## Monitoring + +The following monitoring is recommended: + +### Basic Hardware Monitoring + +#### CPU + +CPU usage varies depending on system usage. We recommend to monitor CPU usage and to alert when the usage is higher +than normal. Recommended starting alerts would be 5-minute CPU load +level of 5 and 10, and adjusting according to performance. + +#### RAM + +Available memory usage also varies depending on system usage. Due to spikes in usage when performing certain tasks, 6-8 GB +of available RAM is recommended as the standard baseline. Some use cases may require more. Thus, we recommend to have 8 GB +of available memory on top of the regular system usage. Alert levels should alert if the available memory is below normal. + +##### Disk Usage + +There are several disks used by the system. We recommend monitoring all of them. Standard alert levels are 20%, 10% and +5% of free disk space. + +### Service Availability + +The following services should be monitored periodically for availability and for response time: + +* `apiserver` - [http://localhost:10000/api/debug.ping](http://localhost:10000/api/debug.ping) should return HTTP 200 +* `webserver` - [http://localhost:10000](http://localhost:10000/) should return HTTP 200 +* `fileserver` - [http://localhost:10000/files/](http://localhost:10000/files/) should return HTTP 405 ("method not allowed") + + +### API Server Docker Memory Usage + +A usage spike can happen during normal operation. But very high spikes (above 6GB) are not expected. We recommend using +`docker stats` to get this information. + +For example, the following comment retrieves the API server's information from the Docker server: + +``` +sudo curl -s --unix-socket /var/run/docker.sock http://localhost/containers/allegro-apiserver/stats?stream=false +``` + +We recommend monitoring the API server memory in addition to the system's available RAM. Alerts should be triggered +when memory usage of the API server exceeds the normal behavior. A starting value can be 6 GB. + +### Backup Failures + +It is also highly recommended to monitor the backups and to alert if a backup has failed. + +## Troubleshooting + +In normal operation mode, all services should be up, and a call to `sudo docker ps` should yield the list of services. + +If a service fails, it is usually due to one of the following: + +* Lack of required resources such as storage or memory +* Incorrect configuration +* Software anomaly + +When a service fails, it should automatically restart. However, if the cause of the failure is persistent, the service +will fail again. If a service fails, do the following: + +### Check the Log + +Run: + +``` +sudo docker logs -n 1000 +``` + +See if there is an error message in the log that can explain the failure. + +### Check the Server's Environment + +The system should be constantly monitored, however it is important to check the following: + +* **Storage space**: run `sudo du -hs /` +* **RAM**: + * Run `vmstat -s` to check available RAM + * Run: `top` to check the processes. + + :::note + Some operations, such as complex queries, may cause a spike in memory usage. Therefore, it is recommended to have at least 8GB of free RAM available. + ::: + +* **Network**: Make sure that there is external access to the services +* **CPU**: The best indicator of the need of additional compute resources is high CPU usage of the `apiserver` and `apiserver-es` services. + * Examine the usage of each service using `sudo docker stats` + * If there is a need to add additional CPUs after updating the server, increase the number of workers on the `apiserver` + service by changing the value of `APISERVER_WORKERS_NUMBER` in the `constants.env` file (up to one additional worker per additional core). + +### API Server + +In case of failures in the `allegro-apiserver` container, or in cases in which the web application gets unexpected errors, +and the browser's developer tools (F12) network tab shows error codes being returned by the server, also check the log +of the `apiserver` which is written to `/opt/allegro/logs/apiserver/apiserver.log`. +Additionally, you can check the server availability using: + +``` +curl http://localhost:8008/api/debug.ping +``` + +This should return HTTP 200. + +### Web Server + +Check the webserver availability by running the following: + +``` +curl http://:8080/configuration.json | +``` + + + diff --git a/docs/deploying_clearml/enterprise_deploy/sso_active_directory.md b/docs/deploying_clearml/enterprise_deploy/sso_active_directory.md new file mode 100644 index 00000000..e3b4c9c2 --- /dev/null +++ b/docs/deploying_clearml/enterprise_deploy/sso_active_directory.md @@ -0,0 +1,46 @@ +--- +title: Group Integration in Active Directory SAML +--- + +Follow this guide to integrate groups from Active Directory with ClearML. + +## Actions in Active Directory +Make sure that the group claims are passed to the ClearML app. + +## Actions in ClearML +### Creating the Groups +* Groups integration is disabled by default +* Groups are not auto-created and need to be created manually in ClearML using the [Users & Groups](../../webapp/settings/webapp_settings_users.md#user-groups) + page in the ClearML web UI, or using the ClearML API. +* If a group does not exist in ClearML, the user will be created, but will not be assigned to any group. +* Group claim used by ClearML is `groups` by default +* Group name is taken from the first CN of the full DN path. For example, for the following DN: `CN=test, OU=unit, DU=mycomp`, + the group name in ClearML will be `test` +* The group name matching in ClearML is case-sensitive by default + +### Configuring ClearML Group Integration + +To enable ClearML group integration set the following environment variable: +``` +CLEARML__services__login__sso__saml_client__microsoft_ad__groups__enabled=true +``` + +To configure groups that should automatically become admins in ClearML set the following environment variable: +``` +CLEARML__services__login__sso__saml_client__microsoft_ad__groups__admins=[, , ...] +``` + +To change the the default Group Claim set the following environment variable: +``` +CLEARML__services__login__sso__saml_client__microsoft_ad__groups__claim=... +``` + +To make group matching case insensitive set the following environment variable: +``` +CLEARML__services__login__sso__saml_client__microsoft_ad__groups__case_sensitive=false +``` + +In order to prohibit the users, who do not belong to any of the AD groups created in ClearML from signing up set the following environment variable: +``` +CLEARML__services__login__sso__saml_client__microsoft_ad__groups__prohibit_user_signup_if_not_in_group=true +``` diff --git a/docs/deploying_clearml/enterprise_deploy/sso_keycloak.md b/docs/deploying_clearml/enterprise_deploy/sso_keycloak.md new file mode 100644 index 00000000..4462e1af --- /dev/null +++ b/docs/deploying_clearml/enterprise_deploy/sso_keycloak.md @@ -0,0 +1,225 @@ +--- +title: KeyCloak IdP Configuration +--- + +This procedure is a step-by-step guide of the configuration process for the ClearML Enterprise Server with the KeyCloak IdP. + +In the following sections, the term "publicly accessible" does not have to mean open to the entire world or publicly +accessible from the Internet, it simply means accessible to your users from their workstations (typically when using a +browser). + +In the following sections, you will be instructed to set up different environment variables for the ClearML Server. If +using a `docker-compose` deployment, these should be defined in your `docker-compose.override.yaml` file, under the +`apiserver` service’ environment variables, as follows: + +``` +services: + ... + apiserver: + ... + environment: + = + ... +``` + +If using a Kubernetes deployment, these should be set in the ClearML Enterprise server chart values override file, under +the `.Values.apiserver.extraEnvs` array section, as follows: + +``` +... +apiserver: + extraEnvs: + - name: + value: "" + - ... + +``` + +All examples below are provided in the Kubernetes format. + +## Prerequisites + +* An existing ClearML Enterprise server / control-plane installation (using `docker-compose` or Kubernetes), which is +set up with a publicly accessible endpoint fo the ClearML WebApp +* A KeyCloak IdP installed with a publicly accessible endpoint, with you as admin having access to the KeyCloak administration UI. + +## Configuration + +### Basic Setup + +#### KeyCloak Configuration + +In the KeyCloak administration UI: + +1. Register a new ClearML app with the callback url: `/callback_keycloak` +2. Make sure that the claims representing `user_id`, `email` and `user name` are returned +3. Make a note of the `client_id`, `client_secret`, `Auth url` and `Access token url` for configuration in the ClearML Server. + +#### ClearML Server Configuration + +In the ClearML Server deployment, set the environment variables specified below. + +##### KeyCloak Base URL + +Use the start of the token or authorization endpoint, usually the part just before `protocol/openid-connect/...` + +``` +- name: CLEARML__services__login__sso__oauth_client__keycloak__base_url + value: "" +``` + +##### KeyCloak Authorization Endpoint + +``` +- name: CLEARML__services__login__sso__oauth_client__keycloak__authorize_url + value: "" +``` + +##### KeyCloak Access Token Endpoint + +``` +- name: CLEARML__services__login__sso__oauth_client__keycloak__access_token_url + value: "" +``` + +##### KeyCloak Client ID + +The client ID is obtained when creating the KeyCloak ClearML App. + +``` +- name: CLEARML__secure__login__sso__oauth_client__keycloak__client_id + value: "" +``` + +##### KeyCloak Client Secret + +The client secret is obtained when creating the KeyCloak ClearML App. + +``` +- name: CLEARML__secure__login__sso__oauth_client__keycloak__client_secret + value: "" +``` + +##### Automatic User Creation Support + +Usually, when using IdPs in ClearML, the ClearML Server will map users signing in to the server into tenants (companies) +using predefined whitelists and specific invitations (users explicitly added by admins). + +To support automatic user creation in a trusted environment, where all users signing in using this IdP are automatically +added to the same tenant (company), the following environment variable should be set: + +``` +- name: CLEARML__secure__login__sso__oauth_client__keycloak__default_company + value: "" +``` + +### User Groups Integration + +This option allows automatically synchronizing group membership from KeyCloak into existing ClearML User Groups when +logging in users (this is done on every user login, not just on user sign-in). + +Make sure a ClearML User Group exists for each potential KeyCloak group that should be synchronized to +prevent an uncontrolled proliferation of user groups. The ClearML server will not automatically create user groups in +this mode. + +#### Keycloak Configuration + +* When configuring the Open ID client for ClearML: + * Navigate to the `Client Scopes` tab + * Click on the first row `-dedicated` + * Click `Add Mapper → By configuration` and then select the `Group membership` option + * In the opened dialog enter the name `groups` and Token claim name `groups` + * Uncheck the `Full group path` option and save the mapper +* To validate yourself: + * Return to the `Client Details → Client` scope tab + * Go to the `Evaluate` sub-tab and select a user that has any group memberships + * On the right side navigate to `Generated ID` token and then to `Generated User Info` + * Inspect that in both cases you can see the groups claim in the displayed user data + +#### ClearML Server Configuration + +Set the following environment variables for the `apiserver` service: + +``` +- name: CLEARML__services__login__sso__oauth_client__keycloak__groups__enabled + value: "true" +- name: CLEARML__services__login__sso__oauth_client__keycloak__groups__claim + value: "groups" +- name: CLEARML__services__login__sso__oauth_client__keycloak__claims__name + value: "preferred_username" +``` + +##### Setting Administrators by Group Association + +In case you would like the members of the particular KeyCloak group to be set as administrators in ClearML, set the +following environment variable. Note that in this case, the KeyCloak group(s) do not have to be present in the ClearML Server. + +``` +- name: CLEARML__services__login__sso__oauth_client__keycloak__groups__admins + value: "[\"\"]" +``` + +##### Restrict User Signup + +To prevent sign in for users who do not have a matching group(s) found using the above-mentioned configuration, set the +following environment variable. + +``` +- name: CLEARML__services__login__sso__oauth_client__keycloak__groups__prohibit_user_signup_if_not_in_group + value: "true" +``` + +### Administrator User Role Association + +For integration of an admin user role from KeyCloak into the ClearML service, do the following. + +#### KeyCloak Configuration + +1. For each administrator user, assign the admin role to that user in KeyCloak +2. In the `Client Scopes` tab, make sure that the `roles` claim is returned in the access token or userinfo token +(this depends on the configuration in step 1) + +#### ClearML Server Configuration + +By default, the ClearML Server will use the admin claim to identify administrator users. To use a different group name +for designating the admin role, set the following environment variable: + +``` +- name: CLEARML__services__login__sso__oauth_client__keycloak__admin_role + value: "" +``` + +#### Disabling Admin Role Association + +To disable the automatic administrator claim and manage administrators solely from inside the ClearML WebApp, make sure +that user roles are not returned by KeyCloak in the auth token or the `userinfo` endpoint, and/or set the following +ClearML Server environment variable: + +``` +- name: CLEARML__services__login__sso__oauth_client__keycloak__admin_role + value: "" +``` + +### Additional ClearML Server Configurations + +#### KeyCloak Session Logout + +To automatically log out the user from the KeyCloak provider when the user logs out of the ClearMK service, set the +following environment variable. This will make sure that the KeyCloak session is not maintained in the browser so that +when the user tries to log into the ClearML service, the KeyCloak login page will be used again and not skipped. + +``` +- name: CLEARML__services__login__sso__oauth_client__keycloak__idp_logout + value: "true" +``` + +#### User Info Source + +By default, the user info is taken from the KeyCloak access token. If you prefer to use the user info available through +the OAuth’s protocol `userinfo` endpoint, set the following environment variable: + +``` +- name: CLEARML__services__login__sso__oauth_client__keycloak__get_info_from_access_token + value: "false" +``` + diff --git a/docs/deploying_clearml/enterprise_deploy/sso_multi_tenant_login.md b/docs/deploying_clearml/enterprise_deploy/sso_multi_tenant_login.md new file mode 100644 index 00000000..b1508c4f --- /dev/null +++ b/docs/deploying_clearml/enterprise_deploy/sso_multi_tenant_login.md @@ -0,0 +1,98 @@ +--- +title: Multi-Tenant Login Mode +--- + +In a multi-tenant setup, each external tenant can be represented by an SSO client defined in the customer Identity provider +(Keycloak). Each ClearML tenant can be associated with a particular external tenant. Currently, only one +ClearML tenant can be associated with a particular external tenant + +## Setup IdP/SSO Client in Identity Provider + +1. Add the following URL to "Valid redirect URIs": `/callback_` +2. Add the following URLs to "Valid post logout redirect URIs": + + ``` + /login + /login/ + ``` +3. Make sure the external tenant ID and groups are returned as claims for a each user + +## Configure ClearML to use Multi-Tenant Mode + +Set the following environment variables in the ClearML enterprise helm chart under the `apiserver` section: +* To turn on the multi-tenant login mode: + + ``` + - name: CLEARML__services__login__sso__tenant_login + value: "true" + ``` +* To hide any global IdP/SSO configuration that's not associated with a specific ClearML tenant: + + ``` + - name: CLEARML__services__login__sso__allow_settings_providers + value: "false" + ``` + +Enable `onlyPasswordLogin` by setting the following environment variable in the helm chart under the `webserver` section: + +``` +- name: WEBSERVER__onlyPasswordLogin` + value: “true”` +``` + +## Setup IdP for a ClearML Tenant + +To set an IdP client for a ClearML tenant, you’ll need to set the ClearML tenant settings and define an identity provider: + +1. Call the following API to set the ClearML tenant settings: + + ``` + curl $APISERVER_URL/system.update_company_sso_config -H "Content-Type: application/json" -u $APISERVER_KEY:$APISERVER_SECRET -d'{ + "company": "", + "sso": { + "tenant": "", + "group_mapping": { + "IDP group name1": "Clearml group name1", + "IDP group name2": "Clearml group name2" + }, + "admin_groups": ["IDP admin group name1", "IDP admin group name2"] + }}' + ``` +2. Call the following API to define the ClearML tenant identity provider: + + ``` + curl $APISERVER_URL/sso.save_provider_configuration -H "Content-Type: application/json" -u $APISERVER_KEY:$APISERVER_SECRET -d'{ + "provider": "keycloak", + "company": "", + "configuration": { + "id": "", + "display_name": "", + "client_id": "", + "client_secret": "", + "authorization_endpoint": "", + "token_endpoint": "", + "revocation_endpoint": "", + "end_session_endpoint": "", + "logout_from_provider": true, + "claim_tenant": "tenant_key", + "claim_name": "name", + "group_enabled": true, + "claim_groups": "ad_groups_trusted", + "group_prohibit_user_login_if_not_in_group": true + }}' + ``` + The above configuration assumes the following: + * On logout from ClearML, the user is also logged out from the Identity Provider + * External tenant ID for the user is returned under the `tenant_key` claim + * User display name is returned under the `name` claim + * User groups list is returned under the `ad_groups_trusted` claim + * Group integration is turned on and a user will be allowed to log in if any of the groups s/he belongs to in the + IdP exists under the corresponding ClearML tenant (this is after group name translation is done according to the ClearML tenant settings) + +## Webapp Login + +When running in multi-tenant login mode, a user belonging to some external tenant should use the following link to log in: + +``` +/login/ +``` \ No newline at end of file diff --git a/docs/deploying_clearml/enterprise_deploy/sso_saml_k8s.md b/docs/deploying_clearml/enterprise_deploy/sso_saml_k8s.md new file mode 100644 index 00000000..186bf345 --- /dev/null +++ b/docs/deploying_clearml/enterprise_deploy/sso_saml_k8s.md @@ -0,0 +1,60 @@ +--- +title: Microsoft AD SAML +--- + +This document describes the configuration required for connecting a ClearML Kubernetes server to allow authenticating +users with Microsoft AD using SAML. + +Configuration requires two steps: +* Configuration of the application in the active directory +* Configuration in the ClearML server side + +## Active Directory Configuration +1. Register the ClearML app with the callback url: `/callback_microsoft_ad` +1. Make sure that SSO binding is set to HTTP-Redirect +1. Make sure that the following user claims are returned to the ClearML app: + + ``` + emailaddress - user.mail + displayname - user.displayname + Unique user identifier - user.principalname + ``` + +1. Generate the IdP metadata file and save the file and entity ID + +## ClearML Server Side Configuration +The following should be configured in the override file: + +``` +apiserver: + additionalConfigs: + metadata.xml: | + + + + test + + + extraEnvs: + - name: "ALLEGRO__secure__login__sso__saml_client__microsoft_ad__entity_id" + value: "" + - name: "ALLEGRO__secure__login__sso__saml_client__microsoft_ad__idp_metadata_file" + value: "/opt/clearml/config/default/metadata.xml" + - name: "ALLEGRO__secure__login__sso__saml_client__microsoft_ad__default_company" + value: "" + - name: "CLEARML__services__login__sso__saml_client__microsoft_ad__claims__object_id" + value: "http://schemas.microsoft.com/identity/claims/objectidentifier" + - name: "CLEARML__services__login__sso__saml_client__microsoft_ad__claims__name" + value: "http://schemas.microsoft.com/identity/claims/displayname" + - name: "CLEARML__services__login__sso__saml_client__microsoft_ad__claims__email" + value: "emailAddress" + - name: "CLEARML__services__login__sso__saml_client__microsoft_ad__claims__given_name" + value: "givenName" + - name: "CLEARML__services__login__sso__saml_client__microsoft_ad__claims__surname" + value: "surname" + - name: "CLEARML__services__login__sso__saml_client__microsoft_ad__claims__email" + value: "emailAddress" + - name: "CLEARML__services__login__sso__saml_client__microsoft_ad__claims__email" + value: "emailAddress" +``` + diff --git a/docs/deploying_clearml/enterprise_deploy/vpc_aws.md b/docs/deploying_clearml/enterprise_deploy/vpc_aws.md new file mode 100644 index 00000000..4c1690cb --- /dev/null +++ b/docs/deploying_clearml/enterprise_deploy/vpc_aws.md @@ -0,0 +1,292 @@ +--- +title: AWS VPC +--- + +This guide provides step-by-step instructions for installing the ClearML Enterprise Server on AWS using a Virtual Private Cloud (VPC). + +It covers the following: +* Set up security groups and IAM role +* Create EC2 instance with required disks +* Install dependencies and mount disks +* Deploy ClearML version using docker-compose +* Set up load balancer and DNS +* Set up server backup + +## Prerequisites + +An AWS account with at least 2 availability zones is required. It is recommended to install on a region with at least +3 availability zones. Having fewer than 3 availability zones would prevent the use of high-availability setups, if +needed in the future. + +## Instance Setup + +:::note +It is recommended to use a VPC with IPv6 enabled for future usage expansion. +::: + +### Create Security Groups for the Server and Load Balancer + +1. Create a security group for the load balancer. + + It is recommended to configure the security group to allow access, at first, only for a trusted IP address or a set + of trusted IP addresses, that will be used for the initial setup of the server. + + * Ingress TCP ports: 80, 443 from trusted IP addresses. + * Egress: All addresses and ports. + +1. Create a security group for the main server (`clearml-main`): + + * Ingress: + * TCP port 10000, from the load balancer's security group + * TCP port 22 from trusted IP addresses. + * Egress: All addresses and ports + +:::important +A company’s security policy may require filtering Egress traffic. However, at the initial stage, one should note that +some external repositories will be used to install software. +::: + +### Create an IAM Role for the Server + +To perform backups to S3, the instance will need a role that allows EC2 access (RW) to a backup bucket. +An example policy document with the above parameters is provided at `self_installed_policy.json`. + +### Create Instance + +Instance requirements: + +1. The instance must be created in a VPC with at least two public subnets to allow for AWS load balancer setup. +2. `x86_64` based instance +3. [Amazon Linux 2 OS](https://aws.amazon.com/amazon-linux-2/?amazon-linux-whats-new.sort-by=item.additionalFields.postDateTime&amazon-linux-whats-new.sort-order=desc) +4. Disks: + 1. Root disk: 50GB `gp3` disk, or one with higher volume/performance. + 2. Data disk: + 1. Used for databases (ElasticSearch and Mongo DB) in which meta-data and events are saved + 2. Device: `/dev/sdf` + 3. Recommended initial size: 100GB + 4. Type: `gp3` or a higher random access performance one. + 3. Fileserver disk: + 1. Used for storing files such as debug images and models + 2. Device: `/dev/sdg` + 3. Recommended initial size: Should be estimated by users of the system. + 4. Type: Depending on usage, but `gp3` or `st1` are usually the best options: + 1. For a large amount of data, used by a small number of users/experiments, use `st1` (minimum `st1` disk size: 500GB). + 2. For all other scenarios, use SSD disks (e.g. `gp3`). + 3. The disk type can be changed after creation. + 4. Very large number of users and/or experiments may require higher than the default `gp3` disk performance. + 4. Docker data disk: + 1. Used for Docker data. + 2. Device: `/dev/sdh` + 3. Recommended initial size: 30GB + 4. Type: `gp3` +5. Use the `clearml-main` security group and the IAM role created in the previous step. + +## Configuration and Software Deployment + +### Install Dependencies + +1. Copy the following files to `/home/ec2-user` directory on the server: + 1. `envoy.yaml` + 2. `self_installed_VPC_EC2_amazon_linux_2_install.sh` +2. Run `self_installed_VPC_EC2_amazon_linux_2_install.sh` from the `/home/ec2-user` directory. +3. Verify the disks were mounted successfully (using: `df -h`) to: + 1. `/opt/allegro/data` + 2. `/opt/allegro/data/fileserver` + 3. `/var/lib/docker` +4. Reboot server. + +### Version Deployment + +1. Copy the following files to `/home/ec2-user` directory on the server: + * `constants.env` + * `docker-compose.yml` + * `docker-compose.override.yml` +2. Log in to Dockerhub: + + ``` + source constants.env + sudo docker login -u=$DOCKERHUB_USER -p=$DOCKERHUB_PASSWORD + ``` +3. Start the dockers: + + ``` + sudo docker-compose --env-file constants.env up -d + ``` + +## Load Balancer + +1. Create a TLS certificate: + 1. Choose a domain name to be used with the server. The main URL that will be used by the system’s users will be app.\ + 2. Create a certificate, with the following DNS names: + 1. \ + 2. \*.\ + +2. Create the `envoy` target group for the server: + 1. Port: 10000 + 2. Protocol: HTTP + 3. Target type: instance + 4. Attach the server instance as the single target. + 5. Health check: + 1. Match HTTP response code 200 + 2. Path: `/api/debug.ping` + 3. timeout: 10 seconds + 4. Healthy threshold: 1 + 5. Unhealthy threshold: 2 + +3. Create an Application Load Balancer, with the following parameters: + 1. Security group: As defined [above](#create-security-groups-for-the-server-and-load-balancer) for the load balancer + 2. Subnets: Two subnets on the VPC. It is recommended to have at least one of two on the same subnet as the instance. + 3. Idle timeout: 300 seconds + 4. Enable deletion protection: True + 5. IP address type: If possible, dualstack. Otherwise, IPv4. + 6. Listeners: + 1. HTTP: + 1. Port: 80 + 2. protocol: HTTP + 3. redirect (HTTP 301\) to the same address, with HTTPS + 2. HTTPS: + 1. port 443 + 2. Protocol: HTTPS + 3. Certificate: As defined above. + 4. SSL policy: + 1. Based on your company's security policy + + 2. Currently recommended: `ELBSecurityPolicy-TLS13-1-2-Res-2021-06` + + :::note + After setting up the listener, we recommend changing the rules created automatically: Set a default HTTP 404 + response, and forwarding to the target group only if the HTTP header matches `` or `*.` + ::: + +4. Define DNS rules + 1. Use your DNS provider of choice to forward traffic to the load balancer. + 2. If using Route53, the use of A record aliases is recommended. + 3. The following domains should point to the load balancer: + 1. `` + 2. `*.` + +You can now change the load balancers security group to allow Internet access + +## Backups + +### File Server + +Identify the file server's EBS volume ID on the AWS console. + +On the AWS backup service: + +1. Create a backup vault. +2. Create a backup plan for the EBS volume into the vault. + 1. Recommended to perform at least a daily backup. + 2. Recommended backups expiration of 2 days at the least. + +### Elastic + +#### Create the Backup Repo + +1. Copy `create_elastic_backup_repo.sh` file to `/home/ec2-user` directory on the server +2. Run: + + ``` + create_elastic_backup_repo.sh + ``` + +#### Backing Up + +Backup is done by running the `elastic_backup.py` Python script periodically. + +1. Copy `elastic_backup.py` to the `/home/ec2-user` directory on the server +2. Install the required packages: + + ``` + pip3 install “elasticsearch\>=6.0.0,\<=7.17.7” + pip3 install boto3 + ``` + +3. For daily backups, run: + + ``` + /home/ec2-user/elastic_backup.py --host-address localhost --snapshot-name-prefix clearml --backup-repo daily --delete-backups-older-than-days 7 + ``` + +4. For hourly backups run: + + ``` + /home/ec2-user/elastic_backup.py --host-address localhost --snapshot-name-prefix clearml --backup-repo hourly --delete-backups-older-than-days 1 + +5. Recommended to add these to the crontab + +### MongoDB + +Backup is done by running the `mongo_backup.sh` script periodically. + +1. Copy `mongo_backup.sh` to `/home/ec2-user` directory on the server +2. Run: + + ``` + mongo_backup.sh / (ex: mongo_backup.sh mybucket/path/in/bucket) + ``` + +3. Recommended to add this to the crontab + +:::note +The MongoDB script does not deal with deletion of old backups. It's recommended to create an S3 lifecycle rule for +deletion beyond the company's required retention period. +::: + +## Monitoring +### Hardware Monitoring + +#### CPU + +CPU usage varies depending on system usage. We recommend to monitor CPU usage and to alert when the usage is higher +than normal. Recommended starting alerts would be 5-minute CPU load +level of 5 and 10, and adjusting according to performance. + +#### RAM + +Available memory usage also varies depending on system usage. Due to spikes in usage when performing certain tasks, 6-8 GB +of available RAM is recommended as the standard baseline. Some use cases may require more. Thus, we recommend to have 8 GB +of available memory on top of the regular system usage. Alert levels should alert if the available memory is below normal. + +#### Disk Usage + +There are several disks used by the system. We recommend monitoring all of them. Standard alert levels are 20%, 10% and +5% of free disk space. + +### Service Availability + +The following services should be monitored periodically for availability and for response time: + +* `apiserver` - [http://localhost:10000/api/debug.ping](http://localhost:10000/api/debug.ping) should return HTTP 200 +* `webserver` - [http://localhost:10000](http://localhost:10000/) should return HTTP 200 +* `fileserver` - [http://localhost:10000/files/](http://localhost:10000/files/) should return HTTP 405 ("method not allowed") + +### API Server Docker Memory Usage + +A usage spike can happen during normal operation. But very high spikes (above 6GB) are not expected. We recommend using +`docker stats` to get this information. + +For example, the following comment retrieves the API server's information from the Docker server: + +``` +sudo curl -s --unix-socket /var/run/docker.sock http://localhost/containers/allegro-apiserver/stats?stream=false +``` + +We recommend monitoring the API server memory in addition to the system's available RAM. Alerts should be triggered +when memory usage of the API server exceeds the normal behavior. A starting value can be 6 GB. + +### Backup Failures + +All scripts provided use exit code 0 when successfully completing the backups. Other exit codes indicate problems. The +log would usually indicate the reason for the failure. + +## Maintenance + +### Removing app containers + +To remove old application containers, add the following to the cron: + +``` +0 0 * * * root docker container prune --force --filter "until=96h" +``` diff --git a/docs/deploying_models.md b/docs/deploying_models.md new file mode 100644 index 00000000..5afb240b --- /dev/null +++ b/docs/deploying_models.md @@ -0,0 +1,34 @@ +--- +title: Model Deployment +--- + +Model deployment makes trained models accessible for real-world applications. ClearML provides a comprehensive suite of +tools for seamless model deployment, which supports +features including: +* Version control +* Automatic updates +* Performance monitoring + +ClearML's offerings optimize the deployment process +while ensuring scalability and security. The solutions include: +* **Model Deployment UI Applications** (available under the Enterprise Plan) - The UI applications simplify deploying models + as network services through secure endpoints, providing an interface for managing deployments--no code required. + See more information about the following applications: + * [vLLM Deployment](webapp/applications/apps_model_deployment.md) + * [Embedding Model Deployment](webapp/applications/apps_embed_model_deployment.md) + * [Llama.cpp Model Deployment](webapp/applications/apps_llama_deployment.md) +* **Command-line Interface** - `clearml-serving` is a CLI for model deployment and orchestration. + It supports integration with Kubernetes clusters or custom container-based + solutions, offering flexibility for diverse infrastructure setups. + For more information, see [ClearML Serving](clearml_serving/clearml_serving.md). + +## Model Endpoint Monitoring +All deployed models are displayed in a unified **Model Endpoints** list in the UI. This +allows users to monitor endpoint activity and manage deployments from a single location. + +For more information, see [Model Endpoints](webapp/webapp_model_endpoints.md). + +![Model Endpoints](img/webapp_model_endpoints_monitor.png#light-mode-only) +![Model Endpoints](img/webapp_model_endpoints_monitor_dark.png#dark-mode-only) + + diff --git a/docs/fundamentals/agents_and_queues.md b/docs/fundamentals/agents_and_queues.md index 50509f53..aa7a1337 100644 --- a/docs/fundamentals/agents_and_queues.md +++ b/docs/fundamentals/agents_and_queues.md @@ -17,7 +17,7 @@ from installing required packages to setting environment variables, all leading to executing the code (supporting both virtual environment or flexible docker container configurations). The agent also supports overriding parameter values on-the-fly without code modification, thus enabling no-code experimentation (this is also the foundation on which -ClearML [Hyperparameter Optimization](hpo.md) is implemented). +ClearML [Hyperparameter Optimization](../getting_started/hpo.md) is implemented). An agent can be associated with specific GPUs, enabling workload distribution. For example, on a machine with 8 GPUs you can allocate several GPUs to an agent and use the rest for a different workload, even through another agent (see [Dynamic GPU Allocation](../clearml_agent/clearml_agent_dynamic_gpus.md)). diff --git a/docs/fundamentals/hyperparameters.md b/docs/fundamentals/hyperparameters.md index f91da4a4..52e02390 100644 --- a/docs/fundamentals/hyperparameters.md +++ b/docs/fundamentals/hyperparameters.md @@ -6,7 +6,7 @@ Hyperparameters are a script's configuration options. Since hyperparameters can model performance, it is crucial to efficiently track and manage them. ClearML supports tracking and managing hyperparameters in each task and provides a dedicated [hyperparameter -optimization module](hpo.md). With ClearML's logging and tracking capabilities, tasks can be reproduced, and their +optimization module](../getting_started/hpo.md). With ClearML's logging and tracking capabilities, tasks can be reproduced, and their hyperparameters and results can be saved and compared, which is key to understanding model behavior. ClearML lets you easily try out different hyperparameter values without changing your original code. ClearML's [execution diff --git a/docs/fundamentals/task.md b/docs/fundamentals/task.md index afdb8633..19ed8fc5 100644 --- a/docs/fundamentals/task.md +++ b/docs/fundamentals/task.md @@ -124,7 +124,7 @@ Available task types are: * *inference* - Model inference job (e.g. offline / batch model execution) * *controller* - A task that lays out the logic for other tasks' interactions, manual or automatic (e.g. a pipeline controller) -* *optimizer* - A specific type of controller for optimization tasks (e.g. [hyperparameter optimization](hpo.md)) +* *optimizer* - A specific type of controller for optimization tasks (e.g. [hyperparameter optimization](../getting_started/hpo.md)) * *service* - Long lasting or recurring service (e.g. server cleanup, auto ingress, sync services etc.) * *monitor* - A specific type of service for monitoring * *application* - A task implementing custom applicative logic, like [autoscaler](../guides/services/aws_autoscaler.md) diff --git a/docs/getting_started/architecture.md b/docs/getting_started/architecture.md index 1343e688..cc59cdd3 100644 --- a/docs/getting_started/architecture.md +++ b/docs/getting_started/architecture.md @@ -2,9 +2,9 @@ title: ClearML Modules --- -- [**ClearML Python Package**](../getting_started/ds/ds_first_steps.md#install-clearml) (`clearml`) for integrating ClearML into your existing code-base. -- [**ClearML Server**](../deploying_clearml/clearml_server.md) (`clearml-server`) for storing experiment, model, and workflow data, and supporting the Web UI experiment manager. It is also the control plane for the MLOps. -- [**ClearML Agent**](../clearml_agent.md) (`clearml-agent`), the MLOps orchestration agent. Enabling experiment and workflow reproducibility, and scalability. +- [**ClearML Python Package**](../clearml_sdk/clearml_sdk_setup.md) (`clearml`) for integrating ClearML into your existing code-base. +- [**ClearML Server**](../deploying_clearml/clearml_server.md) (`clearml-server`) for storing task, model, and workflow data, and supporting the Web UI experiment manager. It is also the control plane for the MLOps. +- [**ClearML Agent**](../clearml_agent.md) (`clearml-agent`), the MLOps orchestration agent. Enabling task and workflow reproducibility, and scalability. - [**ClearML Data**](../clearml_data/clearml_data.md) (`clearml-data`) data management and versioning on top of file-systems/object-storage. - [**ClearML Serving**](../clearml_serving/clearml_serving.md) (`clearml-serving`) for model deployment and orchestration. - [**ClearML Session**](../apps/clearml_session.md) (`clearml-session`) for launching remote instances of Jupyter Notebooks and VSCode. diff --git a/docs/getting_started/auto_log_exp.md b/docs/getting_started/auto_log_exp.md new file mode 100644 index 00000000..2f3b44d8 --- /dev/null +++ b/docs/getting_started/auto_log_exp.md @@ -0,0 +1,59 @@ +--- +title: Auto-logging Experiments +--- + +In ClearML, experiments are organized as [Tasks](../fundamentals/task.md). + +When you integrate the ClearML SDK with your code, the ClearML task manager automatically captures: +* Source code and uncommitted changes +* Installed packages +* General information such as machine details, runtime, creation date etc. +* Model files, parameters, scalars, and plots from popular ML frameworks such as TensorFlow and PyTorch (see list of [supported frameworks](../clearml_sdk/task_sdk.md#automatic-logging)) +* Console output + +:::tip Automatic logging control +To control what ClearML automatically logs, see this [FAQ](../faq.md#controlling_logging). +::: + +## To Auto-log Your Experiments + +1. Install `clearml` and connect it to the ClearML Server (see [instructions](../clearml_sdk/clearml_sdk.md)) +1. At the beginning of your code, import the `clearml` package: + + ```python + from clearml import Task + ``` + + :::tip Full Automatic Logging + To ensure full automatic logging, it is recommended to import the `clearml` package at the top of your entry script. + ::: + +1. Initialize the Task object in your `main()` function, or the beginning of the script. + + ```python + task = Task.init(project_name='great project', task_name='best task') + ``` + + If the project does not already exist, a new one is created automatically. + + The console should display the following output: + + ``` + ClearML Task: created new task id=1ca59ef1f86d44bd81cb517d529d9e5a + 2021-07-25 13:59:09 + ClearML results page: https://app.clear.ml/projects/4043a1657f374e9298649c6ba72ad233/experiments/1ca59ef1f86d44bd81cb517d529d9e5a/output/log + 2025-01-25 13:59:16 + ``` + +1. Click the results page link to go to the [task's detail page in the ClearML WebApp](../webapp/webapp_exp_track_visual.md), + where you can monitor the task's status, view all its logged data, visualize its results, and more! + + ![Info panel](../img/webapp_tracking_40.png#light-mode-only) + ![Info panel](../img/webapp_tracking_40_dark.png#dark-mode-only) + +**That's it!** You are done integrating ClearML with your code :) + +Now, [command-line arguments](../fundamentals/hyperparameters.md#tracking-hyperparameters), [console output](../fundamentals/logger.md#types-of-logged-results), TensorBoard and Matplotlib, and much more will automatically be +logged in the UI under the created Task. + +Sit back, relax, and watch your models converge :) \ No newline at end of file diff --git a/docs/getting_started/building_pipelines.md b/docs/getting_started/building_pipelines.md new file mode 100644 index 00000000..a6a7466d --- /dev/null +++ b/docs/getting_started/building_pipelines.md @@ -0,0 +1,25 @@ +--- +title: Building Pipelines +--- + + +Pipelines are a way to streamline and connect multiple processes, plugging the output of one process as the input of another. + +ClearML Pipelines are implemented by a Controller Task that holds the logic of the pipeline steps' interactions. The +execution logic controls which step to launch based on parent steps completing their execution. Depending on the +specifications laid out in the controller task, a step's parameters can be overridden, enabling users to leverage other +steps' execution products such as artifacts and parameters. + +When run, the controller will sequentially launch the pipeline steps. Pipelines can be executed locally or +on any machine using the [clearml-agent](../clearml_agent.md). + +ClearML pipelines are created from code using one of the following: +* [PipelineController class](../pipelines/pipelines_sdk_tasks.md) - A pythonic interface for defining and configuring the + pipeline controller and its steps. The controller and steps can be functions in your Python code or existing ClearML tasks. +* [PipelineDecorator class](../pipelines/pipelines_sdk_function_decorators.md) - A set of Python decorators which transform + your functions into the pipeline controller and steps + +For more information, see [ClearML Pipelines](../pipelines/pipelines.md). + +![Pipeline DAG](../img/webapp_pipeline_DAG.png#light-mode-only) +![Pipeline DAG](../img/webapp_pipeline_DAG_dark.png#dark-mode-only) \ No newline at end of file diff --git a/docs/getting_started/clearml_agent_base_docker.md b/docs/getting_started/clearml_agent_base_docker.md new file mode 100644 index 00000000..b4959647 --- /dev/null +++ b/docs/getting_started/clearml_agent_base_docker.md @@ -0,0 +1,20 @@ +--- +title: Building Task Execution Environments in a Container +--- + +### Base Container + +Build a container according to the execution environment of a specific task. + +```bash +clearml-agent build --id --docker --target +``` + +You can add the container as the base container image to a task, using one of the following methods: + +- Using the **ClearML Web UI** - See [Default Container](../webapp/webapp_exp_tuning.md#default-container). +- In the ClearML configuration file - Use the ClearML configuration file [`agent.default_docker`](../configs/clearml_conf.md#agentdefault_docker) + options. + +Check out [this tutorial](../guides/clearml_agent/exp_environment_containers.md) for building a Docker container +replicating the execution environment of an existing task. \ No newline at end of file diff --git a/docs/clearml_agent/clearml_agent_docker.md b/docs/getting_started/clearml_agent_docker_exec.md similarity index 51% rename from docs/clearml_agent/clearml_agent_docker.md rename to docs/getting_started/clearml_agent_docker_exec.md index 99bdb1e2..c1c3eb30 100644 --- a/docs/clearml_agent/clearml_agent_docker.md +++ b/docs/getting_started/clearml_agent_docker_exec.md @@ -1,5 +1,5 @@ --- -title: Building Docker Containers +title: Building Executable Task Containers --- ## Exporting a Task into a Standalone Docker Container @@ -28,20 +28,3 @@ Build a Docker container that when launched executes a specific task, or a clone Check out [this tutorial](../guides/clearml_agent/executable_exp_containers.md) for building executable task containers. - -### Base Docker Container - -Build a Docker container according to the execution environment of a specific task. - -```bash -clearml-agent build --id --docker --target -``` - -You can add the Docker container as the base Docker image to a task, using one of the following methods: - -- Using the **ClearML Web UI** - See [Default Container](../webapp/webapp_exp_tuning.md#default-container). -- In the ClearML configuration file - Use the ClearML configuration file [`agent.default_docker`](../configs/clearml_conf.md#agentdefault_docker) - options. - -Check out [this tutorial](../guides/clearml_agent/exp_environment_containers.md) for building a Docker container -replicating the execution environment of an existing task. \ No newline at end of file diff --git a/docs/clearml_agent/clearml_agent_scheduling.md b/docs/getting_started/clearml_agent_scheduling.md similarity index 99% rename from docs/clearml_agent/clearml_agent_scheduling.md rename to docs/getting_started/clearml_agent_scheduling.md index 469dfe54..80d22df7 100644 --- a/docs/clearml_agent/clearml_agent_scheduling.md +++ b/docs/getting_started/clearml_agent_scheduling.md @@ -1,6 +1,7 @@ --- -title: Scheduling Working Hours +title: Managing Agent Work Schedules --- + :::important Enterprise Feature This feature is available under the ClearML Enterprise plan. ::: diff --git a/docs/getting_started/data_management.md b/docs/getting_started/data_management.md new file mode 100644 index 00000000..3064a51f --- /dev/null +++ b/docs/getting_started/data_management.md @@ -0,0 +1,131 @@ +--- +title: Managing Your Data +--- + +Data is probably one of the biggest factors that determines the success of a project. Associating a model's data with +the model's configuration, code, and results (such as accuracy) is key to deducing meaningful insights into model behavior. + +[ClearML Data](../clearml_data/clearml_data.md) lets you: +* Version your data +* Fetch your data from every machine with minimal code changes +* Use the data with any other task +* Associate data to task results. + +ClearML offers the following data management solutions: + +* `clearml.Dataset` - A Python interface for creating, retrieving, managing, and using datasets. See [SDK](../clearml_data/clearml_data_sdk.md) + for an overview of the basic methods of the Dataset module. +* `clearml-data` - A CLI utility for creating, uploading, and managing datasets. See [CLI](../clearml_data/clearml_data_cli.md) + for a reference of `clearml-data` commands. +* Hyper-Datasets - ClearML's advanced queryable dataset management solution. For more information, see [Hyper-Datasets](../hyperdatasets/overview.md) + +The following guide will use both the `clearml-data` CLI and the `Dataset` class to do the following: +1. Create a ClearML dataset +2. Access the dataset from a ClearML Task in order to preprocess the data +3. Create a new version of the dataset with the modified data +4. Use the new version of the dataset to train a model + +## Creating Dataset + +Let's assume you have some code that extracts data from a production database into a local folder. +Your goal is to create an immutable copy of the data to be used by further steps. + +1. Create the dataset using the `clearml-data create` command and passing the dataset's project and name. You can add a + `latest` tag, making it easier to find it later. + + ```bash + clearml-data create --project chatbot_data --name dataset_v1 --latest + ``` + +1. Add data to the dataset using `clearml-data sync` and passing the path of the folder to be added to the dataset. + This command also uploads the data and finalizes the dataset automatically. + + ```bash + clearml-data sync --folder ./work_dataset + ``` + + +## Preprocessing Data +The second step is to preprocess the data. First access the data, then modify it, +and lastly create a new version of the data. + +1. Create a task for you data preprocessing (not required): + + ```python + from clearml import Task, Dataset + + # create a task for the data processing + task = Task.init(project_name='data', task_name='create', task_type='data_processing') + ``` + +1. Access a dataset using [`Dataset.get()`](../references/sdk/dataset.md#datasetget): + + ```python + # get the v1 dataset + dataset = Dataset.get(dataset_project='data', dataset_name='dataset_v1') + ``` +1. Get a local mutable copy of the dataset using [`Dataset.get_mutable_local_copy`](../references/sdk/dataset.md#get_mutable_local_copy). \ + This downloads the dataset to a specified `target_folder` (non-cached). If the folder already has contents, specify + whether to overwrite its contents with the dataset contents using the `overwrite` parameter. + + ```python + # get a local mutable copy of the dataset + dataset_folder = dataset.get_mutable_local_copy( + target_folder='work_dataset', + overwrite=True + ) + ``` + +1. Preprocess the data, including modifying some files in the `./work_dataset` folder. + +1. Create a new version of the dataset: + + ```python + # create a new version of the dataset with the pickle file + new_dataset = Dataset.create( + dataset_project='data', + dataset_name='dataset_v2', + parent_datasets=[dataset], + # this will make sure we have the creation code and the actual dataset artifacts on the same Task + use_current_task=True, + ) + +1. Add the modified data to the dataset: + + ```python + new_dataset.sync_folder(local_path=dataset_folder) + new_dataset.upload() + new_dataset.finalize() + ``` + +1. Remove the `latest` tag from the previous dataset and add the tag to the new dataset: + ```python + # now let's remove the previous dataset tag + dataset.tags = [] + new_dataset.tags = ['latest'] + ``` + +The new dataset inherits the contents of the datasets specified in `Dataset.create`'s `parent_datasets` argument. +This not only helps trace back dataset changes with full genealogy, but also makes the storage more efficient, +since it only stores the changed and/or added files from the parent versions. +When you access the dataset, it automatically merges the files from all parent versions +in a fully automatic and transparent process, as if the files were always part of the requested Dataset. + +## Training +You can now train your model with the **latest** dataset you have in the system, by getting the instance of the Dataset +based on the `latest` tag (if you have two Datasets with the same tag you will get the newest). +Once you have the dataset you can request a local copy of the data. All local copy requests are cached, +which means that if you access the same dataset multiple times you will not have any unnecessary downloads. + +```python +# create a task for the model training +task = Task.init(project_name='data', task_name='ingest', task_type='training') + +# get the latest dataset with the tag `latest` +dataset = Dataset.get(dataset_tags='latest') + +# get a cached copy of the Dataset files +dataset_folder = dataset.get_local_copy() + +# train model here +``` \ No newline at end of file diff --git a/docs/getting_started/ds/ds_second_steps.md b/docs/getting_started/ds/ds_second_steps.md deleted file mode 100644 index 3aaf3f87..00000000 --- a/docs/getting_started/ds/ds_second_steps.md +++ /dev/null @@ -1,193 +0,0 @@ ---- -title: Next Steps ---- - -So, you've already [installed ClearML's Python package](ds_first_steps.md) and run your first experiment! - -Now, you'll learn how to track Hyperparameters, Artifacts, and Metrics! - -## Accessing Experiments - -Every previously executed experiment is stored as a Task. -A Task's project and name can be changed after the experiment has been executed. -A Task is also automatically assigned an auto-generated unique identifier (UUID string) that cannot be changed and always locates the same Task in the system. - -Retrieve a Task object programmatically by querying the system based on either the Task ID, -or project and name combination. You can also query tasks based on their properties, like tags (see [Querying Tasks](../../clearml_sdk/task_sdk.md#querying--searching-tasks)). - -```python -prev_task = Task.get_task(task_id='123456deadbeef') -``` - -Once you have a Task object you can query the state of the Task, get its model(s), scalars, parameters, etc. - -## Log Hyperparameters - -For full reproducibility, it's paramount to save hyperparameters for each experiment. Since hyperparameters can have substantial impact -on model performance, saving and comparing these between experiments is sometimes the key to understanding model behavior. - -ClearML supports logging `argparse` module arguments out of the box, so once ClearML is integrated into the code, it automatically logs all parameters provided to the argument parser. - -You can also log parameter dictionaries (very useful when parsing an external configuration file and storing as a dict object), -whole configuration files, or even custom objects or [Hydra](https://hydra.cc/docs/intro/) configurations! - -```python -params_dictionary = {'epochs': 3, 'lr': 0.4} -task.connect(params_dictionary) -``` - -See [Configuration](../../clearml_sdk/task_sdk.md#configuration) for all hyperparameter logging options. - -## Log Artifacts - -ClearML lets you easily store the output products of an experiment - Model snapshot / weights file, a preprocessing of your data, feature representation of data and more! - -Essentially, artifacts are files (or Python objects) uploaded from a script and are stored alongside the Task. -These artifacts can be easily accessed by the web UI or programmatically. - -Artifacts can be stored anywhere, either on the ClearML server, or any object storage solution or shared folder. -See all [storage capabilities](../../integrations/storage.md). - - -### Adding Artifacts - -Upload a local file containing the preprocessed results of the data: -```python -task.upload_artifact(name='data', artifact_object='/path/to/preprocess_data.csv') -``` - -You can also upload an entire folder with all its content by passing the folder (the folder will be zipped and uploaded as a single zip file). -```python -task.upload_artifact(name='folder', artifact_object='/path/to/folder/') -``` - -Lastly, you can upload an instance of an object; Numpy/Pandas/PIL Images are supported with `npz`/`csv.gz`/`jpg` formats accordingly. -If the object type is unknown, ClearML pickles it and uploads the pickle file. - -```python -numpy_object = np.eye(100, 100) -task.upload_artifact(name='features', artifact_object=numpy_object) -``` - -For more artifact logging options, see [Artifacts](../../clearml_sdk/task_sdk.md#artifacts). - -### Using Artifacts - -Logged artifacts can be used by other Tasks, whether it's a pre-trained Model or processed data. -To use an artifact, first you have to get an instance of the Task that originally created it, -then you either download it and get its path, or get the artifact object directly. - -For example, using a previously generated preprocessed data. - -```python -preprocess_task = Task.get_task(task_id='preprocessing_task_id') -local_csv = preprocess_task.artifacts['data'].get_local_copy() -``` - -`task.artifacts` is a dictionary where the keys are the artifact names, and the returned object is the artifact object. -Calling `get_local_copy()` returns a local cached copy of the artifact. Therefore, next time you execute the code, you don't -need to download the artifact again. -Calling `get()` gets a deserialized pickled object. - -Check out the [artifacts retrieval](https://github.com/clearml/clearml/blob/master/examples/reporting/artifacts_retrieval.py) example code. - -### Models - -Models are a special kind of artifact. -Models created by popular frameworks (such as PyTorch, TensorFlow, Scikit-learn) are automatically logged by ClearML. -All snapshots are automatically logged. In order to make sure you also automatically upload the model snapshot (instead of saving its local path), -pass a storage location for the model files to be uploaded to. - -For example, upload all snapshots to an S3 bucket: -```python -task = Task.init( - project_name='examples', - task_name='storing model', - output_uri='s3://my_models/' -) -``` - -Now, whenever the framework (TensorFlow/Keras/PyTorch etc.) stores a snapshot, the model file is automatically uploaded to the bucket to a specific folder for the experiment. - -Loading models by a framework is also logged by the system; these models appear in an experiment's **Artifacts** tab, -under the "Input Models" section. - -Check out model snapshots examples for [TensorFlow](https://github.com/clearml/clearml/blob/master/examples/frameworks/tensorflow/tensorflow_mnist.py), -[PyTorch](https://github.com/clearml/clearml/blob/master/examples/frameworks/pytorch/pytorch_mnist.py), -[Keras](https://github.com/clearml/clearml/blob/master/examples/frameworks/keras/keras_tensorboard.py), -[scikit-learn](https://github.com/clearml/clearml/blob/master/examples/frameworks/scikit-learn/sklearn_joblib_example.py). - -#### Loading Models -Loading a previously trained model is quite similar to loading artifacts. - -```python -prev_task = Task.get_task(task_id='the_training_task') -last_snapshot = prev_task.models['output'][-1] -local_weights_path = last_snapshot.get_local_copy() -``` - -Like before, you have to get the instance of the task training the original weights files, then you can query the task for its output models (a list of snapshots), and get the latest snapshot. -:::note -Using TensorFlow, the snapshots are stored in a folder, meaning the `local_weights_path` will point to a folder containing your requested snapshot. -::: -As with artifacts, all models are cached, meaning the next time you run this code, no model needs to be downloaded. -Once one of the frameworks will load the weights file, the running task will be automatically updated with "Input Model" pointing directly to the original training Task's Model. -This feature lets you easily get a full genealogy of every trained and used model by your system! - -## Log Metrics - -Full metrics logging is the key to finding the best performing model! -By default, ClearML automatically captures and logs everything reported to TensorBoard and Matplotlib. - -Since not all metrics are tracked that way, you can also manually report metrics using a [`Logger`](../../fundamentals/logger.md) object. - -You can log everything, from time series data and confusion matrices to HTML, Audio, and Video, to custom plotly graphs! Everything goes! - -![Experiment plots](../../img/report_plotly.png#light-mode-only) -![Experiment plots](../../img/report_plotly_dark.png#dark-mode-only) - -Once everything is neatly logged and displayed, use the [comparison tool](../../webapp/webapp_exp_comparing.md) to find the best configuration! - - -## Track Experiments - -The task table is a powerful tool for creating dashboards and views of your own projects, your team's projects, or the entire development. - -![Task table](../../img/webapp_experiment_table.png#light-mode-only) -![Task table](../../img/webapp_experiment_table_dark.png#dark-mode-only) - - -### Creating Leaderboards -Customize the [task table](../../webapp/webapp_exp_table.md) to fit your own needs, adding desired views of parameters, metrics, and tags. -You can filter and sort based on parameters and metrics, so creating custom views is simple and flexible. - -Create a dashboard for a project, presenting the latest Models and their accuracy scores, for immediate insights. - -It can also be used as a live leaderboard, showing the best performing experiments' status, updated in real time. -This is helpful to monitor your projects' progress, and to share it across the organization. - -Any page is sharable by copying the URL from the address bar, allowing you to bookmark leaderboards or to send an exact view of a specific experiment or a comparison page. - -You can also tag Tasks for visibility and filtering allowing you to add more information on the execution of the experiment. -Later you can search based on task name in the search bar, and filter experiments based on their tags, parameters, status, and more. - -## What's Next? - -This covers the basics of ClearML! Running through this guide you've learned how to log Parameters, Artifacts and Metrics! - -If you want to learn more look at how we see the data science process in our [best practices](best_practices.md) page, -or check these pages out: - -- Scale you work and deploy [ClearML Agents](../../clearml_agent.md) -- Develop on remote machines with [ClearML Session](../../apps/clearml_session.md) -- Structure your work and put it into [Pipelines](../../pipelines/pipelines.md) -- Improve your experiments with [Hyperparameter Optimization](../../fundamentals/hpo.md) -- Check out ClearML's integrations with your favorite ML frameworks like [TensorFlow](../../integrations/tensorflow.md), - [PyTorch](../../integrations/pytorch.md), [Keras](../../integrations/keras.md), - and more - -## YouTube Playlist - -All these tips and tricks are also covered in ClearML's **Getting Started** series on YouTube. Go check it out :) - -[![Watch the video](https://img.youtube.com/vi/kyOfwVg05EM/hqdefault.jpg)](https://www.youtube.com/watch?v=kyOfwVg05EM&list=PLMdIlCuMqSTnoC45ME5_JnsJX0zWqDdlO&index=3) \ No newline at end of file diff --git a/docs/getting_started/hpo.md b/docs/getting_started/hpo.md new file mode 100644 index 00000000..aa388581 --- /dev/null +++ b/docs/getting_started/hpo.md @@ -0,0 +1,34 @@ +--- +title: Hyperparameter Optimization +--- + +## What is Hyperparameter Optimization? +Hyperparameters are variables that directly control the behaviors of training algorithms, and have a significant effect on +the performance of the resulting machine learning models. Hyperparameter optimization (HPO) is crucial for improving +model performance and generalization. + +Finding the hyperparameter values that yield the best performing models can be complicated. Manually adjusting +hyperparameters over the course of many training trials can be slow and tedious. Luckily, ClearML offers automated +solutions to boost hyperparameter optimization efficiency. + +## Workflow + +![Hyperparameter optimization diagram](../img/hpo_diagram.png) + +The preceding diagram demonstrates the typical flow of hyperparameter optimization where the parameters of a base task are optimized: + +1. Configure an Optimization Task with a base task whose parameters will be optimized, optimization targets, and a set of parameter values to + test +1. Clone the base task. Each clone's parameter is overridden with a value from the optimization task +1. Enqueue each clone for execution by a ClearML Agent +1. The Optimization Task records and monitors the cloned tasks' configuration and execution details, and returns a + summary of the optimization results. + +## ClearML Solutions + +ClearML offers three solutions for hyperparameter optimization: +* [GUI application](../webapp/applications/apps_hpo.md): The Hyperparameter Optimization app allows you to run and manage the optimization tasks + directly from the web interface--no code necessary (available under the ClearML Pro plan). +* [Command-Line Interface (CLI)](../apps/clearml_param_search.md): The `clearml-param-search` CLI tool enables you to configure and launch the optimization process from your terminal. +* [Python Interface](../clearml_sdk/hpo_sdk.md): The `HyperParameterOptimizer` class within the ClearML SDK allows you to + configure and launch optimization tasks, and seamlessly integrate them in your existing model training tasks. diff --git a/docs/getting_started/logging_using_artifacts.md b/docs/getting_started/logging_using_artifacts.md new file mode 100644 index 00000000..27cafc98 --- /dev/null +++ b/docs/getting_started/logging_using_artifacts.md @@ -0,0 +1,122 @@ +--- +title: Logging and Using Task Artifacts +--- + +:::note +This tutorial assumes that you've already set up [ClearML](../clearml_sdk/clearml_sdk_setup.md) +::: + + +ClearML lets you easily store a task's output products--or **Artifacts**: +* [Model](#models) snapshot / weights file +* Preprocessing of your data +* Feature representation of data +* And more! + +**Artifacts** are files or Python objects that are uploaded and stored alongside the Task. +These artifacts can be easily accessed by the web UI or programmatically. + +Artifacts can be stored anywhere, either on the ClearML Server, or any object storage solution or shared folder. +See all [storage capabilities](../integrations/storage.md). + + +## Adding Artifacts + +Let's create a [Task](../fundamentals/task.md) and add some artifacts to it. + +1. Create a task using [`Task.init()`](../references/sdk/task.md#taskinit) + + ```python + from clearml import Task + + task = Task.init(project_name='great project', task_name='task with artifacts') + ``` + +1. Upload a local **file** using [`Task.upload_folder()`](../references/sdk/task.md#upload_artifact) and specifying the artifact's + name and its path: + + ```python + task.upload_artifact(name='data', artifact_object='/path/to/preprocess_data.csv') + ``` + +1. Upload an **entire folder** with all its content by passing the folder path (the folder will be zipped and uploaded as a single zip file). + + ```python + task.upload_artifact(name='folder', artifact_object='/path/to/folder/') + ``` + +1. Upload an instance of an object. Numpy/Pandas/PIL Images are supported with `npz`/`csv.gz`/`jpg` formats accordingly. + If the object type is unknown, ClearML pickles it and uploads the pickle file. + + ```python + numpy_object = np.eye(100, 100) + task.upload_artifact(name='features', artifact_object=numpy_object) + ``` + +For more artifact logging options, see [Artifacts](../clearml_sdk/task_sdk.md#artifacts). + +### Using Artifacts + +Logged artifacts can be used by other Tasks, whether it's a pre-trained Model or processed data. +To use an artifact, first you have to get an instance of the Task that originally created it, +then you either download it and get its path, or get the artifact object directly. + +For example, using a previously generated preprocessed data. + +```python +preprocess_task = Task.get_task(task_id='preprocessing_task_id') +local_csv = preprocess_task.artifacts['data'].get_local_copy() +``` + +`task.artifacts` is a dictionary where the keys are the artifact names, and the returned object is the artifact object. +Calling `get_local_copy()` returns a local cached copy of the artifact. Therefore, next time you execute the code, you don't +need to download the artifact again. +Calling `get()` gets a deserialized pickled object. + +Check out the [artifacts retrieval](https://github.com/clearml/clearml/blob/master/examples/reporting/artifacts_retrieval.py) example code. + +## Models + +Models are a special kind of artifact. +Models created by popular frameworks (such as PyTorch, TensorFlow, Scikit-learn) are automatically logged by ClearML. +All snapshots are automatically logged. In order to make sure you also automatically upload the model snapshot (instead of saving its local path), +pass a storage location for the model files to be uploaded to. + +For example, upload all snapshots to an S3 bucket: +```python +task = Task.init( + project_name='examples', + task_name='storing model', + output_uri='s3://my_models/' +) +``` + +Now, whenever the framework (TensorFlow/Keras/PyTorch etc.) stores a snapshot, the model file is automatically uploaded to the bucket to a specific folder for the task. + +Loading models by a framework is also logged by the system; these models appear in a task's **Artifacts** tab, +under the "Input Models" section. + +Check out model snapshots examples for [TensorFlow](https://github.com/clearml/clearml/blob/master/examples/frameworks/tensorflow/tensorflow_mnist.py), +[PyTorch](https://github.com/clearml/clearml/blob/master/examples/frameworks/pytorch/pytorch_mnist.py), +[Keras](https://github.com/clearml/clearml/blob/master/examples/frameworks/keras/keras_tensorboard.py), +[scikit-learn](https://github.com/clearml/clearml/blob/master/examples/frameworks/scikit-learn/sklearn_joblib_example.py). + +### Loading Models +Loading a previously trained model is quite similar to loading artifacts. + +```python +prev_task = Task.get_task(task_id='the_training_task') +last_snapshot = prev_task.models['output'][-1] +local_weights_path = last_snapshot.get_local_copy() +``` + +Like before, you have to get the instance of the task training the original weights files, then you can query the task for its output models (a list of snapshots), and get the latest snapshot. + +:::note +Using TensorFlow, the snapshots are stored in a folder, meaning the `local_weights_path` will point to a folder containing your requested snapshot. +::: + +As with artifacts, all models are cached, meaning the next time you run this code, no model needs to be downloaded. +Once one of the frameworks will load the weights file, the running task will be automatically updated with "Input Model" pointing directly to the original training Task's Model. +This feature lets you easily get a full genealogy of every trained and used model by your system! + diff --git a/docs/getting_started/main.md b/docs/getting_started/main.md index 26fffcd0..bc1b3b17 100644 --- a/docs/getting_started/main.md +++ b/docs/getting_started/main.md @@ -1,8 +1,4 @@ ---- -id: main -title: What is ClearML? -slug: / ---- +# What is ClearML? ClearML is an open-source, end-to-end AI Platform designed to streamline AI adoption and the entire development lifecycle. It supports every phase of AI development, from research to production, allowing users to @@ -109,14 +105,14 @@ Want a more in depth introduction to ClearML? Choose where you want to get start - [Track and upload](../fundamentals/task.md) metrics and models with only 2 lines of code - [Reproduce](../webapp/webapp_exp_reproducing.md) tasks with 3 mouse clicks -- [Create bots](../guides/services/slack_alerts.md) that send you Slack messages based on experiment behavior (for example, +- [Create bots](../guides/services/slack_alerts.md) that send you Slack messages based on task behavior (for example, alert you whenever your model improves in accuracy) - Manage your [data](../clearml_data/clearml_data.md) - store, track, and version control -- Remotely execute experiments on any compute resource you have available with [ClearML Agent](../clearml_agent.md) +- Remotely execute tasks on any compute resource you have available with [ClearML Agent](../clearml_agent.md) - Automatically scale cloud instances according to your resource needs with ClearML's [AWS Autoscaler](../webapp/applications/apps_aws_autoscaler.md) and [GCP Autoscaler](../webapp/applications/apps_gcp_autoscaler.md) GUI applications -- Run [hyperparameter optimization](../fundamentals/hpo.md) +- Run [hyperparameter optimization](hpo.md) - Build [pipelines](../pipelines/pipelines.md) from code - Much more! diff --git a/docs/getting_started/mlops/mlops_first_steps.md b/docs/getting_started/mlops/mlops_first_steps.md deleted file mode 100644 index 34635cd3..00000000 --- a/docs/getting_started/mlops/mlops_first_steps.md +++ /dev/null @@ -1,225 +0,0 @@ ---- -title: First Steps ---- - -:::note -This tutorial assumes that you've already [signed up](https://app.clear.ml) to ClearML -::: - -ClearML provides tools for **automation**, **orchestration**, and **tracking**, all key in performing effective MLOps and LLMOps. - -Effective MLOps and LLMOps rely on the ability to scale work beyond one's own computer. Moving from your own machine can be time-consuming. -Even assuming that you have all the drivers and applications installed, you still need to manage multiple Python environments -for different packages / package versions, or worse - manage different Dockers for different package versions. - -Not to mention, when working on remote machines, executing experiments, tracking what's running where, and making sure machines -are fully utilized at all times become daunting tasks. - -This can create overhead that derails you from your core work! - -ClearML Agent was designed to deal with such issues and more! It is a tool responsible for executing tasks on remote machines: on-premises or in the cloud! ClearML Agent provides the means to reproduce and track tasks in your -machine of choice through the ClearML WebApp with no need for additional code. - -The agent will set up the environment for a specific Task's execution (inside a Docker, or bare-metal), install the -required Python packages, and execute and monitor the process. - - -## Set up an Agent - -1. Install the agent: - - ```bash - pip install clearml-agent - ``` - -1. Connect the agent to the server by [creating credentials](https://app.clear.ml/settings/workspace-configuration), then run this: - - ```bash - clearml-agent init - ``` - - :::note - If you've already created credentials, you can copy-paste the default agent section from [here](https://github.com/clearml/clearml-agent/blob/master/docs/clearml.conf#L15) (this is optional. If the section is not provided the default values will be used) - ::: - -1. Start the agent's daemon and assign it to a [queue](../../fundamentals/agents_and_queues.md#what-is-a-queue): - - ```bash - clearml-agent daemon --queue default - ``` - - A queue is an ordered list of Tasks that are scheduled for execution. The agent will pull Tasks from its assigned - queue (`default` in this case), and execute them one after the other. Multiple agents can listen to the same queue - (or even multiple queues), but only a single agent will pull a Task to be executed. - -:::tip Agent Deployment Modes -ClearML Agents can be deployed in: -* [Virtual environment mode](../../clearml_agent/clearml_agent_execution_env.md): Agent creates a new venv to execute a task. -* [Docker mode](../../clearml_agent/clearml_agent_execution_env.md#docker-mode): Agent executes a task inside a -Docker container. - -For more information, see [Running Modes](../../fundamentals/agents_and_queues.md#running-modes). -::: - -## Clone a Task -Tasks can be reproduced (cloned) for validation or as a baseline for further experimentation. -Cloning a task duplicates the task's configuration, but not its outputs. - -**To clone a task in the ClearML WebApp:** -1. Click on any project card to open its [task table](../../webapp/webapp_exp_table.md). -1. Right-click one of the tasks on the table. -1. Click **Clone** in the context menu, which will open a **CLONE TASK** window. -1. Click **CLONE** in the window. - -The newly cloned task will appear and its info panel will slide open. The cloned task is in draft mode, so -it can be modified. You can edit the Git / code references, control the Python packages to be installed, specify the -Docker container image to be used, or change the hyperparameters and configuration files. See [Modifying Tasks](../../webapp/webapp_exp_tuning.md#modifying-tasks) for more information about editing tasks in the UI. - -## Enqueue a Task -Once you have set up a task, it is now time to execute it. - -**To execute a task through the ClearML WebApp:** -1. Right-click your draft task (the context menu is also available through the Menu - button on the top right of the task's info panel) -1. Click **ENQUEUE,** which will open the **ENQUEUE TASK** window -1. In the window, select `default` in the queue menu -1. Click **ENQUEUE** - -This action pushes the task into the `default` queue. The task's status becomes *Pending* until an agent -assigned to the queue fetches it, at which time the task's status becomes *Running*. The agent executes the -task, and the task can be [tracked and its results visualized](../../webapp/webapp_exp_track_visual.md). - - -## Programmatic Interface - -The cloning, modifying, and enqueuing actions described above can also be performed programmatically. - -### First Steps -#### Access Previously Executed Tasks -All Tasks in the system can be accessed through their unique Task ID, or based on their properties using the [`Task.get_task`](../../references/sdk/task.md#taskget_task) -method. For example: -```python -from clearml import Task - -executed_task = Task.get_task(task_id='aabbcc') -``` - -Once a specific Task object has been obtained, it can be cloned, modified, and more. See [Advanced Usage](#advanced-usage). - -#### Clone a Task - -To duplicate a task, use the [`Task.clone`](../../references/sdk/task.md#taskclone) method, and input either a -Task object or the Task's ID as the `source_task` argument. -```python -cloned_task = Task.clone(source_task=executed_task) -``` - -#### Enqueue a Task -To enqueue the task, use the [`Task.enqueue`](../../references/sdk/task.md#taskenqueue) method, and input the Task object -with the `task` argument, and the queue to push the task into with `queue_name`. - -```python -Task.enqueue(task=cloned_task, queue_name='default') -``` - -### Advanced Usage -Before execution, use a variety of programmatic methods to manipulate a task object. - -#### Modify Hyperparameters -[Hyperparameters](../../fundamentals/hyperparameters.md) are an integral part of Machine Learning code as they let you -control the code without directly modifying it. Hyperparameters can be added from anywhere in your code, and ClearML supports multiple ways to obtain them! - -Users can programmatically change cloned tasks' parameters. - -For example: -```python -from clearml import Task - -cloned_task = Task.clone(task_id='aabbcc') -cloned_task.set_parameter(name='internal/magic', value=42) -``` - -#### Report Artifacts -Artifacts are files created by your task. Users can upload [multiple types of data](../../clearml_sdk/task_sdk.md#logging-artifacts), -objects and files to a task anywhere from code. - -```python -import numpy as np -from clearml import Task - -Task.current_task().upload_artifact(name='a_file', artifact_object='local_file.bin') -Task.current_task().upload_artifact(name='numpy', artifact_object=np.ones(4,4)) -``` - -Artifacts serve as a great way to pass and reuse data between tasks. Artifacts can be [retrieved](../../clearml_sdk/task_sdk.md#using-artifacts) -by accessing the Task that created them. These artifacts can be modified and uploaded to other tasks. - -```python -from clearml import Task - -executed_task = Task.get_task(task_id='aabbcc') -# artifact as a file -local_file = executed_task.artifacts['file'].get_local_copy() -# artifact as object -a_numpy = executed_task.artifacts['numpy'].get() -``` - -By facilitating the communication of complex objects between tasks, artifacts serve as the foundation of ClearML's [Data Management](../../clearml_data/clearml_data.md) -and [pipeline](../../pipelines/pipelines.md) solutions. - -#### Log Models -Logging models into the model repository is the easiest way to integrate the development process directly with production. -Any model stored by a supported framework (Keras / TensorFlow / PyTorch / Joblib etc.) will be automatically logged into ClearML. - -ClearML also supports methods to explicitly log models. Models can be automatically stored on a preferred storage medium -(S3 bucket, Google storage, etc.). - -#### Log Metrics -Log as many metrics as you want from your processes using the [Logger](../../fundamentals/logger.md) module. This -improves the visibility of your processes' progress. - -```python -from clearml import Logger - -Logger.current_logger().report_scalar( - graph='metric', - series='variant', - value=13.37, - iteration=counter -) -``` - -You can also retrieve reported scalars for programmatic analysis: -```python -from clearml import Task - -executed_task = Task.get_task(task_id='aabbcc') -# get a summary of the min/max/last value of all reported scalars -min_max_values = executed_task.get_last_scalar_metrics() -# get detailed graphs of all scalars -full_scalars = executed_task.get_reported_scalars() -``` - -#### Query Tasks -You can also search and query Tasks in the system. Use the [`Task.get_tasks`](../../references/sdk/task.md#taskget_tasks) -class method to retrieve Task objects and filter based on the specific values of the Task - status, parameters, metrics and more! - -```python -from clearml import Task - -tasks = Task.get_tasks( - project_name='examples', - task_name='partial_name_match', - task_filter={'status': 'in_progress'} -) -``` - -#### Manage Your Data -Data is probably one of the biggest factors that determines the success of a project. Associating a model's data with -the model's configuration, code, and results (such as accuracy) is key to deducing meaningful insights into model behavior. - -[ClearML Data](../../clearml_data/clearml_data.md) lets you version your data, so it's never lost, fetch it from every -machine with minimal code changes, and associate data to task results. - -Logging data can be done via command line, or programmatically. If any preprocessing code is involved, ClearML logs it -as well! Once data is logged, it can be used by other tasks. diff --git a/docs/getting_started/mlops/mlops_second_steps.md b/docs/getting_started/mlops/mlops_second_steps.md deleted file mode 100644 index aa56772b..00000000 --- a/docs/getting_started/mlops/mlops_second_steps.md +++ /dev/null @@ -1,121 +0,0 @@ ---- -title: Next Steps ---- - -Once Tasks are defined and in the ClearML system, they can be chained together to create Pipelines. -Pipelines provide users with a greater level of abstraction and automation, with Tasks running one after the other. - -Tasks can interface with other Tasks in the pipeline and leverage other Tasks' work products. - -The sections below describe the following scenarios: -* [Dataset creation](#dataset-creation) -* Data [processing](#preprocessing-data) and [consumption](#training) -* [Pipeline building](#building-the-pipeline) - - -## Building Tasks -### Dataset Creation - -Let's assume you have some code that extracts data from a production database into a local folder. -Your goal is to create an immutable copy of the data to be used by further steps: - -```bash -clearml-data create --project data --name dataset -clearml-data sync --folder ./from_production -``` - -You can add a tag `latest` to the Dataset, marking it as the latest version. - -### Preprocessing Data -The second step is to preprocess the data. First access the data, then modify it, -and lastly create a new version of the data. - -```python -from clearml import Task, Dataset - -# create a task for the data processing part -task = Task.init(project_name='data', task_name='create', task_type='data_processing') - -# get the v1 dataset -dataset = Dataset.get(dataset_project='data', dataset_name='dataset_v1') - -# get a local mutable copy of the dataset -dataset_folder = dataset.get_mutable_local_copy( - target_folder='work_dataset', - overwrite=True -) -# change some files in the `./work_dataset` folder - -# create a new version of the dataset with the pickle file -new_dataset = Dataset.create( - dataset_project='data', - dataset_name='dataset_v2', - parent_datasets=[dataset], - # this will make sure we have the creation code and the actual dataset artifacts on the same Task - use_current_task=True, -) -new_dataset.sync_folder(local_path=dataset_folder) -new_dataset.upload() -new_dataset.finalize() -# now let's remove the previous dataset tag -dataset.tags = [] -new_dataset.tags = ['latest'] -``` - -The new dataset inherits the contents of the datasets specified in `Dataset.create`'s `parent_datasets` argument. -This not only helps trace back dataset changes with full genealogy, but also makes the storage more efficient, -since it only stores the changed and/or added files from the parent versions. -When you access the Dataset, it automatically merges the files from all parent versions -in a fully automatic and transparent process, as if the files were always part of the requested Dataset. - -### Training -You can now train your model with the **latest** Dataset you have in the system, by getting the instance of the Dataset -based on the `latest` tag -(if by any chance you have two Datasets with the same tag you will get the newest). -Once you have the dataset you can request a local copy of the data. All local copy requests are cached, -which means that if you access the same dataset multiple times you will not have any unnecessary downloads. - -```python -# create a task for the model training -task = Task.init(project_name='data', task_name='ingest', task_type='training') - -# get the latest dataset with the tag `latest` -dataset = Dataset.get(dataset_tags='latest') - -# get a cached copy of the Dataset files -dataset_folder = dataset.get_local_copy() - -# train our model here -``` - -## Building the Pipeline - -Now that you have the data creation step, and the data training step, create a pipeline that when executed, -will first run the first and then run the second. -It is important to remember that pipelines are Tasks by themselves and can also be automated by other pipelines (i.e. pipelines of pipelines). - -```python -from clearml import PipelineController - -pipe = PipelineController( - project='data', - name='pipeline demo', - version="1.0" -) - -pipe.add_step( - name='step 1 data', - base_project_name='data', - base_task_name='create' -) -pipe.add_step( - name='step 2 train', - parents=['step 1 data', ], - base_project_name='data', - base_task_name='ingest' -) -``` - -You can also pass the parameters from one step to the other (for example `Task.id`). -In addition to pipelines made up of Task steps, ClearML also supports pipelines consisting of function steps. For more -information, see the [full pipeline documentation](../../pipelines/pipelines.md). diff --git a/docs/getting_started/project_progress.md b/docs/getting_started/project_progress.md new file mode 100644 index 00000000..01f12893 --- /dev/null +++ b/docs/getting_started/project_progress.md @@ -0,0 +1,43 @@ +--- +title: Monitoring Project Progress +--- + +ClearML provides a comprehensive set of monitoring tools to help effectively track and manage machine learning projects. +These tools offer both high-level overviews and detailed insights into task execution, resource +utilization, and project performance. + +## Offerings + +### Project Dashboard + +:::info Pro Plan Offering +The Project Dashboard app is available under the ClearML Pro plan. +::: + +The [**Project Dashboard**](../webapp/applications/apps_dashboard.md) UI application provides a centralized +view of project progress, task statuses, resource usage, and key performance metrics. It offers: +* Comprehensive insights: + * Track task statuses and trends over time. + * Monitor GPU utilization and worker activity. + * Analyze performance metrics. +* Proactive alerts - By integrating with Slack, the Dashboard can notify teams of task failures + and completions. + +For more information, see [Project Dashboard](../webapp/applications/apps_dashboard.md). + +![Project Dashboard](../img/apps_dashboard.png#light-mode-only) +![Project Dashboard](../img/apps_dashboard_dark.png#dark-mode-only) + +### Project Overview + +A project's **OVERVIEW** tab in the UI presents a general picture of a project: +* Metric Snapshot – A graphical representation of selected metric values across project tasks, offering a quick assessment of progress. +* Task Status Tracking – When a single metric variant is selected for the snapshot, task status is color-coded (e.g., +Completed, Aborted, Published, Failed) for better visibility. + +Use the Metric Snapshot to track project progress and identify trends in task performance. + +For more information, see [Project Overview](../webapp/webapp_project_overview.md). + +![Project Overview](../img/webapp_project_overview.png#light-mode-only) +![Project Overview](../img/webapp_project_overview_dark.png#dark-mode-only) diff --git a/docs/getting_started/remote_execution.md b/docs/getting_started/remote_execution.md new file mode 100644 index 00000000..3f7fab5f --- /dev/null +++ b/docs/getting_started/remote_execution.md @@ -0,0 +1,84 @@ +--- +title: Remote Execution +--- + +:::note +This guide assumes that you've already set up [ClearML](../clearml_sdk/clearml_sdk_setup.md) and [ClearML Agent](../clearml_agent/clearml_agent_setup.md). +::: + +ClearML Agent enables seamless remote execution by offloading computations from a local development environment to a more +powerful remote machine. This is useful for: + +* Running initial process (a task or function) locally before scaling up. +* Offloading resource-intensive tasks to dedicated compute nodes. +* Managing execution through ClearML's queue system. + +This guide focuses on transitioning a locally executed process to a remote machine for scalable execution. To learn how +to reproduce a previously executed process on a remote machine, see [Reproducing Tasks](reproduce_tasks.md). + +## Running a Task Remotely + +A compelling workflow is: + +1. Run code on a development machine for a few iterations, or just set up the environment. +1. Move the execution to a beefier remote machine for the actual training. + +Use [`Task.execute_remotely()`](../references/sdk/task.md#execute_remotely) to implement this workflow. This method stops the current manual execution, and then +re-runs it on a remote machine. + +1. Deploy a `clearml-agent` from your beefier remote machine and assign it to the `default` queue: + + ```commandline + clearml-agent daemon --queue default + ``` + +1. Run the local code to send to the remote machine for execution: + + ```python + from clearml import Task + + task = Task.init(project_name="myProject", task_name="myTask") + + # training code + + task.execute_remotely( + queue_name='default', + clone=False, + exit_process=True + ) + ``` + +Once `execute_remotely()` is called on the machine, it stops the local process and enqueues the current task into the `default` +queue. From there, an agent assigned to the queue can pull and launch it. + +## Running a Function Remotely + +You can execute a specific function remotely using [`Task.create_function_task()`](../references/sdk/task.md#create_function_task). +This method creates a ClearML Task from a Python function and runs it on a remote machine. + +For example: + +```python +from clearml import Task + +task = Task.init(project_name="myProject", task_name="Remote function") + +def run_me_remotely(some_argument): + print(some_argument) + +a_func_task = task.create_function_task( + func=run_me_remotely, + func_name='func_id_run_me_remotely', + task_name='a func task', + # everything below will be passed directly to our function as arguments + some_argument=123 +) +``` + +:::important Function Task Creation +Function tasks must be created from within a regular task, created by calling `Task.init` +::: + +Arguments passed to the function will be automatically logged in the task's **CONFIGURATION** tab under the **HYPERPARAMETERS > Function section**. +Like any other arguments, they can be changed from the UI or programmatically. + diff --git a/docs/getting_started/reproduce_tasks.md b/docs/getting_started/reproduce_tasks.md new file mode 100644 index 00000000..57bb1a98 --- /dev/null +++ b/docs/getting_started/reproduce_tasks.md @@ -0,0 +1,82 @@ +--- +title: Reproducing Tasks +--- + +:::note +This tutorial assumes that you've already set up [ClearML](../clearml_sdk/clearml_sdk_setup.md) and [ClearML Agent](../clearml_agent/clearml_agent_setup.md). +::: + +Tasks can be reproduced--or **Cloned**--for validation or as a baseline for further experimentation. When you initialize a task in your +code, ClearML logs everything needed to reproduce your task and its environment: +* Uncommitted changes +* Used packages and their versions +* Parameters +* and more + +Cloning a task duplicates the task's configuration, but not its outputs. + +ClearML offers two ways to clone your task: +* [Via the WebApp](#via-the-webapp)--no further code required +* [Via programmatic interface](#via-programmatic-interface) using the `clearml` Python package + +Once you have cloned your task, you can modify its setup, and then execute it remotely on a machine of your choice using a ClearML Agent. + +## Via the WebApp + +**To clone a task in the ClearML WebApp:** +1. Click on any project card to open its [task table](../webapp/webapp_exp_table.md). +1. Right-click the task you want to reproduce. +1. Click **Clone** in the context menu, which will open a **CLONE TASK** window. +1. Click **CLONE** in the window. + +The newly cloned task's details page will open up. The cloned task is in *draft* mode, which means +it can be modified. You can edit any of the Task's setup details, including: +* Git and/or code references +* Python packages to be installed +* Container image to be used + +You can adjust the values of the task's hyperparameters and configuration files. See [Modifying Tasks](../webapp/webapp_exp_tuning.md#modifying-tasks) for more +information about editing tasks in the UI. + +### Enqueue a Task +Once you have set up a task, it is now time to execute it. + +**To execute a task through the ClearML WebApp:** +1. In the task's details page, click "Menu" Menu +1. Click **ENQUEUE** to open the **ENQUEUE TASK** window +1. In the window, select `default` in the `Queue` menu +1. Click **ENQUEUE** + +This action pushes the task into the `default` queue. The task's status becomes *Pending* until an agent +assigned to the queue fetches it, at which time the task's status becomes *Running*. The agent executes the +task, and the task can be [tracked and its results visualized](../webapp/webapp_exp_track_visual.md). + + +## Via Programmatic Interface + +The cloning, modifying, and enqueuing actions described above can also be performed programmatically using `clearml`. + + +### Clone the Task + +To duplicate the task, use [`Task.clone()`](../references/sdk/task.md#taskclone), and input either a +Task object or the Task's ID as the `source_task` argument. + +```python +cloned_task = Task.clone(source_task='qw03485je3hap903ere54') +``` + +The cloned task is in *draft* mode, which means it can be modified. For modification options, such as setting new parameter +values, see [Task SDK](../clearml_sdk/task_sdk.md). + +### Enqueue the Task +To enqueue the task, use [`Task.enqueue()`](../references/sdk/task.md#taskenqueue), and input the Task object +with the `task` argument, and the queue to push the task into with `queue_name`. + +```python +Task.enqueue(task=cloned_task, queue_name='default') +``` + +This action pushes the task into the `default` queue. The task's status becomes *Pending* until an agent +assigned to the queue fetches it, at which time the task's status becomes *Running*. The agent executes the +task, and the task can be [tracked and its results visualized](../webapp/webapp_exp_track_visual.md). \ No newline at end of file diff --git a/docs/getting_started/task_trigger_schedule.md b/docs/getting_started/task_trigger_schedule.md new file mode 100644 index 00000000..f1822e22 --- /dev/null +++ b/docs/getting_started/task_trigger_schedule.md @@ -0,0 +1,41 @@ +--- +title: Scheduling and Triggering Task Execution +--- + + In ClearML, tasks can be scheduled and triggered automatically, enabling seamless workflow automation. This section + provides an overview of the mechanisms available for managing task scheduling and event-based + triggering. + +## Task Scheduling +Task scheduling allows users to define one-shot or periodic executions at specified times and intervals. This +is useful for: + +* Running routine operations such as periodic model training, evaluation jobs, backups, and reports. +* Automating data ingestion and preprocessing workflows. +* Ensuring regular execution of monitoring and reporting tasks. + +ClearML's offers the following scheduling solutions: +* [**UI Application**](../webapp/applications/apps_task_scheduler.md) (available under the Enterprise Plan) - The **Task Scheduler** app + provides a simple no-code interface for managing task schedules. + +* [**Python Interface**](../references/sdk/scheduler.md) - Use the `TaskScheduler` class to programmatically manage + task schedules. + +## Task Execution Triggering + +ClearML's trigger manager enables you to automate task execution based on event occurence in the ClearML system, such as: +* Changes in task status (e.g. running, completed, etc.) +* Publication, archiving, or tagging of tasks, models, or datasets +* Task metrics crossing predefined thresholds + +This is useful for: +* Triggering a training task when a dataset has been tagged as `latest` or any other tag +* Running an inference task when a model has been published +* Retraining a model when accuracy falls below a certain threshold +* And more + +ClearML's offers the following trigger management solutions: +* [**UI Application**](../webapp/applications/apps_trigger_manager.md) (available under the Enterprise Plan) - The **Trigger Manager** app + provides a simple no-code interface for managing task triggers . +* [**Python Interface**](../references/sdk/trigger.md) - Use the `TriggerScheduler` class to programmatically manage + task triggers. diff --git a/docs/getting_started/track_tasks.md b/docs/getting_started/track_tasks.md new file mode 100644 index 00000000..0b8223f6 --- /dev/null +++ b/docs/getting_started/track_tasks.md @@ -0,0 +1,46 @@ +--- +title: Tracking Tasks +--- + +Every ClearML [task](../fundamentals/task.md) you create can be found in the **All Tasks** table and in its project's +task table. + +The task table is a powerful tool for creating dashboards and views of your own projects, your team's projects, or the +entire development. + +![Task table](../img/webapp_experiment_table.png#light-mode-only) +![Task table](../img/webapp_experiment_table_dark.png#dark-mode-only) + +Customize the [task table](../webapp/webapp_exp_table.md) to fit your own needs by adding views of parameters, metrics, and tags. +Filter and sort based on various criteria, such as parameters and metrics, making it simple to create custom +views. This allows you to: + +* Create a dashboard for a project, presenting the latest model accuracy scores, for immediate insights. +* Create a live leaderboard displaying the best-performing tasks, updated in real time +* Monitor a projects' progress and share it across the organization. + +## Creating Leaderboards + +To create a leaderboard: + +1. Select a project in the ClearML WebApp and go to its task table +1. Customize the column selection. Click "Settings" Setting Gear + to view and select columns to display. +1. Filter tasks by name using the search bar to find tasks containing any search term +1. Filter by other categories by clicking "Filter" Filter + on the relevant column. There are a few types of filters: + * Value set - Choose which values to include from a list of all values in the column + * Numerical ranges - Insert minimum and/or maximum value + * Date ranges - Insert starting and/or ending date and time + * Tags - Choose which tags to filter by from a list of all tags used in the column. + * Filter by multiple tag values using the **ANY** or **ALL** options, which correspond to the logical "AND" and "OR" respectively. These + options appear on the top of the tag list. + * Filter by the absence of a tag (logical "NOT") by clicking its checkbox twice. An `X` will appear in the tag's checkbox. +1. Enable auto-refresh for real-time monitoring + +For more detailed instructions, see the [Tracking Leaderboards Tutorial](../guides/ui/building_leader_board.md). + +## Sharing Leaderboards + +Bookmark the URL of your customized leaderboard to save and share your view. The URL contains all parameters and values +for your specific leaderboard view. \ No newline at end of file diff --git a/docs/guides/clearml-task/clearml_task_tutorial.md b/docs/guides/clearml-task/clearml_task_tutorial.md index 085f352c..99b86e0f 100644 --- a/docs/guides/clearml-task/clearml_task_tutorial.md +++ b/docs/guides/clearml-task/clearml_task_tutorial.md @@ -7,7 +7,7 @@ on a remote or local machine, from a remote repository and your local machine. ### Prerequisites -- [`clearml`](../../getting_started/ds/ds_first_steps.md) Python package installed and configured +- [`clearml`](../../clearml_sdk/clearml_sdk_setup) Python package installed and configured - [`clearml-agent`](../../clearml_agent/clearml_agent_setup.md#installation) running on at least one machine (to execute the task), configured to listen to `default` queue ### Executing Code from a Remote Repository diff --git a/docs/guides/clearml_agent/executable_exp_containers.md b/docs/guides/clearml_agent/executable_exp_containers.md index 35cd57da..884bc53a 100644 --- a/docs/guides/clearml_agent/executable_exp_containers.md +++ b/docs/guides/clearml_agent/executable_exp_containers.md @@ -9,7 +9,7 @@ script. ## Prerequisites * [`clearml-agent`](../../clearml_agent/clearml_agent_setup.md#installation) installed and configured -* [`clearml`](../../getting_started/ds/ds_first_steps.md#install-clearml) installed and configured +* [`clearml`](../../clearml_sdk/clearml_sdk_setup#install-clearml) installed and configured * [clearml](https://github.com/clearml/clearml) repo cloned (`git clone https://github.com/clearml/clearml.git`) ## Creating the ClearML Task diff --git a/docs/guides/clearml_agent/exp_environment_containers.md b/docs/guides/clearml_agent/exp_environment_containers.md index 0398e017..388d932e 100644 --- a/docs/guides/clearml_agent/exp_environment_containers.md +++ b/docs/guides/clearml_agent/exp_environment_containers.md @@ -11,7 +11,7 @@ be used when running optimization tasks. ## Prerequisites * [`clearml-agent`](../../clearml_agent/clearml_agent_setup.md#installation) installed and configured -* [`clearml`](../../getting_started/ds/ds_first_steps.md#install-clearml) installed and configured +* [`clearml`](../../clearml_sdk/clearml_sdk_setup#install-clearml) installed and configured * [clearml](https://github.com/clearml/clearml) repo cloned (`git clone https://github.com/clearml/clearml.git`) ## Creating the ClearML Task diff --git a/docs/guides/frameworks/tensorflow/integration_keras_tuner.md b/docs/guides/frameworks/tensorflow/integration_keras_tuner.md index 4635afd9..5db4d120 100644 --- a/docs/guides/frameworks/tensorflow/integration_keras_tuner.md +++ b/docs/guides/frameworks/tensorflow/integration_keras_tuner.md @@ -3,10 +3,10 @@ title: Keras Tuner --- :::tip -If you are not already using ClearML, see [Getting Started](../../../getting_started/ds/ds_first_steps.md) for setup -instructions. +If you are not already using ClearML, see [ClearML Setup instructions](../clearml_sdk/clearml_sdk_setup). ::: + Integrate ClearML into code that uses [Keras Tuner](https://www.tensorflow.org/tutorials/keras/keras_tuner). By specifying `ClearMLTunerLogger` (see [kerastuner.py](https://github.com/clearml/clearml/blob/master/clearml/external/kerastuner.py)) as the Keras Tuner logger, ClearML automatically logs scalars and hyperparameter optimization. diff --git a/docs/guides/main.md b/docs/guides/main.md index 89143186..202eaa40 100644 --- a/docs/guides/main.md +++ b/docs/guides/main.md @@ -1,6 +1,6 @@ --- id: guidemain -title: Examples +title: ClearML Tutorials slug: /guides --- diff --git a/docs/hyperdatasets/task.md b/docs/hyperdatasets/task.md index ea6a9063..5543acaf 100644 --- a/docs/hyperdatasets/task.md +++ b/docs/hyperdatasets/task.md @@ -1,6 +1,10 @@ --- -title: Tasks +title: Dataviews --- + +:::important ENTERPRISE FEATURE +Dataviews available under the ClearML Enterprise plan. +::: Hyper-Datasets extend the ClearML [**Task**](../fundamentals/task.md) with [Dataviews](dataviews.md). diff --git a/docs/hyperdatasets/webapp/webapp_annotator.md b/docs/hyperdatasets/webapp/webapp_annotator.md index fb48de89..3a52547f 100644 --- a/docs/hyperdatasets/webapp/webapp_annotator.md +++ b/docs/hyperdatasets/webapp/webapp_annotator.md @@ -2,6 +2,10 @@ title: Annotation Tasks --- +:::important ENTERPRISE FEATURE +Annotation tasks are available under the ClearML Enterprise plan. +::: + Use the Annotations page to access and manage annotation Tasks. Use annotation tasks to efficiently organize the annotation of frames in Dataset versions and manage the work of annotators diff --git a/docs/hyperdatasets/webapp/webapp_datasets.md b/docs/hyperdatasets/webapp/webapp_datasets.md index cddbe574..5cc3d06f 100644 --- a/docs/hyperdatasets/webapp/webapp_datasets.md +++ b/docs/hyperdatasets/webapp/webapp_datasets.md @@ -2,6 +2,10 @@ title: Hyper-Datasets Page --- +:::important ENTERPRISE FEATURE +Hyper-Datasets are available under the ClearML Enterprise plan. +::: + Use the Hyper-Datasets Page to navigate between and manage hyper-datasets. You can view the Hyper-Datasets page in Project view Project view diff --git a/docs/hyperdatasets/webapp/webapp_datasets_frames.md b/docs/hyperdatasets/webapp/webapp_datasets_frames.md index ca92d2c8..ee4037b2 100644 --- a/docs/hyperdatasets/webapp/webapp_datasets_frames.md +++ b/docs/hyperdatasets/webapp/webapp_datasets_frames.md @@ -2,6 +2,10 @@ title: Working with Frames --- +:::important ENTERPRISE FEATURE +Hyper-Datasets are available under the ClearML Enterprise plan. +::: + View and edit SingleFrames in the Dataset page. After selecting a Hyper-Dataset version, the **Version Browser** shows a sample of frames and enables viewing SingleFrames and FramesGroups, and editing SingleFrames, in the [frame viewer](#frame-viewer). Before opening the frame viewer, you can filter the frames by applying [simple](webapp_datasets_versioning.md#simple-frame-filtering) or [advanced](webapp_datasets_versioning.md#advanced-frame-filtering) diff --git a/docs/hyperdatasets/webapp/webapp_datasets_versioning.md b/docs/hyperdatasets/webapp/webapp_datasets_versioning.md index dfa64503..f40d44a3 100644 --- a/docs/hyperdatasets/webapp/webapp_datasets_versioning.md +++ b/docs/hyperdatasets/webapp/webapp_datasets_versioning.md @@ -2,6 +2,10 @@ title: Dataset Versions --- +:::important ENTERPRISE FEATURE +Hyper-Datasets are available under the ClearML Enterprise plan. +::: + Use the Dataset versioning WebApp (UI) features for viewing, creating, modifying, and deleting [Dataset versions](../dataset.md#dataset-versioning). diff --git a/docs/hyperdatasets/webapp/webapp_dataviews.md b/docs/hyperdatasets/webapp/webapp_dataviews.md index 73e1d821..9722528b 100644 --- a/docs/hyperdatasets/webapp/webapp_dataviews.md +++ b/docs/hyperdatasets/webapp/webapp_dataviews.md @@ -2,6 +2,10 @@ title: The Dataview Table --- +:::important ENTERPRISE FEATURE +Dataviews are available under the ClearML Enterprise plan. +::: + The **Dataview table** is a [customizable](#customizing-the-dataview-table) list of Dataviews associated with a project. Use it to view and create Dataviews, and access their info panels. diff --git a/docs/hyperdatasets/webapp/webapp_exp_comparing.md b/docs/hyperdatasets/webapp/webapp_exp_comparing.md index 8a5b2707..333ba0cb 100644 --- a/docs/hyperdatasets/webapp/webapp_exp_comparing.md +++ b/docs/hyperdatasets/webapp/webapp_exp_comparing.md @@ -2,6 +2,10 @@ title: Comparing Dataviews --- +:::important ENTERPRISE FEATURE +Dataviews are available under the ClearML Enterprise plan. +::: + In addition to [ClearML's comparison features](../../webapp/webapp_exp_comparing.md), the ClearML Enterprise WebApp supports comparing input data selection criteria of task [Dataviews](../dataviews.md), enabling to easily locate, visualize, and analyze differences. diff --git a/docs/hyperdatasets/webapp/webapp_exp_modifying.md b/docs/hyperdatasets/webapp/webapp_exp_modifying.md index 1c616ae2..bbb57e62 100644 --- a/docs/hyperdatasets/webapp/webapp_exp_modifying.md +++ b/docs/hyperdatasets/webapp/webapp_exp_modifying.md @@ -2,6 +2,10 @@ title: Modifying Dataviews --- +:::important ENTERPRISE FEATURE +Dataviews are available under the ClearML Enterprise plan. +::: + A task that has been executed can be [cloned](../../webapp/webapp_exp_reproducing.md), then the cloned task's execution details can be modified, and the modified task can be executed. diff --git a/docs/hyperdatasets/webapp/webapp_exp_track_visual.md b/docs/hyperdatasets/webapp/webapp_exp_track_visual.md index 978b613b..569d1fff 100644 --- a/docs/hyperdatasets/webapp/webapp_exp_track_visual.md +++ b/docs/hyperdatasets/webapp/webapp_exp_track_visual.md @@ -2,6 +2,10 @@ title: Task Dataviews --- +:::important ENTERPRISE FEATURE +Dataviews are available under the ClearML Enterprise plan. +::: + While a task is running, and any time after it finishes, results are tracked and can be visualized in the ClearML Enterprise WebApp (UI). diff --git a/docs/img/app_bool_choice.png b/docs/img/app_bool_choice.png new file mode 100644 index 00000000..d0df5dd8 Binary files /dev/null and b/docs/img/app_bool_choice.png differ diff --git a/docs/img/app_bool_choice_dark.png b/docs/img/app_bool_choice_dark.png new file mode 100644 index 00000000..5e28c914 Binary files /dev/null and b/docs/img/app_bool_choice_dark.png differ diff --git a/docs/img/app_cond_str.png b/docs/img/app_cond_str.png new file mode 100644 index 00000000..7ac43ae4 Binary files /dev/null and b/docs/img/app_cond_str.png differ diff --git a/docs/img/app_cond_str_dark.png b/docs/img/app_cond_str_dark.png new file mode 100644 index 00000000..8b26acbe Binary files /dev/null and b/docs/img/app_cond_str_dark.png differ diff --git a/docs/img/app_group.png b/docs/img/app_group.png new file mode 100644 index 00000000..9d377d5a Binary files /dev/null and b/docs/img/app_group.png differ diff --git a/docs/img/app_group_dark.png b/docs/img/app_group_dark.png new file mode 100644 index 00000000..116fec04 Binary files /dev/null and b/docs/img/app_group_dark.png differ diff --git a/docs/img/app_html_elements.png b/docs/img/app_html_elements.png new file mode 100644 index 00000000..67769ac1 Binary files /dev/null and b/docs/img/app_html_elements.png differ diff --git a/docs/img/app_html_elements_dark.png b/docs/img/app_html_elements_dark.png new file mode 100644 index 00000000..f9eb9eca Binary files /dev/null and b/docs/img/app_html_elements_dark.png differ diff --git a/docs/img/app_log.png b/docs/img/app_log.png new file mode 100644 index 00000000..272def23 Binary files /dev/null and b/docs/img/app_log.png differ diff --git a/docs/img/app_log_dark.png b/docs/img/app_log_dark.png new file mode 100644 index 00000000..16c90163 Binary files /dev/null and b/docs/img/app_log_dark.png differ diff --git a/docs/img/app_plot.png b/docs/img/app_plot.png new file mode 100644 index 00000000..26907fce Binary files /dev/null and b/docs/img/app_plot.png differ diff --git a/docs/img/app_plot_dark.png b/docs/img/app_plot_dark.png new file mode 100644 index 00000000..840e772a Binary files /dev/null and b/docs/img/app_plot_dark.png differ diff --git a/docs/img/app_proj_selection.png b/docs/img/app_proj_selection.png new file mode 100644 index 00000000..3b125b91 Binary files /dev/null and b/docs/img/app_proj_selection.png differ diff --git a/docs/img/app_proj_selection_dark.png b/docs/img/app_proj_selection_dark.png new file mode 100644 index 00000000..8a3dc9e3 Binary files /dev/null and b/docs/img/app_proj_selection_dark.png differ diff --git a/docs/img/gif/ai_dev_center.gif b/docs/img/gif/ai_dev_center.gif new file mode 100644 index 00000000..7a76737a Binary files /dev/null and b/docs/img/gif/ai_dev_center.gif differ diff --git a/docs/img/gif/ai_dev_center_dark.gif b/docs/img/gif/ai_dev_center_dark.gif new file mode 100644 index 00000000..ab5a4efb Binary files /dev/null and b/docs/img/gif/ai_dev_center_dark.gif differ diff --git a/docs/img/gif/dataset.gif b/docs/img/gif/dataset.gif index a83978a4..3063a288 100644 Binary files a/docs/img/gif/dataset.gif and b/docs/img/gif/dataset.gif differ diff --git a/docs/img/gif/dataset_dark.gif b/docs/img/gif/dataset_dark.gif new file mode 100644 index 00000000..85486974 Binary files /dev/null and b/docs/img/gif/dataset_dark.gif differ diff --git a/docs/img/gif/genai_engine.gif b/docs/img/gif/genai_engine.gif new file mode 100644 index 00000000..ecca8a5e Binary files /dev/null and b/docs/img/gif/genai_engine.gif differ diff --git a/docs/img/gif/genai_engine_dark.gif b/docs/img/gif/genai_engine_dark.gif new file mode 100644 index 00000000..6af30d0f Binary files /dev/null and b/docs/img/gif/genai_engine_dark.gif differ diff --git a/docs/img/gif/infra_control_plane.gif b/docs/img/gif/infra_control_plane.gif new file mode 100644 index 00000000..66e70c8d Binary files /dev/null and b/docs/img/gif/infra_control_plane.gif differ diff --git a/docs/img/gif/infra_control_plane_dark.gif b/docs/img/gif/infra_control_plane_dark.gif new file mode 100644 index 00000000..3d25ef82 Binary files /dev/null and b/docs/img/gif/infra_control_plane_dark.gif differ diff --git a/docs/img/gif/integrations_yolov5.gif b/docs/img/gif/integrations_yolov5.gif index f332940c..0a0795bd 100644 Binary files a/docs/img/gif/integrations_yolov5.gif and b/docs/img/gif/integrations_yolov5.gif differ diff --git a/docs/img/gif/integrations_yolov5_dark.gif b/docs/img/gif/integrations_yolov5_dark.gif new file mode 100644 index 00000000..6dcfb4f2 Binary files /dev/null and b/docs/img/gif/integrations_yolov5_dark.gif differ diff --git a/docs/integrations/accelerate.md b/docs/integrations/accelerate.md index 6be0f9ab..8d5d685e 100644 --- a/docs/integrations/accelerate.md +++ b/docs/integrations/accelerate.md @@ -9,7 +9,7 @@ such as required packages and uncommitted changes, and supports reporting scalar ## Setup -To use Accelerate's ClearML tracker, make sure that `clearml` is [installed and set up](../getting_started/ds/ds_first_steps.md#install-clearml) +To use Accelerate's ClearML tracker, make sure that `clearml` is [installed and set up](../clearml_sdk/clearml_sdk_setup#install-clearml) in your environment, and use the `log_with` parameter when instantiating an `Accelerator`: ```python diff --git a/docs/integrations/autokeras.md b/docs/integrations/autokeras.md index a92eb852..dcf38cff 100644 --- a/docs/integrations/autokeras.md +++ b/docs/integrations/autokeras.md @@ -3,7 +3,7 @@ title: AutoKeras --- :::tip -If you are not already using ClearML, see [Getting Started](../getting_started/ds/ds_first_steps.md) for setup +If you are not already using ClearML, see [Getting Started](../clearml_sdk/clearml_sdk_setup) for setup instructions. ::: @@ -95,7 +95,8 @@ and shuts down instances as needed, according to a resource budget that you set. ### Cloning, Editing, and Enqueuing -![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5.gif) +![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5.gif#light-mode-only) +![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5_dark.gif#dark-mode-only) Use ClearML's web interface to edit task details, like configuration parameters or input models, then execute the task with the new configuration on a remote machine: diff --git a/docs/integrations/catboost.md b/docs/integrations/catboost.md index 50c41700..f3e60261 100644 --- a/docs/integrations/catboost.md +++ b/docs/integrations/catboost.md @@ -3,7 +3,7 @@ title: CatBoost --- :::tip -If you are not already using ClearML, see [Getting Started](../getting_started/ds/ds_first_steps.md) for setup +If you are not already using ClearML, see [ClearML Setup](../clearml_sdk/clearml_sdk_setup) for setup instructions. ::: @@ -93,7 +93,8 @@ and shuts down instances as needed, according to a resource budget that you set. ### Cloning, Editing, and Enqueuing -![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5.gif) +![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5.gif#light-mode-only) +![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5_dark.gif#dark-mode-only) Use ClearML's web interface to edit task details, like configuration parameters or input models, then execute the task with the new configuration on a remote machine: @@ -117,5 +118,5 @@ task.execute_remotely(queue_name='default', exit_process=True) ## Hyperparameter Optimization Use ClearML's [`HyperParameterOptimizer`](../references/sdk/hpo_optimization_hyperparameteroptimizer.md) class to find -the hyperparameter values that yield the best performing models. See [Hyperparameter Optimization](../fundamentals/hpo.md) +the hyperparameter values that yield the best performing models. See [Hyperparameter Optimization](../getting_started/hpo.md) for more information. diff --git a/docs/integrations/click.md b/docs/integrations/click.md index cf9298bd..c1169615 100644 --- a/docs/integrations/click.md +++ b/docs/integrations/click.md @@ -3,7 +3,7 @@ title: Click --- :::tip -If you are not already using ClearML, see [Getting Started](../getting_started/ds/ds_first_steps.md) for setup +If you are not already using ClearML, see [ClearML Setup](../clearml_sdk/clearml_sdk_setup) for setup instructions. ::: diff --git a/docs/integrations/fastai.md b/docs/integrations/fastai.md index e8fd03e5..f532be3a 100644 --- a/docs/integrations/fastai.md +++ b/docs/integrations/fastai.md @@ -3,8 +3,7 @@ title: Fast.ai --- :::tip -If you are not already using ClearML, see [Getting Started](../getting_started/ds/ds_first_steps.md) for setup -instructions. +If you are not already using ClearML, see [ClearML Setup instructions](../clearml_sdk/clearml_sdk_setup). ::: ClearML integrates seamlessly with [fast.ai](https://www.fast.ai/), automatically logging its models and scalars. @@ -93,7 +92,8 @@ and shuts down instances as needed, according to a resource budget that you set. ### Cloning, Editing, and Enqueuing -![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5.gif) +![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5.gif#light-mode-only) +![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5_dark.gif#dark-mode-only) Use ClearML's web interface to edit task details, like configuration parameters or input models, then execute the task with the new configuration on a remote machine: diff --git a/docs/integrations/hydra.md b/docs/integrations/hydra.md index d8a05c04..faaa41b0 100644 --- a/docs/integrations/hydra.md +++ b/docs/integrations/hydra.md @@ -3,8 +3,7 @@ title: Hydra --- :::tip -If you are not already using ClearML, see [Getting Started](../getting_started/ds/ds_first_steps.md) for setup -instructions. +If you are not already using ClearML, see [ClearML Setup instructions](../clearml_sdk/clearml_sdk_setup). ::: diff --git a/docs/integrations/ignite.md b/docs/integrations/ignite.md index 9b2de832..683292ab 100644 --- a/docs/integrations/ignite.md +++ b/docs/integrations/ignite.md @@ -3,8 +3,7 @@ title: PyTorch Ignite --- :::tip -If you are not already using ClearML, see [Getting Started](../getting_started/ds/ds_first_steps.md) for setup -instructions. +If you are not already using ClearML, see [ClearML Setup instructions](../clearml_sdk/clearml_sdk_setup). ::: [PyTorch Ignite](https://pytorch.org/ignite/index.html) is a library for training and evaluating neural networks in diff --git a/docs/integrations/integrations.md b/docs/integrations/integrations.md new file mode 100644 index 00000000..589d30fa --- /dev/null +++ b/docs/integrations/integrations.md @@ -0,0 +1,40 @@ +# ClearML Integrations + +ClearML seamlessly integrates with a wide range of popular machine learning frameworks, tools, and platforms to enhance your ML development workflow. Our integrations enable automatic experiment tracking, model management, and pipeline orchestration across your preferred tools. + +### Deep Learning Frameworks +* [PyTorch](pytorch.md) +* [TensorFlow](tensorflow.md) +* [Keras](keras.md) +* [YOLO v5](yolov5.md) +* [YOLO v8](yolov8.md) +* [MMEngine](mmengine.md) +* [MMCV](mmcv.md) +* [MONAI](monai.md) +* [Nvidia TAO](tao.md) +* [MegEngine](megengine.md) +* [FastAI](fastai.md) + +### ML Frameworks +* [scikit-learn](scikit_learn.md) +* [XGBoost](xgboost.md) +* [LightGBM](lightgbm.md) +* [CatBoost](catboost.md) +* [Seaborn](seaborn.md) + +### Configuration and Optimization +* [AutoKeras](autokeras.md) +* [Keras Tuner](keras_tuner.md) +* [Optuna](optuna.md) +* [Hydra](hydra.md) +* [Click](click.md) +* [Python Fire](python_fire.md) +* [jsonargparse](jsonargparse.md) + +### MLOps and Visualization +* [TensorBoard](tensorboard.md) +* [TensorBoardX](tensorboardx.md) +* [Matplotlib](matplotlib.md) +* [LangChain](langchain.md) +* [Pytorch Ignite](ignite.md) +* [Pytorch Lightning](pytorch_lightning.md) diff --git a/docs/integrations/jsonargparse.md b/docs/integrations/jsonargparse.md index 8f348e45..42cc2fa2 100644 --- a/docs/integrations/jsonargparse.md +++ b/docs/integrations/jsonargparse.md @@ -3,11 +3,11 @@ title: jsonargparse --- :::tip -If you are not already using ClearML, see [Getting Started](../getting_started/ds/ds_first_steps.md) for setup -instructions. +If you are not already using ClearML, see [ClearML Setup instructions](../clearml_sdk/clearml_sdk_setup). ::: + [jsonargparse](https://github.com/omni-us/jsonargparse) is a Python package for creating command-line interfaces. ClearML integrates seamlessly with `jsonargparse` and automatically logs its command-line parameters and connected configuration files. diff --git a/docs/integrations/keras.md b/docs/integrations/keras.md index 52f6f487..d9ac7a0d 100644 --- a/docs/integrations/keras.md +++ b/docs/integrations/keras.md @@ -3,10 +3,10 @@ title: Keras --- :::tip -If you are not already using ClearML, see [Getting Started](../getting_started/ds/ds_first_steps.md) for setup -instructions. +If you are not already using ClearML, see [ClearML Setup instructions](../clearml_sdk/clearml_sdk_setup). ::: + ClearML integrates with [Keras](https://keras.io/) out-of-the-box, automatically logging its models, scalars, TensorFlow definitions, and TensorBoard outputs. @@ -105,7 +105,8 @@ and shuts down instances as needed, according to a resource budget that you set. ### Cloning, Editing, and Enqueuing -![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5.gif) +![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5.gif#light-mode-only) +![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5_dark.gif#dark-mode-only) Use ClearML's web interface to edit task details, like configuration parameters or input models, then execute the task with the new configuration on a remote machine: @@ -129,5 +130,5 @@ task.execute_remotely(queue_name='default', exit_process=True) ## Hyperparameter Optimization Use ClearML's [`HyperParameterOptimizer`](../references/sdk/hpo_optimization_hyperparameteroptimizer.md) class to find -the hyperparameter values that yield the best performing models. See [Hyperparameter Optimization](../fundamentals/hpo.md) +the hyperparameter values that yield the best performing models. See [Hyperparameter Optimization](../getting_started/hpo.md) for more information. diff --git a/docs/integrations/keras_tuner.md b/docs/integrations/keras_tuner.md index d75cffc1..705526b8 100644 --- a/docs/integrations/keras_tuner.md +++ b/docs/integrations/keras_tuner.md @@ -3,10 +3,10 @@ title: Keras Tuner --- :::tip -If you are not already using ClearML, see [Getting Started](../getting_started/ds/ds_first_steps.md) for setup -instructions. +If you are not already using ClearML, see [ClearML Setup instructions](../clearml_sdk/clearml_sdk_setup). ::: + [Keras Tuner](https://www.tensorflow.org/tutorials/keras/keras_tuner) is a library that helps you pick the optimal set of hyperparameters for training your models. ClearML integrates seamlessly with `kerastuner` and automatically logs task scalars, the output model, and hyperparameter optimization summary. diff --git a/docs/integrations/langchain.md b/docs/integrations/langchain.md index f4fef37d..c85f7551 100644 --- a/docs/integrations/langchain.md +++ b/docs/integrations/langchain.md @@ -3,10 +3,10 @@ title: LangChain --- :::tip -If you are not already using ClearML, see [Getting Started](../getting_started/ds/ds_first_steps.md) for setup -instructions. +If you are not already using ClearML, see [ClearML Setup instructions](../clearml_sdk/clearml_sdk_setup). ::: + [LangChain](https://github.com/langchain-ai/langchain) is a popular framework for developing applications powered by language models. You can integrate ClearML into your LangChain code using the built-in `ClearMLCallbackHandler`. This class is used to create a ClearML Task to log LangChain assets and metrics. diff --git a/docs/integrations/lightgbm.md b/docs/integrations/lightgbm.md index cce9887e..7f6d2628 100644 --- a/docs/integrations/lightgbm.md +++ b/docs/integrations/lightgbm.md @@ -3,10 +3,10 @@ title: LightGBM --- :::tip -If you are not already using ClearML, see [Getting Started](../getting_started/ds/ds_first_steps.md) for setup -instructions. +If you are not already using ClearML, see [ClearML Setup instructions](../clearml_sdk/clearml_sdk_setup). ::: + ClearML integrates seamlessly with [LightGBM](https://github.com/microsoft/LightGBM), automatically logging its models, metric plots, and parameters. @@ -94,7 +94,8 @@ and shuts down instances as needed, according to a resource budget that you set. ### Cloning, Editing, and Enqueuing -![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5.gif) +![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5.gif#light-mode-only) +![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5_dark.gif#dark-mode-only) Use ClearML's web interface to edit task details, like configuration parameters or input models, then execute the task with the new configuration on a remote machine: @@ -118,5 +119,5 @@ task.execute_remotely(queue_name='default', exit_process=True) ## Hyperparameter Optimization Use ClearML's [`HyperParameterOptimizer`](../references/sdk/hpo_optimization_hyperparameteroptimizer.md) class to find -the hyperparameter values that yield the best performing models. See [Hyperparameter Optimization](../fundamentals/hpo.md) +the hyperparameter values that yield the best performing models. See [Hyperparameter Optimization](../getting_started/hpo.md) for more information. diff --git a/docs/integrations/matplotlib.md b/docs/integrations/matplotlib.md index 06714ff8..dde8e0cd 100644 --- a/docs/integrations/matplotlib.md +++ b/docs/integrations/matplotlib.md @@ -3,10 +3,10 @@ title: Matplotlib --- :::tip -If you are not already using ClearML, see [Getting Started](../getting_started/ds/ds_first_steps.md) for setup -instructions. +If you are not already using ClearML, see [ClearML Setup instructions](../clearml_sdk/clearml_sdk_setup). ::: + [Matplotlib](https://matplotlib.org/) is a Python library for data visualization. ClearML automatically captures plots and images created with `matplotlib`. diff --git a/docs/integrations/megengine.md b/docs/integrations/megengine.md index 77cad702..3ad13771 100644 --- a/docs/integrations/megengine.md +++ b/docs/integrations/megengine.md @@ -3,10 +3,10 @@ title: MegEngine --- :::tip -If you are not already using ClearML, see [Getting Started](../getting_started/ds/ds_first_steps.md) for setup -instructions. +If you are not already using ClearML, see [ClearML Setup instructions](../clearml_sdk/clearml_sdk_setup). ::: + ClearML integrates seamlessly with [MegEngine](https://github.com/MegEngine/MegEngine), automatically logging its models. All you have to do is simply add two lines of code to your MegEngine script: @@ -90,7 +90,8 @@ and shuts down instances as needed, according to a resource budget that you set. ### Cloning, Editing, and Enqueuing -![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5.gif) +![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5.gif#light-mode-only) +![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5_dark.gif#dark-mode-only) Use ClearML's web interface to edit task details, like configuration parameters or input models, then execute the task with the new configuration on a remote machine: @@ -114,5 +115,5 @@ task.execute_remotely(queue_name='default', exit_process=True) ## Hyperparameter Optimization Use ClearML's [`HyperParameterOptimizer`](../references/sdk/hpo_optimization_hyperparameteroptimizer.md) class to find -the hyperparameter values that yield the best performing models. See [Hyperparameter Optimization](../fundamentals/hpo.md) +the hyperparameter values that yield the best performing models. See [Hyperparameter Optimization](../getting_started/hpo.md) for more information. diff --git a/docs/integrations/mmcv.md b/docs/integrations/mmcv.md index 8c77ca70..b9833820 100644 --- a/docs/integrations/mmcv.md +++ b/docs/integrations/mmcv.md @@ -7,10 +7,10 @@ title: MMCV v1.x ::: :::tip -If you are not already using ClearML, see [Getting Started](../getting_started/ds/ds_first_steps.md) for setup -instructions. +If you are not already using ClearML, see [ClearML Setup instructions](../clearml_sdk/clearml_sdk_setup). ::: + [MMCV](https://github.com/open-mmlab/mmcv/tree/1.x) is a computer vision framework developed by OpenMMLab. You can integrate ClearML into your code using the `mmcv` package's [`ClearMLLoggerHook`](https://mmcv.readthedocs.io/en/master/_modules/mmcv/runner/hooks/logger/clearml.html) class. This class is used to create a ClearML Task and to automatically log metrics. diff --git a/docs/integrations/mmengine.md b/docs/integrations/mmengine.md index 09d64256..733625f6 100644 --- a/docs/integrations/mmengine.md +++ b/docs/integrations/mmengine.md @@ -3,10 +3,10 @@ title: MMEngine --- :::tip -If you are not already using ClearML, see [Getting Started](../getting_started/ds/ds_first_steps.md) for setup -instructions. +If you are not already using ClearML, see [ClearML Setup instructions](../clearml_sdk/clearml_sdk_setup). ::: + [MMEngine](https://github.com/open-mmlab/mmengine) is a library for training deep learning models based on PyTorch. MMEngine supports ClearML through a builtin logger: It automatically logs task environment information, such as required packages and uncommitted changes, and supports reporting scalars, parameters, and debug samples. diff --git a/docs/integrations/monai.md b/docs/integrations/monai.md index 3dc98233..8b82e036 100644 --- a/docs/integrations/monai.md +++ b/docs/integrations/monai.md @@ -3,10 +3,10 @@ title: MONAI --- :::tip -If you are not already using ClearML, see [Getting Started](../getting_started/ds/ds_first_steps.md) for setup -instructions. +If you are not already using ClearML, see [ClearML Setup instructions](../clearml_sdk/clearml_sdk_setup). ::: + [MONAI](https://github.com/Project-MONAI/MONAI) is a PyTorch-based, open-source framework for deep learning in healthcare imaging. You can integrate ClearML into your code using MONAI's built-in handlers: [`ClearMLImageHandler`, `ClearMLStatsHandler`](#clearmlimagehandler-and-clearmlstatshandler), and [`ModelCheckpoint`](#modelcheckpoint). diff --git a/docs/integrations/optuna.md b/docs/integrations/optuna.md index f660f78b..2e4c821b 100644 --- a/docs/integrations/optuna.md +++ b/docs/integrations/optuna.md @@ -2,7 +2,7 @@ title: Optuna --- -[Optuna](https://optuna.readthedocs.io/en/latest) is a [hyperparameter optimization](../fundamentals/hpo.md) framework, +[Optuna](https://optuna.readthedocs.io/en/latest) is a [hyperparameter optimization](../getting_started/hpo.md) framework, which makes use of different samplers such as grid search, random, bayesian, and evolutionary algorithms. You can integrate Optuna into ClearML's automated hyperparameter optimization. diff --git a/docs/integrations/pytorch.md b/docs/integrations/pytorch.md index 59191fc9..9373b4e7 100644 --- a/docs/integrations/pytorch.md +++ b/docs/integrations/pytorch.md @@ -3,10 +3,10 @@ title: PyTorch --- :::tip -If you are not already using ClearML, see [Getting Started](../getting_started/ds/ds_first_steps.md) for setup -instructions. +If you are not already using ClearML, see [ClearML Setup instructions](../clearml_sdk/clearml_sdk_setup). ::: + ClearML integrates seamlessly with [PyTorch](https://pytorch.org/), automatically logging its models. All you have to do is simply add two lines of code to your PyTorch script: @@ -114,7 +114,8 @@ and shuts down instances as needed, according to a resource budget that you set. ### Cloning, Editing, and Enqueuing -![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5.gif) +![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5.gif#light-mode-only) +![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5_dark.gif#dark-mode-only) Use ClearML's web interface to edit task details, like configuration parameters or input models, then execute the task with the new configuration on a remote machine: diff --git a/docs/integrations/pytorch_lightning.md b/docs/integrations/pytorch_lightning.md index d01f5cb2..41e95bba 100644 --- a/docs/integrations/pytorch_lightning.md +++ b/docs/integrations/pytorch_lightning.md @@ -3,10 +3,10 @@ title: PyTorch Lightning --- :::tip -If you are not already using ClearML, see [Getting Started](../getting_started/ds/ds_first_steps.md) for setup -instructions. +If you are not already using ClearML, see [ClearML Setup instructions](../clearml_sdk/clearml_sdk_setup). ::: + [PyTorch Lightning](https://github.com/Lightning-AI/lightning) is a framework that simplifies the process of training and deploying PyTorch models. ClearML seamlessly integrates with PyTorch Lightning, automatically logging PyTorch models, parameters supplied by [LightningCLI](https://lightning.ai/docs/pytorch/stable/cli/lightning_cli.html), and more. @@ -120,7 +120,8 @@ and shuts down instances as needed, according to a resource budget that you set. ### Cloning, Editing, and Enqueuing -![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5.gif) +![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5.gif#light-mode-only) +![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5_dark.gif#dark-mode-only) Use ClearML's web interface to edit task details, like configuration parameters or input models, then execute the task with the new configuration on a remote machine: @@ -144,6 +145,6 @@ task.execute_remotely(queue_name='default', exit_process=True) ## Hyperparameter Optimization Use ClearML's [`HyperParameterOptimizer`](../references/sdk/hpo_optimization_hyperparameteroptimizer.md) class to find -the hyperparameter values that yield the best performing models. See [Hyperparameter Optimization](../fundamentals/hpo.md) +the hyperparameter values that yield the best performing models. See [Hyperparameter Optimization](../getting_started/hpo.md) for more information. diff --git a/docs/integrations/scikit_learn.md b/docs/integrations/scikit_learn.md index 5a6afbab..c0fb490a 100644 --- a/docs/integrations/scikit_learn.md +++ b/docs/integrations/scikit_learn.md @@ -3,10 +3,10 @@ title: scikit-learn --- :::tip -If you are not already using ClearML, see [Getting Started](../getting_started/ds/ds_first_steps.md) for setup -instructions. +If you are not already using ClearML, see [ClearML Setup instructions](../clearml_sdk/clearml_sdk_setup). ::: + ClearML integrates seamlessly with [scikit-learn](https://scikit-learn.org/stable/), automatically logging models created with `joblib`. @@ -96,7 +96,8 @@ and shuts down instances as needed, according to a resource budget that you set. ### Cloning, Editing, and Enqueuing -![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5.gif) +![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5.gif#light-mode-only) +![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5_dark.gif#dark-mode-only) Use ClearML's web interface to edit task details, like configuration parameters or input models, then execute the task with the new configuration on a remote machine: diff --git a/docs/integrations/seaborn.md b/docs/integrations/seaborn.md index ca2e1a2c..54b65583 100644 --- a/docs/integrations/seaborn.md +++ b/docs/integrations/seaborn.md @@ -3,10 +3,10 @@ title: Seaborn --- :::tip -If you are not already using ClearML, see [Getting Started](../getting_started/ds/ds_first_steps.md) for setup -instructions. +If you are not already using ClearML, see [ClearML Setup instructions](../clearml_sdk/clearml_sdk_setup). ::: + [seaborn](https://seaborn.pydata.org/) is a Python library for data visualization. ClearML automatically captures plots created using `seaborn`. All you have to do is add two lines of code to your script: diff --git a/docs/integrations/tao.md b/docs/integrations/tao.md index ec80c93f..6a2376b2 100644 --- a/docs/integrations/tao.md +++ b/docs/integrations/tao.md @@ -113,7 +113,8 @@ and shuts down instances as needed, according to a resource budget that you set. ### Cloning, Editing, and Enqueuing -![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5.gif) +![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5.gif#light-mode-only) +![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5_dark.gif#dark-mode-only) Use ClearML's web interface to edit task details, like configuration parameters or input models, then execute the task with the new configuration on a remote machine: diff --git a/docs/integrations/tensorboard.md b/docs/integrations/tensorboard.md index a0921c3b..317c983f 100644 --- a/docs/integrations/tensorboard.md +++ b/docs/integrations/tensorboard.md @@ -3,9 +3,10 @@ title: TensorBoard --- :::tip -If you are not already using ClearML, see [Getting Started](../getting_started/ds/ds_first_steps.md). +If you are not already using ClearML, see [ClearML Setup instructions](../clearml_sdk/clearml_sdk_setup). ::: + [TensorBoard](https://www.tensorflow.org/tensorboard) is TensorFlow's data visualization toolkit. ClearML automatically captures all data logged to TensorBoard. All you have to do is add two lines of code to your script: diff --git a/docs/integrations/tensorboardx.md b/docs/integrations/tensorboardx.md index c8bf97bf..673b2c7b 100644 --- a/docs/integrations/tensorboardx.md +++ b/docs/integrations/tensorboardx.md @@ -3,7 +3,7 @@ title: TensorboardX --- :::tip -If you are not already using ClearML, see [Getting Started](../getting_started/ds/ds_first_steps.md). +If you are not already using ClearML, see [ClearML Setup instructions](../clearml_sdk/clearml_sdk_setup). ::: [TensorboardX](https://tensorboardx.readthedocs.io/en/latest/tutorial.html#what-is-tensorboard-x) is a data diff --git a/docs/integrations/tensorflow.md b/docs/integrations/tensorflow.md index 3bdaee58..49040835 100644 --- a/docs/integrations/tensorflow.md +++ b/docs/integrations/tensorflow.md @@ -3,10 +3,10 @@ title: TensorFlow --- :::tip -If you are not already using ClearML, see [Getting Started](../getting_started/ds/ds_first_steps.md) for setup -instructions. +If you are not already using ClearML, see [ClearML Setup instructions](../clearml_sdk/clearml_sdk_setup). ::: + ClearML integrates with [TensorFlow](https://www.tensorflow.org/) out-of-the-box, automatically logging its models, definitions, scalars, as well as TensorBoard outputs. @@ -107,7 +107,8 @@ and shuts down instances as needed, according to a resource budget that you set. ### Cloning, Editing, and Enqueuing -![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5.gif) +![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5.gif#light-mode-only) +![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5_dark.gif#dark-mode-only) Use ClearML's web interface to edit task details, like configuration parameters or input models, then execute the task with the new configuration on a remote machine: @@ -131,5 +132,5 @@ task.execute_remotely(queue_name='default', exit_process=True) ## Hyperparameter Optimization Use ClearML's [`HyperParameterOptimizer`](../references/sdk/hpo_optimization_hyperparameteroptimizer.md) class to find -the hyperparameter values that yield the best performing models. See [Hyperparameter Optimization](../fundamentals/hpo.md) +the hyperparameter values that yield the best performing models. See [Hyperparameter Optimization](../getting_started/hpo.md) for more information. diff --git a/docs/integrations/transformers.md b/docs/integrations/transformers.md index 754fd07f..74b8c69b 100644 --- a/docs/integrations/transformers.md +++ b/docs/integrations/transformers.md @@ -78,7 +78,8 @@ and shuts down instances as needed, according to a resource budget that you set. ### Cloning, Editing, and Enqueuing -![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5.gif) +![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5.gif#light-mode-only) +![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5_dark.gif#dark-mode-only) Use ClearML's web interface to edit task details, like configuration parameters or input models, then execute the task with the new configuration on a remote machine: @@ -90,5 +91,5 @@ The ClearML Agent executing the task will use the new values to [override any ha ## Hyperparameter Optimization Use ClearML's [`HyperParameterOptimizer`](../references/sdk/hpo_optimization_hyperparameteroptimizer.md) class to find -the hyperparameter values that yield the best performing models. See [Hyperparameter Optimization](../fundamentals/hpo.md) +the hyperparameter values that yield the best performing models. See [Hyperparameter Optimization](../getting_started/hpo.md) for more information. diff --git a/docs/integrations/xgboost.md b/docs/integrations/xgboost.md index 7f230f81..876f5fb2 100644 --- a/docs/integrations/xgboost.md +++ b/docs/integrations/xgboost.md @@ -3,8 +3,7 @@ title: XGBoost --- :::tip -If you are not already using ClearML, see [Getting Started](../getting_started/ds/ds_first_steps.md) for setup -instructions. +If you are not already using ClearML, see [ClearML Setup instructions](../clearml_sdk/clearml_sdk_setup). ::: ClearML integrates seamlessly with [XGBoost](https://xgboost.readthedocs.io/en/stable/), automatically logging its models, @@ -121,7 +120,8 @@ and shuts down instances as needed, according to a resource budget that you set. ### Cloning, Editing, and Enqueuing -![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5.gif) +![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5.gif#light-mode-only) +![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5_dark.gif#dark-mode-only) Use ClearML's web interface to edit task details, like configuration parameters or input models, then execute the task with the new configuration on a remote machine: @@ -145,5 +145,5 @@ task.execute_remotely(queue_name='default', exit_process=True) ## Hyperparameter Optimization Use ClearML's [`HyperParameterOptimizer`](../references/sdk/hpo_optimization_hyperparameteroptimizer.md) class to find -the hyperparameter values that yield the best performing models. See [Hyperparameter Optimization](../fundamentals/hpo.md) +the hyperparameter values that yield the best performing models. See [Hyperparameter Optimization](../getting_started/hpo.md) for more information. diff --git a/docs/integrations/yolov5.md b/docs/integrations/yolov5.md index 6690cf75..9818d8c9 100644 --- a/docs/integrations/yolov5.md +++ b/docs/integrations/yolov5.md @@ -7,7 +7,7 @@ built in logger: * Track every YOLOv5 training run in ClearML * Version and easily access your custom training data with [ClearML Data](../clearml_data/clearml_data.md) * Remotely train and monitor your YOLOv5 training runs using [ClearML Agent](../clearml_agent.md) -* Get the very best mAP using ClearML [Hyperparameter Optimization](../fundamentals/hpo.md) +* Get the very best mAP using ClearML [Hyperparameter Optimization](../getting_started/hpo.md) * Turn your newly trained YOLOv5 model into an API with just a few commands using [ClearML Serving](../clearml_serving/clearml_serving.md) ## Setup @@ -169,7 +169,8 @@ and shuts down instances as needed, according to a resource budget that you set. ### Cloning, Editing, and Enqueuing -![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5.gif) +![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5.gif#light-mode-only) +![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5_dark.gif#dark-mode-only) Use ClearML's web interface to edit task details, like configuration parameters or input models, then execute the task with the new configuration on a remote machine: diff --git a/docs/integrations/yolov8.md b/docs/integrations/yolov8.md index 90d38321..f3080412 100644 --- a/docs/integrations/yolov8.md +++ b/docs/integrations/yolov8.md @@ -166,4 +166,5 @@ with the new configuration on a remote machine: The ClearML Agent executing the task will use the new values to [override any hard coded values](../clearml_agent.md). -![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5.gif) +![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5.gif#light-mode-only) +![Cloning, editing, enqueuing gif](../img/gif/integrations_yolov5_dark.gif#dark-mode-only) diff --git a/docs/overview.md b/docs/overview.md new file mode 100644 index 00000000..12cb5402 --- /dev/null +++ b/docs/overview.md @@ -0,0 +1,82 @@ +--- +id: overview +title: What is ClearML? +slug: / +--- + +# ClearML Documentation + +## Overview +Welcome to the documentation for ClearML, the end to end platform for streamlining AI development and deployment. ClearML consists of three essential layers: +1. [**Infrastructure Control Plane**](#infrastructure-control-plane) (Cloud/On-Prem Agnostic) +2. [**AI Development Center**](#ai-development-center) +3. [**GenAI App Engine**](#genai-app-engine) + +Each layer provides distinct functionality to ensure an efficient and scalable AI workflow from development to deployment. + +![Webapp gif](img/gif/webapp_screenshots.gif#light-mode-only) +![Webapp gif](img/gif/webapp_screenshots_dark.gif#dark-mode-only) + +--- + +## Infrastructure Control Plane +The Infrastructure Control Plane serves as the foundation of the ClearML platform, offering compute resource provisioning and management, enabling administrators to make the compute available through GPUaaS capabilities and no-hassle configuration. +Utilizing the Infrastructure Control Plane, DevOps and IT teams can manage and optimize GPU resources to ensure high performance and cost efficiency. + +#### Features +- **Resource Management:** Automates the allocation and management of GPU resources. +- **Workload Autoscaling:** Seamlessly scale GPU resources based on workload demands. +- **Monitoring and Logging:** Provides comprehensive monitoring and logging for GPU utilization and performance. +- **Cost Optimization:** Consolidate cloud and on-prem compute into a seamless GPUaaS offering +- **Deployment Flexibility:** Easily run your workloads on both cloud and on-premises compute. + +![Infrastructure control plane](img/gif/infra_control_plane.gif#light-mode-only) +![Infrastructure control plane](img/gif/infra_control_plane_dark.gif#dark-mode-only) + +--- + +## AI Development Center +The AI Development Center offers a robust environment for developing, training, and testing AI models. It is designed to be cloud and on-premises agnostic, providing flexibility in deployment. + +#### Features +- **Integrated Development Environment:** A comprehensive IDE for training, testing, and debugging AI models. +- **Model Training:** Scalable and distributed model training and hyperparameter optimization. +- **Data Management:** Tools for data preprocessing, management, and versioning. +- **Experiment Tracking:** Track metrics, artifacts and log. manage versions, and compare results. +- **Workflow Automation:** Build pipelines to formalize your workflow + +![AI Dev center](img/gif/ai_dev_center.gif#light-mode-only) +![AI Dev center](img/gif/ai_dev_center_dark.gif#dark-mode-only) + +--- + +## GenAI App Engine +The GenAI App Engine is designed to deploy large language models (LLM) into GPU clusters and manage various AI workloads, including Retrieval-Augmented Generation (RAG) tasks. This layer also handles networking, authentication, and role-based access control (RBAC) for deployed services. + +#### Features +- **LLM Deployment:** Seamlessly deploy LLMs into GPU clusters. +- **RAG Workloads:** Efficiently manage and execute RAG workloads. +- **Networking and Authentication:** Deploy GenAI through secure, authenticated network endpoints +- **RBAC:** Implement RBAC to control access to deployed services. + +![GenAI engine](img/gif/genai_engine.gif#light-mode-only) +![GenAI engine](img/gif/genai_engine_dark.gif#dark-mode-only) + +--- + +## Getting Started +To begin using the ClearML, follow these steps: +1. **Set Up Infrastructure Control Plane:** Allocate and manage your GPU resources. +2. **Develop AI Models:** Use the AI Development Center to develop and train your models. +3. **Deploy AI Models:** Deploy your models using the GenAI App Engine. + +For detailed instructions on each step, refer to the respective sections in this documentation. + +--- + +## Support +For feature requests or bug reports, see ClearML on [GitHub](https://github.com/clearml/clearml/issues). + +If you have any questions, join the discussion on the **ClearML** [Slack channel](https://joinslack.clear.ml), or tag your questions on [stackoverflow](https://stackoverflow.com/questions/tagged/clearml) with the **clearml** tag. + +Lastly, you can always find us at [support@clearml.ai](mailto:support@clearml.ai?subject=ClearML). \ No newline at end of file diff --git a/docs/pipelines/pipelines.md b/docs/pipelines/pipelines.md index 1785c34f..2c0e742d 100644 --- a/docs/pipelines/pipelines.md +++ b/docs/pipelines/pipelines.md @@ -12,7 +12,8 @@ products such as artifacts and parameters. When run, the controller will sequentially launch the pipeline steps. The pipeline logic and steps can be executed locally, or on any machine using the [clearml-agent](../clearml_agent.md). -![Pipeline UI](../img/pipelines_DAG.png) +![Pipeline UI](../img/webapp_pipeline_DAG.png#light-mode-only) +![Pipeline UI](../img/webapp_pipeline_DAG_dark.png#dark-mode-only) The [Pipeline Run](../webapp/pipelines/webapp_pipeline_viewing.md) page in the web UI displays the pipeline's structure in terms of executed steps and their status, as well as the run's configuration parameters and output. See [pipeline UI](../webapp/pipelines/webapp_pipeline_page.md) diff --git a/docs/remote_session.md b/docs/remote_session.md index b6c2fc85..8d104534 100644 --- a/docs/remote_session.md +++ b/docs/remote_session.md @@ -16,7 +16,7 @@ meets resource needs: * [Clearml Session CLI](apps/clearml_session.md) - Launch an interactive JupyterLab, VS Code, and SSH session on a remote machine: * Automatically store and sync your [interactive session workspace](apps/clearml_session.md#storing-and-synchronizing-workspace) * Replicate a previously executed task's execution environment and [interactively execute and debug](apps/clearml_session.md#starting-a-debugging-session) it on a remote session - * Develop directly inside your Kubernetes pods ([see ClearML Agent](clearml_agent/clearml_agent_deployment.md#kubernetes)) + * Develop directly inside your Kubernetes pods ([see ClearML Agent](clearml_agent/clearml_agent_deployment_k8s.md)) * And more! * GUI Applications (available under ClearML Enterprise Plan) - These apps provide access to remote machines over a secure and encrypted SSH connection, allowing you to work in a remote environment using your preferred development diff --git a/docs/webapp/applications/apps_aws_autoscaler.md b/docs/webapp/applications/apps_aws_autoscaler.md index cfab329f..3068db42 100644 --- a/docs/webapp/applications/apps_aws_autoscaler.md +++ b/docs/webapp/applications/apps_aws_autoscaler.md @@ -319,17 +319,10 @@ to an IAM user, and create credentials keys for that user to configure in the au "ssm:GetParameters", "ssm:GetParameter" ], - "Resource": "arn:aws:ssm:*::parameter/aws/service/marketplace/*" - }, - { - "Sid": "AllowUsingDeeplearningAMIAliases", - "Effect": "Allow", - "Action": [ - "ssm:GetParametersByPath", - "ssm:GetParameters", - "ssm:GetParameter" - ], - "Resource": "arn:aws:ssm:*::parameter/aws/service/deeplearning/*" + "Resource": [ + "arn:aws:ssm:*::parameter/aws/service/marketplace/*", + "arn:aws:ssm:*::parameter/aws/service/deeplearning/*" + ] } ] } diff --git a/docs/webapp/applications/apps_dashboard.md b/docs/webapp/applications/apps_dashboard.md index e122d018..306b80ff 100644 --- a/docs/webapp/applications/apps_dashboard.md +++ b/docs/webapp/applications/apps_dashboard.md @@ -28,13 +28,13 @@ of the chosen metric over time. * Monitored Metric - Series - Metric series (variant) to track * Monitored Metric - Trend - Choose whether to track the monitored metric's highest or lowest values * **Slack Notification** (optional) - Set up Slack integration for notifications of task failure. Select the -`Alert on completed experiments` under `Additional options` to set up alerts for task completions. +`Alert on completed tasks` under `Additional options` to set up alerts for task completions. * API Token - Slack workspace access token * Channel Name - Slack channel to which task failure alerts will be posted * Alert Iteration Threshold - Minimum number of task iterations to trigger Slack alerts (tasks that fail prior to the threshold will be ignored) * **Additional options** - * Track manual (non agent-run) experiments as well - Select to include in the dashboard tasks that were not executed by an agent - * Alert on completed experiments - Select to include completed tasks in alerts: in the dashboard's Task Alerts section and in Slack Alerts. + * Track manual (non agent-run) tasks as well - Select to include in the dashboard tasks that were not executed by an agent + * Alert on completed tasks - Select to include completed tasks in alerts: in the dashboard's Task Alerts section and in Slack Alerts. * **Export Configuration** - Export the app instance configuration as a JSON file, which you can later import to create a new instance with the same configuration. @@ -50,7 +50,7 @@ of the chosen metric over time. Once a project dashboard instance is launched, its dashboard displays the following information about a project: * Task Status Summary - Percentages of Tasks by status * Task Type Summary - Percentages of local tasks vs. agent tasks -* Experiments Summary - Number of tasks by status over time +* Task Summary - Number of tasks by status over time * Monitoring - GPU utilization and GPU memory usage * Metric Monitoring - An aggregated view of the values of a metric over time * Project's Active Workers - Number of workers currently executing tasks in the monitored project diff --git a/docs/webapp/applications/apps_hpo.md b/docs/webapp/applications/apps_hpo.md index 0238b3a6..e7b65f20 100644 --- a/docs/webapp/applications/apps_hpo.md +++ b/docs/webapp/applications/apps_hpo.md @@ -56,18 +56,18 @@ limits. **CONFIGURATION > HYPERPARAMETERS > Hydra**). ::: * **Optimization Job Title** (optional) - Name for the HPO instance. This will appear in the instance list -* **Optimization Experiments Destination Project** (optional) - The project where optimization tasks will be saved. +* **Optimization Tasks Destination Project** (optional) - The project where optimization tasks will be saved. Leave empty to use the same project as the Initial task. * **Maximum Concurrent Tasks** - The maximum number of simultaneously running optimization tasks * **Advanced Configuration** (optional) - * Limit Total HPO Experiments - Maximum total number of optimization tasks - * Number of Top Experiments to Save - Number of best performing tasks to save (the rest are archived) - * Limit Single Experiment Running Time (Minutes) - Time limit per optimization task. Tasks will be + * Limit Total HPO Tasks - Maximum total number of optimization tasks + * Number of Top Tasks to Save - Number of best performing tasks to save (the rest are archived) + * Limit Single Task Running Time (Minutes) - Time limit per optimization task. Tasks will be stopped after the specified time elapsed - * Minimal Number of Iterations Per Single Experiment - Some search methods, such as Optuna, prune underperforming + * Minimal Number of Iterations Per Single Task - Some search methods, such as Optuna, prune underperforming tasks. This is the minimum number of iterations per task before it can be stopped. Iterations are based on the tasks' own reporting (for example, if tasks report every epoch, then iterations=epochs) - * Maximum Number of Iterations Per Single Experiment - Maximum iterations per task after which it will be + * Maximum Number of Iterations Per Single Task - Maximum iterations per task after which it will be stopped. Iterations are based on the tasks' own reporting (for example, if tasks report every epoch, then iterations=epochs) * Limit Total Optimization Instance Time (Minutes) - Time limit for the whole optimization process (in minutes) diff --git a/docs/webapp/applications/apps_llama_deployment.md b/docs/webapp/applications/apps_llama_deployment.md index 1f965d1e..596586b3 100644 --- a/docs/webapp/applications/apps_llama_deployment.md +++ b/docs/webapp/applications/apps_llama_deployment.md @@ -81,6 +81,6 @@ values from the file, which can be modified before launching the app instance
![llama deployment app form](../../img/apps_llama_form.png#light-mode-only) -![llama deployment app form](../../img/apps_llama_form.png#dark-mode-only) +![llama deployment app form](../../img/apps_llama_form_dark.png#dark-mode-only)
\ No newline at end of file diff --git a/docs/webapp/webapp_exp_track_visual.md b/docs/webapp/webapp_exp_track_visual.md index 496daa47..8d8ef485 100644 --- a/docs/webapp/webapp_exp_track_visual.md +++ b/docs/webapp/webapp_exp_track_visual.md @@ -93,7 +93,7 @@ using to set up an environment (`pip` or `conda`) are available. Select which re ### Container The Container section list the following information: -* Image - a pre-configured container that ClearML Agent will use to remotely execute this task (see [Building Docker containers](../clearml_agent/clearml_agent_docker.md)) +* Image - a pre-configured container that ClearML Agent will use to remotely execute this task (see [Building Docker containers](../getting_started/clearml_agent_docker_exec.md)) * Arguments - add container arguments * Setup shell script - a bash script to be executed inside the container before setting up the task's environment diff --git a/docs/webapp/webapp_exp_tuning.md b/docs/webapp/webapp_exp_tuning.md index 6c6ddd96..b63dc423 100644 --- a/docs/webapp/webapp_exp_tuning.md +++ b/docs/webapp/webapp_exp_tuning.md @@ -72,7 +72,7 @@ and/or Reset functions. #### Default Container -Select a pre-configured container that the [ClearML Agent](../clearml_agent.md) will use to remotely execute this task (see [Building Docker containers](../clearml_agent/clearml_agent_docker.md)). +Select a pre-configured container that the [ClearML Agent](../clearml_agent.md) will use to remotely execute this task (see [Building Docker containers](../getting_started/clearml_agent_docker_exec.md)). **To add, change, or delete a default container:** diff --git a/docs/webapp/webapp_model_comparing.md b/docs/webapp/webapp_model_comparing.md index 07be1798..ee98d6d9 100644 --- a/docs/webapp/webapp_model_comparing.md +++ b/docs/webapp/webapp_model_comparing.md @@ -46,8 +46,7 @@ models update. The Enterprise Plan and Hosted Service support embedding resource The comparison tabs provides the following views: * [Side-by-side textual comparison](#side-by-side-textual-comparison) * [Tabular scalar comparison](#tabular-scalar-comparison) -* [Merged plot comparison](#plot-comparison) -* [Side-by-side graphic comparison](#graphic-comparison) +* [Plot comparison](#plot-comparison) ### Side-by-side Textual Comparison diff --git a/docusaurus.config.js b/docusaurus.config.js index d78ff414..ef4f8ead 100644 --- a/docusaurus.config.js +++ b/docusaurus.config.js @@ -68,7 +68,7 @@ module.exports = { }, announcementBar: { id: 'supportus', - content: 'If you ❤️ ️ClearML, ⭐️ us on GitHub!', + content: 'If you ❤️ ️ClearML, ⭐️ us on GitHub!', isCloseable: true, }, navbar: { @@ -82,54 +82,72 @@ module.exports = { }, items: [ { - to: '/docs', - label: 'Docs', + to: '/docs/', + label: 'Overview', position: 'left', + activeBaseRegex: '^/docs/latest/docs/(fundamentals/agents_and_queues|hyper_datasets|clearml_agent(/(clearml_agent_dynamic_gpus|clearml_agent_fractional_gpus)?|)?|cloud_autoscaling/autoscaling_overview|remote_session|model_registry|deploying_clearml/enterprise_deploy/appgw|build_interactive_models|deploying_models|custom_apps)?$', }, { - to:'/docs/hyperdatasets/overview', - label: 'Hyper-Datasets', + to: '/docs/clearml_sdk/clearml_sdk_setup', + label: 'Setup', position: 'left', + activeBaseRegex: '^/docs/latest/docs/(deploying_clearml(?!/enterprise_deploy/appgw(/.*)?$)(/.*)?$|clearml_sdk/clearml_sdk_setup|user_management(/.*)?|clearml_agent/(clearml_agent_setup|clearml_agent_deployment_bare_metal|clearml_agent_deployment_k8s|clearml_agent_deployment_slurm|clearml_agent_execution_env|clearml_agent_env_caching|clearml_agent_services_mode)|integrations/storage)/?$', }, - // {to: 'tutorials', label: 'Tutorials', position: 'left'}, - // Please keep GitHub link to the right for consistency. - {to: '/docs/guides', label: 'Examples', position: 'left'}, - //{to: '/docs/references', label: 'API', position: 'left'}, { - label: 'References', + to: '/docs/getting_started/auto_log_exp', + label: 'Using ClearML', + position: 'left', + activeBaseRegex: '^/docs/latest/docs/(getting_started(?!/video_tutorials(/.*)?)|clearml_serving|apps/clearml_session)(/.*)?$', + }, + { + label: 'Developer Center', position: 'left', // or 'right' + to: '/docs/fundamentals/projects', + activeBaseRegex: '^/docs/latest/docs/(fundamentals(?!/agents_and_queues)(/.*)?|configs/configuring_clearml|getting_started/video_tutorials(/.*)?|clearml_sdk(?!/clearml_sdk_setup)(/.*)?|pipelines(/.*)?|hyperdatasets(/.*)?|clearml_data(/.*)?|hyperdatasets(/webapp)(/.*)?|references(/.*)?|webapp(/.*)?|clearml_agent/(clearml_agent_ref|clearml_agent_env_var)(/.*)?|configs/(clearml_conf|env_vars)(/.*)?|apps/(clearml_task|clearml_param_search)(/.*)?|best_practices(/.*)?|guides(/.*)?|integrations(/.*)?|faq|release_notes(/.*)?)$', + activeClassName: 'navbar__link--active', items: [ { - label: 'SDK', + label: 'ClearML Basics', + to: '/docs/fundamentals/projects', + activeBaseRegex: '^/docs/latest/docs/(fundamentals|getting_started/video_tutorials|clearml_sdk(/(?!clearml_sdk_setup).*|(?=/))?|pipelines|clearml_data|hyperdatasets/(?!webapp/).*)(/.*)?$', + }, + { + label: 'References', to: '/docs/references/sdk/task', + activeBaseRegex: '^/docs/latest/docs/(references/|webapp/.*|hyperdatasets/webapp/.*|clearml_agent/(clearml_agent_ref|clearml_agent_env_var)|configs/(clearml_conf|env_vars)|apps/(clearml_task|clearml_param_search))(/.*)?$', }, { - label: 'ClearML Agent', - to: '/docs/clearml_agent/clearml_agent_ref', + label: 'Best Practices', + to: 'docs/best_practices/data_scientist_best_practices', + activeBaseRegex: '^/docs/latest/docs/best_practices/' }, { - label: 'Server API', - to: '/docs/references/api', + label: 'Tutorials', + to: '/docs/guides', + activeBaseRegex: '^/docs/latest/docs/guides', }, { - label: 'Hyper-Datasets', - to: '/docs/references/hyperdataset', + label: 'Code Integrations', + to: '/docs/integrations', + activeBaseRegex: '^/docs/latest/docs/integrations(?!/storage)', + }, + { + label: 'FAQ', + to: '/docs/faq', + activeBaseRegex: '^/docs/latest/docs/faq$', }, - { label: 'Release Notes', to: '/docs/release_notes/clearml_server/open_source/ver_2_0', + activeBaseRegex: '^/docs/latest/docs/release_notes/', }, - { - label: 'Community Resources', - to: '/docs/community', - } + ], }, { - label: 'FAQ', + label: 'Community Resources', position: 'left', // or 'right' - to: '/docs/faq' + to: '/docs/latest/docs/community', }, { href: 'https://joinslack.clear.ml', @@ -150,7 +168,7 @@ module.exports = { 'aria-label': 'Twitter', }, { - href: 'https://github.com/allegroai/clearml', + href: 'https://github.com/clearml/clearml', position: 'right', className: 'header-ico header-ico--github', 'aria-label': 'GitHub repository', @@ -197,7 +215,7 @@ module.exports = { }, { label: 'GitHub', - href: 'https://github.com/allegroai/clearml', + href: 'https://github.com/clearml/clearml', }, ], }, @@ -215,13 +233,13 @@ module.exports = { // Please change this to your repo. breadcrumbs: false, editUrl: - 'https://github.com/allegroai/clearml-docs/edit/main/', + 'https://github.com/clearml/clearml-docs/edit/main/', }, // API: { // sidebarPath: require.resolve('./sidebars.js'), // // Please change this to your repo. // editUrl: - // 'https://github.com/allegroai/clearml-docs/edit/main/', + // 'https://github.com/clearml/clearml-docs/edit/main/', // }, blog: { blogTitle: 'ClearML Tutorials', @@ -231,7 +249,7 @@ module.exports = { showReadingTime: true, // Please change this to your repo. editUrl: - 'https://github.com/allegroai/clearml-docs/edit/main/tutorials/', + 'https://github.com/clearml/clearml-docs/edit/main/tutorials/', }, theme: { customCss: require.resolve('./src/css/custom.css'), diff --git a/package-lock.json b/package-lock.json index d146110c..5a47cfd3 100644 --- a/package-lock.json +++ b/package-lock.json @@ -15,7 +15,7 @@ "@docusaurus/plugin-google-analytics": "^3.6.1", "@docusaurus/plugin-google-gtag": "^3.6.1", "@docusaurus/preset-classic": "^3.6.1", - "@easyops-cn/docusaurus-search-local": "^0.48.0", + "@easyops-cn/docusaurus-search-local": "^0.48.5", "@mdx-js/react": "^3.0.0", "clsx": "^1.1.1", "joi": "^17.4.0", diff --git a/package.json b/package.json index a9144041..27874081 100644 --- a/package.json +++ b/package.json @@ -23,7 +23,7 @@ "@docusaurus/plugin-google-analytics": "^3.6.1", "@docusaurus/plugin-google-gtag": "^3.6.1", "@docusaurus/preset-classic": "^3.6.1", - "@easyops-cn/docusaurus-search-local": "^0.48.0", + "@easyops-cn/docusaurus-search-local": "^0.48.5", "@mdx-js/react": "^3.0.0", "clsx": "^1.1.1", "medium-zoom": "^1.0.6", diff --git a/sidebars.js b/sidebars.js index 29abd359..b6414ef3 100644 --- a/sidebars.js +++ b/sidebars.js @@ -9,293 +9,120 @@ module.exports = { mainSidebar: [ - {'Getting Started': ['getting_started/main', { - 'Where do I start?': [{'Data Scientists': ['getting_started/ds/ds_first_steps', 'getting_started/ds/ds_second_steps', 'getting_started/ds/best_practices']}, - {'MLOps and LLMOps': ['getting_started/mlops/mlops_first_steps','getting_started/mlops/mlops_second_steps','getting_started/mlops/mlops_best_practices']}] - }, 'getting_started/architecture', {'Video Tutorials': - [ - 'getting_started/video_tutorials/quick_introduction', - 'getting_started/video_tutorials/core_component_overview', - 'getting_started/video_tutorials/experiment_manager_hands-on', - 'getting_started/video_tutorials/experiment_management_best_practices', - 'getting_started/video_tutorials/agent_remote_execution_and_automation', - 'getting_started/video_tutorials/hyperparameter_optimization', - 'getting_started/video_tutorials/pipelines_from_code', - 'getting_started/video_tutorials/pipelines_from_tasks', - 'getting_started/video_tutorials/clearml-data', - 'getting_started/video_tutorials/the_clearml_autoscaler', - 'getting_started/video_tutorials/hyperdatasets_data_versioning', + { + type: 'doc', + id: 'overview', + label: 'ClearML at a Glance', + }, + { + type: 'category', + collapsible: true, + label: 'Infrastructure Control Plane (GPUaaS)', + items: [ + 'fundamentals/agents_and_queues', + 'clearml_agent', + 'clearml_agent/clearml_agent_dynamic_gpus', + 'clearml_agent/clearml_agent_fractional_gpus', + 'cloud_autoscaling/autoscaling_overview', + 'remote_session' + ] + }, + { + type: 'category', + collapsible: true, + label: 'AI Development Center', + items: [ + 'clearml_sdk/clearml_sdk', + 'pipelines/pipelines', + 'clearml_data/clearml_data', + 'hyper_datasets', + 'model_registry', + ] + }, + { + type: 'category', + collapsible: true, + label: 'GenAI App Engine', + items: [ + 'deploying_clearml/enterprise_deploy/appgw', + 'build_interactive_models', + 'deploying_models', + 'custom_apps' + ] + }, + ], + usecaseSidebar: [ + /*'getting_started/main',*/ + 'getting_started/auto_log_exp', + 'getting_started/track_tasks', + 'getting_started/reproduce_tasks', + 'getting_started/logging_using_artifacts', + 'getting_started/data_management', + 'getting_started/remote_execution', + 'getting_started/building_pipelines', + 'getting_started/hpo', + 'getting_started/clearml_agent_docker_exec', + 'getting_started/clearml_agent_base_docker', + 'getting_started/clearml_agent_scheduling', + {"Deploying Model Endpoints": [ { - 'Hands-on MLOps Tutorials':[ - 'getting_started/video_tutorials/hands-on_mlops_tutorials/how_clearml_is_used_by_a_data_scientist', - 'getting_started/video_tutorials/hands-on_mlops_tutorials/how_clearml_is_used_by_an_mlops_engineer', - 'getting_started/video_tutorials/hands-on_mlops_tutorials/ml_ci_cd_using_github_actions_and_clearml' + type: 'category', + collapsible: true, + collapsed: true, + label: 'ClearML Serving', + link: {type: 'doc', id: 'clearml_serving/clearml_serving'}, + items: ['clearml_serving/clearml_serving_setup', 'clearml_serving/clearml_serving_cli', 'clearml_serving/clearml_serving_tutorial'] + }, + { + type: 'category', + collapsible: true, + collapsed: true, + label: 'Model Launchers', + items: [ + 'webapp/applications/apps_embed_model_deployment', + 'webapp/applications/apps_model_deployment', + 'webapp/applications/apps_llama_deployment' ] - } - ]}]}, - {'ClearML Fundamentals': [ - 'fundamentals/projects', 'fundamentals/task', 'fundamentals/hyperparameters', - 'fundamentals/artifacts', 'fundamentals/models', 'fundamentals/logger', 'fundamentals/agents_and_queues', - 'fundamentals/hpo' - ] - }, - { - type: 'category', - collapsible: true, - collapsed: true, - label: 'ClearML SDK', - link: {type: 'doc', id: 'clearml_sdk/clearml_sdk'}, - items: ['clearml_sdk/task_sdk', 'clearml_sdk/model_sdk', 'clearml_sdk/apiclient_sdk'] - }, - { - type: 'category', - collapsible: true, - collapsed: true, - label: 'ClearML Agent', - link: {type: 'doc', id: 'clearml_agent'}, - items: ['clearml_agent/clearml_agent_setup', 'clearml_agent/clearml_agent_deployment', - 'clearml_agent/clearml_agent_execution_env', 'clearml_agent/clearml_agent_env_caching', - 'clearml_agent/clearml_agent_dynamic_gpus', 'clearml_agent/clearml_agent_fractional_gpus', - 'clearml_agent/clearml_agent_services_mode', 'clearml_agent/clearml_agent_docker', - 'clearml_agent/clearml_agent_scheduling'] - }, - { - type: 'category', - collapsible: true, - collapsed: true, - label: 'Cloud Autoscaling', - link: {type: 'doc', id: 'cloud_autoscaling/autoscaling_overview'}, - items: [ - {'Autoscaler Apps': [ - 'webapp/applications/apps_aws_autoscaler', - 'webapp/applications/apps_gcp_autoscaler', - ] - } - ] - }, - { - type: 'category', - collapsible: true, - collapsed: true, - label: 'ClearML Pipelines', - link: {type: 'doc', id: 'pipelines/pipelines'}, - items: [{"Building Pipelines": - ['pipelines/pipelines_sdk_tasks', 'pipelines/pipelines_sdk_function_decorators'] - } - ] - }, - { - type: 'category', - collapsible: true, - collapsed: true, - label: 'ClearML Data', - link: {type: 'doc', id: 'clearml_data/clearml_data'}, - items: ['clearml_data/clearml_data_cli', 'clearml_data/clearml_data_sdk', 'clearml_data/best_practices', - { - type: 'category', - collapsible: true, - collapsed: true, - label: 'Workflows', - link: {type: 'doc', id: 'clearml_data/data_management_examples/workflows'}, - items: [ - 'clearml_data/data_management_examples/data_man_simple', - 'clearml_data/data_management_examples/data_man_folder_sync', - 'clearml_data/data_management_examples/data_man_cifar_classification', - 'clearml_data/data_management_examples/data_man_python' - ] - }, - ] - }, - 'hyper_datasets', - 'model_registry', - { - type: 'category', - collapsible: true, - collapsed: true, - label: 'Remote IDE', - link: {type: 'doc', id: 'remote_session'}, - items: [ - 'apps/clearml_session', - {type: 'ref', id: 'webapp/applications/apps_ssh_session'}, - {type: 'ref', id: 'webapp/applications/apps_jupyter_lab'}, - {type: 'ref', id: 'webapp/applications/apps_vscode'} - ] - }, - { - type: 'category', - collapsible: true, - collapsed: true, - label: 'ClearML Serving', - link: {type: 'doc', id: 'clearml_serving/clearml_serving'}, - items: ['clearml_serving/clearml_serving_setup', 'clearml_serving/clearml_serving_cli', 'clearml_serving/clearml_serving_tutorial'] - }, - {'CLI Tools': [ - 'apps/clearml_task', - {type: 'ref', id: 'clearml_agent/clearml_agent_ref'}, - {type: 'ref', id: 'clearml_data/clearml_data_cli'}, - 'apps/clearml_param_search', - {type: 'ref', id: 'apps/clearml_session'}, - {type: 'ref', id: 'clearml_serving/clearml_serving_cli'}, - ] - }, - {'Integrations': [ - 'integrations/autokeras', - 'integrations/catboost', - 'integrations/click', - 'integrations/fastai', - {"Hugging Face": ['integrations/transformers', 'integrations/accelerate']}, - 'integrations/hydra', 'integrations/jsonargparse', - 'integrations/keras', 'integrations/keras_tuner', - 'integrations/langchain', - 'integrations/lightgbm', 'integrations/matplotlib', - 'integrations/megengine', 'integrations/monai', 'integrations/tao', - {"OpenMMLab":['integrations/mmcv', 'integrations/mmengine']}, - 'integrations/optuna', - 'integrations/python_fire', 'integrations/pytorch', - 'integrations/ignite', - 'integrations/pytorch_lightning', - 'integrations/scikit_learn', 'integrations/seaborn', - 'integrations/splunk', - 'integrations/tensorboard', 'integrations/tensorboardx', 'integrations/tensorflow', - 'integrations/xgboost', 'integrations/yolov5', 'integrations/yolov8' - ] - }, - 'integrations/storage', - { - type: 'category', - collapsible: true, - collapsed: true, - label: 'WebApp', - link: {type: 'doc', id: 'webapp/webapp_overview'}, - items: [ - 'webapp/webapp_home', - { - 'Projects': [ - 'webapp/webapp_projects_page', - 'webapp/webapp_project_overview', - { - 'Tasks': ['webapp/webapp_exp_table', 'webapp/webapp_exp_track_visual', 'webapp/webapp_exp_reproducing', 'webapp/webapp_exp_tuning', - 'webapp/webapp_exp_comparing'] - }, - { - 'Models': ['webapp/webapp_model_table', 'webapp/webapp_model_viewing', 'webapp/webapp_model_comparing'] - }, - 'webapp/webapp_exp_sharing' - ] - }, - { - 'Datasets':[ - 'webapp/datasets/webapp_dataset_page', 'webapp/datasets/webapp_dataset_viewing' - ] - }, - { - 'Pipelines':[ - 'webapp/pipelines/webapp_pipeline_page', 'webapp/pipelines/webapp_pipeline_table', 'webapp/pipelines/webapp_pipeline_viewing' - ] - }, - 'webapp/webapp_model_endpoints', - 'webapp/webapp_reports', - { - type: 'category', - collapsible: true, - collapsed: true, - label: 'Orchestration', - link: {type: 'doc', id: 'webapp/webapp_workers_queues'}, - items: ['webapp/webapp_orchestration_dash', 'webapp/resource_policies'] - }, - { - type: 'category', - collapsible: true, - collapsed: true, - label: 'ClearML Applications', - link: {type: 'doc', id: 'webapp/applications/apps_overview'}, - items: [ - { - "General": [ - 'webapp/applications/apps_hpo', - 'webapp/applications/apps_dashboard', - 'webapp/applications/apps_task_scheduler', - 'webapp/applications/apps_trigger_manager', - ] - }, - { - "AI Dev": [ - 'webapp/applications/apps_ssh_session', - 'webapp/applications/apps_jupyter_lab', - 'webapp/applications/apps_vscode', - ] - }, - { - "UI Dev": [ - 'webapp/applications/apps_gradio', - 'webapp/applications/apps_streamlit' - ] - }, - { - "Deploy": [ - 'webapp/applications/apps_embed_model_deployment', - 'webapp/applications/apps_model_deployment', - 'webapp/applications/apps_llama_deployment' - ] - }, - ] - }, - { - type: 'category', - collapsible: true, - collapsed: true, - label: 'Settings', - link: {type: 'doc', id: 'webapp/settings/webapp_settings_overview'}, - items: ['webapp/settings/webapp_settings_profile', - 'webapp/settings/webapp_settings_admin_vaults', 'webapp/settings/webapp_settings_users', - 'webapp/settings/webapp_settings_access_rules', 'webapp/settings/webapp_settings_id_providers', - 'webapp/settings/webapp_settings_resource_configs', 'webapp/settings/webapp_settings_usage_billing', - 'webapp/settings/webapp_settings_storage_credentials' - ] - }, - ] - }, - { - type: 'category', - collapsible: true, - collapsed: true, - label: 'Configuring ClearML', - link: {type: 'doc', id: 'configs/configuring_clearml'}, - items: ['configs/clearml_conf', 'configs/env_vars'] - }, - {'User Management': [ - 'user_management/user_groups', - 'user_management/access_rules', - 'user_management/admin_vaults', - 'user_management/identity_providers' - ] - }, - { - type: 'category', - collapsible: true, - collapsed: true, - label: 'ClearML Server', - link: {type: 'doc', id: 'deploying_clearml/clearml_server'}, - items: [ - {'Deploying ClearML Server': - ['deploying_clearml/clearml_server_aws_ec2_ami', 'deploying_clearml/clearml_server_gcp', - 'deploying_clearml/clearml_server_linux_mac', 'deploying_clearml/clearml_server_win', - 'deploying_clearml/clearml_server_kubernetes_helm'] - }, - {'Upgrading ClearML Server': - ['deploying_clearml/upgrade_server_aws_ec2_ami','deploying_clearml/upgrade_server_gcp', - 'deploying_clearml/upgrade_server_linux_mac', 'deploying_clearml/upgrade_server_win', - 'deploying_clearml/upgrade_server_kubernetes_helm', - 'deploying_clearml/clearml_server_es7_migration', 'deploying_clearml/clearml_server_mongo44_migration'] - }, - 'deploying_clearml/clearml_server_config', 'deploying_clearml/clearml_server_security' - ] - }, - - //'Comments': ['Notes'], - - - + } + ]}, + {"Launching a Remote IDE": [ + 'apps/clearml_session', + {type: 'ref', id: 'webapp/applications/apps_ssh_session'}, + {type: 'ref', id: 'webapp/applications/apps_jupyter_lab'}, + {type: 'ref', id: 'webapp/applications/apps_vscode'} + ]}, + {"Building Interactive Model Demos": [ + {type: 'ref', id: 'webapp/applications/apps_gradio'}, + {type: 'ref', id: 'webapp/applications/apps_streamlit'}, + ]}, + 'getting_started/task_trigger_schedule', + 'getting_started/project_progress', + ], + integrationsSidebar: [ + { + type: 'doc', + label: 'Overview', + id: 'integrations/integrations', + }, + 'integrations/autokeras', + 'integrations/catboost', + 'integrations/click', + 'integrations/fastai', + {"Hugging Face": ['integrations/transformers', 'integrations/accelerate']}, + 'integrations/hydra', 'integrations/jsonargparse', + 'integrations/keras', 'integrations/keras_tuner', + 'integrations/langchain', + 'integrations/lightgbm', 'integrations/matplotlib', + 'integrations/megengine', 'integrations/monai', 'integrations/tao', + {"OpenMMLab":['integrations/mmcv', 'integrations/mmengine']}, + 'integrations/optuna', + 'integrations/python_fire', 'integrations/pytorch', + 'integrations/ignite', + 'integrations/pytorch_lightning', + 'integrations/scikit_learn', 'integrations/seaborn', + 'integrations/splunk', + 'integrations/tensorboard', 'integrations/tensorboardx', 'integrations/tensorflow', + 'integrations/xgboost', 'integrations/yolov5', 'integrations/yolov8' ], guidesSidebar: [ 'guides/guidemain', @@ -304,6 +131,7 @@ module.exports = { {'ClearML Task': ['guides/clearml-task/clearml_task_tutorial']}, {'ClearML Agent': ['guides/clearml_agent/executable_exp_containers', 'guides/clearml_agent/exp_environment_containers', 'guides/clearml_agent/reproduce_exp']}, {'Datasets': ['clearml_data/data_management_examples/data_man_cifar_classification', 'clearml_data/data_management_examples/data_man_python']}, + {id: 'hyperdatasets/code_examples', type: 'doc', label: 'Hyper-Datasets'}, {'Distributed': ['guides/distributed/distributed_pytorch_example', 'guides/distributed/subprocess_example']}, {'Docker': ['guides/docker/extra_docker_shell_script']}, {'Frameworks': [ @@ -342,7 +170,6 @@ module.exports = { {'Offline Mode':['guides/set_offline']}, {'Optimization': ['guides/optimization/hyper-parameter-optimization/examples_hyperparam_opt']}, {'Pipelines': ['guides/pipeline/pipeline_controller', 'guides/pipeline/pipeline_decorator', 'guides/pipeline/pipeline_functions']}, - {'Reporting': ['guides/reporting/explicit_reporting','guides/reporting/3d_plots_reporting', 'guides/reporting/artifacts', 'guides/reporting/using_artifacts', 'guides/reporting/clearml_logging_example', 'guides/reporting/html_reporting', 'guides/reporting/hyper_parameters', 'guides/reporting/image_reporting', 'guides/reporting/manual_matplotlib_reporting', 'guides/reporting/media_reporting', 'guides/reporting/model_config', 'guides/reporting/pandas_reporting', 'guides/reporting/plotly_reporting', @@ -352,6 +179,112 @@ module.exports = { {'Web UI': ['guides/ui/building_leader_board','guides/ui/tuning_exp']} ], + knowledgeSidebar: [ + {'Fundamentals': [ + 'fundamentals/projects', + 'fundamentals/task', + 'fundamentals/hyperparameters', + 'fundamentals/artifacts', + 'fundamentals/models', + 'fundamentals/logger', + ]}, + { + type: 'category', + collapsible: true, + collapsed: true, + label: 'ClearML SDK', + link: {type: 'doc', id: 'clearml_sdk/clearml_sdk'}, + items: [ + 'clearml_sdk/task_sdk', + 'clearml_sdk/model_sdk', + 'hyperdatasets/task', + 'clearml_sdk/hpo_sdk', + 'clearml_sdk/apiclient_sdk' + ] + }, + { + type: 'category', + collapsible: true, + collapsed: true, + label: 'ClearML Pipelines', + link: {type: 'doc', id: 'pipelines/pipelines'}, + items: [{ + "Building Pipelines": [ + 'pipelines/pipelines_sdk_tasks', + 'pipelines/pipelines_sdk_function_decorators' + ] + }] + }, + { + type: 'category', + collapsible: true, + collapsed: true, + label: 'ClearML Data', + link: {type: 'doc', id: 'clearml_data/clearml_data'}, + items: [ + 'clearml_data/clearml_data_cli', + 'clearml_data/clearml_data_sdk', + { + type: 'category', + collapsible: true, + collapsed: true, + label: 'Workflows', + link: {type: 'doc', id: 'clearml_data/data_management_examples/workflows'}, + items: [ + 'clearml_data/data_management_examples/data_man_simple', + 'clearml_data/data_management_examples/data_man_folder_sync', + 'clearml_data/data_management_examples/data_man_cifar_classification', + 'clearml_data/data_management_examples/data_man_python' + ] + }, + ] + }, + { + type: 'category', + collapsible: true, + collapsed: true, + label: 'Hyper-Datasets', + link: {type: 'doc', id: 'hyperdatasets/overview'}, + items: [ + 'hyperdatasets/dataset', + { + type: 'category', + collapsible: true, + collapsed: true, + label: 'Frames', + link: {type: 'doc', id: 'hyperdatasets/frames'}, + items: [ + 'hyperdatasets/single_frames', + 'hyperdatasets/frame_groups', + 'hyperdatasets/sources', + 'hyperdatasets/annotations', + 'hyperdatasets/masks', + 'hyperdatasets/previews', + 'hyperdatasets/custom_metadata' + ] + }, + 'hyperdatasets/dataviews', + ] + }, + {'Video Tutorials': [ + 'getting_started/video_tutorials/quick_introduction', + 'getting_started/video_tutorials/core_component_overview', + 'getting_started/video_tutorials/experiment_manager_hands-on', + 'getting_started/video_tutorials/experiment_management_best_practices', + 'getting_started/video_tutorials/agent_remote_execution_and_automation', + 'getting_started/video_tutorials/hyperparameter_optimization', + 'getting_started/video_tutorials/pipelines_from_code', + 'getting_started/video_tutorials/pipelines_from_tasks', + 'getting_started/video_tutorials/clearml-data', + 'getting_started/video_tutorials/the_clearml_autoscaler', + 'getting_started/video_tutorials/hyperdatasets_data_versioning', + {'Hands-on MLOps Tutorials': [ + 'getting_started/video_tutorials/hands-on_mlops_tutorials/how_clearml_is_used_by_a_data_scientist', + 'getting_started/video_tutorials/hands-on_mlops_tutorials/how_clearml_is_used_by_an_mlops_engineer', + 'getting_started/video_tutorials/hands-on_mlops_tutorials/ml_ci_cd_using_github_actions_and_clearml' + ]} + ]}, + ], rnSidebar: [ {'Server': [ { @@ -383,7 +316,7 @@ module.exports = { 'release_notes/clearml_server/enterprise/ver_3_24', { 'Older Versions': [ - 'release_notes/clearml_server/enterprise/ver_3_23','release_notes/clearml_server/enterprise/ver_3_22', + 'release_notes/clearml_server/enterprise/ver_3_23', 'release_notes/clearml_server/enterprise/ver_3_22', 'release_notes/clearml_server/enterprise/ver_3_21', 'release_notes/clearml_server/enterprise/ver_3_20' ] } @@ -456,7 +389,8 @@ module.exports = { ] } ], - sdkSidebar: [ + referenceSidebar: [ + {'SDK': [ 'references/sdk/task', 'references/sdk/logger', {'Model': ['references/sdk/model_model', @@ -481,59 +415,298 @@ module.exports = { 'references/sdk/hpo_parameters_uniformintegerparameterrange', 'references/sdk/hpo_parameters_uniformparameterrange', 'references/sdk/hpo_parameters_parameterset', - ]}, - ], - clearmlAgentSidebar: [ - 'clearml_agent/clearml_agent_ref', 'clearml_agent/clearml_agent_env_var' - ], - hyperdatasetsSidebar: [ - 'hyperdatasets/overview', - {'Frames': [ - 'hyperdatasets/frames', - 'hyperdatasets/single_frames', - 'hyperdatasets/frame_groups', - 'hyperdatasets/sources', - 'hyperdatasets/annotations', - 'hyperdatasets/masks', - 'hyperdatasets/previews', - 'hyperdatasets/custom_metadata' ]}, - 'hyperdatasets/dataset', - 'hyperdatasets/dataviews', - 'hyperdatasets/task', - {'WebApp': [ - {'Projects': [ - 'hyperdatasets/webapp/webapp_dataviews', 'hyperdatasets/webapp/webapp_exp_track_visual', - 'hyperdatasets/webapp/webapp_exp_modifying', 'hyperdatasets/webapp/webapp_exp_comparing', - ] - }, - {'Datasets': [ - 'hyperdatasets/webapp/webapp_datasets', - 'hyperdatasets/webapp/webapp_datasets_versioning', - 'hyperdatasets/webapp/webapp_datasets_frames' - ] - }, - 'hyperdatasets/webapp/webapp_annotator' + {'Enterprise Hyper-Datasets': [ + {'Hyper-Dataset': [ + 'references/hyperdataset/hyperdataset', + 'references/hyperdataset/hyperdatasetversion' + ]}, + {'DataFrame': [ + 'references/hyperdataset/singleframe', + 'references/hyperdataset/framegroup', + 'references/hyperdataset/annotation', + ]}, + 'references/hyperdataset/dataview', + ]}, + ]}, + {'CLI Tools': [ + 'apps/clearml_task', + {type: 'ref', id: 'clearml_data/clearml_data_cli'}, + 'apps/clearml_param_search', + {type: 'ref', id: 'apps/clearml_session'}, + {type: 'ref', id: 'clearml_serving/clearml_serving_cli'}, + ] }, + {'ClearML Agent': [ + 'clearml_agent/clearml_agent_ref', 'clearml_agent/clearml_agent_env_var' + ]}, + { + type: 'category', + collapsible: true, + collapsed: true, + label: 'Client Configuration', + link: {type: 'doc', id: 'configs/configuring_clearml'}, + items: [ + 'configs/clearml_conf', + 'configs/env_vars' + ] + }, + {'Server API': [ + 'references/api/index', + 'references/api/definitions', + 'references/api/login', + 'references/api/debug', + 'references/api/projects', + 'references/api/queues', + 'references/api/workers', + 'references/api/events', + 'references/api/models', + 'references/api/tasks', + ]}, + { + type: 'category', + collapsible: true, + collapsed: true, + label: 'WebApp', + link: {type: 'doc', id: 'webapp/webapp_overview'}, + items: [ + 'webapp/webapp_home', + {'Projects': [ + 'webapp/webapp_projects_page', + 'webapp/webapp_project_overview', + {'Tasks': [ + 'webapp/webapp_exp_table', + 'webapp/webapp_exp_track_visual', + 'webapp/webapp_exp_reproducing', + 'webapp/webapp_exp_tuning', + 'webapp/webapp_exp_comparing' + ]}, + {'Models': [ + 'webapp/webapp_model_table', + 'webapp/webapp_model_viewing', + 'webapp/webapp_model_comparing' + ]}, + {'Dataviews': [ + 'hyperdatasets/webapp/webapp_dataviews', + 'hyperdatasets/webapp/webapp_exp_track_visual', + 'hyperdatasets/webapp/webapp_exp_modifying', + 'hyperdatasets/webapp/webapp_exp_comparing' + ]}, + 'webapp/webapp_exp_sharing' + ]}, + {'Datasets': [ + 'webapp/datasets/webapp_dataset_page', + 'webapp/datasets/webapp_dataset_viewing' + ]}, + {'Hyper-Datasets': [ + 'hyperdatasets/webapp/webapp_datasets', + 'hyperdatasets/webapp/webapp_datasets_versioning', + 'hyperdatasets/webapp/webapp_datasets_frames', + 'hyperdatasets/webapp/webapp_annotator' + ]}, + {'Pipelines': [ + 'webapp/pipelines/webapp_pipeline_page', + 'webapp/pipelines/webapp_pipeline_table', + 'webapp/pipelines/webapp_pipeline_viewing' + ]}, + 'webapp/webapp_model_endpoints', + 'webapp/webapp_reports', + { + type: 'category', + collapsible: true, + collapsed: true, + label: 'Orchestration', + link: {type: 'doc', id: 'webapp/webapp_workers_queues'}, + items: [ + 'webapp/webapp_orchestration_dash', + { + type: 'category', + collapsible: true, + collapsed: true, + label: 'Autoscalers', + items: [ + 'webapp/applications/apps_aws_autoscaler', + 'webapp/applications/apps_gcp_autoscaler', + ] + }, + 'webapp/resource_policies' + ] + }, + { + type: 'category', + collapsible: true, + collapsed: true, + label: 'ClearML Applications', + link: {type: 'doc', id: 'webapp/applications/apps_overview'}, + items: [ + {"General": [ + 'webapp/applications/apps_hpo', + 'webapp/applications/apps_dashboard', + 'webapp/applications/apps_task_scheduler', + 'webapp/applications/apps_trigger_manager', + ]}, + {"AI Dev": [ + 'webapp/applications/apps_ssh_session', + 'webapp/applications/apps_jupyter_lab', + 'webapp/applications/apps_vscode', + ]}, + {"UI Dev": [ + 'webapp/applications/apps_gradio', + 'webapp/applications/apps_streamlit' + ]}, + {"Deploy": [ + 'webapp/applications/apps_embed_model_deployment', + 'webapp/applications/apps_model_deployment', + 'webapp/applications/apps_llama_deployment' + ]}, + ] + }, + { + type: 'category', + collapsible: true, + collapsed: true, + label: 'Settings', + link: {type: 'doc', id: 'webapp/settings/webapp_settings_overview'}, + items: [ + 'webapp/settings/webapp_settings_profile', + 'webapp/settings/webapp_settings_admin_vaults', + 'webapp/settings/webapp_settings_users', + 'webapp/settings/webapp_settings_access_rules', + 'webapp/settings/webapp_settings_id_providers', + 'webapp/settings/webapp_settings_resource_configs', + 'webapp/settings/webapp_settings_usage_billing', + 'webapp/settings/webapp_settings_storage_credentials' + ] + }, ] }, - 'hyperdatasets/code_examples' ], - sdkHyperDataset: [ - {'Hyper-Dataset': ['references/hyperdataset/hyperdataset', 'references/hyperdataset/hyperdatasetversion']}, - {'DataFrame': ['references/hyperdataset/singleframe', - 'references/hyperdataset/framegroup', 'references/hyperdataset/annotation',]}, - 'references/hyperdataset/dataview', + installationSidebar: [ + 'clearml_sdk/clearml_sdk_setup', + { + type: 'category', + collapsible: true, + collapsed: true, + label: 'ClearML Agent', + items: [ + 'clearml_agent/clearml_agent_setup', + { + 'Deployment': [ + 'clearml_agent/clearml_agent_deployment_bare_metal', + 'clearml_agent/clearml_agent_deployment_k8s', + 'clearml_agent/clearml_agent_deployment_slurm', + ] + }, + 'clearml_agent/clearml_agent_execution_env', + 'clearml_agent/clearml_agent_env_caching', + 'clearml_agent/clearml_agent_services_mode', + ] + }, + { + type: 'doc', + label: 'Configuring Client Storage Access', + id: 'integrations/storage', + }, + { + type: 'category', + collapsible: true, + collapsed: true, + label: 'Open Source Server', + link: {type: 'doc', id: 'deploying_clearml/clearml_server'}, + items: [ + {'Deployment Options': [ + 'deploying_clearml/clearml_server_aws_ec2_ami', + 'deploying_clearml/clearml_server_gcp', + 'deploying_clearml/clearml_server_linux_mac', + 'deploying_clearml/clearml_server_win', + 'deploying_clearml/clearml_server_kubernetes_helm' + ]}, + 'deploying_clearml/clearml_server_config', + 'deploying_clearml/clearml_server_security', + {'Server Upgrade Procedures': [ + 'deploying_clearml/upgrade_server_aws_ec2_ami', + 'deploying_clearml/upgrade_server_gcp', + 'deploying_clearml/upgrade_server_linux_mac', + 'deploying_clearml/upgrade_server_win', + 'deploying_clearml/upgrade_server_kubernetes_helm', + 'deploying_clearml/clearml_server_es7_migration', + 'deploying_clearml/clearml_server_mongo44_migration' + ]}, + ] + }, +/* {'Getting Started': [ + 'getting_started/architecture', + ]},*/ + { + 'Enterprise Server Deployment': [ + 'deploying_clearml/enterprise_deploy/multi_tenant_k8s', + 'deploying_clearml/enterprise_deploy/vpc_aws', + 'deploying_clearml/enterprise_deploy/on_prem_ubuntu', + ] + }, + { + type: 'category', + collapsible: true, + collapsed: true, + label: 'ClearML Application Gateway', + items: [ + 'deploying_clearml/enterprise_deploy/appgw_install_compose', + 'deploying_clearml/enterprise_deploy/appgw_install_k8s', + ] + }, + 'deploying_clearml/enterprise_deploy/custom_billing', + 'deploying_clearml/enterprise_deploy/delete_tenant', + 'deploying_clearml/enterprise_deploy/import_projects', + 'deploying_clearml/enterprise_deploy/change_artifact_links', + { + 'Enterprise Applications': [ + 'deploying_clearml/enterprise_deploy/app_install_ubuntu_on_prem', + 'deploying_clearml/enterprise_deploy/app_install_ex_server', + 'deploying_clearml/enterprise_deploy/app_custom', + ] + }, + { + 'User Management': [ + 'user_management/user_groups', + 'user_management/access_rules', + 'user_management/admin_vaults', + { + type: 'category', + collapsible: true, + collapsed: true, + label: 'Identity Provider Integration', + link: {type: 'doc', id: 'user_management/identity_providers'}, + items: [ + 'deploying_clearml/enterprise_deploy/sso_multi_tenant_login', + 'deploying_clearml/enterprise_deploy/sso_saml_k8s', + 'deploying_clearml/enterprise_deploy/sso_keycloak', + 'deploying_clearml/enterprise_deploy/sso_active_directory' + ] + }, + ] + }, ], - apiSidebar: [ - 'references/api/index', - 'references/api/definitions', - 'references/api/login', - 'references/api/debug', - 'references/api/projects', - 'references/api/queues', - 'references/api/workers', - 'references/api/events', - 'references/api/models', - 'references/api/tasks', + bestPracticesSidebar: [ + { + type: 'category', + collapsible: true, + label: 'Best Practices', + items: [ + { + type: 'doc', + label: 'Data Scientists', + id: 'best_practices/data_scientist_best_practices' + }, + { + type: 'doc', + label: 'MLOps and LLMOps', + id: 'best_practices/mlops_best_practices' + }, + { + type: 'doc', + label: 'Data Management', + id: 'best_practices/data_best_practices' + }, + ], + }, ] }; diff --git a/src/css/custom.css b/src/css/custom.css index e428974d..160fad9e 100644 --- a/src/css/custom.css +++ b/src/css/custom.css @@ -29,7 +29,7 @@ html { --ifm-color-primary-light: #17c5a2; --ifm-color-primary-lighter: #2edfbb; - --ifm-color-primary-lightest: #51f1d1; + --ifm-color-primary-lightest: #AEFDED; --ifm-toc-background-color: #141722; --ifm-code-font-size: 95%; @@ -46,16 +46,24 @@ html { --ifm-code-padding-vertical: 0.2rem; } -html[data-theme="dark"] { - --ifm-background-color: #1a1e2c; - --ifm-footer-background-color: #1a1e2c; - --ifm-footer-link-color: #a4a5aa; - --ifm-footer-link-hover-color: #14aa8c; - --ifm-dropdown-background-color: #2c3246; - --ifm-table-stripe-background: #141722; - --ifm-link-color: var(--ifm-color-primary-light); +[data-theme=dark]:root { + --ifm-background-color: #040506; /* body bg */ + --ifm-header-background-color: #101418; /* section 1 */ + --ifm-footer-background-color: #101418; /* section 1 */ + --ifm-footer-link-color: #D8FFF0; /* specific footer link color */ + --ifm-footer-link-hover-color: #ffffff; /* specific footer link hover color */ + --ifm-dropdown-background-color: #242D37; /* section 2 */ + --ifm-table-stripe-background: #101418; /* section 1 */ + --ifm-link-color: #6AD6C0; /* specific link color */ + --ifm-link-hover-color: #AEFDED; /* specific link hover color */ + --ifm-font-color-base: #E5E5E5; /* body text */ + --ifm-hr-background-color: #242D37; /* section 1 */ + --ifm-toc-link-color: #E5E5E5; /* body text */ + --ifm-toc-background-color: #242D37; /* section 2 */ + --ifm-code-background: #242D37; /* section 2 */ } + @media (min-width: 1400px) { /* Expand sidebar width above 1400px */ html[data-theme="light"], @@ -70,7 +78,7 @@ a { } html[data-theme="dark"] a:hover { - color: var(--ifm-color-primary-lightest); + color: var(--ifm-color-primary-lightest); } .align-center { @@ -151,12 +159,16 @@ html[data-theme="dark"] div[role="banner"] { background-color: #09173C; } html[data-theme="dark"] .navbar--dark { - background-color: #151722; + background-color: var(--ifm-header-background-color); } .navbar--dark.navbar .navbar__toggle { color: white; /* opener icon color */ } +html[data-theme="dark"] .navbar__link:hover, +html[data-theme="dark"] .navbar__link--active { + color: var(--ifm-link-color); +} /* ===HEADER=== */ @@ -374,7 +386,7 @@ html[data-theme="light"] [class^="sidebarLogo"] > img { html[data-theme="dark"] .menu__link--active { - color: var(--ifm-color-primary-lighter); + color: var(--ifm-link-color); } html[data-theme="light"] .menu__link:not(.menu__link--active) { color: #606a78; @@ -460,11 +472,13 @@ html[data-theme="light"] .table-of-contents { box-shadow: 0 0 0 2px rgba(0,0,0,0.1) inset; } html[data-theme="dark"] .table-of-contents { - background-color: var(--ifm-toc-background-color); box-shadow: 0 0 0 2px rgba(0,0,0,0.4) inset; } html[data-theme="dark"] a.table-of-contents__link--active { - color: var(--ifm-color-primary-light); + color: var(--ifm-link-color); +} +html[data-theme="dark"] .table-of-contents a:hover { + color: var(--ifm-color-primary-lightest); } .table-of-contents__left-border { border:none; @@ -481,9 +495,6 @@ a.table-of-contents__link--active:before { border-left: 6px solid var(--ifm-color-primary); transform: translateY(5px); } -html[data-theme="light"] .table-of-contents__link:not(.table-of-contents__link--active) { - color: rgba(0,0,0,0.9); -} /* toc: show "..." inside code tag */ .table-of-contents code { @@ -564,7 +575,7 @@ html[data-theme="light"] .footer__link-item[href*="stackoverflow"] { html[data-theme="dark"] .footer__link-item:hover { - color: var(--ifm-color-primary-lighter); + color: var(--ifm-footer-link-hover-color); } @@ -719,15 +730,37 @@ html[data-theme="light"] .icon { /* md heading style */ +/*

*/ +html[data-theme="light"] h2 { + color: #0b2471; +} +html[data-theme="light"] h2 a.hash-link { + color: #0b2471; +} + +html[data-theme="dark"] h2 { + color: #A8C5E6; +} +html[data-theme="dark"] h2 a.hash-link { + color: #A8C5E6; +} + /*

*/ .markdown h3 { font-size: 1.6rem; } html[data-theme="light"] h3 { - color: var(--ifm-color-primary-darker); + color: #a335d5; } +html[data-theme="light"] h3 a.hash-link { + color: #a335d5; +} + html[data-theme="dark"] h3 { - color: var(--ifm-color-primary-lightest); + color: #DAA5BF; +} +html[data-theme="dark"] h3 a.hash-link { + color: #DAA5BF; } /*

*/ @@ -736,20 +769,19 @@ html[data-theme="dark"] h3 { margin-bottom: 8px; margin-top: 42px; } + html[data-theme="light"] h4 { - color: #62b00d; + color: #242D37; } +html[data-theme="light"] h4 a.hash-link { + color: #242D37; +} + html[data-theme="dark"] h4 { - color: #83de1f; + color: #c7cdd2; } - - -/*
*/ -.markdown hr { - border-bottom: none; -} -html[data-theme="dark"] .markdown hr { - border-color: rgba(255,255,255,0.1); +html[data-theme="dark"] h4 a.hash-link { + color: #c7cdd2; }