mirror of
https://github.com/clearml/clearml-docs
synced 2025-03-19 19:49:03 +00:00
Merge branch 'main' of https://github.com/allegroai/clearml-docs
This commit is contained in:
commit
a3a1c84925
@ -4,14 +4,14 @@ title: ClearML Server
|
||||
|
||||
## What is ClearML Server?
|
||||
The ClearML Server is the backend service infrastructure for ClearML. It allows multiple users to collaborate and
|
||||
manage their experiments by working seamlessly with the ClearML Python package and [ClearML Agent](../clearml_agent.md).
|
||||
manage their tasks by working seamlessly with the ClearML Python package and [ClearML Agent](../clearml_agent.md).
|
||||
|
||||
ClearML Server is composed of the following:
|
||||
* Web server including the [ClearML Web UI](../webapp/webapp_overview.md), which is the user interface for tracking, comparing, and managing experiments.
|
||||
* Web server including the [ClearML Web UI](../webapp/webapp_overview.md), which is the user interface for tracking, comparing, and managing tasks.
|
||||
* API server which is a RESTful API for:
|
||||
|
||||
* Documenting and logging experiments, including information, statistics, and results.
|
||||
* Querying experiments history, logs, and results.
|
||||
* Documenting and logging tasks, including information, statistics, and results.
|
||||
* Querying task history, logs, and results.
|
||||
|
||||
* File server which stores media and models making them easily accessible using the ClearML Web UI.
|
||||
|
||||
@ -23,9 +23,9 @@ The ClearML Web UI is the ClearML user interface and is part of ClearML Server.
|
||||
|
||||
Use the ClearML Web UI to:
|
||||
|
||||
* Track experiments
|
||||
* Compare experiments
|
||||
* Manage experiments
|
||||
* Track tasks
|
||||
* Compare tasks
|
||||
* Manage tasks
|
||||
|
||||
For detailed information about the ClearML Web UI, see [User Interface](../webapp/webapp_overview.md).
|
||||
|
||||
|
@ -12,7 +12,7 @@ This page describes the ClearML Server [deployment](#clearml-server-deployment-c
|
||||
* [Opening Elasticsearch, MongoDB, and Redis for External Access](#opening-elasticsearch-mongodb-and-redis-for-external-access)
|
||||
* [Web login authentication](#web-login-authentication) - Create and manage users and passwords
|
||||
* [Using hashed passwords](#using-hashed-passwords) - Option to use hashed passwords instead of plain-text passwords
|
||||
* [Non-responsive Task watchdog](#non-responsive-task-watchdog) - For inactive experiments
|
||||
* [Non-responsive Task watchdog](#non-responsive-task-watchdog) - For inactive tasks
|
||||
* [Custom UI context menu actions](#custom-ui-context-menu-actions)
|
||||
|
||||
For all configuration options, see the [ClearML Configuration Reference](../configs/clearml_conf.md) page.
|
||||
@ -361,7 +361,7 @@ You can also use hashed passwords instead of plain-text passwords. To do that:
|
||||
|
||||
### Non-responsive Task Watchdog
|
||||
|
||||
The non-responsive experiment watchdog monitors experiments that were not updated for a specified time interval, and then
|
||||
The non-responsive task watchdog monitors tasks that were not updated for a specified time interval, and then
|
||||
the watchdog marks them as `aborted`. The non-responsive experiment watchdog is always active.
|
||||
|
||||
Modify the following settings for the watchdog:
|
||||
@ -464,8 +464,8 @@ an alternate folder you configured), and input the modified configuration
|
||||
:::
|
||||
|
||||
The action will appear in the context menu for the object type in which it was specified:
|
||||
* Task, model, dataview - Right-click an object in the [experiments](../webapp/webapp_exp_table.md), [models](../webapp/webapp_model_table.md),
|
||||
and [dataviews](../hyperdatasets/webapp/webapp_dataviews.md) tables respectively. Alternatively, click the object to
|
||||
* Task, model, dataview - Right-click an object in the [task](../webapp/webapp_exp_table.md), [model](../webapp/webapp_model_table.md),
|
||||
and [dataview](../hyperdatasets/webapp/webapp_dataviews.md) tables respectively. Alternatively, click the object to
|
||||
open its info tab, then click the menu button <img src="/docs/latest/icons/ico-bars-menu.svg" className="icon size-md space-sm" />
|
||||
to access the context menu.
|
||||
* Project - In the project page > click the menu button <img src="/docs/latest/icons/ico-bars-menu.svg" className="icon size-md space-sm" />
|
||||
|
@ -3,8 +3,8 @@ title: ClearML Modules
|
||||
---
|
||||
|
||||
- [**ClearML Python Package**](../getting_started/ds/ds_first_steps.md#install-clearml) (`clearml`) for integrating ClearML into your existing code-base.
|
||||
- [**ClearML Server**](../deploying_clearml/clearml_server.md) (`clearml-server`) for storing experiment, model, and workflow data, and supporting the Web UI experiment manager. It is also the control plane for the MLOps.
|
||||
- [**ClearML Agent**](../clearml_agent.md) (`clearml-agent`), the MLOps orchestration agent. Enabling experiment and workflow reproducibility, and scalability.
|
||||
- [**ClearML Server**](../deploying_clearml/clearml_server.md) (`clearml-server`) for storing task, model, and workflow data, and supporting the Web UI experiment manager. It is also the control plane for the MLOps.
|
||||
- [**ClearML Agent**](../clearml_agent.md) (`clearml-agent`), the MLOps orchestration agent. Enabling task and workflow reproducibility, and scalability.
|
||||
- [**ClearML Data**](../clearml_data/clearml_data.md) (`clearml-data`) data management and versioning on top of file-systems/object-storage.
|
||||
- [**ClearML Serving**](../clearml_serving/clearml_serving.md) (`clearml-serving`) for model deployment and orchestration.
|
||||
- [**ClearML Session**](../apps/clearml_session.md) (`clearml-session`) for launching remote instances of Jupyter Notebooks and VSCode.
|
||||
|
@ -44,7 +44,7 @@ pip install clearml
|
||||
CLEARML_CONFIG_FILE = MyOtherClearML.conf
|
||||
```
|
||||
|
||||
For more information about running experiments inside Docker containers, see [ClearML Agent Deployment](../../clearml_agent/clearml_agent_deployment.md)
|
||||
For more information about running tasks inside Docker containers, see [ClearML Agent Deployment](../../clearml_agent/clearml_agent_deployment.md)
|
||||
and [ClearML Agent Reference](../../clearml_agent/clearml_agent_ref.md).
|
||||
|
||||
</Collapsible>
|
||||
|
@ -2,14 +2,14 @@
|
||||
title: Next Steps
|
||||
---
|
||||
|
||||
So, you've already [installed ClearML's Python package](ds_first_steps.md) and run your first experiment!
|
||||
So, you've already [installed ClearML's Python package](ds_first_steps.md) and run your first task!
|
||||
|
||||
Now, you'll learn how to track Hyperparameters, Artifacts, and Metrics!
|
||||
|
||||
## Accessing Experiments
|
||||
## Accessing Tasks
|
||||
|
||||
Every previously executed experiment is stored as a Task.
|
||||
A Task's project and name can be changed after the experiment has been executed.
|
||||
A Task's project and name can be changed after it has been executed.
|
||||
A Task is also automatically assigned an auto-generated unique identifier (UUID string) that cannot be changed and always locates the same Task in the system.
|
||||
|
||||
Retrieve a Task object programmatically by querying the system based on either the Task ID,
|
||||
@ -23,8 +23,8 @@ Once you have a Task object you can query the state of the Task, get its model(s
|
||||
|
||||
## Log Hyperparameters
|
||||
|
||||
For full reproducibility, it's paramount to save hyperparameters for each experiment. Since hyperparameters can have substantial impact
|
||||
on model performance, saving and comparing these between experiments is sometimes the key to understanding model behavior.
|
||||
For full reproducibility, it's paramount to save each task's hyperparameters. Since hyperparameters can have substantial impact
|
||||
on model performance, saving and comparing them between tasks is sometimes the key to understanding model behavior.
|
||||
|
||||
ClearML supports logging `argparse` module arguments out of the box, so once ClearML is integrated into the code, it automatically logs all parameters provided to the argument parser.
|
||||
|
||||
@ -40,7 +40,7 @@ See [Configuration](../../clearml_sdk/task_sdk.md#configuration) for all hyperpa
|
||||
|
||||
## Log Artifacts
|
||||
|
||||
ClearML lets you easily store the output products of an experiment - Model snapshot / weights file, a preprocessing of your data, feature representation of data and more!
|
||||
ClearML lets you easily store the output products of a task: Model snapshot / weights file, a preprocessing of your data, feature representation of data and more!
|
||||
|
||||
Essentially, artifacts are files (or Python objects) uploaded from a script and are stored alongside the Task.
|
||||
These artifacts can be easily accessed by the web UI or programmatically.
|
||||
@ -107,9 +107,9 @@ task = Task.init(
|
||||
)
|
||||
```
|
||||
|
||||
Now, whenever the framework (TensorFlow/Keras/PyTorch etc.) stores a snapshot, the model file is automatically uploaded to the bucket to a specific folder for the experiment.
|
||||
Now, whenever the framework (TensorFlow/Keras/PyTorch etc.) stores a snapshot, the model file is automatically uploaded to the bucket to a specific folder for the task.
|
||||
|
||||
Loading models by a framework is also logged by the system; these models appear in an experiment's **Artifacts** tab,
|
||||
Loading models by a framework is also logged by the system; these models appear in a task's **Artifacts** tab,
|
||||
under the "Input Models" section.
|
||||
|
||||
Check out model snapshots examples for [TensorFlow](https://github.com/clearml/clearml/blob/master/examples/frameworks/tensorflow/tensorflow_mnist.py),
|
||||
@ -149,7 +149,7 @@ You can log everything, from time series data and confusion matrices to HTML, Au
|
||||
Once everything is neatly logged and displayed, use the [comparison tool](../../webapp/webapp_exp_comparing.md) to find the best configuration!
|
||||
|
||||
|
||||
## Track Experiments
|
||||
## Track Tasks
|
||||
|
||||
The task table is a powerful tool for creating dashboards and views of your own projects, your team's projects, or the entire development.
|
||||
|
||||
@ -163,13 +163,13 @@ You can filter and sort based on parameters and metrics, so creating custom view
|
||||
|
||||
Create a dashboard for a project, presenting the latest Models and their accuracy scores, for immediate insights.
|
||||
|
||||
It can also be used as a live leaderboard, showing the best performing experiments' status, updated in real time.
|
||||
It can also be used as a live leaderboard, showing the best performing tasks' status, updated in real time.
|
||||
This is helpful to monitor your projects' progress, and to share it across the organization.
|
||||
|
||||
Any page is sharable by copying the URL from the address bar, allowing you to bookmark leaderboards or to send an exact view of a specific experiment or a comparison page.
|
||||
Any page is sharable by copying the URL from the address bar, allowing you to bookmark leaderboards or to send an exact view of a specific task or a comparison page.
|
||||
|
||||
You can also tag Tasks for visibility and filtering allowing you to add more information on the execution of the experiment.
|
||||
Later you can search based on task name in the search bar, and filter experiments based on their tags, parameters, status, and more.
|
||||
You can also tag Tasks for visibility and filtering allowing you to add more information on the execution of the task.
|
||||
Later you can search based on task name in the search bar, and filter tasks based on their tags, parameters, status, and more.
|
||||
|
||||
## What's Next?
|
||||
|
||||
@ -181,7 +181,7 @@ or check these pages out:
|
||||
- Scale you work and deploy [ClearML Agents](../../clearml_agent.md)
|
||||
- Develop on remote machines with [ClearML Session](../../apps/clearml_session.md)
|
||||
- Structure your work and put it into [Pipelines](../../pipelines/pipelines.md)
|
||||
- Improve your experiments with [Hyperparameter Optimization](../../fundamentals/hpo.md)
|
||||
- Improve your tasks with [Hyperparameter Optimization](../../fundamentals/hpo.md)
|
||||
- Check out ClearML's integrations with your favorite ML frameworks like [TensorFlow](../../integrations/tensorflow.md),
|
||||
[PyTorch](../../integrations/pytorch.md), [Keras](../../integrations/keras.md),
|
||||
and more
|
||||
|
@ -109,10 +109,10 @@ Want a more in depth introduction to ClearML? Choose where you want to get start
|
||||
|
||||
- [Track and upload](../fundamentals/task.md) metrics and models with only 2 lines of code
|
||||
- [Reproduce](../webapp/webapp_exp_reproducing.md) tasks with 3 mouse clicks
|
||||
- [Create bots](../guides/services/slack_alerts.md) that send you Slack messages based on experiment behavior (for example,
|
||||
- [Create bots](../guides/services/slack_alerts.md) that send you Slack messages based on task behavior (for example,
|
||||
alert you whenever your model improves in accuracy)
|
||||
- Manage your [data](../clearml_data/clearml_data.md) - store, track, and version control
|
||||
- Remotely execute experiments on any compute resource you have available with [ClearML Agent](../clearml_agent.md)
|
||||
- Remotely execute tasks on any compute resource you have available with [ClearML Agent](../clearml_agent.md)
|
||||
- Automatically scale cloud instances according to your resource needs with ClearML's
|
||||
[AWS Autoscaler](../webapp/applications/apps_aws_autoscaler.md) and [GCP Autoscaler](../webapp/applications/apps_gcp_autoscaler.md)
|
||||
GUI applications
|
||||
|
@ -28,13 +28,13 @@ of the chosen metric over time.
|
||||
* Monitored Metric - Series - Metric series (variant) to track
|
||||
* Monitored Metric - Trend - Choose whether to track the monitored metric's highest or lowest values
|
||||
* **Slack Notification** (optional) - Set up Slack integration for notifications of task failure. Select the
|
||||
`Alert on completed experiments` under `Additional options` to set up alerts for task completions.
|
||||
`Alert on completed tasks` under `Additional options` to set up alerts for task completions.
|
||||
* API Token - Slack workspace access token
|
||||
* Channel Name - Slack channel to which task failure alerts will be posted
|
||||
* Alert Iteration Threshold - Minimum number of task iterations to trigger Slack alerts (tasks that fail prior to the threshold will be ignored)
|
||||
* **Additional options**
|
||||
* Track manual (non agent-run) experiments as well - Select to include in the dashboard tasks that were not executed by an agent
|
||||
* Alert on completed experiments - Select to include completed tasks in alerts: in the dashboard's Task Alerts section and in Slack Alerts.
|
||||
* Track manual (non agent-run) tasks as well - Select to include in the dashboard tasks that were not executed by an agent
|
||||
* Alert on completed tasks - Select to include completed tasks in alerts: in the dashboard's Task Alerts section and in Slack Alerts.
|
||||
* **Export Configuration** - Export the app instance configuration as a JSON file, which you can later import to create
|
||||
a new instance with the same configuration.
|
||||
|
||||
@ -50,7 +50,7 @@ of the chosen metric over time.
|
||||
Once a project dashboard instance is launched, its dashboard displays the following information about a project:
|
||||
* Task Status Summary - Percentages of Tasks by status
|
||||
* Task Type Summary - Percentages of local tasks vs. agent tasks
|
||||
* Experiments Summary - Number of tasks by status over time
|
||||
* Task Summary - Number of tasks by status over time
|
||||
* Monitoring - GPU utilization and GPU memory usage
|
||||
* Metric Monitoring - An aggregated view of the values of a metric over time
|
||||
* Project's Active Workers - Number of workers currently executing tasks in the monitored project
|
||||
|
@ -56,18 +56,18 @@ limits.
|
||||
**CONFIGURATION > HYPERPARAMETERS > Hydra**).
|
||||
:::
|
||||
* **Optimization Job Title** (optional) - Name for the HPO instance. This will appear in the instance list
|
||||
* **Optimization Experiments Destination Project** (optional) - The project where optimization tasks will be saved.
|
||||
* **Optimization Tasks Destination Project** (optional) - The project where optimization tasks will be saved.
|
||||
Leave empty to use the same project as the Initial task.
|
||||
* **Maximum Concurrent Tasks** - The maximum number of simultaneously running optimization tasks
|
||||
* **Advanced Configuration** (optional)
|
||||
* Limit Total HPO Experiments - Maximum total number of optimization tasks
|
||||
* Number of Top Experiments to Save - Number of best performing tasks to save (the rest are archived)
|
||||
* Limit Single Experiment Running Time (Minutes) - Time limit per optimization task. Tasks will be
|
||||
* Limit Total HPO Tasks - Maximum total number of optimization tasks
|
||||
* Number of Top Tasks to Save - Number of best performing tasks to save (the rest are archived)
|
||||
* Limit Single Task Running Time (Minutes) - Time limit per optimization task. Tasks will be
|
||||
stopped after the specified time elapsed
|
||||
* Minimal Number of Iterations Per Single Experiment - Some search methods, such as Optuna, prune underperforming
|
||||
* Minimal Number of Iterations Per Single Task - Some search methods, such as Optuna, prune underperforming
|
||||
tasks. This is the minimum number of iterations per task before it can be stopped. Iterations are
|
||||
based on the tasks' own reporting (for example, if tasks report every epoch, then iterations=epochs)
|
||||
* Maximum Number of Iterations Per Single Experiment - Maximum iterations per task after which it will be
|
||||
* Maximum Number of Iterations Per Single Task - Maximum iterations per task after which it will be
|
||||
stopped. Iterations are based on the tasks' own reporting (for example, if tasks report every epoch,
|
||||
then iterations=epochs)
|
||||
* Limit Total Optimization Instance Time (Minutes) - Time limit for the whole optimization process (in minutes)
|
||||
|
Loading…
Reference in New Issue
Block a user