This commit is contained in:
revital 2025-02-23 07:48:17 +02:00
commit a3a1c84925
8 changed files with 40 additions and 40 deletions

View File

@ -4,14 +4,14 @@ title: ClearML Server
## What is ClearML Server?
The ClearML Server is the backend service infrastructure for ClearML. It allows multiple users to collaborate and
manage their experiments by working seamlessly with the ClearML Python package and [ClearML Agent](../clearml_agent.md).
manage their tasks by working seamlessly with the ClearML Python package and [ClearML Agent](../clearml_agent.md).
ClearML Server is composed of the following:
* Web server including the [ClearML Web UI](../webapp/webapp_overview.md), which is the user interface for tracking, comparing, and managing experiments.
* Web server including the [ClearML Web UI](../webapp/webapp_overview.md), which is the user interface for tracking, comparing, and managing tasks.
* API server which is a RESTful API for:
* Documenting and logging experiments, including information, statistics, and results.
* Querying experiments history, logs, and results.
* Documenting and logging tasks, including information, statistics, and results.
* Querying task history, logs, and results.
* File server which stores media and models making them easily accessible using the ClearML Web UI.
@ -23,9 +23,9 @@ The ClearML Web UI is the ClearML user interface and is part of ClearML Server.
Use the ClearML Web UI to:
* Track experiments
* Compare experiments
* Manage experiments
* Track tasks
* Compare tasks
* Manage tasks
For detailed information about the ClearML Web UI, see [User Interface](../webapp/webapp_overview.md).

View File

@ -12,7 +12,7 @@ This page describes the ClearML Server [deployment](#clearml-server-deployment-c
* [Opening Elasticsearch, MongoDB, and Redis for External Access](#opening-elasticsearch-mongodb-and-redis-for-external-access)
* [Web login authentication](#web-login-authentication) - Create and manage users and passwords
* [Using hashed passwords](#using-hashed-passwords) - Option to use hashed passwords instead of plain-text passwords
* [Non-responsive Task watchdog](#non-responsive-task-watchdog) - For inactive experiments
* [Non-responsive Task watchdog](#non-responsive-task-watchdog) - For inactive tasks
* [Custom UI context menu actions](#custom-ui-context-menu-actions)
For all configuration options, see the [ClearML Configuration Reference](../configs/clearml_conf.md) page.
@ -361,7 +361,7 @@ You can also use hashed passwords instead of plain-text passwords. To do that:
### Non-responsive Task Watchdog
The non-responsive experiment watchdog monitors experiments that were not updated for a specified time interval, and then
The non-responsive task watchdog monitors tasks that were not updated for a specified time interval, and then
the watchdog marks them as `aborted`. The non-responsive experiment watchdog is always active.
Modify the following settings for the watchdog:
@ -464,8 +464,8 @@ an alternate folder you configured), and input the modified configuration
:::
The action will appear in the context menu for the object type in which it was specified:
* Task, model, dataview - Right-click an object in the [experiments](../webapp/webapp_exp_table.md), [models](../webapp/webapp_model_table.md),
and [dataviews](../hyperdatasets/webapp/webapp_dataviews.md) tables respectively. Alternatively, click the object to
* Task, model, dataview - Right-click an object in the [task](../webapp/webapp_exp_table.md), [model](../webapp/webapp_model_table.md),
and [dataview](../hyperdatasets/webapp/webapp_dataviews.md) tables respectively. Alternatively, click the object to
open its info tab, then click the menu button <img src="/docs/latest/icons/ico-bars-menu.svg" className="icon size-md space-sm" />
to access the context menu.
* Project - In the project page > click the menu button <img src="/docs/latest/icons/ico-bars-menu.svg" className="icon size-md space-sm" />

View File

@ -3,8 +3,8 @@ title: ClearML Modules
---
- [**ClearML Python Package**](../getting_started/ds/ds_first_steps.md#install-clearml) (`clearml`) for integrating ClearML into your existing code-base.
- [**ClearML Server**](../deploying_clearml/clearml_server.md) (`clearml-server`) for storing experiment, model, and workflow data, and supporting the Web UI experiment manager. It is also the control plane for the MLOps.
- [**ClearML Agent**](../clearml_agent.md) (`clearml-agent`), the MLOps orchestration agent. Enabling experiment and workflow reproducibility, and scalability.
- [**ClearML Server**](../deploying_clearml/clearml_server.md) (`clearml-server`) for storing task, model, and workflow data, and supporting the Web UI experiment manager. It is also the control plane for the MLOps.
- [**ClearML Agent**](../clearml_agent.md) (`clearml-agent`), the MLOps orchestration agent. Enabling task and workflow reproducibility, and scalability.
- [**ClearML Data**](../clearml_data/clearml_data.md) (`clearml-data`) data management and versioning on top of file-systems/object-storage.
- [**ClearML Serving**](../clearml_serving/clearml_serving.md) (`clearml-serving`) for model deployment and orchestration.
- [**ClearML Session**](../apps/clearml_session.md) (`clearml-session`) for launching remote instances of Jupyter Notebooks and VSCode.

View File

@ -44,7 +44,7 @@ pip install clearml
CLEARML_CONFIG_FILE = MyOtherClearML.conf
```
For more information about running experiments inside Docker containers, see [ClearML Agent Deployment](../../clearml_agent/clearml_agent_deployment.md)
For more information about running tasks inside Docker containers, see [ClearML Agent Deployment](../../clearml_agent/clearml_agent_deployment.md)
and [ClearML Agent Reference](../../clearml_agent/clearml_agent_ref.md).
</Collapsible>

View File

@ -2,14 +2,14 @@
title: Next Steps
---
So, you've already [installed ClearML's Python package](ds_first_steps.md) and run your first experiment!
So, you've already [installed ClearML's Python package](ds_first_steps.md) and run your first task!
Now, you'll learn how to track Hyperparameters, Artifacts, and Metrics!
## Accessing Experiments
## Accessing Tasks
Every previously executed experiment is stored as a Task.
A Task's project and name can be changed after the experiment has been executed.
A Task's project and name can be changed after it has been executed.
A Task is also automatically assigned an auto-generated unique identifier (UUID string) that cannot be changed and always locates the same Task in the system.
Retrieve a Task object programmatically by querying the system based on either the Task ID,
@ -23,8 +23,8 @@ Once you have a Task object you can query the state of the Task, get its model(s
## Log Hyperparameters
For full reproducibility, it's paramount to save hyperparameters for each experiment. Since hyperparameters can have substantial impact
on model performance, saving and comparing these between experiments is sometimes the key to understanding model behavior.
For full reproducibility, it's paramount to save each task's hyperparameters. Since hyperparameters can have substantial impact
on model performance, saving and comparing them between tasks is sometimes the key to understanding model behavior.
ClearML supports logging `argparse` module arguments out of the box, so once ClearML is integrated into the code, it automatically logs all parameters provided to the argument parser.
@ -40,7 +40,7 @@ See [Configuration](../../clearml_sdk/task_sdk.md#configuration) for all hyperpa
## Log Artifacts
ClearML lets you easily store the output products of an experiment - Model snapshot / weights file, a preprocessing of your data, feature representation of data and more!
ClearML lets you easily store the output products of a task: Model snapshot / weights file, a preprocessing of your data, feature representation of data and more!
Essentially, artifacts are files (or Python objects) uploaded from a script and are stored alongside the Task.
These artifacts can be easily accessed by the web UI or programmatically.
@ -107,9 +107,9 @@ task = Task.init(
)
```
Now, whenever the framework (TensorFlow/Keras/PyTorch etc.) stores a snapshot, the model file is automatically uploaded to the bucket to a specific folder for the experiment.
Now, whenever the framework (TensorFlow/Keras/PyTorch etc.) stores a snapshot, the model file is automatically uploaded to the bucket to a specific folder for the task.
Loading models by a framework is also logged by the system; these models appear in an experiment's **Artifacts** tab,
Loading models by a framework is also logged by the system; these models appear in a task's **Artifacts** tab,
under the "Input Models" section.
Check out model snapshots examples for [TensorFlow](https://github.com/clearml/clearml/blob/master/examples/frameworks/tensorflow/tensorflow_mnist.py),
@ -149,7 +149,7 @@ You can log everything, from time series data and confusion matrices to HTML, Au
Once everything is neatly logged and displayed, use the [comparison tool](../../webapp/webapp_exp_comparing.md) to find the best configuration!
## Track Experiments
## Track Tasks
The task table is a powerful tool for creating dashboards and views of your own projects, your team's projects, or the entire development.
@ -163,13 +163,13 @@ You can filter and sort based on parameters and metrics, so creating custom view
Create a dashboard for a project, presenting the latest Models and their accuracy scores, for immediate insights.
It can also be used as a live leaderboard, showing the best performing experiments' status, updated in real time.
It can also be used as a live leaderboard, showing the best performing tasks' status, updated in real time.
This is helpful to monitor your projects' progress, and to share it across the organization.
Any page is sharable by copying the URL from the address bar, allowing you to bookmark leaderboards or to send an exact view of a specific experiment or a comparison page.
Any page is sharable by copying the URL from the address bar, allowing you to bookmark leaderboards or to send an exact view of a specific task or a comparison page.
You can also tag Tasks for visibility and filtering allowing you to add more information on the execution of the experiment.
Later you can search based on task name in the search bar, and filter experiments based on their tags, parameters, status, and more.
You can also tag Tasks for visibility and filtering allowing you to add more information on the execution of the task.
Later you can search based on task name in the search bar, and filter tasks based on their tags, parameters, status, and more.
## What's Next?
@ -181,7 +181,7 @@ or check these pages out:
- Scale you work and deploy [ClearML Agents](../../clearml_agent.md)
- Develop on remote machines with [ClearML Session](../../apps/clearml_session.md)
- Structure your work and put it into [Pipelines](../../pipelines/pipelines.md)
- Improve your experiments with [Hyperparameter Optimization](../../fundamentals/hpo.md)
- Improve your tasks with [Hyperparameter Optimization](../../fundamentals/hpo.md)
- Check out ClearML's integrations with your favorite ML frameworks like [TensorFlow](../../integrations/tensorflow.md),
[PyTorch](../../integrations/pytorch.md), [Keras](../../integrations/keras.md),
and more

View File

@ -109,10 +109,10 @@ Want a more in depth introduction to ClearML? Choose where you want to get start
- [Track and upload](../fundamentals/task.md) metrics and models with only 2 lines of code
- [Reproduce](../webapp/webapp_exp_reproducing.md) tasks with 3 mouse clicks
- [Create bots](../guides/services/slack_alerts.md) that send you Slack messages based on experiment behavior (for example,
- [Create bots](../guides/services/slack_alerts.md) that send you Slack messages based on task behavior (for example,
alert you whenever your model improves in accuracy)
- Manage your [data](../clearml_data/clearml_data.md) - store, track, and version control
- Remotely execute experiments on any compute resource you have available with [ClearML Agent](../clearml_agent.md)
- Remotely execute tasks on any compute resource you have available with [ClearML Agent](../clearml_agent.md)
- Automatically scale cloud instances according to your resource needs with ClearML's
[AWS Autoscaler](../webapp/applications/apps_aws_autoscaler.md) and [GCP Autoscaler](../webapp/applications/apps_gcp_autoscaler.md)
GUI applications

View File

@ -28,13 +28,13 @@ of the chosen metric over time.
* Monitored Metric - Series - Metric series (variant) to track
* Monitored Metric - Trend - Choose whether to track the monitored metric's highest or lowest values
* **Slack Notification** (optional) - Set up Slack integration for notifications of task failure. Select the
`Alert on completed experiments` under `Additional options` to set up alerts for task completions.
`Alert on completed tasks` under `Additional options` to set up alerts for task completions.
* API Token - Slack workspace access token
* Channel Name - Slack channel to which task failure alerts will be posted
* Alert Iteration Threshold - Minimum number of task iterations to trigger Slack alerts (tasks that fail prior to the threshold will be ignored)
* **Additional options**
* Track manual (non agent-run) experiments as well - Select to include in the dashboard tasks that were not executed by an agent
* Alert on completed experiments - Select to include completed tasks in alerts: in the dashboard's Task Alerts section and in Slack Alerts.
* Track manual (non agent-run) tasks as well - Select to include in the dashboard tasks that were not executed by an agent
* Alert on completed tasks - Select to include completed tasks in alerts: in the dashboard's Task Alerts section and in Slack Alerts.
* **Export Configuration** - Export the app instance configuration as a JSON file, which you can later import to create
a new instance with the same configuration.
@ -50,7 +50,7 @@ of the chosen metric over time.
Once a project dashboard instance is launched, its dashboard displays the following information about a project:
* Task Status Summary - Percentages of Tasks by status
* Task Type Summary - Percentages of local tasks vs. agent tasks
* Experiments Summary - Number of tasks by status over time
* Task Summary - Number of tasks by status over time
* Monitoring - GPU utilization and GPU memory usage
* Metric Monitoring - An aggregated view of the values of a metric over time
* Project's Active Workers - Number of workers currently executing tasks in the monitored project

View File

@ -56,18 +56,18 @@ limits.
**CONFIGURATION > HYPERPARAMETERS > Hydra**).
:::
* **Optimization Job Title** (optional) - Name for the HPO instance. This will appear in the instance list
* **Optimization Experiments Destination Project** (optional) - The project where optimization tasks will be saved.
* **Optimization Tasks Destination Project** (optional) - The project where optimization tasks will be saved.
Leave empty to use the same project as the Initial task.
* **Maximum Concurrent Tasks** - The maximum number of simultaneously running optimization tasks
* **Advanced Configuration** (optional)
* Limit Total HPO Experiments - Maximum total number of optimization tasks
* Number of Top Experiments to Save - Number of best performing tasks to save (the rest are archived)
* Limit Single Experiment Running Time (Minutes) - Time limit per optimization task. Tasks will be
* Limit Total HPO Tasks - Maximum total number of optimization tasks
* Number of Top Tasks to Save - Number of best performing tasks to save (the rest are archived)
* Limit Single Task Running Time (Minutes) - Time limit per optimization task. Tasks will be
stopped after the specified time elapsed
* Minimal Number of Iterations Per Single Experiment - Some search methods, such as Optuna, prune underperforming
* Minimal Number of Iterations Per Single Task - Some search methods, such as Optuna, prune underperforming
tasks. This is the minimum number of iterations per task before it can be stopped. Iterations are
based on the tasks' own reporting (for example, if tasks report every epoch, then iterations=epochs)
* Maximum Number of Iterations Per Single Experiment - Maximum iterations per task after which it will be
* Maximum Number of Iterations Per Single Task - Maximum iterations per task after which it will be
stopped. Iterations are based on the tasks' own reporting (for example, if tasks report every epoch,
then iterations=epochs)
* Limit Total Optimization Instance Time (Minutes) - Time limit for the whole optimization process (in minutes)