mirror of
https://github.com/clearml/clearml-docs
synced 2025-04-15 05:04:31 +00:00
Small edits (#162)
This commit is contained in:
parent
8f4851c5c1
commit
e72ca23b54
@ -13,8 +13,8 @@ in the UI and send it for long-term training on a remote machine.
|
||||
|
||||
## What Does ClearML Session Do?
|
||||
`clearml-session` is a feature that allows to launch a session of JupyterLab and VS Code, and to execute code on a remote
|
||||
machine that better meets resource needs. With this feature, local links are provided, which can be used to access
|
||||
JupyterLab and VS Code on a remote machine over a secure and encrypted SSH connection. By default, the JupyterLab and
|
||||
machine that better meets resource needs. This feature provides local links to access JupyterLab and VS Code on a
|
||||
remote machine over a secure and encrypted SSH connection. By default, the JupyterLab and
|
||||
VS Code remote sessions use ports 8878 and 8898 respectively.
|
||||
|
||||
<details className="cml-expansion-panel screenshot">
|
||||
@ -40,7 +40,7 @@ VS Code remote sessions use ports 8878 and 8898 respectively.
|
||||
## How it Works
|
||||
|
||||
ClearML allows to leverage a resource (e.g. GPU or CPU machine) by utilizing the [ClearML Agent](../clearml_agent).
|
||||
A ClearML Agent will run on a target machine, and ClearML Session will instruct it to execute the Jupyter / VS Code
|
||||
A ClearML Agent runs on a target machine, and ClearML Session instructs it to execute the Jupyter / VS Code
|
||||
server to develop remotely.
|
||||
After entering a `clearml-session` command with all specifications:
|
||||
|
||||
@ -51,8 +51,8 @@ After entering a `clearml-session` command with all specifications:
|
||||
launches it.
|
||||
|
||||
1. Once the agent finishes the initial setup of the interactive Task, the local `cleaml-session` connects to the host
|
||||
machine via SSH, and tunnels both SSH and JupyterLab over the SSH connection. If a specific Docker was specified, the
|
||||
JupyterLab environment will run inside the Docker.
|
||||
machine via SSH, and tunnels both SSH and JupyterLab over the SSH connection. If a Docker is specified, the
|
||||
JupyterLab environment runs inside the Docker.
|
||||
|
||||
1. The CLI outputs access links to the remote JupyterLab and VS Code sessions:
|
||||
|
||||
@ -73,14 +73,15 @@ To run a session inside a Docker container, use the `--docker` flag and enter th
|
||||
session.
|
||||
|
||||
### Installing Requirements
|
||||
`clearml-session` can install required Python packages when setting up the remote environment. A `requirement.txt` file
|
||||
can be attached to the command using `--requirements </file/location.txt>`.
|
||||
Alternatively, packages can be manually specified, using `--packages "<package_name>"`
|
||||
`clearml-session` can install required Python packages when setting up the remote environment.
|
||||
Specify requirements in one of the following ways:
|
||||
* Attach a `requirement.txt` file to the command using `--requirements </file/location.txt>`.
|
||||
* Manually specify packages using `--packages "<package_name>"`
|
||||
(for example `--packages "keras" "clearml"`), and they'll be automatically installed.
|
||||
|
||||
### Accessing a Git Repository
|
||||
To access a git repository remotely, add a `--git-credentials` flag and set it to `true`, so the local .git-credentials
|
||||
file will be sent to the interactive session. This is helpful if working on private git repositories, and it allows for seamless
|
||||
To access a git repository remotely, add a `--git-credentials` flag and set it to `true`, so the local `.git-credentials`
|
||||
file is sent to the interactive session. This is helpful if working on private git repositories, and it allows for seamless
|
||||
cloning and tracking of git references, including untracked changes.
|
||||
|
||||
### Re-launching and Shutting Down Sessions
|
||||
@ -101,11 +102,11 @@ Active sessions:
|
||||
Connect to session [0-1] or 'N' to skip
|
||||
```
|
||||
|
||||
To shut down a remote session, which will free the `clearml-agent` and close the CLI, enter "Shutdown". If a session
|
||||
To shut down a remote session, which frees the `clearml-agent` and closes the CLI, enter "Shutdown". If a session
|
||||
is shutdown, there is no option to reconnect to it.
|
||||
|
||||
### Connecting to an Existing Session
|
||||
If a `clearml-session` is running remotely, it's possible to continue working on the session from any machine.
|
||||
If a `clearml-session` is running remotely, you can continue working on the session from any machine.
|
||||
When `clearml-session` is launched, it initializes a task with a unique ID in the ClearML Server.
|
||||
|
||||
To connect to an existing session:
|
||||
@ -116,7 +117,7 @@ To connect to an existing session:
|
||||
|
||||
|
||||
### Starting a Debugging Session
|
||||
Previously executed experiments in the ClearML system can be debugged on a remote interactive session.
|
||||
You can debug previously executed experiments registered in the ClearML system on a remote interactive session.
|
||||
Input into `clearml-session` the ID of a Task to debug, then `clearml-session` clones the experiment's git repository and
|
||||
replicates the environment on a remote machine. Then the code can be interactively executed and debugged on JupyterLab / VS Code.
|
||||
|
||||
|
@ -6,7 +6,7 @@ ClearML Task is ClearML's Zero Code Integration Module. Using only the command l
|
||||
you can easily track your work and integrate ClearML with your existing code.
|
||||
|
||||
`clearml-task` automatically integrates ClearML into any script or **any** python repository. `clearml-task` has the option
|
||||
to send the task to a queue, where a **ClearML Agent** listening to the queue will fetch the task and execute it on a
|
||||
to send the task to a queue, where a ClearML Agent assigned to the queue fetches the task and executes it on a
|
||||
remote or local machine. It's even possible to provide command line arguments and provide Python module dependencies and requirements.txt file!
|
||||
|
||||
## How Does ClearML Task Work?
|
||||
@ -14,8 +14,8 @@ remote or local machine. It's even possible to provide command line arguments an
|
||||
1. Execute `clearml-task`, pointing it to your script or repository, and optionally an execution queue.
|
||||
1. `clearml-task` does its magic! It creates a new experiment on the [ClearML Server](../deploying_clearml/clearml_server.md),
|
||||
and, if a queue was specified, it sends the experiment to the queue to be fetched and executed by a **ClearML Agent**.
|
||||
1. The command line will provide you with a link to your task's page in the ClearML web UI,
|
||||
where you will be able to view the task's details.
|
||||
1. The command line provides you with a link to your task's page in the ClearML web UI,
|
||||
where you can view the task's details.
|
||||
|
||||
## Features and Options
|
||||
### Docker
|
||||
|
@ -41,7 +41,7 @@ and [configuration options](configs/clearml_conf.md#agent-section).
|
||||
## Installation
|
||||
|
||||
:::note
|
||||
If **ClearML** was previously configured, follow [this](#adding-clearml-agent-to-a-configuration-file) to add
|
||||
If ClearML was previously configured, follow [this](#adding-clearml-agent-to-a-configuration-file) to add
|
||||
ClearML Agent specific configurations
|
||||
:::
|
||||
|
||||
@ -78,7 +78,7 @@ Install ClearML Agent as a system Python package and not in a Python virtual env
|
||||
|
||||
Detected credentials key="********************" secret="*******"
|
||||
|
||||
1. **Enter** to accept default server URL, which is detected from the credentials or Enter a ClearML web server URL.
|
||||
1. **Enter** to accept default server URL, which is detected from the credentials or enter a ClearML web server URL.
|
||||
|
||||
A secure protocol, https, must be used. **Do not use http.**
|
||||
|
||||
@ -531,7 +531,7 @@ clearml-agent daemon --dynamic-gpus --queue dual_gpus=2 single_gpu=1
|
||||
|
||||
### Example
|
||||
|
||||
Let's say there are three queues on a server, named:
|
||||
Let's say a server has three queues:
|
||||
* `dual_gpu`
|
||||
* `quad_gpu`
|
||||
* `opportunistic`
|
||||
@ -553,7 +553,7 @@ Another option for allocating GPUs:
|
||||
clearml-agent daemon --dynamic-gpus --gpus 0-7 --queue dual=2 opportunistic=1-4
|
||||
```
|
||||
|
||||
Notice that a minimum and maximum value of GPUs was specified for the `opportunistic` queue. This means the agent
|
||||
Notice that a minimum and maximum value of GPUs is specified for the `opportunistic` queue. This means the agent
|
||||
will pull a Task from the `opportunistic` queue and allocate up to 4 GPUs based on availability (i.e. GPUs not currently
|
||||
being used by other agents).
|
||||
|
||||
|
@ -100,7 +100,7 @@ Total 5 files, 248771 bytes
|
||||
|
||||
## Creating a Child Dataset
|
||||
|
||||
In Clear Data, it's possible to create datasets that inherit the content of other datasets, there are called child datasets.
|
||||
Using ClearML Data, you can create child datasets that inherit the content of other datasets.
|
||||
|
||||
1. Create a new dataset, specifying the previously created one as its parent:
|
||||
|
||||
@ -111,8 +111,8 @@ In Clear Data, it's possible to create datasets that inherit the content of othe
|
||||
You'll need to input the Dataset ID you received when created the dataset above
|
||||
:::
|
||||
|
||||
1. Now, we want to add a new file.
|
||||
* Create a new file: `echo "data data data" > new_data.txt` (this will create the file `new_data.txt`),
|
||||
1. Add a new file.
|
||||
* Create a new file: `echo "data data data" > new_data.txt`
|
||||
* Now add the file to the dataset:
|
||||
|
||||
```bash
|
||||
@ -126,7 +126,7 @@ You'll need to input the Dataset ID you received when created the dataset above
|
||||
1 file added
|
||||
```
|
||||
|
||||
1. Let's also remove a file. We'll need to specify the file's full path (within the dataset, not locally) to remove it.
|
||||
1. Remove a file. We'll need to specify the file's full path (within the dataset, not locally) to remove it.
|
||||
|
||||
```bash
|
||||
clearml-data remove --files data_samples/dancing.jpg
|
||||
@ -145,7 +145,7 @@ You'll need to input the Dataset ID you received when created the dataset above
|
||||
clearml-data close
|
||||
```
|
||||
|
||||
1. Let's take a look again at the files in the dataset:
|
||||
1. Look again at the files in the dataset:
|
||||
|
||||
```
|
||||
clearml-data list --id 8b68686a4af040d081027ba3cf6bbca6
|
||||
|
@ -1,18 +1,18 @@
|
||||
---
|
||||
title: Configuration File
|
||||
---
|
||||
This reference page provides detailed information about the configurable options for **ClearML** and **ClearML Agent**.
|
||||
**ClearML** and **ClearML Agent** use the same configuration file `clearml.conf`.
|
||||
This reference page provides detailed information about the configurable options for ClearML and ClearML Agent.
|
||||
ClearML and ClearML Agent use the same configuration file `clearml.conf`.
|
||||
|
||||
This reference page is organized by configuration file section:
|
||||
|
||||
* [agent](#agent-section) - Contains **ClearML Agent** configuration options. If **ClearML Agent** was not installed, the configuration
|
||||
* [agent](#agent-section) - Contains ClearML Agent configuration options. If ClearML Agent was not installed, the configuration
|
||||
file will not have an `agent` section.
|
||||
* [api](#api-section) - Contains **ClearML** and **ClearML Agent** configuration options for **ClearML Server**.
|
||||
* [sdk](#sdk-section) - Contains **ClearML** and **ClearML Agent** configuration options for **ClearML Python Package** and **ClearML Server**.
|
||||
* [api](#api-section) - Contains ClearML and ClearML Agent configuration options for ClearML Server.
|
||||
* [sdk](#sdk-section) - Contains ClearML and ClearML Agent configuration options for ClearML Python Package and ClearML Server.
|
||||
|
||||
An example configuration file is located [here](https://github.com/allegroai/clearml-agent/blob/master/docs/clearml.conf),
|
||||
in the **ClearML Agent** GitHub repository.
|
||||
in the ClearML Agent GitHub repository.
|
||||
|
||||
:::info
|
||||
The values in the ClearML configuration file can be overridden by environment variables, the [configuration vault](../webapp/webapp_profile.md#configuration-vault),
|
||||
@ -23,7 +23,7 @@ and command-line arguments.
|
||||
|
||||
To add, change, or delete options, edit your configuration file.
|
||||
|
||||
**To edit your **ClearML** configuration file:**
|
||||
**To edit your ClearML configuration file:**
|
||||
|
||||
1. Open the configuration file for editing, depending upon your operating system:
|
||||
|
||||
@ -60,7 +60,7 @@ for information about using environment variables with Windows in the configurat
|
||||
|
||||
**`agent`** (*dict*)
|
||||
|
||||
* Dictionary of top-level **ClearML Agent** options to configure **ClearML Agent** for Git credentials, package managers, cache management, workers, and Docker for workers.
|
||||
* Dictionary of top-level ClearML Agent options to configure ClearML Agent for Git credentials, package managers, cache management, workers, and Docker for workers.
|
||||
---
|
||||
|
||||
**`agent.cuda_version`** (*float*)
|
||||
@ -538,25 +538,25 @@ Torch Nightly builds are ephemeral and are deleted from time to time.
|
||||
|
||||
**`api`** (*dict*)
|
||||
|
||||
Dictionary of configuration options for the **ClearML Server** API, web, and file servers and credentials.
|
||||
Dictionary of configuration options for the ClearML Server API, web, and file servers and credentials.
|
||||
|
||||
---
|
||||
|
||||
**`api.api_server`** (*string*)
|
||||
|
||||
* The URL of your **ClearML** API server. For example, `https://api.MyDomain.com`.
|
||||
* The URL of your ClearML API server. For example, `https://api.MyDomain.com`.
|
||||
|
||||
---
|
||||
|
||||
**`api.web_server`** (*string*)
|
||||
|
||||
* The URL of your **ClearML** web server. For example, `https://app.MyDomain.com`.
|
||||
* The URL of your ClearML web server. For example, `https://app.MyDomain.com`.
|
||||
|
||||
---
|
||||
|
||||
**`api.files_server`** (*string*)
|
||||
|
||||
* The URL of your **ClearML** file server. For example, `https://files.MyDomain.com`.
|
||||
* The URL of your ClearML file server. For example, `https://files.MyDomain.com`.
|
||||
|
||||
:::warning
|
||||
You must use a secure protocol. For ``api.web_server``, ``api.files_server``, and ``api.files_server``. You must use a secure protocol, "https". Do not use "http".
|
||||
@ -576,13 +576,13 @@ You must use a secure protocol. For ``api.web_server``, ``api.files_server``, an
|
||||
|
||||
**`api.credentials.access_key`** (*string*)
|
||||
|
||||
* Your **ClearML** access key.
|
||||
* Your ClearML access key.
|
||||
|
||||
---
|
||||
|
||||
**`api.credentials.secret_key`** (*string*)
|
||||
|
||||
* Your **ClearML** credentials.
|
||||
* Your ClearML credentials.
|
||||
|
||||
---
|
||||
|
||||
@ -607,7 +607,7 @@ Set to False only if required.
|
||||
|
||||
**`sdk`** (*dict*)
|
||||
|
||||
* Dictionary that contains configuration options for the **ClearML Python Package** and related options, including storage,
|
||||
* Dictionary that contains configuration options for the ClearML Python Package and related options, including storage,
|
||||
metrics, network, AWS S3 buckets and credentials, Google Cloud Storage, Azure Storage, log, and development.
|
||||
|
||||
<br/>
|
||||
@ -852,7 +852,7 @@ and limitations on bucket naming.
|
||||
|
||||
**`sdk.development.worker.report_period_sec`** (*integer*)
|
||||
|
||||
* For development mode workers, the interval in seconds for a development mode **ClearML** worker to report.
|
||||
* For development mode workers, the interval in seconds for a development mode ClearML worker to report.
|
||||
|
||||
<br/>
|
||||
|
||||
|
@ -123,7 +123,7 @@ instructions in the [Security](clearml_server_security.md) page.
|
||||
|
||||
sudo curl https://raw.githubusercontent.com/allegroai/clearml-server/master/docker/docker-compose.yml -o /opt/clearml/docker-compose.yml
|
||||
|
||||
1. For Linux only, configure the **ClearML Agent Services**. If `CLEARML_HOST_IP` is not provided, then **ClearML Agent Services** will use the external public address of the **ClearML Server**. If `CLEARML_AGENT_GIT_USER` / `CLEARML_AGENT_GIT_PASS` are not provided, then **ClearML Agent Services** will not be able to access any private repositories for running service tasks.
|
||||
1. For Linux only, configure the **ClearML Agent Services**. If `CLEARML_HOST_IP` is not provided, then **ClearML Agent Services** uses the external public address of the **ClearML Server**. If `CLEARML_AGENT_GIT_USER` / `CLEARML_AGENT_GIT_PASS` are not provided, then **ClearML Agent Services** can't access any private repositories for running service tasks.
|
||||
|
||||
export CLEARML_HOST_IP=server_host_ip_here
|
||||
export CLEARML_AGENT_GIT_USER=git_username_here
|
||||
|
@ -15,8 +15,8 @@ This documentation page applies to deploying your own open source ClearML Server
|
||||
|
||||
For Linux only, if upgrading from <strong>Trains Server</strong> v0.14 or older, configure the <strong>ClearML Agent Services</strong>.
|
||||
|
||||
* If ``CLEARML_HOST_IP`` is not provided, then **ClearML Agent Services** will use the external public address of the **ClearML Server**.
|
||||
* If ``CLEARML_AGENT_GIT_USER`` / ``CLEARML_AGENT_GIT_PASS`` are not provided, then **ClearML Agent Services** will not be able to access any private repositories for running service tasks.
|
||||
* If ``CLEARML_HOST_IP`` is not provided, then **ClearML Agent Services** uses the external public address of the **ClearML Server**.
|
||||
* If ``CLEARML_AGENT_GIT_USER`` / ``CLEARML_AGENT_GIT_PASS`` are not provided, then **ClearML Agent Services** can't access any private repositories for running service tasks.
|
||||
|
||||
|
||||
export CLEARML_HOST_IP=server_host_ip_here
|
||||
|
12
docs/faq.md
12
docs/faq.md
@ -228,7 +228,7 @@ To replace the URL of each model, execute the following commands:
|
||||
sudo docker exec -it clearml-mongo /bin/bash
|
||||
```
|
||||
|
||||
1. Inside the docker shell, create the following script. Make sure to replace `<old-bucket-name>` and `<new-bucket-name>`,
|
||||
1. Create the following script inside the Docker shell:
|
||||
as well as the URL protocol if you aren't using `s3`.
|
||||
```bash
|
||||
cat <<EOT >> script.js
|
||||
@ -237,7 +237,7 @@ To replace the URL of each model, execute the following commands:
|
||||
db.model.save(e);});
|
||||
EOT
|
||||
```
|
||||
|
||||
Make sure to replace `<old-bucket-name>` and `<new-bucket-name>`.
|
||||
1. Run the script against the backend DB:
|
||||
|
||||
```bash
|
||||
@ -258,7 +258,7 @@ To fix this, the registered URL of each model needs to be replaced with its curr
|
||||
sudo docker exec -it clearml-mongo /bin/bash
|
||||
```
|
||||
|
||||
1. Inside the Docker shell, create the following script. Make sure to replace `<old-bucket-name>` and `<new-bucket-name>`, as well as the URL protocol prefixes if you aren't using S3.
|
||||
1. Create the following script inside the Docker shell.
|
||||
```bash
|
||||
cat <<EOT >> script.js
|
||||
db.model.find({uri:{$regex:/^s3/}}).forEach(function(e,i) {
|
||||
@ -266,7 +266,7 @@ To fix this, the registered URL of each model needs to be replaced with its curr
|
||||
db.model.save(e);});
|
||||
EOT
|
||||
```
|
||||
|
||||
Make sure to replace `<old-bucket-name>` and `<new-bucket-name>`, as well as the URL protocol prefixes if you aren't using S3.
|
||||
1. Run the script against the backend DB:
|
||||
|
||||
```bash
|
||||
@ -930,7 +930,7 @@ If a port conflict occurs, change the MongoDB and / or Elastic ports in the `doc
|
||||
To change the MongoDB and / or Elastic ports for your ClearML Server, do the following:
|
||||
|
||||
1. Edit the `docker-compose.yml` file.
|
||||
1. In the `services/trainsserver/environment` section, add the following environment variable(s):
|
||||
1. Add the following environment variable(s) in the `services/trainsserver/environment` section:
|
||||
|
||||
* For MongoDB:
|
||||
|
||||
@ -994,7 +994,7 @@ Do the following:
|
||||
|
||||
1. If a ClearML configuration file (`clearml.conf`) exists, delete it.
|
||||
1. Open a terminal session.
|
||||
1. In the terminal session, set the system environment variable to `127.0.0.1`, for example:
|
||||
1. Set the system environment variable to `127.0.0.1` in the terminal session. For example:
|
||||
|
||||
* Linux:
|
||||
|
||||
|
@ -26,7 +26,7 @@ can allocate several GPUs to an agent and use the rest for a different workload,
|
||||
|
||||
## What is a Queue?
|
||||
|
||||
A ClearML queue is an ordered list of Tasks scheduled for execution. A queue can be serviced by one or multiple agents.
|
||||
A ClearML queue is an ordered list of Tasks scheduled for execution. One or multiple agents can service a queue.
|
||||
Agents servicing a queue pull the queued tasks in order and execute them.
|
||||
|
||||
A ClearML Agent can service multiple queues in either of the following modes:
|
||||
@ -51,8 +51,8 @@ The diagram above demonstrates a typical flow where an agent executes a task:
|
||||
1. Set up the python environment and required packages.
|
||||
1. The task's script/code is executed.
|
||||
|
||||
While the agent is running, it continuously reports system metrics to the ClearML Server (these can be monitored in the
|
||||
[**Workers and Queues**](../webapp/webapp_workers_queues.md) page).
|
||||
While the agent is running, it continuously reports system metrics to the ClearML Server. You can monitor these metrics
|
||||
in the [**Workers and Queues**](../webapp/webapp_workers_queues.md) page.
|
||||
|
||||
## Resource Management
|
||||
Installing an Agent on machines allows it to monitor all the machine's status (GPU / CPU / Memory / Network / Disk IO).
|
||||
|
@ -6,7 +6,7 @@ title: Hyperparameter Optimization
|
||||
Hyperparameters are variables that directly control the behaviors of training algorithms, and have a significant effect on
|
||||
the performance of the resulting machine learning models. Finding the hyperparameter values that yield the best
|
||||
performing models can be complicated. Manually adjusting hyperparameters over the course of many training trials can be
|
||||
slow and tedious. Luckily, hyperparameter optimization can be automated and boosted using ClearML's
|
||||
slow and tedious. Luckily, you can automate and boost hyperparameter optimization with ClearML's
|
||||
[**`HyperParameterOptimizer`**](../references/sdk/hpo_optimization_hyperparameteroptimizer.md) class.
|
||||
|
||||
## ClearML's HyperParameter Optimization
|
||||
@ -77,11 +77,12 @@ optimization.
|
||||
```python
|
||||
from clearml import Task
|
||||
|
||||
task = Task.init(project_name='Hyper-Parameter Optimization',
|
||||
task_name='Automatic Hyper-Parameter Optimization',
|
||||
task_type=Task.TaskTypes.optimizer,
|
||||
reuse_last_task_id=False)
|
||||
|
||||
task = Task.init(
|
||||
project_name='Hyper-Parameter Optimization',
|
||||
task_name='Automatic Hyper-Parameter Optimization',
|
||||
task_type=Task.TaskTypes.optimizer,
|
||||
reuse_last_task_id=False
|
||||
)
|
||||
```
|
||||
|
||||
1. Define the optimization configuration and resources budget:
|
||||
|
@ -8,7 +8,7 @@ member of the [Task](task.md) object.
|
||||
ClearML integrates with the leading visualization libraries, and automatically captures reports to them.
|
||||
|
||||
## Types of Logged Results
|
||||
In ClearML, there are four types of reports:
|
||||
ClearML supports four types of reports:
|
||||
- Text - Mostly captured automatically from stdout and stderr but can be logged manually.
|
||||
- Scalars - Time series data. X-axis is always a sequential number, usually iterations but can be epochs or others.
|
||||
- Plots - General graphs and diagrams, such as histograms, confusion matrices line plots, and custom plotly charts.
|
||||
|
@ -14,7 +14,7 @@ information as well as execution outputs.
|
||||
All the information captured by a task is by default uploaded to the [ClearML Server](../deploying_clearml/clearml_server.md)
|
||||
and it can be visualized in the [ClearML WebApp](../webapp/webapp_overview.md) (UI). ClearML can also be configured to upload
|
||||
model checkpoints, artifacts, and charts to cloud storage (see [Storage](../integrations/storage.md)). Additionally,
|
||||
there is an option to work with tasks in Offline Mode, in which all information is saved in a local folder (see
|
||||
you can work with tasks in Offline Mode, in which all information is saved in a local folder (see
|
||||
[Storing Task Data Offline](../guides/set_offline.md)).
|
||||
|
||||
In the UI and code, tasks are grouped into [projects](projects.md), which are logical entities similar to folders. Users can decide
|
||||
|
@ -16,10 +16,10 @@ The below is only our opinion. ClearML was designed to fit into any workflow whe
|
||||
|
||||
During early stages of model development, while code is still being modified heavily, this is the usual setup we'd expect to see used by data scientists:
|
||||
|
||||
- A local development machine, usually a laptop (and usually using only CPU) with a fraction of the dataset for faster iterations - this is used for writing the training pipeline code, ensuring it knows to parse the data
|
||||
and there are no glaring bugs.
|
||||
- A workstation with a GPU, usually with a limited amount of memory for small batch-sizes. This is used to train the model and ensure the model we chose makes sense and that the training
|
||||
procedure works. Can be used to provide initial models for testing.
|
||||
- A local development machine, usually a laptop (and usually using only CPU) with a fraction of the dataset for faster
|
||||
iterations - Use a local machine for writing, training, and debugging pipeline code.
|
||||
- A workstation with a GPU, usually with a limited amount of memory for small batch-sizes - Use this workstation to train
|
||||
the model and ensure that you choose a model that makes sense, and the training procedure works. Can be used to provide initial models for testing.
|
||||
|
||||
The abovementioned setups might be folded into each other and that's great! If you have a GPU machine for each researcher, that's awesome!
|
||||
The goal of this phase is to get a code, dataset and environment setup, so we can start digging to find the best model!
|
||||
|
@ -10,7 +10,7 @@ Now, we'll learn how to track Hyperparameters, Artifacts and Metrics!
|
||||
|
||||
Every previously executed experiment is stored as a Task.
|
||||
A Task has a project and a name, both can be changed after the experiment has been executed.
|
||||
A Task is also automatically assigned an auto-generated unique identifier (UUID string) that cannot be changed and will always locate the same Task in the system.
|
||||
A Task is also automatically assigned an auto-generated unique identifier (UUID string) that cannot be changed and always locates the same Task in the system.
|
||||
|
||||
It's possible to retrieve a Task object programmatically by querying the system based on either the Task ID,
|
||||
or project & name combination. It's also possible to query tasks based on their properties, like Tags.
|
||||
@ -26,7 +26,7 @@ Once we have a Task object we can query the state of the Task, get its Model, sc
|
||||
For full reproducibility, it's paramount to save Hyperparameters for each experiment. Since Hyperparameters can have substantial impact
|
||||
on Model performance, saving and comparing these between experiments is sometimes the key to understand model behavior.
|
||||
|
||||
ClearML supports logging `argparse` module arguments out of the box, so once integrating it into the code, it will automatically log all parameters provided to the argument parser.
|
||||
ClearML supports logging `argparse` module arguments out of the box, so once ClearML is integrated into the code, it automatically logs all parameters provided to the argument parser.
|
||||
|
||||
It's also possible to log parameter dictionaries (very useful when parsing an external config file and storing as a dict object),
|
||||
whole configuration files or even custom objects or [Hydra](https://hydra.cc/docs/intro/) configurations!
|
||||
@ -46,7 +46,7 @@ Essentially, artifacts are files (or python objects) uploaded from a script and
|
||||
These Artifacts can be easily accessed by the web UI or programmatically.
|
||||
|
||||
Artifacts can be stored anywhere, either on the ClearML server, or any object storage solution or shared folder.
|
||||
See all [storage capabilities](../../integrations/storage).
|
||||
See all [storage capabilities](../../integrations/storage.md).
|
||||
|
||||
|
||||
### Adding Artifacts
|
||||
@ -84,9 +84,9 @@ local_csv = preprocess_task.artifacts['data'].get_local_copy()
|
||||
```
|
||||
|
||||
The `task.artifacts` is a dictionary where the keys are the Artifact names, and the returned object is the Artifact object.
|
||||
Calling `get_local_copy()` will return a local cached copy of the artifact,
|
||||
this means that the next time we execute the code we will not need to download the artifact again.
|
||||
Calling `get()` will get a deserialized pickled object.
|
||||
Calling `get_local_copy()` returns a local cached copy of the artifact. Therefore, next time we execute the code, we don't
|
||||
need to download the artifact again.
|
||||
Calling `get()` gets a deserialized pickled object.
|
||||
|
||||
Check out the [artifacts retrieval](https://github.com/allegroai/clearml/blob/master/examples/reporting/artifacts_retrieval.py) example code.
|
||||
|
||||
@ -94,15 +94,15 @@ Check out the [artifacts retrieval](https://github.com/allegroai/clearml/blob/ma
|
||||
|
||||
Models are a special kind artifact.
|
||||
Models created by popular frameworks (such as Pytorch, Tensorflow, Scikit-learn) are automatically logged by ClearML.
|
||||
All snapshots are automatically logged, in order to make sure we also automatically upload the model snapshot (instead of saving its local path)
|
||||
All snapshots are automatically logged. In order to make sure we also automatically upload the model snapshot (instead of saving its local path),
|
||||
we need to pass a storage location for the model files to be uploaded to.
|
||||
|
||||
For example uploading all snapshots to our S3 bucket:
|
||||
For example, upload all snapshots to an S3 bucket:
|
||||
```python
|
||||
task = Task.init(project_name='examples', task_name='storing model', output_uri='s3://my_models/')
|
||||
```
|
||||
|
||||
From now on, whenever the framework (TF/Keras/PyTorch etc.) will be storing a snapshot, the model file will automatically get uploaded to our bucket under a specific folder for the experiment.
|
||||
Now, whenever the framework (TF/Keras/PyTorch etc.) stores a snapshot, the model file is automatically uploaded to the bucket to a specific folder for the experiment.
|
||||
|
||||
Loading models by a framework is also logged by the system, these models appear under the “Input Models” section, under the Artifacts tab.
|
||||
|
||||
@ -124,7 +124,7 @@ Like before we have to get the instance of the Task training the original weight
|
||||
:::note
|
||||
Using Tensorflow, the snapshots are stored in a folder, meaning the `local_weights_path` will point to a folder containing our requested snapshot.
|
||||
:::
|
||||
As with Artifacts all models are cached, meaning the next time we will run this code, no model will need to be downloaded.
|
||||
As with Artifacts, all models are cached, meaning the next time we run this code, no model needs to be downloaded.
|
||||
Once one of the frameworks will load the weights file, the running Task will be automatically updated with “Input Model” pointing directly to the original training Task’s Model.
|
||||
This feature allows you to easily get a full genealogy of every trained and used model by your system!
|
||||
|
||||
@ -150,7 +150,7 @@ The experiment table is a powerful tool for creating dashboards and views of you
|
||||
|
||||
|
||||
### Creating Leaderboards
|
||||
The [experiments table](../../webapp/webapp_exp_table.md) can be customized to your own needs, adding desired views of parameters, metrics and tags.
|
||||
Customize the [experiments table](../../webapp/webapp_exp_table.md) to fit your own needs, adding desired views of parameters, metrics and tags.
|
||||
It's possible to filter and sort based on parameters and metrics, so creating custom views is simple and flexible.
|
||||
|
||||
Create a dashboard for a project, presenting the latest Models and their accuracy scores, for immediate insights.
|
||||
|
@ -115,7 +115,7 @@ Task.enqueue(task=cloned_task, queue_name='default')
|
||||
```
|
||||
|
||||
### Advanced Usage
|
||||
Before execution, there are a variety of programmatic methods which can be used to manipulate a task object.
|
||||
Before execution, use a variety of programmatic methods to manipulate a task object.
|
||||
|
||||
#### Modify Hyperparameters
|
||||
[Hyperparameters](../../fundamentals/hyperparameters.md) are an integral part of Machine Learning code as they let you
|
||||
|
@ -7,7 +7,10 @@ Pipelines provide users with a greater level of abstraction and automation, with
|
||||
|
||||
Tasks can interface with other Tasks in the pipeline and leverage other Tasks' work products.
|
||||
|
||||
We'll go through a scenario where users create a Dataset, process the data then consume it with another task, all running as a pipeline.
|
||||
The sections below describe the following scenarios:
|
||||
* Dataset creation
|
||||
* Data processing and consumption
|
||||
* Pipeline building
|
||||
|
||||
|
||||
## Building Tasks
|
||||
@ -56,11 +59,11 @@ dataset.tags = []
|
||||
new_dataset.tags = ['latest']
|
||||
```
|
||||
|
||||
We passed the `parents` argument when we created v2 of the Dataset, this inherits all the parent's version content.
|
||||
This will not only help us in tracing back dataset changes with full genealogy, but will also make our storage more efficient,
|
||||
as it will only store the files that were changed / added from the parent versions.
|
||||
When we will later need access to the Dataset it will automatically merge the files from all parent versions
|
||||
in a fully automatic and transparent process, as if they were always part of the requested Dataset.
|
||||
We passed the `parents` argument when we created v2 of the Dataset, which inherits all the parent's version content.
|
||||
This not only helps trace back dataset changes with full genealogy, but also makes our storage more efficient,
|
||||
since it only store the changed and / or added files from the parent versions.
|
||||
When we access the Dataset, it automatically merges the files from all parent versions
|
||||
in a fully automatic and transparent process, as if the files were always part of the requested Dataset.
|
||||
|
||||
### Training
|
||||
We can now train our model with the **latest** Dataset we have in the system.
|
||||
|
@ -17,8 +17,7 @@ dataset), and reports (uploads) the following to the main Task:
|
||||
Each Task in a subprocess references the main Task by calling [Task.current_task](../../references/sdk/task#taskcurrent_task), which always returns
|
||||
the main Task.
|
||||
|
||||
When the script runs, it creates an experiment named `test torch distributed`, which is associated with the `examples` project
|
||||
in the **ClearML Web UI**.
|
||||
When the script runs, it creates an experiment named `test torch distributed`, which is associated with the `examples` project.
|
||||
|
||||
## Artifacts
|
||||
|
||||
|
@ -32,7 +32,7 @@ clearml-session --docker nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 --packages
|
||||
* Specify the resource queue `--queue default`.
|
||||
|
||||
:::note
|
||||
There is an option to enter a project name using `--project <name>`. If no project is input, the default project
|
||||
Enter a project name using `--project <name>`. If no project is input, the default project
|
||||
name is "DevOps"
|
||||
:::
|
||||
|
||||
|
@ -6,7 +6,7 @@ The ClearML [AWS autoscaler example](https://github.com/allegroai/clearml/blob/m
|
||||
demonstrates how to use the [`clearml.automation.auto_scaler`](https://github.com/allegroai/clearml/blob/master/clearml/automation/auto_scaler.py)
|
||||
module to implement a service that optimizes AWS EC2 instance scaling according to a defined instance budget.
|
||||
|
||||
It periodically polls your AWS cluster and automatically stops idle instances based on a defined maximum idle time or spins
|
||||
The autoscaler periodically polls your AWS cluster and automatically stops idle instances based on a defined maximum idle time or spins
|
||||
up new instances when there aren't enough to execute pending tasks.
|
||||
|
||||
## Running the ClearML AWS Autoscaler
|
||||
|
@ -38,7 +38,7 @@ frame.add_annotation(box2d_xywh=(10, 10, 30, 20), labels=['test'])
|
||||
The `box2d_xywh` argument specifies the coordinates of the annotation's bounding box, and the `labels` argument specifies
|
||||
a list of labels for the annotation.
|
||||
|
||||
When adding an annotation there are a few options for entering the annotation's boundaries, including:
|
||||
Enter the annotation's boundaries in one of the following ways:
|
||||
* `poly2d_xy` - A list of floating points (x,y) to create for single polygon, or a list of floating points lists for a
|
||||
complex polygon.
|
||||
* `ellipse2d_xyrrt` - A List consisting of cx, cy, rx, ry, and theta for an ellipse.
|
||||
|
@ -8,7 +8,7 @@ source data to the ClearML Enterprise platform. That source data is a **mask**.
|
||||
Masks are used in deep learning for semantic segmentation.
|
||||
|
||||
Masks correspond to raw data where the objects to be detected are marked with colors in the masks. The colors
|
||||
are RGB values and represent the objects, which are labeled for segmentation.
|
||||
are RGB values and represent the objects that are labeled for segmentation.
|
||||
|
||||
In frames used for semantic segmentation, the metadata connecting the mask files / images to the ClearML Enterprise platform,
|
||||
and the RGB values and labels used for segmentation are separate. They are contained in two different dictionaries of
|
||||
|
@ -1,8 +1,7 @@
|
||||
---
|
||||
title: Comparing Experiments
|
||||
---
|
||||
It is always useful to be able to do some forensics on what causes an experiment to succeed and to better understand
|
||||
performance issues.
|
||||
It is always useful to investigate what causes an experiment to succeed.
|
||||
The **ClearML Web UI** provides a deep experiment comparison, allowing to locate, visualize, and analyze differences including:
|
||||
|
||||
* [Details](#details)
|
||||
|
@ -170,7 +170,7 @@ Click the checkbox in the top left corner of the table to select all items curre
|
||||
An extended bulk selection tool is available through the down arrow next to the checkbox in the top left corner, enabling
|
||||
selecting items beyond the items currently on-screen:
|
||||
* **All** - Select all experiments in the project
|
||||
* **None** - Clear Selection
|
||||
* **None** - Clear selection
|
||||
* **Filtered** - Select **all experiments in the project** that match the current active filters in the project
|
||||
|
||||
## Creating an Experiment Leaderboard
|
||||
|
@ -11,7 +11,7 @@ meaning that it's the first thing that is seen when opening the project.
|
||||
|
||||
## Metric Snapshot
|
||||
|
||||
On the top of the **OVERVIEW** tab, there is an option to display a **metric snapshot**. Choose a metric and variant,
|
||||
On the top of the **OVERVIEW** tab, you can display a **metric snapshot**. Choose a metric and variant,
|
||||
and then the window will present an aggregated view of the value for that metric and the time that each
|
||||
experiment scored that value. This way, the project's progress can be quickly deduced.
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user