mirror of
https://github.com/clearml/clearml
synced 2025-06-26 18:16:07 +00:00
clearml initial version 0.17.0
This commit is contained in:
136
docs/clearml-task.md
Normal file
136
docs/clearml-task.md
Normal file
@@ -0,0 +1,136 @@
|
||||
# `clearml-task` - Execute ANY python code on a remote machine
|
||||
|
||||
If you are already familiar with `clearml`, then you can think of `clearml-task` as a way to create a Task/experiment
|
||||
from any script without the need to add even a single line of code to the original codebase.
|
||||
|
||||
`clearml-task` allows a user to **take any python code/repository and launch it on a remote machine**.
|
||||
|
||||
The remote execution is fully monitored, all outputs - including console / tensorboard / matplotlib
|
||||
are logged in real-time into the ClearML UI
|
||||
|
||||
## What does it do?
|
||||
|
||||
`clearml-task` creates a new experiment on your `clearml-server`; it populates the experiment's environment with:
|
||||
|
||||
* repository/commit/branch, as specified by the command-line invocation.
|
||||
* optional: the base docker image to be used as underlying environment
|
||||
* optional: alternative python requirements, in case `requirements.txt` is not found inside the repository.
|
||||
|
||||
Once the new experiment is created and populated, it will enqueue the experiment to the selected execution queue.
|
||||
|
||||
When the experiment is executed on the remote machine (performed by an available `clearml-agent`), all the console outputs
|
||||
will be logged in real-time, alongside your TensorBoard and matplotlib.
|
||||
|
||||
### Use-cases for `clearml-task` remote execution
|
||||
|
||||
- You have an off-the-shelf code, and you want to launch it on a remote machine with a specific resource (i.e., GPU)
|
||||
- You want to run the [hyper-parameter optimization]() on a codebase that is still not connected with `clearml`
|
||||
- You want to create a [pipeline]() from an assortment of scripts, and you need to create Tasks for those scripts
|
||||
- Sometimes, you just want to run some code on a remote machine, either using an on-prem cluster or on the cloud...
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- A single python script, or an up-to-date repository containing the codebase.
|
||||
- `clearml-agent` running on at least one machine (to execute the experiment)
|
||||
|
||||
## Tutorial
|
||||
|
||||
### Launching a job from a repository
|
||||
|
||||
We will be launching this [script](https://github.com/allegroai/trains/blob/master/examples/frameworks/scikit-learn/sklearn_matplotlib_example.py) on a remote machine. The following are the command-line options we will be using:
|
||||
- First, we have to give the experiment a name and select a project (`--project examples --name remote_test`)
|
||||
- Then, we select the repository with our code. If we do not specify branch / commit, it will take the latest commit
|
||||
from the master branch (`--repo https://github.com/allegroai/clearml.git`)
|
||||
- Lastly, we need to specify which script in the repository needs to be run (`--script examples/frameworks/scikit-learn/sklearn_matplotlib_example.py`)
|
||||
Notice that by default, the execution working directory will be the root of the repository. If we need to change it, add `--cwd <folder>`
|
||||
|
||||
If we additionally need to pass an argument to our scripts, use the `--args` switch.
|
||||
The names of the arguments should match the argparse arguments, removing the '--' prefix
|
||||
(e.g. instead of --key=value -> use `--args key=value` )
|
||||
|
||||
``` bash
|
||||
clearml-task --project examples --name remote_test --repo https://github.com/allegroai/clearml.git
|
||||
--script examples/frameworks/scikit-learn/sklearn_matplotlib_example.py
|
||||
--queue single_gpu
|
||||
```
|
||||
|
||||
### Launching a job from a local script
|
||||
|
||||
We will be launching a single local script file (no git repo needed) on a remote machine.
|
||||
|
||||
- First, we have to give the experiment a name and select a project (`--project examples --name remote_test`)
|
||||
- Then, we select the script file on our machine, `--script /path/to/my/script.py`
|
||||
- If we need specific packages, we can specify them manually with `--packages "tqdm>=4" "torch>1.0"`
|
||||
or we can pass a requirements file `--requirements /path/to/my/requirements.txt`
|
||||
- Same as in the repo case, if we need to pass arguments to `argparse` we can add `--args key=value`
|
||||
- If we have a docker container with an entire environment we want our script to run inside,
|
||||
add e.g., `--docker nvcr.io/nvidia/pytorch:20.11-py3`
|
||||
|
||||
Note: In this example, the exact version of PyTorch to install will be resolved by the `clearml-agent` depending on the CUDA environment available at runtime.
|
||||
|
||||
``` bash
|
||||
clearml-task --project examples --name remote_test --script /path/to/my/script.py
|
||||
--packages "tqdm>=4" "torch>1.0" --args verbose=true
|
||||
--queue dual_gpu
|
||||
```
|
||||
|
||||
### CLI options
|
||||
|
||||
``` bash
|
||||
clearml-task --help
|
||||
```
|
||||
|
||||
``` console
|
||||
ClearML launch - launch any codebase on remote machines running clearml-agent
|
||||
|
||||
optional arguments:
|
||||
-h, --help show this help message and exit
|
||||
--version Display the Allegro.ai utility version
|
||||
--project PROJECT Required: set the project name for the task. If
|
||||
--base-task-id is used, this arguments is optional.
|
||||
--name NAME Required: select a name for the remote task
|
||||
--repo REPO remote URL for the repository to use. Example: --repo
|
||||
https://github.com/allegroai/clearml.git
|
||||
--branch BRANCH Select specific repository branch/tag (implies the
|
||||
latest commit from the branch)
|
||||
--commit COMMIT Select specific commit id to use (default: latest
|
||||
commit, or when used with local repository matching
|
||||
the local commit id)
|
||||
--folder FOLDER Remotely execute the code in the local folder. Notice!
|
||||
It assumes a git repository already exists. Current
|
||||
state of the repo (commit id and uncommitted changes)
|
||||
is logged and will be replicated on the remote machine
|
||||
--script SCRIPT Specify the entry point script for the remote
|
||||
execution. When used in tandem with --repo the script
|
||||
should be a relative path inside the repository, for
|
||||
example: --script source/train.py .When used with
|
||||
--folder it supports a direct path to a file inside
|
||||
the local repository itself, for example: --script
|
||||
~/project/source/train.py
|
||||
--cwd CWD Working directory to launch the script from. Default:
|
||||
repository root folder. Relative to repo root or local
|
||||
folder
|
||||
--args [ARGS [ARGS ...]]
|
||||
Arguments to pass to the remote execution, list of
|
||||
<argument>=<value> strings.Currently only argparse
|
||||
arguments are supported. Example: --args lr=0.003
|
||||
batch_size=64
|
||||
--queue QUEUE Select the queue to launch the task. If not provided a
|
||||
Task will be created but it will not be launched.
|
||||
--requirements REQUIREMENTS
|
||||
Specify requirements.txt file to install when setting
|
||||
the session. If not provided, the requirements.txt
|
||||
from the repository will be used.
|
||||
--packages [PACKAGES [PACKAGES ...]]
|
||||
Manually specify a list of required packages. Example:
|
||||
--packages "tqdm>=2.1" "scikit-learn"
|
||||
--docker DOCKER Select the docker image to use in the remote session
|
||||
--skip-task-init If set, Task.init() call is not added to the entry
|
||||
point, and is assumed to be called in within the
|
||||
script. Default: add Task.init() call entry point
|
||||
script
|
||||
--base-task-id BASE_TASK_ID
|
||||
Use a pre-existing task in the system, instead of a local repo/script.
|
||||
Essentially clones an existing task and overrides arguments/requirements.
|
||||
|
||||
```
|
||||
196
docs/clearml.conf
Normal file
196
docs/clearml.conf
Normal file
@@ -0,0 +1,196 @@
|
||||
# ClearML SDK configuration file
|
||||
api {
|
||||
# web_server on port 8080
|
||||
web_server: "http://localhost:8080"
|
||||
|
||||
# Notice: 'api_server' is the api server (default port 8008), not the web server.
|
||||
api_server: "http://localhost:8008"
|
||||
|
||||
# file server on port 8081
|
||||
files_server: "http://localhost:8081"
|
||||
|
||||
# Credentials are generated using the webapp, http://localhost:8080/profile
|
||||
credentials {"access_key": "EGRTCO8JMSIGI6S39GTP43NFWXDQOW", "secret_key": "x!XTov_G-#vspE*Y(h$Anm&DIc5Ou-F)jsl$PdOyj5wG1&E!Z8"}
|
||||
|
||||
# verify host ssl certificate, set to False only if you have a very good reason
|
||||
verify_certificate: True
|
||||
}
|
||||
sdk {
|
||||
# ClearML - default SDK configuration
|
||||
|
||||
storage {
|
||||
cache {
|
||||
# Defaults to system temp folder / cache
|
||||
default_base_dir: "~/.clearml/cache"
|
||||
}
|
||||
}
|
||||
|
||||
metrics {
|
||||
# History size for debug files per metric/variant. For each metric/variant combination with an attached file
|
||||
# (e.g. debug image event), file names for the uploaded files will be recycled in such a way that no more than
|
||||
# X files are stored in the upload destination for each metric/variant combination.
|
||||
file_history_size: 100
|
||||
|
||||
# Max history size for matplotlib imshow files per plot title.
|
||||
# File names for the uploaded images will be recycled in such a way that no more than
|
||||
# X images are stored in the upload destination for each matplotlib plot title.
|
||||
matplotlib_untitled_history_size: 100
|
||||
|
||||
# Limit the number of digits after the dot in plot reporting (reducing plot report size)
|
||||
# plot_max_num_digits: 5
|
||||
|
||||
# Settings for generated debug images
|
||||
images {
|
||||
format: JPEG
|
||||
quality: 87
|
||||
subsampling: 0
|
||||
}
|
||||
|
||||
# Support plot-per-graph fully matching Tensorboard behavior (i.e. if this is set to true, each series should have its own graph)
|
||||
tensorboard_single_series_per_graph: false
|
||||
}
|
||||
|
||||
network {
|
||||
metrics {
|
||||
# Number of threads allocated to uploading files (typically debug images) when transmitting metrics for
|
||||
# a specific iteration
|
||||
file_upload_threads: 4
|
||||
|
||||
# Warn about upload starvation if no uploads were made in specified period while file-bearing events keep
|
||||
# being sent for upload
|
||||
file_upload_starvation_warning_sec: 120
|
||||
}
|
||||
|
||||
iteration {
|
||||
# Max number of retries when getting frames if the server returned an error (http code 500)
|
||||
max_retries_on_server_error: 5
|
||||
# Backoff factory for consecutive retry attempts.
|
||||
# SDK will wait for {backoff factor} * (2 ^ ({number of total retries} - 1)) between retries.
|
||||
retry_backoff_factor_sec: 10
|
||||
}
|
||||
}
|
||||
aws {
|
||||
s3 {
|
||||
# S3 credentials, used for read/write access by various SDK elements
|
||||
|
||||
# default, used for any bucket not specified below
|
||||
key: ""
|
||||
secret: ""
|
||||
region: ""
|
||||
|
||||
credentials: [
|
||||
# specifies key/secret credentials to use when handling s3 urls (read or write)
|
||||
# {
|
||||
# bucket: "my-bucket-name"
|
||||
# key: "my-access-key"
|
||||
# secret: "my-secret-key"
|
||||
# },
|
||||
# {
|
||||
# # This will apply to all buckets in this host (unless key/value is specifically provided for a given bucket)
|
||||
# host: "my-minio-host:9000"
|
||||
# key: "12345678"
|
||||
# secret: "12345678"
|
||||
# multipart: false
|
||||
# secure: false
|
||||
# }
|
||||
]
|
||||
}
|
||||
boto3 {
|
||||
pool_connections: 512
|
||||
max_multipart_concurrency: 16
|
||||
}
|
||||
}
|
||||
google.storage {
|
||||
# # Default project and credentials file
|
||||
# # Will be used when no bucket configuration is found
|
||||
# project: "clearml"
|
||||
# credentials_json: "/path/to/credentials.json"
|
||||
|
||||
# # Specific credentials per bucket and sub directory
|
||||
# credentials = [
|
||||
# {
|
||||
# bucket: "my-bucket"
|
||||
# subdir: "path/in/bucket" # Not required
|
||||
# project: "clearml"
|
||||
# credentials_json: "/path/to/credentials.json"
|
||||
# },
|
||||
# ]
|
||||
}
|
||||
azure.storage {
|
||||
# containers: [
|
||||
# {
|
||||
# account_name: "clearml"
|
||||
# account_key: "secret"
|
||||
# # container_name:
|
||||
# }
|
||||
# ]
|
||||
}
|
||||
|
||||
log {
|
||||
# debugging feature: set this to true to make null log propagate messages to root logger (so they appear in stdout)
|
||||
null_log_propagate: false
|
||||
task_log_buffer_capacity: 66
|
||||
|
||||
# disable urllib info and lower levels
|
||||
disable_urllib3_info: true
|
||||
}
|
||||
|
||||
development {
|
||||
# Development-mode options
|
||||
|
||||
# dev task reuse window
|
||||
task_reuse_time_window_in_hours: 72.0
|
||||
|
||||
# Run VCS repository detection asynchronously
|
||||
vcs_repo_detect_async: true
|
||||
|
||||
# Store uncommitted git/hg source code diff in experiment manifest when training in development mode
|
||||
# This stores "git diff" or "hg diff" into the experiment's "script.requirements.diff" section
|
||||
store_uncommitted_code_diff: true
|
||||
store_code_diff_from_remote: false
|
||||
|
||||
# Support stopping an experiment in case it was externally stopped, status was changed or task was reset
|
||||
support_stopping: true
|
||||
|
||||
# Default Task output_uri. if output_uri is not provided to Task.init, default_output_uri will be used instead.
|
||||
default_output_uri: ""
|
||||
|
||||
# Default auto generated requirements optimize for smaller requirements
|
||||
# If True, analyze the entire repository regardless of the entry point.
|
||||
# If False, first analyze the entry point script, if it does not contain other to local files,
|
||||
# do not analyze the entire repository.
|
||||
force_analyze_entire_repo: false
|
||||
|
||||
# If set to true, *clearml* update message will not be printed to the console
|
||||
# this value can be overwritten with os environment variable CLEARML_SUPPRESS_UPDATE_MESSAGE=1
|
||||
suppress_update_message: false
|
||||
|
||||
# If this flag is true (default is false), instead of analyzing the code with Pigar, analyze with `pip freeze`
|
||||
detect_with_pip_freeze: false
|
||||
detect_with_conda_freeze: false
|
||||
|
||||
# Log specific environment variables. OS environments are enlisted in the "Environment" section
|
||||
# of the Hyper-Parameters.
|
||||
# multiple selected variables are supported including the suffix '*'.
|
||||
# For example: "AWS_*" will log any OS environment variable starting with 'AWS_'.
|
||||
# This value can be overwritten with os environment variable CLEARML_LOG_ENVIRONMENT="[AWS_*, CUDA_VERSION]"
|
||||
# Example: log_os_environments: ["AWS_*", "CUDA_VERSION"]
|
||||
log_os_environments: []
|
||||
|
||||
# Development mode worker
|
||||
worker {
|
||||
# Status report period in seconds
|
||||
report_period_sec: 2
|
||||
|
||||
# ping to the server - check connectivity
|
||||
ping_period_sec: 30
|
||||
|
||||
# Log all stdout & stderr
|
||||
log_stdout: true
|
||||
|
||||
# compatibility feature, report memory usage for the entire machine
|
||||
# default (false), report only on the running process and its sub-processes
|
||||
report_global_mem_used: false
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -2,39 +2,40 @@
|
||||
|
||||
Firstly, we thank you for taking the time to contribute!
|
||||
|
||||
The following is a set of guidelines for contributing to TRAINS.
|
||||
Contribution comes in many forms:
|
||||
* Reporting [issues](https://github.com/allegroai/clearml/issues) you've come upon
|
||||
* Participating in issue discussions in the [issue tracker](https://github.com/allegroai/clearml/issues) and the [ClearML community slack space](https://join.slack.com/t/allegroai-trains/shared_invite/enQtOTQyMTI1MzQxMzE4LTY5NTUxOTY1NmQ1MzQ5MjRhMGRhZmM4ODE5NTNjMTg2NTBlZGQzZGVkMWU3ZDg1MGE1MjQxNDEzMWU2NmVjZmY)
|
||||
* Suggesting new features or enhancements
|
||||
* Implementing new features or fixing outstanding issues
|
||||
|
||||
The following is a set of guidelines for contributing to ClearML.
|
||||
These are primarily guidelines, not rules.
|
||||
Use your best judgment and feel free to propose changes to this document in a pull request.
|
||||
|
||||
## Reporting Bugs
|
||||
## Reporting Issues
|
||||
|
||||
This section guides you through submitting a bug report for TRAINS.
|
||||
By following these guidelines, you
|
||||
help maintainers and the community understand your report, reproduce the behavior, and find related reports.
|
||||
By following these guidelines, you help maintainers and the community understand your report, reproduce the behavior, and find related reports.
|
||||
|
||||
Before creating bug reports, please check whether the bug you want to report already appears [here](link to issues).
|
||||
You may discover that you do not need to create a bug report.
|
||||
When you are creating a bug report, please include as much detail as possible.
|
||||
Before reporting an issue, please check whether it already appears [here](https://github.com/allegroai/clearml/issues).
|
||||
If it does, join the on-going discussion instead.
|
||||
|
||||
**Note**: If you find a **Closed** issue that may be the same issue which you are currently experiencing,
|
||||
then open a **New** issue and include a link to the original (Closed) issue in the body of your new one.
|
||||
|
||||
Explain the problem and include additional details to help maintainers reproduce the problem:
|
||||
When reporting an issue, please include as much detail as possible: explain the problem and include additional details to help maintainers reproduce the problem:
|
||||
|
||||
* **Use a clear and descriptive title** for the issue to identify the problem.
|
||||
* **Describe the exact steps necessary to reproduce the problem** in as much detail as possible. Please do not just summarize what you did. Make sure to explain how you did it.
|
||||
* **Provide the specific environment setup.** Include the `pip freeze` output, specific environment variables, Python version, and other relevant information.
|
||||
* **Provide specific examples to demonstrate the steps.** Include links to files or GitHub projects, or copy/paste snippets which you use in those examples.
|
||||
* **If you are reporting any TRAINS crash,** include a crash report with a stack trace from the operating system. Make sure to add the crash report in the issue and place it in a [code block](https://help.github.com/en/articles/getting-started-with-writing-and-formatting-on-github#multiple-lines),
|
||||
* **If you are reporting any ClearML crash,** include a crash report with a stack trace from the operating system. Make sure to add the crash report in the issue and place it in a [code block](https://help.github.com/en/articles/getting-started-with-writing-and-formatting-on-github#multiple-lines),
|
||||
a [file attachment](https://help.github.com/articles/file-attachments-on-issues-and-pull-requests/), or just put it in a [gist](https://gist.github.com/) (and provide link to that gist).
|
||||
* **Describe the behavior you observed after following the steps** and the exact problem with that behavior.
|
||||
* **Explain which behavior you expected to see and why.**
|
||||
* **For Web-App issues, please include screenshots and animated GIFs** which recreate the described steps and clearly demonstrate the problem. You can use [LICEcap](https://www.cockos.com/licecap/) to record GIFs on macOS and Windows, and [silentcast](https://github.com/colinkeenan/silentcast) or [byzanz](https://github.com/threedaymonk/byzanz) on Linux.
|
||||
|
||||
## Suggesting Enhancements
|
||||
## Suggesting New Features and Enhancements
|
||||
|
||||
This section guides you through submitting an enhancement suggestion for TRAINS, including
|
||||
completely new features and minor improvements to existing functionality.
|
||||
By following these guidelines, you help maintainers and the community understand your suggestion and find related suggestions.
|
||||
|
||||
Enhancement suggestions are tracked as GitHub issues. After you determine which repository your enhancement suggestion is related to, create an issue on that repository and provide the following:
|
||||
@@ -43,12 +44,18 @@ Enhancement suggestions are tracked as GitHub issues. After you determine which
|
||||
* **A step-by-step description of the suggested enhancement** in as much detail as possible.
|
||||
* **Specific examples to demonstrate the steps.** Include copy/pasteable snippets which you use in those examples as [Markdown code blocks](https://help.github.com/articles/markdown-basics/#multiple-lines).
|
||||
* **Describe the current behavior and explain which behavior you expected to see instead and why.**
|
||||
* **Include screenshots or animated GIFs** which help you demonstrate the steps or point out the part of TRAINS which the suggestion is related to. You can use [LICEcap](https://www.cockos.com/licecap/) to record GIFs on macOS and Windows, and [silentcast](https://github.com/colinkeenan/silentcast) or [byzanz](https://github.com/threedaymonk/byzanz) on Linux.
|
||||
|
||||
|
||||
|
||||
* **Include screenshots or animated GIFs** which help you demonstrate the steps or point out the part of ClearML which the suggestion is related to. You can use [LICEcap](https://www.cockos.com/licecap/) to record GIFs on macOS and Windows, and [silentcast](https://github.com/colinkeenan/silentcast) or [byzanz](https://github.com/threedaymonk/byzanz) on Linux.
|
||||
|
||||
## Pull Requests
|
||||
|
||||
Before you submit a new PR:
|
||||
|
||||
* Verify the work you plan to merge addresses an existing [issue](https://github.com/allegroai/clearml/issues) (If not, open a new one)
|
||||
* Check related discussions in the [ClearML slack community](https://join.slack.com/t/allegroai-trains/shared_invite/enQtOTQyMTI1MzQxMzE4LTY5NTUxOTY1NmQ1MzQ5MjRhMGRhZmM4ODE5NTNjMTg2NTBlZGQzZGVkMWU3ZDg1MGE1MjQxNDEzMWU2NmVjZmY) (Or start your own discussion on the `#clearml-dev` channel)
|
||||
* Make sure your code conforms to the ClearML coding standards by running:
|
||||
`flake8 --max-line-length=120 --statistics --show-source --extend-ignore=E501 ./trains*`
|
||||
|
||||
In your PR include:
|
||||
* A reference to the issue it addresses
|
||||
* A brief description of the approach you've taken for implementing
|
||||
|
||||
|
||||
BIN
docs/dataset_screenshots.gif
Normal file
BIN
docs/dataset_screenshots.gif
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 447 KiB |
139
docs/datasets.md
Normal file
139
docs/datasets.md
Normal file
@@ -0,0 +1,139 @@
|
||||
# ClearML introducing Dataset management!
|
||||
|
||||
## Decoupling Data from Code - The Dataset Paradigm
|
||||
|
||||
### The ultimate goal of `clearml-data` is to transform datasets into configuration parameters
|
||||
Just like any other argument, the dataset argument should retrieve a full local copy of the
|
||||
dataset to be used by the experiment.
|
||||
This means datasets can be efficiently retrieved by any machine in a reproducible way.
|
||||
Together it creates a full version control solution for all your data,
|
||||
that is both machine and environment agnostic.
|
||||
|
||||
|
||||
### Design Goals : Simple / Agnostic / File-based / Efficient
|
||||
|
||||
## Key Concepts:
|
||||
1) **Dataset** is a **collection of files** : e.g. folder with all subdirectories and files included in the dataset
|
||||
2) **Differential storage** : Efficient storage / network
|
||||
3) **Flexible**: support addition / removal / merge of files and datasets
|
||||
4) **Descriptive, transparent & searchable**: support projects, names, descriptions, tags and searchable fields
|
||||
5) **Simple interface** (CLI and programmatic)
|
||||
6) **Accessible**: get a copy of the dataset files from anywhere on any machine
|
||||
|
||||
### Workflow:
|
||||
|
||||
#### Simple dataset creation with CLI:
|
||||
|
||||
- Create a dataset
|
||||
``` bash
|
||||
clearml-data create --project <my_project> --name <my_dataset_name>
|
||||
```
|
||||
- Add local files to the dataset
|
||||
``` bashtrue
|
||||
clearml-data add --id <dataset_id_from_previous_command> --files ~/datasets/best_dataset/
|
||||
```
|
||||
- Upload files (Optional: specify storage `--storage` `s3://bucket` or `gs://` or `azure://` or `/mnt/shared/`)
|
||||
``` bash
|
||||
clearml-data upload --id <dataset_id>
|
||||
```
|
||||
- Close dataset
|
||||
``` bash
|
||||
clearml-data close --id <dataset_id>
|
||||
```
|
||||
|
||||
|
||||
#### Integrating datasets into your code:
|
||||
``` python
|
||||
from argparse import ArgumentParser
|
||||
from clearml import Dataset
|
||||
|
||||
# adding command line interface, so it is easy to use
|
||||
parser = ArgumentParser()
|
||||
parser.add_argument('--dataset', default='aayyzz', type=str, help='Dataset ID to train on')
|
||||
args = parser.parse_args()
|
||||
|
||||
# creating a task, so that later we could override the argparse from UI
|
||||
task = Task.init(project_name='examples', task_name='dataset demo')
|
||||
|
||||
# getting a local copy of the dataset
|
||||
dataset_folder = Datset.get(dataset_id=args.dataset).get_local_copy()
|
||||
|
||||
# go over the files in `dataset_folder` and train your model
|
||||
```
|
||||
|
||||
|
||||
#### Modifying a dataset with CLI:
|
||||
|
||||
- Create a new dataset (specify the parent dataset id)
|
||||
``` bash
|
||||
clearml-data create --name <improved_dataset> --parents <existing_dataset_id>
|
||||
```
|
||||
- Get a mutable copy of the current dataset
|
||||
``` bash
|
||||
clearml-data get --id <created_dataset_id> --copy ~/datasets/working_dataset
|
||||
```
|
||||
- Change / add / remove files from the dataset folder
|
||||
``` bash
|
||||
vim ~/datasets/working_dataset/everything.csv
|
||||
```
|
||||
- Sync local changes
|
||||
``` bash
|
||||
clearml-data sync --id <created_dataset_id> --folder ~/datasets/working_dataset
|
||||
```
|
||||
- Upload files (Optional: specify storage `--storage` `s3://bucket` or `gs://` or `azure://` or `/mnt/shared/`)
|
||||
``` bash
|
||||
clearml-data upload --id <created_dataset_id>
|
||||
```
|
||||
- Close dataset
|
||||
``` bash
|
||||
clearml-data close --id <created_dataset_id>
|
||||
```
|
||||
|
||||
|
||||
#### Command Line Interface Summary:
|
||||
|
||||
- **`search`** Search a dataset based on project / name / description / tag etc.
|
||||
- **`list`** List the file directory content of a dataset (no need to download a copy pf the dataset)
|
||||
- **`verify`** Verify a local copy of a dataset (verify the dataset files SHA2 hash)
|
||||
- **`create`** Create a new dataset (support extending/inheriting multiple parents)
|
||||
- **`delete`** Delete a dataset
|
||||
- **`add`** Add local files to a dataset
|
||||
- **`sync`** Sync dataset with a local folder (source-of-truth being the local folder)
|
||||
- **`remove`** Remove files from dataset (no need to download a copy of the dataset)
|
||||
- **`get`** Get a local copy of the dataset (either readonly --link, or writable --copy)
|
||||
- **`upload`** Upload the dataset (use --storage to specify storage target such as S3/GS/Azure/Folder, default: file server)
|
||||
|
||||
|
||||
#### Under the hood (how it all works):
|
||||
|
||||
Each dataset instance stores the collection of files added/modified from the previous version (parent).
|
||||
|
||||
When requesting a copy of the dataset all parent datasets on the graph are downloaded and a new folder
|
||||
is merged with all changes introduced in the dataset DAG.
|
||||
|
||||
Implementation details:
|
||||
|
||||
Dataset differential snapshot is stored in a single zip file for efficiency in storage and network
|
||||
bandwidth. Local cache is built into the process making sure datasets are downloaded only once.
|
||||
Dataset contains SHA2 hash of all the files in the dataset.
|
||||
In order to increase dataset fetching speed, only file size is verified automatically,
|
||||
the SHA2 hash is verified only on user's request.
|
||||
|
||||
The design supports multiple parents per dataset, essentially merging all parents based on order.
|
||||
To improve deep dataset DAG storage and speed, dataset squashing was introduced. A user can squash
|
||||
a dataset, merging down all changes introduced in the DAG, creating a new flat version without parent datasets.
|
||||
|
||||
|
||||
### Datasets UI:
|
||||
|
||||
A dataset is represented as a special `Task` in the system. <br>
|
||||
It is of type `data-processing` with a special tag `dataset`.
|
||||
|
||||
- Full log (calls / CLI) of the dataset creation process can be found in the "Execution" section.
|
||||
- Listing of the dataset differential snapshot, summary of files added / modified / removed and details of files
|
||||
in the differential snapshot (location / size / hash), is available in the Artifacts section you can find a
|
||||
- The full dataset listing (all files included) is available in the Configuration section under `Dataset Content`.
|
||||
This allows you to quickly compare two dataset contents and visually see the difference.
|
||||
- The dataset genealogy DAG and change-set summary table is visualized in Results / Plots
|
||||
|
||||
<a href="https://app.community.clear.ml"><img src="https://github.com/allegroai/clearml/blob/master/docs/dataset_screenshots.gif?raw=true" width="80%"></a>
|
||||
@@ -1,6 +1,6 @@
|
||||
# TRAINS Explicit Logging
|
||||
# ClearML Explicit Logging
|
||||
|
||||
Using the **TRAINS** [Logger](https://github.com/allegroai/trains/blob/master/trains/logger.py) module and other **TRAINS** features, you can explicitly log any of the following:
|
||||
Using the **ClearML** [Logger](https://github.com/allegroai/clearml/blob/master/clearml/logger.py) module and other **ClearML** features, you can explicitly log any of the following:
|
||||
|
||||
* Report graphs and images
|
||||
* [Scalar metrics](#scalar-metrics)
|
||||
@@ -19,10 +19,10 @@ Using the **TRAINS** [Logger](https://github.com/allegroai/trains/blob/master/tr
|
||||
* Message logging
|
||||
* [Reporting text without formatting](#reporting-text-without-formatting)
|
||||
|
||||
Additionally, the **TRAINS** Logger module provides methods that allow you to do the following:
|
||||
Additionally, the **ClearML** Logger module provides methods that allow you to do the following:
|
||||
|
||||
* Get the [current logger]()
|
||||
* Overrride the TRAINS configuration file with a [default upload destination]() for images and files
|
||||
* Overrride the ClearML configuration file with a [default upload destination]() for images and files
|
||||
|
||||
## Graphs and Images
|
||||
|
||||
@@ -30,7 +30,7 @@ Additionally, the **TRAINS** Logger module provides methods that allow you to do
|
||||
|
||||
Use to report scalar metrics by iteration as a line plot.
|
||||
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/scalar_reporting.py)) with the following method.
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/scalar_reporting.py)) with the following method.
|
||||
|
||||
**Method**:
|
||||
|
||||
@@ -101,7 +101,7 @@ def report_scalar(self, title, series, value, iteration)
|
||||
|
||||
Use to report any data by iteration as a histogram.
|
||||
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/scatter_hist_confusion_mat_reporting.py)) with the following method.
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/scatter_hist_confusion_mat_reporting.py)) with the following method.
|
||||
|
||||
**Method**:
|
||||
|
||||
@@ -199,7 +199,7 @@ def report_histogram(self, title, series, values, iteration, labels=None, xlabel
|
||||
|
||||
Use to report any data by iteration as a single or multiple line plot.
|
||||
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/scatter_hist_confusion_mat_reporting.py)) with the following method.
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/scatter_hist_confusion_mat_reporting.py)) with the following method.
|
||||
|
||||
**Method**:
|
||||
|
||||
@@ -325,7 +325,7 @@ def report_line_plot(self, title, series, iteration, xaxis, yaxis, mode='lines',
|
||||
|
||||
Use to report any vector data as a 2D scatter diagram.
|
||||
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/scatter_hist_confusion_mat_reporting.py)) with the following method.
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/scatter_hist_confusion_mat_reporting.py)) with the following method.
|
||||
|
||||
**Method**:
|
||||
|
||||
@@ -461,7 +461,7 @@ def report_scatter2d(self, title, series, scatter, iteration, xaxis=None, yaxis=
|
||||
|
||||
Use to report any array data as a 3D scatter diagram.
|
||||
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/3d_plots_reporting.py)) with the following method.
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/3d_plots_reporting.py)) with the following method.
|
||||
|
||||
**Method**:
|
||||
|
||||
@@ -597,7 +597,7 @@ def report_scatter3d(self, title, series, scatter, iteration, labels=None, mode=
|
||||
|
||||
Use to report a heat-map matrix as a confusion matrix. You can also plot a heat-map as a [surface diagram](#surface-diagrams).
|
||||
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/scatter_hist_confusion_mat_reporting.py)) with the following method.
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/scatter_hist_confusion_mat_reporting.py)) with the following method.
|
||||
|
||||
**Method**:
|
||||
|
||||
@@ -689,7 +689,7 @@ def report_confusion_matrix(self, title, series, matrix, iteration, xlabels=None
|
||||
|
||||
Use to plot a heat-map matrix as a surface diagram. You can also plot a heat-map as a [confusion matrix](#confusion-matrices).
|
||||
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/3d_plots_reporting.py)) with the following method.
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/3d_plots_reporting.py)) with the following method.
|
||||
|
||||
**Method**:
|
||||
|
||||
@@ -818,10 +818,10 @@ def report_surface(self, title, series, matrix, iteration, xlabels=None, ylabels
|
||||
|
||||
### Images
|
||||
|
||||
Use to report an image and upload its contents to the bucket specified in the **TRAINS** configuration file,
|
||||
Use to report an image and upload its contents to the bucket specified in the **ClearML** configuration file,
|
||||
or a [a default upload destination](#set-default-upload-destination), if you set a default.
|
||||
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/manual_reporting.py)) with the following method.
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/manual_reporting.py)) with the following method.
|
||||
|
||||
**Method**:
|
||||
|
||||
@@ -929,7 +929,7 @@ def report_image(self, title, series, iteration, local_path=None, matrix=None, m
|
||||
|
||||
### Logging Experiment Parameter Dictionaries
|
||||
|
||||
In order for **TRAINS** to log a dictionary of parameters, use the `Task.connect` method.
|
||||
In order for **ClearML** to log a dictionary of parameters, use the `Task.connect` method.
|
||||
|
||||
For example, to log the hyper-parameters <code>learning_rate</code>, <code>batch_size</code>, <code>display_step</code>, <code>model_path</code>, <code>n_hidden_1</code>, and <code>n_hidden_2</code>:
|
||||
|
||||
@@ -938,27 +938,27 @@ For example, to log the hyper-parameters <code>learning_rate</code>, <code>batch
|
||||
parameters_dict = { 'learning_rate': 0.001, 'batch_size': 100, 'display_step': 1,
|
||||
'model_path': "/tmp/model.ckpt", 'n_hidden_1': 256, 'n_hidden_2': 256 }
|
||||
|
||||
# Connect the dictionary to your TRAINS Task
|
||||
# Connect the dictionary to your ClearML Task
|
||||
parameters_dict = Task.current_task().connect(parameters_dict)
|
||||
```
|
||||
|
||||
### Specifying Environment Variables to Track
|
||||
|
||||
By setting the `TRAINS_LOG_ENVIRONMENT` environment variable, make **TRAINS** log either:
|
||||
By setting the `CLEARML_LOG_ENVIRONMENT` environment variable, make **ClearML** log either:
|
||||
|
||||
* All environment variables
|
||||
|
||||
export TRAINS_LOG_ENVIRONMENT="*"
|
||||
export CLEARML_LOG_ENVIRONMENT="*"
|
||||
|
||||
* Specific environment variables
|
||||
|
||||
For example, log `PWD` and `PYTHONPATH`
|
||||
|
||||
export TRAINS_LOG_ENVIRONMENT="PWD,PYTHONPATH"
|
||||
export CLEARML_LOG_ENVIRONMENT="PWD,PYTHONPATH"
|
||||
|
||||
* No environment variables
|
||||
|
||||
export TRAINS_LOG_ENVIRONMENT=
|
||||
export CLEARML_LOG_ENVIRONMENT=
|
||||
|
||||
## Logging Messages
|
||||
|
||||
@@ -972,7 +972,7 @@ Use the methods in this section to log various types of messages. The method nam
|
||||
def debug(self, msg, *args, **kwargs)
|
||||
```
|
||||
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/text_reporting.py)) with the following method.
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/text_reporting.py)) with the following method.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
@@ -1010,7 +1010,7 @@ First [get the current logger](#get-the-current-logger) and then use it (see an
|
||||
def info(self, msg, *args, **kwargs)
|
||||
```
|
||||
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/text_reporting.py)) with the following method.
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/text_reporting.py)) with the following method.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
@@ -1048,7 +1048,7 @@ First [get the current logger](#get-the-current-logger) and then use it (see an
|
||||
def warn(self, msg, *args, **kwargs)
|
||||
```
|
||||
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/text_reporting.py)) with the following method.
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/text_reporting.py)) with the following method.
|
||||
|
||||
**Arguments**:<a name="log_arguments"></a>
|
||||
|
||||
@@ -1087,7 +1087,7 @@ First [get the current logger](#get-the-current-logger) and then use it (see an
|
||||
def error(self, msg, *args, **kwargs)
|
||||
```
|
||||
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/text_reporting.py)) with the following method.
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/text_reporting.py)) with the following method.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
@@ -1125,7 +1125,7 @@ First [get the current logger](#get-the-current-logger) and then use it (see an
|
||||
def critical(self, msg, *args, **kwargs)
|
||||
```
|
||||
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/text_reporting.py)) with the following method.
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/text_reporting.py)) with the following method.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
@@ -1163,7 +1163,7 @@ First [get the current logger](#get-the-current-logger) and then use it (see an
|
||||
def fatal(self, msg, *args, **kwargs)
|
||||
```
|
||||
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/text_reporting.py)) with the following method.
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/text_reporting.py)) with the following method.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
@@ -1201,7 +1201,7 @@ First [get the current logger](#get-the-current-logger) and then use it (see an
|
||||
def console(self, msg, level=logging.INFO, omit_console=False, *args, **kwargs)
|
||||
```
|
||||
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/text_reporting.py)) with the following method.
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/text_reporting.py)) with the following method.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
@@ -1279,7 +1279,7 @@ First [get the current logger](#get-the-current-logger) and then use it (see an
|
||||
def report_text(self, msg, level=logging.INFO, print_console=False, *args, **_)
|
||||
```
|
||||
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/trains/blob/master/examples/reporting/text_reporting.py)) with the following method.
|
||||
First [get the current logger](#get-the-current-logger) and then use it (see an [example script](https://github.com/allegroai/clearml/blob/master/examples/reporting/text_reporting.py)) with the following method.
|
||||
|
||||
**Arguments**:
|
||||
|
||||
@@ -1371,7 +1371,7 @@ None.
|
||||
Use to specify the default destination storage location used for uploading images.
|
||||
Images are uploaded and a link to the image is reported.
|
||||
|
||||
Credentials for the storage location are in the global configuration file (for example, on Linux, <code>~/trains.conf</code>).
|
||||
Credentials for the storage location are in the global configuration file (for example, on Linux, <code>~/clearml.conf</code>).
|
||||
|
||||
**Method**:
|
||||
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# TRAINS SDK configuration file
|
||||
# ClearML SDK configuration file - Please use ~/clearml.conf
|
||||
api {
|
||||
# web_server on port 8080
|
||||
web_server: "http://localhost:8080"
|
||||
@@ -16,12 +16,12 @@ api {
|
||||
verify_certificate: True
|
||||
}
|
||||
sdk {
|
||||
# TRAINS - default SDK configuration
|
||||
# ClearML - default SDK configuration
|
||||
|
||||
storage {
|
||||
cache {
|
||||
# Defaults to system temp folder / cache
|
||||
default_base_dir: "~/.trains/cache"
|
||||
default_base_dir: "~/.clearml/cache"
|
||||
}
|
||||
}
|
||||
|
||||
@@ -103,7 +103,7 @@ sdk {
|
||||
google.storage {
|
||||
# # Default project and credentials file
|
||||
# # Will be used when no bucket configuration is found
|
||||
# project: "trains"
|
||||
# project: "clearml"
|
||||
# credentials_json: "/path/to/credentials.json"
|
||||
|
||||
# # Specific credentials per bucket and sub directory
|
||||
@@ -111,7 +111,7 @@ sdk {
|
||||
# {
|
||||
# bucket: "my-bucket"
|
||||
# subdir: "path/in/bucket" # Not required
|
||||
# project: "trains"
|
||||
# project: "clearml"
|
||||
# credentials_json: "/path/to/credentials.json"
|
||||
# },
|
||||
# ]
|
||||
@@ -119,7 +119,7 @@ sdk {
|
||||
azure.storage {
|
||||
# containers: [
|
||||
# {
|
||||
# account_name: "trains"
|
||||
# account_name: "clearml"
|
||||
# account_key: "secret"
|
||||
# # container_name:
|
||||
# }
|
||||
@@ -161,8 +161,8 @@ sdk {
|
||||
# do not analyze the entire repository.
|
||||
force_analyze_entire_repo: false
|
||||
|
||||
# If set to true, *trains* update message will not be printed to the console
|
||||
# this value can be overwritten with os environment variable TRAINS_SUPPRESS_UPDATE_MESSAGE=1
|
||||
# If set to true, *clearml* update message will not be printed to the console
|
||||
# this value can be overwritten with os environment variable CLEARML_SUPPRESS_UPDATE_MESSAGE=1
|
||||
suppress_update_message: false
|
||||
|
||||
# If this flag is true (default is false), instead of analyzing the code with Pigar, analyze with `pip freeze`
|
||||
@@ -173,7 +173,7 @@ sdk {
|
||||
# of the Hyper-Parameters.
|
||||
# multiple selected variables are supported including the suffix '*'.
|
||||
# For example: "AWS_*" will log any OS environment variable starting with 'AWS_'.
|
||||
# This value can be overwritten with os environment variable TRAINS_LOG_ENVIRONMENT="[AWS_*, CUDA_VERSION]"
|
||||
# This value can be overwritten with os environment variable CLEARML_LOG_ENVIRONMENT="[AWS_*, CUDA_VERSION]"
|
||||
# Example: log_os_environments: ["AWS_*", "CUDA_VERSION"]
|
||||
log_os_environments: []
|
||||
|
||||
|
||||
Reference in New Issue
Block a user