mirror of
https://github.com/clearml/clearml
synced 2025-06-26 18:16:07 +00:00
Documentation
This commit is contained in:
@@ -1,22 +1,22 @@
|
||||
# Guidelines for Contributing
|
||||
|
||||
Firstly, we thank you for taking the time to contribute!
|
||||
Firstly, we thank you for taking the time to contribute!
|
||||
|
||||
The following is a set of guidelines for contributing to TRAINS.
|
||||
These are primarily guidelines, not rules.
|
||||
The following is a set of guidelines for contributing to TRAINS.
|
||||
These are primarily guidelines, not rules.
|
||||
Use your best judgment and feel free to propose changes to this document in a pull request.
|
||||
|
||||
## Reporting Bugs
|
||||
|
||||
This section guides you through submitting a bug report for TRAINS.
|
||||
By following these guidelines, you
|
||||
help maintainers and the community understand your report, reproduce the behavior, and find related reports.
|
||||
This section guides you through submitting a bug report for TRAINS.
|
||||
By following these guidelines, you
|
||||
help maintainers and the community understand your report, reproduce the behavior, and find related reports.
|
||||
|
||||
Before creating bug reports, please check whether the bug you want to report already appears [here](link to issues).
|
||||
You may discover that you do not need to create a bug report.
|
||||
Before creating bug reports, please check whether the bug you want to report already appears [here](link to issues).
|
||||
You may discover that you do not need to create a bug report.
|
||||
When you are creating a bug report, please include as much detail as possible.
|
||||
|
||||
**Note**: If you find a **Closed** issue that may be the same issue which you are currently experiencing,
|
||||
|
||||
**Note**: If you find a **Closed** issue that may be the same issue which you are currently experiencing,
|
||||
then open a **New** issue and include a link to the original (Closed) issue in the body of your new one.
|
||||
|
||||
Explain the problem and include additional details to help maintainers reproduce the problem:
|
||||
@@ -24,8 +24,8 @@ Explain the problem and include additional details to help maintainers reproduce
|
||||
* **Use a clear and descriptive title** for the issue to identify the problem.
|
||||
* **Describe the exact steps necessary to reproduce the problem** in as much detail as possible. Please do not just summarize what you did. Make sure to explain how you did it.
|
||||
* **Provide the specific environment setup.** Include the `pip freeze` output, specific environment variables, Python version, and other relevant information.
|
||||
* **Provide specific examples to demonstrate the steps.** Include links to files or GitHub projects, or copy/paste snippets which you use in those examples.
|
||||
* **If you are reporting any TRAINS crash,** include a crash report with a stack trace from the operating system. Make sure to add the crash report in the issue and place it in a [code block](https://help.github.com/en/articles/getting-started-with-writing-and-formatting-on-github#multiple-lines),
|
||||
* **Provide specific examples to demonstrate the steps.** Include links to files or GitHub projects, or copy/paste snippets which you use in those examples.
|
||||
* **If you are reporting any TRAINS crash,** include a crash report with a stack trace from the operating system. Make sure to add the crash report in the issue and place it in a [code block](https://help.github.com/en/articles/getting-started-with-writing-and-formatting-on-github#multiple-lines),
|
||||
a [file attachment](https://help.github.com/articles/file-attachments-on-issues-and-pull-requests/), or just put it in a [gist](https://gist.github.com/) (and provide link to that gist).
|
||||
* **Describe the behavior you observed after following the steps** and the exact problem with that behavior.
|
||||
* **Explain which behavior you expected to see and why.**
|
||||
@@ -33,8 +33,8 @@ a [file attachment](https://help.github.com/articles/file-attachments-on-issues-
|
||||
|
||||
## Suggesting Enhancements
|
||||
|
||||
This section guides you through submitting an enhancement suggestion for TRAINS, including
|
||||
completely new features and minor improvements to existing functionality.
|
||||
This section guides you through submitting an enhancement suggestion for TRAINS, including
|
||||
completely new features and minor improvements to existing functionality.
|
||||
By following these guidelines, you help maintainers and the community understand your suggestion and find related suggestions.
|
||||
|
||||
Enhancement suggestions are tracked as GitHub issues. After you determine which repository your enhancement suggestion is related to, create an issue on that repository and provide the following:
|
||||
|
||||
52
docs/faq.md
52
docs/faq.md
@@ -1,17 +1,17 @@
|
||||
# FAQ
|
||||
|
||||
**Can I store more information on the models? For example, can I store enumeration of classes?**
|
||||
|
||||
|
||||
YES!
|
||||
|
||||
Use the SDK `set_model_label_enumeration` method:
|
||||
|
||||
|
||||
```python
|
||||
Task.current_task().set_model_label_enumeration( {‘label’: int(0), } )
|
||||
```
|
||||
|
||||
**Can I store the model configuration file as well?**
|
||||
|
||||
|
||||
YES!
|
||||
|
||||
Use the SDK `set_model_design` method:
|
||||
@@ -24,7 +24,7 @@ Task.current_task().set_model_design( ‘a very long text of the configuration f
|
||||
|
||||
YES!
|
||||
|
||||
Use an SDK [Logger](link to git) object. An instance can be always be retrieved with `Task.current_task().get_logger()`:
|
||||
Use an SDK [Logger](https://github.com/allegroai/trains/blob/master/trains/logger.py) object. An instance can be always be retrieved with `Task.current_task().get_logger()`:
|
||||
|
||||
```python
|
||||
logger = Task.current_task().get_logger()
|
||||
@@ -33,11 +33,11 @@ logger.report_scalar("loss", "classification", iteration=42, value=1.337)
|
||||
|
||||
TRAINS supports scalars, plots, 2d/3d scatter diagrams, histograms, surface diagrams, confusion matrices, images, and text logging.
|
||||
|
||||
An example can be found [here](docs/manual_log.py).
|
||||
An example can be found [here](https://github.com/allegroai/trains/blob/master/examples/manual_reporting.py).
|
||||
|
||||
**I noticed that all of my experiments appear as “Training”. Are there other options?**
|
||||
|
||||
YES!
|
||||
YES!
|
||||
|
||||
When creating experiments and calling `Task.init`, you can pass an experiment type.
|
||||
The currently supported types are `Task.TaskTypes.training` and `Task.TaskTypes.testing`:
|
||||
@@ -46,20 +46,20 @@ The currently supported types are `Task.TaskTypes.training` and `Task.TaskTypes.
|
||||
task = Task.init(project_name, task_name, Task.TaskTypes.testing)
|
||||
```
|
||||
|
||||
If you feel we should add a few more, let us know in the [issues]() section.
|
||||
If you feel we should add a few more, let us know in the [issues](https://github.com/allegroai/trains/issues) section.
|
||||
|
||||
**I noticed I keep getting a message “warning: uncommitted code”. What does it mean?**
|
||||
|
||||
TRAINS not only detects your current repository and git commit,
|
||||
TRAINS not only detects your current repository and git commit,
|
||||
but it also warns you if you are using uncommitted code. TRAINS does this
|
||||
because uncommitted code means it will be difficult to reproduce this experiment.
|
||||
|
||||
**Is there something you can do about uncommitted code running?**
|
||||
|
||||
YES!
|
||||
YES!
|
||||
|
||||
TRAINS currently stores the git diff together with the project.
|
||||
The Web-App will soon present the git diff as well. This is coming very soon!
|
||||
TRAINS currently stores the git diff together with the project.
|
||||
The Web-App will soon present the git diff as well. This is coming very soon!
|
||||
|
||||
**I read that there is a feature for centralized model storage. How do I use it?**
|
||||
|
||||
@@ -70,7 +70,7 @@ Task.init(project_name, task_name, output_uri=’/mnt/shared/folder’)
|
||||
```
|
||||
|
||||
All of the stored snapshots are copied into a subfolder whose name contains the task ID, for example:
|
||||
|
||||
|
||||
`/mnt/shared/folder/task_6ea4f0b56d994320a713aeaf13a86d9d/models/`
|
||||
|
||||
Other options include:
|
||||
@@ -83,11 +83,11 @@ Task.init(project_name, task_name, output_uri=’s3://bucket/folder’)
|
||||
Task.init(project_name, task_name, output_uri=’gs://bucket/folder’)
|
||||
```
|
||||
|
||||
These require configuring the cloud storage credentials in `~/trains.conf` (see an [example](v)).
|
||||
These require configuring the cloud storage credentials in `~/trains.conf` (see an [example](https://github.com/allegroai/trains/blob/master/docs/trains.conf)).
|
||||
|
||||
**I am training multiple models at the same time, but I only see one of them. What happened?**
|
||||
|
||||
This will be fixed in a future version. Currently, TRAINS does support multiple models
|
||||
This will be fixed in a future version. Currently, TRAINS does support multiple models
|
||||
from the same task/experiment so you can find all the models in the project Models tab.
|
||||
In the Task view, we only present the last one.
|
||||
|
||||
@@ -95,25 +95,25 @@ In the Task view, we only present the last one.
|
||||
|
||||
YES!
|
||||
|
||||
See [InputModel]() and [OutputModel]().
|
||||
See [InputModel](https://github.com/allegroai/trains/blob/master/trains/model.py#L319) and [OutputModel](https://github.com/allegroai/trains/blob/master/trains/model.py#L539).
|
||||
|
||||
For example:
|
||||
|
||||
```python
|
||||
input_model = InputModel.import_model(link_to_initial_model_file)
|
||||
Task.current_task().connect(input_model)
|
||||
OutputModel(Task.current_task()).update_weights(link_to_new_model_file_here)
|
||||
OutputModel(Task.current_task()).update_weights(link_to_new_model_file_here)
|
||||
```
|
||||
|
||||
**I am using Jupyter Notebook. Is this supported?**
|
||||
|
||||
YES!
|
||||
YES!
|
||||
|
||||
Jupyter Notebook is supported.
|
||||
|
||||
**I do not use ArgParser for hyper-parameters. Do you have a solution?**
|
||||
|
||||
YES!
|
||||
YES!
|
||||
|
||||
TRAINS supports using a Python dictionary for hyper-parameter logging.
|
||||
|
||||
@@ -125,22 +125,22 @@ From this point onward, not only are the dictionary key/value pairs stored, but
|
||||
|
||||
**Git is not well supported in Jupyter. We just gave up on properly committing our code. Do you have a solution?**
|
||||
|
||||
YES!
|
||||
YES!
|
||||
|
||||
Check our [trains-jupyter-plugin](). It is a Jupyter plugin that allows you to commit your notebook directly from Jupyter. It also saves the Python version of the code and creates an updated `requirements.txt` so you know which packages you were using.
|
||||
Check our [trains-jupyter-plugin](https://github.com/allegroai/trains-jupyter-plugin). It is a Jupyter plugin that allows you to commit your notebook directly from Jupyter. It also saves the Python version of the code and creates an updated `requirements.txt` so you know which packages you were using.
|
||||
|
||||
**Can I use TRAINS with scikit-learn?**
|
||||
|
||||
YES!
|
||||
YES!
|
||||
|
||||
scikit-learn is supported. Everything you do is logged, with the caveat that models are not logged automatically.
|
||||
scikit-learn is supported. Everything you do is logged, with the caveat that models are not logged automatically.
|
||||
Models are not logged automatically because, in most cases, scikit-learn is simply pickling the object to files so there is no underlying frame to connect to.
|
||||
|
||||
**I am working with PyCharm and remotely debugging a machine, but the git repo is not detected. Do you have a solution?**
|
||||
|
||||
YES!
|
||||
YES!
|
||||
|
||||
This is such a common occurrence that we created a PyCharm plugin that allows for a remote debugger to grab your local repository / commit ID. See our [trains-pycharm-plugin]() repository for instructions and [latest release]().
|
||||
This is such a common occurrence that we created a PyCharm plugin that allows for a remote debugger to grab your local repository / commit ID. See our [trains-pycharm-plugin](https://github.com/allegroai/trains-pycharm-plugin) repository for instructions and [latest release](https://github.com/allegroai/trains-pycharm-plugin/releases).
|
||||
|
||||
**How do I know a new version came out?**
|
||||
|
||||
@@ -148,11 +148,11 @@ Unfortunately, TRAINS currently does not support auto-update checks. We hope to
|
||||
|
||||
**Sometimes I see experiments as running while they are not. What is it?**
|
||||
|
||||
When the Python process exits in an orderly fashion, TRAINS closes the experiment.
|
||||
When the Python process exits in an orderly fashion, TRAINS closes the experiment.
|
||||
If a process crashes, then sometimes the stop signal is missed. You can safely right click on the experiment in the Web-App and stop it.
|
||||
|
||||
**In the experiment log tab, I’m missing the first log lines. Where are they?**
|
||||
|
||||
|
||||
Unfortunately, due to speed/optimization issues, we opted to display only the last several hundreds. The full log can be downloaded from the Web-App.
|
||||
|
||||
|
||||
|
||||
131
docs/trains.conf
Normal file
131
docs/trains.conf
Normal file
@@ -0,0 +1,131 @@
|
||||
# TRAINS SDK configuration file
|
||||
api {
|
||||
host: http://localhost:8008
|
||||
credentials {"access_key": "EGRTCO8JMSIGI6S39GTP43NFWXDQOW", "secret_key": "x!XTov_G-#vspE*Y(h$Anm&DIc5Ou-F)jsl$PdOyj5wG1&E!Z8"}
|
||||
}
|
||||
sdk {
|
||||
# TRAINS - default SDK configuration
|
||||
|
||||
storage {
|
||||
cache {
|
||||
# Defaults to system temp folder / cache
|
||||
default_base_dir: "~/.trains/cache"
|
||||
}
|
||||
}
|
||||
|
||||
metrics {
|
||||
# History size for debug files per metric/variant. For each metric/variant combination with an attached file
|
||||
# (e.g. debug image event), file names for the uploaded files will be recycled in such a way that no more than
|
||||
# X files are stored in the upload destination for each metric/variant combination.
|
||||
file_history_size: 100
|
||||
|
||||
# Settings for generated debug images
|
||||
images {
|
||||
format: JPEG
|
||||
quality: 87
|
||||
subsampling: 0
|
||||
}
|
||||
}
|
||||
|
||||
network {
|
||||
metrics {
|
||||
# Number of threads allocated to uploading files (typically debug images) when transmitting metrics for
|
||||
# a specific iteration
|
||||
file_upload_threads: 4
|
||||
|
||||
# Warn about upload starvation if no uploads were made in specified period while file-bearing events keep
|
||||
# being sent for upload
|
||||
file_upload_starvation_warning_sec: 120
|
||||
}
|
||||
|
||||
iteration {
|
||||
# Max number of retries when getting frames if the server returned an error (http code 500)
|
||||
max_retries_on_server_error: 5
|
||||
# Backoff factory for consecutive retry attempts.
|
||||
# SDK will wait for {backoff factor} * (2 ^ ({number of total retries} - 1)) between retries.
|
||||
retry_backoff_factor_sec: 10
|
||||
}
|
||||
}
|
||||
aws {
|
||||
s3 {
|
||||
# S3 credentials, used for read/write access by various SDK elements
|
||||
|
||||
# default, used for any bucket not specified below
|
||||
key: ""
|
||||
secret: ""
|
||||
region: ""
|
||||
|
||||
credentials: [
|
||||
# specifies key/secret credentials to use when handling s3 urls (read or write)
|
||||
# {
|
||||
# bucket: "my-bucket-name"
|
||||
# key: "my-access-key"
|
||||
# secret: "my-secret-key"
|
||||
# },
|
||||
# {
|
||||
# # This will apply to all buckets in this host (unless key/value is specifically provided for a given bucket)
|
||||
# host: "my-minio-host:9000"
|
||||
# key: "12345678"
|
||||
# secret: "12345678"
|
||||
# multipart: false
|
||||
# secure: false
|
||||
# }
|
||||
]
|
||||
}
|
||||
boto3 {
|
||||
pool_connections: 512
|
||||
max_multipart_concurrency: 16
|
||||
}
|
||||
}
|
||||
google.storage {
|
||||
# # Default project and credentials file
|
||||
# # Will be used when no bucket configuration is found
|
||||
# project: "trains"
|
||||
# credentials_json: "/path/to/credentials.json"
|
||||
|
||||
# # Specific credentials per bucket and sub directory
|
||||
# credentials = [
|
||||
# {
|
||||
# bucket: "my-bucket"
|
||||
# subdir: "path/in/bucket" # Not required
|
||||
# project: "trains"
|
||||
# credentials_json: "/path/to/credentials.json"
|
||||
# },
|
||||
# ]
|
||||
}
|
||||
|
||||
log {
|
||||
# debugging feature: set this to true to make null log propagate messages to root logger (so they appear in stdout)
|
||||
null_log_propagate: False
|
||||
task_log_buffer_capacity: 66
|
||||
|
||||
# disable urllib info and lower levels
|
||||
disable_urllib3_info: True
|
||||
}
|
||||
|
||||
development {
|
||||
# Development-mode options
|
||||
|
||||
# dev task reuse window
|
||||
task_reuse_time_window_in_hours: 72.0
|
||||
|
||||
# Run VCS repository detection asynchronously
|
||||
vcs_repo_detect_async: False
|
||||
|
||||
# Store uncommitted git/hg source code diff in experiment manifest when training in development mode
|
||||
# This stores "git diff" or "hg diff" into the experiment's "script.requirements.diff" section
|
||||
store_uncommitted_code_diff_on_train: True
|
||||
|
||||
# Support stopping an experiment in case it was externally stopped, status was changed or task was reset
|
||||
support_stopping: True
|
||||
|
||||
# Development mode worker
|
||||
worker {
|
||||
# Status report period in seconds
|
||||
report_period_sec: 2
|
||||
|
||||
# Log all stdout & stderr
|
||||
log_stdout: True
|
||||
}
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user