mirror of
https://github.com/clearml/clearml
synced 2025-06-26 18:16:07 +00:00
Documentation
This commit is contained in:
parent
22b18e7338
commit
a0ecc6d516
149
README.md
149
README.md
@ -1,127 +1,72 @@
|
|||||||
# TRAINS
|
# TRAINS
|
||||||
## Auto-Magical Experiment Manager & Version Control for AI
|
## Auto-Magical Experiment Manager & Version Control for AI
|
||||||
|
|
||||||
<p style="font-size:1.2rem; font-weight:700;">"Because it’s a jungle out there"</p>
|
"Because it’s a jungle out there"
|
||||||
|
|
||||||
[](https://img.shields.io/github/license/allegroai/trains.svg)
|
[](https://img.shields.io/github/license/allegroai/trains.svg)
|
||||||
[](https://img.shields.io/pypi/pyversions/trains.svg)
|
[](https://img.shields.io/pypi/pyversions/trains.svg)
|
||||||
[](https://img.shields.io/pypi/v/trains.svg)
|
[](https://img.shields.io/pypi/v/trains.svg)
|
||||||
[](https://pypi.python.org/pypi/trains/)
|
[](https://pypi.python.org/pypi/trains/)
|
||||||
|
|
||||||
Behind every great scientist are great repeatable methods. Sadly, this is easier said than done.
|
TRAINS is our solution to a problem we share with countless other researchers and developers in the machine
|
||||||
|
learning/deep learning universe: Training production-grade deep learning models is a glorious but messy process.
|
||||||
When talented scientists, engineers, or developers work on their own, a mess may be unavoidable.
|
TRAINS tracks and controls the process by associating code version control, research projects,
|
||||||
Yet, it may still be manageable. However, with time and more people joining your project, managing the clutter takes
|
performance metrics, and model provenance.
|
||||||
its toll on productivity. As your project moves toward production, visibility and provenance for scaling your
|
|
||||||
deep-learning efforts are a must.
|
|
||||||
|
|
||||||
For teams or entire companies, TRAINS logs everything in one central server and takes on the responsibilities for
|
|
||||||
visibility and provenance so productivity does not suffer. TRAINS records and manages various deep learning
|
|
||||||
research workloads and does so with practically zero integration costs.
|
|
||||||
|
|
||||||
We designed TRAINS specifically to require effortless integration so that teams can preserve their existing methods
|
We designed TRAINS specifically to require effortless integration so that teams can preserve their existing methods
|
||||||
and practices. Use it on a daily basis to boost collaboration and visibility, or use it to automatically collect
|
and practices. Use it on a daily basis to boost collaboration and visibility, or use it to automatically collect
|
||||||
your experimentation logs, outputs, and data to one centralized server.
|
your experimentation logs, outputs, and data to one centralized server.
|
||||||
|
|
||||||
(See TRAINS live at [https://demoapp.trainsai.io](https://demoapp.trainsai.io))
|
(Experience TRAINS live at [https://demoapp.trainsai.io](https://demoapp.trainsai.io))
|
||||||

|

|
||||||
|
|
||||||
|
|
||||||
## Main Features
|
## Main Features
|
||||||
|
|
||||||
TRAINS is our solution to a problem we shared with countless other researchers and developers in the machine
|
* Seamless integration with leading frameworks, including: *PyTorch*, *TensorFlow*, *Keras*, and others coming soon
|
||||||
learning/deep learning universe: Training production-grade deep learning models is a glorious but messy process.
|
* Support for *Jupyter Notebook* and *PyCharm* remote debugging
|
||||||
TRAINS tracks and controls the process by associating code version control, research projects,
|
* Automatic log collection.
|
||||||
performance metrics, and model provenance.
|
* Query, Filter, and Compare your experiment data and results
|
||||||
|
|
||||||
* Start today!
|
|
||||||
* TRAINS is free and open-source
|
|
||||||
* TRAINS requires only two lines of code for full integration
|
|
||||||
* Use it with your favorite tools
|
|
||||||
* Seamless integration with leading frameworks, including: *PyTorch*, *TensorFlow*, *Keras*, and others coming soon
|
|
||||||
* Support for *Jupyter Notebook* (see [trains-jupyter-plugin](https://github.com/allegroai/trains-jupyter-plugin))
|
|
||||||
and *PyCharm* remote debugging (see [trains-pycharm-plugin](https://github.com/allegroai/trains-pycharm-plugin))
|
|
||||||
* Log everything. Experiments become truly repeatable
|
|
||||||
* Model logging with **automatic association** of **model + code + parameters + initial weights**
|
|
||||||
* Automatically create a copy of models on centralized storage
|
|
||||||
([supports shared folders, S3, GS,](https://github.com/allegroai/trains/blob/master/docs/faq.md#i-read-there-is-a-feature-for-centralized-model-storage-how-do-i-use-it-) and Azure is coming soon!)
|
|
||||||
* Share and collaborate
|
* Share and collaborate
|
||||||
* Multi-user process tracking and collaboration
|
|
||||||
* Centralized server for aggregating logs, records, and general bookkeeping
|
**Detailed overview of TRAINS offering and system design can be found [Here](https://github.com/allegroai/trains/blob/master/docs/brief.md).**
|
||||||
* Increase productivity
|
|
||||||
* Comprehensive **experiment comparison**: code commits, initial weights, hyper-parameters and metric results
|
|
||||||
* Order & Organization
|
|
||||||
* Manage and organize your experiments in projects
|
|
||||||
* Query capabilities; sort and filter experiments by results metrics
|
|
||||||
* And more
|
|
||||||
* Stop an experiment on a remote machine using the web-app
|
|
||||||
* A field-tested, feature-rich SDK for your on-the-fly customization needs
|
|
||||||
|
|
||||||
|
|
||||||
## TRAINS Automatically Logs
|
## Using TRAINS
|
||||||
|
|
||||||
* Git repository, branch, commit id and entry point (git diff coming soon)
|
|
||||||
* Hyper-parameters, including
|
|
||||||
* ArgParser for command line parameters with currently used values
|
|
||||||
* Tensorflow Defines (absl-py)
|
|
||||||
* Explicit parameters dictionary
|
|
||||||
* Initial model weights file
|
|
||||||
* Model snapshots
|
|
||||||
* stdout and stderr
|
|
||||||
* Tensorboard/TensorboardX scalars, metrics, histograms, images (with audio coming soon)
|
|
||||||
* Matplotlib
|
|
||||||
|
|
||||||
|
|
||||||
## See for Yourself
|
|
||||||
|
|
||||||
We have a demo server up and running at https://demoapp.trainsai.io. You can try out TRAINS and test your code with it.
|
We have a demo server up and running at https://demoapp.trainsai.io. You can try out TRAINS and test your code with it.
|
||||||
Note that it resets every 24 hours and all of the data is deleted.
|
Note that it resets every 24 hours and all of the data is deleted.
|
||||||
|
|
||||||
Connect your code with TRAINS:
|
When you are ready to use your own TRAINS server, go ahead and [install *TRAINS-server*](#configuring-your-own-trains).
|
||||||
|
|
||||||
|
TRAINS requires only two lines of code for full integration.
|
||||||
|
|
||||||
|
To connect your code with TRAINS:
|
||||||
|
|
||||||
1. Install TRAINS
|
1. Install TRAINS
|
||||||
|
|
||||||
pip install trains
|
pip install trains
|
||||||
|
|
||||||
1. Add the following lines to your code
|
2. Add the following lines to your code
|
||||||
|
|
||||||
from trains import Task
|
from trains import Task
|
||||||
task = Task.init(project_name="my project", task_name="my task")
|
task = Task.init(project_name="my project", task_name="my task")
|
||||||
|
|
||||||
1. Run your code. When TRAINS connects to the server, a link is printed. For example
|
* If project_name is not provided, the repository name will be used instead
|
||||||
|
* If task_name (experiment) is not provided, the current filename will be used instead
|
||||||
|
|
||||||
|
3. Run your code. When TRAINS connects to the server, a link is printed. For example
|
||||||
|
|
||||||
TRAINS Results page:
|
TRAINS Results page:
|
||||||
https://demoapp.trainsai.io/projects/76e5e2d45e914f52880621fe64601e85/experiments/241f06ae0f5c4b27b8ce8b64890ce152/output/log
|
https://demoapp.trainsai.io/projects/76e5e2d45e914f52880621fe64601e85/experiments/241f06ae0f5c4b27b8ce8b64890ce152/output/log
|
||||||
|
|
||||||
1. Open the link and view your experiment parameters, model and tensorboard metrics
|
4. Open the link and view your experiment parameters, model and tensorboard metrics
|
||||||
|
|
||||||
|
## Configuring Your Own TRAINS
|
||||||
|
|
||||||
## How TRAINS Works
|
1. Install and run *TRAINS-server* (see [Installing the TRAINS Server](https://github.com/allegroai/trains-server))
|
||||||
|
|
||||||
TRAINS is a two part solution:
|
2. Run the initial configuration wizard for your TRAINS installation and follow the instructions to setup TRAINS package
|
||||||
|
(http://**_trains-server-ip_**:__port__ and user credentials)
|
||||||
1. TRAINS [python package](https://pypi.org/project/trains/) (auto-magically connects your code, see [Using TRAINS](#using-trains))
|
|
||||||
2. [TRAINS-server](https://github.com/allegroai/trains-server) for logging, querying, control and UI ([Web-App](https://github.com/allegroai/trains-web))
|
|
||||||
|
|
||||||
The following diagram illustrates the interaction of the [TRAINS-server](https://github.com/allegroai/trains-server)
|
|
||||||
and a GPU training machine using the TRAINS python package
|
|
||||||
|
|
||||||
<!---
|
|
||||||

|
|
||||||
-->
|
|
||||||
<img src="https://github.com/allegroai/trains/blob/master/docs/system_diagram.png?raw=true" width="50%">
|
|
||||||
|
|
||||||
|
|
||||||
## Installing and Configuring TRAINS
|
|
||||||
|
|
||||||
1. Install and run trains-server (see [Installing the TRAINS Server](https://github.com/allegroai/trains-server))
|
|
||||||
|
|
||||||
2. Install TRAINS package
|
|
||||||
|
|
||||||
pip install trains
|
|
||||||
|
|
||||||
3. Run the initial configuration wizard and follow the instructions to setup TRAINS package
|
|
||||||
(http://**_trains-server ip_**:__port__ and user credentials)
|
|
||||||
|
|
||||||
trains-init
|
trains-init
|
||||||
|
|
||||||
@ -129,30 +74,10 @@ After installing and configuring, you can access your configuration file at `~/t
|
|||||||
|
|
||||||
Sample configuration file available [here](https://github.com/allegroai/trains/blob/master/docs/trains.conf).
|
Sample configuration file available [here](https://github.com/allegroai/trains/blob/master/docs/trains.conf).
|
||||||
|
|
||||||
## Using TRAINS
|
|
||||||
|
|
||||||
Add the following two lines to the beginning of your code
|
|
||||||
|
|
||||||
from trains import Task
|
|
||||||
task = Task.init(project_name, task_name)
|
|
||||||
|
|
||||||
* If project_name is not provided, the repository name will be used instead
|
|
||||||
* If task_name (experiment) is not provided, the current filename will be used instead
|
|
||||||
|
|
||||||
Executing your script prints a direct link to the experiment results page, for example:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
TRAINS Results page:
|
|
||||||
|
|
||||||
https://demoapp.trainsai.io/projects/76e5e2d45e914f52880621fe64601e85/experiments/241f06ae0f5c4b27b8ce8b64890ce152/output/log
|
|
||||||
```
|
|
||||||
|
|
||||||
*For more examples and use cases*, see [examples](https://github.com/allegroai/trains/blob/master/docs/trains_examples.md).
|
*For more examples and use cases*, see [examples](https://github.com/allegroai/trains/blob/master/docs/trains_examples.md).
|
||||||
|
|
||||||

|
## Who We Are
|
||||||
|
|
||||||
|
|
||||||
## Who Supports TRAINS?
|
|
||||||
|
|
||||||
TRAINS is supported by the same team behind *allegro.ai*,
|
TRAINS is supported by the same team behind *allegro.ai*,
|
||||||
where we build deep learning pipelines and infrastructure for enterprise companies.
|
where we build deep learning pipelines and infrastructure for enterprise companies.
|
||||||
@ -172,13 +97,19 @@ even though this project is currently in the beta stage, your logs and data will
|
|||||||
|
|
||||||
Apache License, Version 2.0 (see the [LICENSE](https://www.apache.org/licenses/LICENSE-2.0.html) for more information)
|
Apache License, Version 2.0 (see the [LICENSE](https://www.apache.org/licenses/LICENSE-2.0.html) for more information)
|
||||||
|
|
||||||
## Guidelines for Contributing
|
## Community
|
||||||
|
|
||||||
|
If you have any questions, look to the TRAINS [FAQ](https://github.com/allegroai/trains/blob/master/docs/faq.md), or
|
||||||
|
tag your questions on [stackoverflow](https://stackoverflow.com/questions/tagged/trains) with the 'trains' tag.
|
||||||
|
|
||||||
|
For feature requests or bug reports, please use [GitHub issues](https://github.com/allegroai/trains/issues).
|
||||||
|
|
||||||
|
Additionally, you can always find us at support@allegro.ai.
|
||||||
|
|
||||||
|
## Contributing
|
||||||
|
|
||||||
See the TRAINS [Guidelines for Contributing](https://github.com/allegroai/trains/blob/master/docs/contributing.md).
|
See the TRAINS [Guidelines for Contributing](https://github.com/allegroai/trains/blob/master/docs/contributing.md).
|
||||||
|
|
||||||
## FAQ
|
|
||||||
|
|
||||||
See the TRAINS [FAQ](https://github.com/allegroai/trains/blob/master/docs/faq.md).
|
_May the force (and the goddess of learning rates) be with you!_
|
||||||
|
|
||||||
<p style="font-size:0.9rem; font-weight:700; font-style:italic">May the force (and the goddess of learning rates) be with you!</p>
|
|
||||||
|
|
||||||
|
66
docs/brief.md
Normal file
66
docs/brief.md
Normal file
@ -0,0 +1,66 @@
|
|||||||
|
# What is TRAINS?
|
||||||
|
Behind every great scientist are great repeatable methods. Sadly, this is easier said than done.
|
||||||
|
|
||||||
|
When talented scientists, engineers, or developers work on their own, a mess may be unavoidable.
|
||||||
|
Yet, it may still be manageable. However, with time and more people joining your project, managing the clutter takes
|
||||||
|
its toll on productivity. As your project moves toward production, visibility and provenance for scaling your
|
||||||
|
deep-learning efforts are a must.
|
||||||
|
|
||||||
|
For teams or entire companies, TRAINS logs everything in one central server and takes on the responsibilities for
|
||||||
|
visibility and provenance so productivity does not suffer. TRAINS records and manages various deep learning
|
||||||
|
research workloads and does so with practically zero integration costs.
|
||||||
|
|
||||||
|
We designed TRAINS specifically to require effortless integration so that teams can preserve their existing methods
|
||||||
|
and practices. Use it on a daily basis to boost collaboration and visibility, or use it to automatically collect
|
||||||
|
your experimentation logs, outputs, and data to one centralized server.
|
||||||
|
|
||||||
|
## Main Features
|
||||||
|
|
||||||
|
* Integrate with your current work flow with minimal effort
|
||||||
|
* Seamless integration with leading frameworks, including: *PyTorch*, *TensorFlow*, *Keras*, and others coming soon
|
||||||
|
* Support for *Jupyter Notebook* (see [trains-jupyter-plugin](https://github.com/allegroai/trains-jupyter-plugin))
|
||||||
|
and *PyCharm* remote debugging (see [trains-pycharm-plugin](https://github.com/allegroai/trains-pycharm-plugin))
|
||||||
|
* Log everything. Experiments become truly repeatable
|
||||||
|
* Model logging with **automatic association** of **model + code + parameters + initial weights**
|
||||||
|
* Automatically create a copy of models on centralized storage
|
||||||
|
([supports shared folders, S3, GS,](https://github.com/allegroai/trains/blob/master/docs/faq.md#i-read-there-is-a-feature-for-centralized-model-storage-how-do-i-use-it-) and Azure is coming soon!)
|
||||||
|
* Share and collaborate
|
||||||
|
* Multi-user process tracking and collaboration
|
||||||
|
* Centralized server for aggregating logs, records, and general bookkeeping
|
||||||
|
* Increase productivity
|
||||||
|
* Comprehensive **experiment comparison**: code commits, initial weights, hyper-parameters and metric results
|
||||||
|
* Order & Organization
|
||||||
|
* Manage and organize your experiments in projects
|
||||||
|
* Query capabilities; sort and filter experiments by results metrics
|
||||||
|
* And more
|
||||||
|
* Stop an experiment on a remote machine using the web-app
|
||||||
|
* A field-tested, feature-rich SDK for your on-the-fly customization needs
|
||||||
|
|
||||||
|
## TRAINS Automatically Logs
|
||||||
|
|
||||||
|
* Git repository, branch, commit id and entry point (git diff coming soon)
|
||||||
|
* Hyper-parameters, including
|
||||||
|
* ArgParser for command line parameters with currently used values
|
||||||
|
* Tensorflow Defines (absl-py)
|
||||||
|
* Explicit parameters dictionary
|
||||||
|
* Initial model weights file
|
||||||
|
* Model snapshots
|
||||||
|
* stdout and stderr
|
||||||
|
* Tensorboard/TensorboardX scalars, metrics, histograms, images (with audio coming soon)
|
||||||
|
* Matplotlib
|
||||||
|
|
||||||
|
## How TRAINS Works
|
||||||
|
|
||||||
|
TRAINS is a two part solution:
|
||||||
|
|
||||||
|
1. TRAINS [python package](https://pypi.org/project/trains/) (auto-magically connects your code, see [Using TRAINS](https://github.com/allegroai/trains#using-trains))
|
||||||
|
2. [TRAINS-server](https://github.com/allegroai/trains-server) for logging, querying, control and UI ([Web-App](https://github.com/allegroai/trains-web))
|
||||||
|
|
||||||
|
The following diagram illustrates the interaction of the [TRAINS-server](https://github.com/allegroai/trains-server)
|
||||||
|
and a GPU training machine using the TRAINS python package
|
||||||
|
|
||||||
|
<!---
|
||||||
|

|
||||||
|
-->
|
||||||
|
<img src="https://github.com/allegroai/trains/blob/master/docs/system_diagram.png?raw=true" width="50%">
|
||||||
|
|
291
docs/faq.md
291
docs/faq.md
@ -1,29 +1,69 @@
|
|||||||
# TRAINS FAQ
|
# TRAINS FAQ
|
||||||
|
|
||||||
* [How to change the location of TRAINS configuration file](#change-config-path)
|
General Information
|
||||||
* [How to override TRAINS credentials from OS environment](#credentials-os-env)
|
|
||||||
* [How to sort models by a certain metric?](#custom-columns)
|
* [How do I know a new version came out?](#new-version-auto-update)
|
||||||
|
|
||||||
|
Configuration
|
||||||
|
|
||||||
|
* [How can I change the location of TRAINS configuration file?](#change-config-path)
|
||||||
|
* [How can I override TRAINS credentials from the OS environment?](#credentials-os-env)
|
||||||
|
|
||||||
|
Models
|
||||||
|
|
||||||
|
* [How can I sort models by a certain metric?](#custom-columns)
|
||||||
* [Can I store more information on the models?](#store-more-model-info)
|
* [Can I store more information on the models?](#store-more-model-info)
|
||||||
* [Can I store the model configuration file as well?](#store-model-configuration)
|
* [Can I store the model configuration file as well?](#store-model-configuration)
|
||||||
* [I want to add more graphs, not just with Tensorboard. Is this supported?](#more-graph-types)
|
|
||||||
* [Is there a way to create a graph comparing hyper-parameters vs model accuracy?](#compare-graph-parameters)
|
|
||||||
* [I noticed that all of my experiments appear as `Training`. Are there other options?](#other-experiment-types)
|
|
||||||
* [I noticed I keep getting the message `warning: uncommitted code`. What does it mean?](#uncommitted-code-warning)
|
|
||||||
* [Is there something TRAINS can do about uncommitted code running?](#help-uncommitted-code)
|
|
||||||
* [I read there is a feature for centralized model storage. How do I use it?](#centralized-model-storage)
|
|
||||||
* [I am training multiple models at the same time, but I only see one of them. What happened?](#only-last-model-appears)
|
* [I am training multiple models at the same time, but I only see one of them. What happened?](#only-last-model-appears)
|
||||||
* [Can I log input and output models manually?](#manually-log-models)
|
* [Can I log input and output models manually?](#manually-log-models)
|
||||||
* [I am using Jupyter Notebook. Is this supported?](#jupyter-notebook)
|
|
||||||
|
Experiments
|
||||||
|
|
||||||
|
* [I noticed I keep getting the message `warning: uncommitted code`. What does it mean?](#uncommitted-code-warning)
|
||||||
* [I do not use Argarser for hyper-parameters. Do you have a solution?](#dont-want-argparser)
|
* [I do not use Argarser for hyper-parameters. Do you have a solution?](#dont-want-argparser)
|
||||||
* [Git is not well supported in Jupyter, so we just gave up on committing our code. Do you have a solution?](#commit-git-in-jupyter)
|
* [I noticed that all of my experiments appear as `Training`. Are there other options?](#other-experiment-types)
|
||||||
* [Can I use TRAINS with scikit-learn?](#use-scikit-learn)
|
|
||||||
* [When using PyCharm to remotely debug a machine, the git repo is not detected. Do you have a solution?](#pycharm-remote-debug-detect-git)
|
|
||||||
* [How do I know a new version came out?](#new-version-auto-update)
|
|
||||||
* [Sometimes I see experiments as running when in fact they are not. What's going on?](#experiment-running-but-stopped)
|
* [Sometimes I see experiments as running when in fact they are not. What's going on?](#experiment-running-but-stopped)
|
||||||
|
* [My code throws an exception, but my experiment status is not "Failed". What happened?](#exception-not-failed)
|
||||||
|
* [When I run my experiment, I get an SSL Connection error [CERTIFICATE_VERIFY_FAILED]. Do you have a solution?](#ssl-connection-error)
|
||||||
|
|
||||||
|
Graphs and Logs
|
||||||
|
|
||||||
* [The first log lines are missing from the experiment log tab. Where did they go?](#first-log-lines-missing)
|
* [The first log lines are missing from the experiment log tab. Where did they go?](#first-log-lines-missing)
|
||||||
|
* [Can I create a graph comparing hyper-parameters vs model accuracy?](#compare-graph-parameters)
|
||||||
|
* [I want to add more graphs, not just with Tensorboard. Is this supported?](#more-graph-types)
|
||||||
|
|
||||||
|
GIT and Storage
|
||||||
|
|
||||||
|
* [Is there something TRAINS can do about uncommitted code running?](#help-uncommitted-code)
|
||||||
|
* [I read there is a feature for centralized model storage. How do I use it?](#centralized-model-storage)
|
||||||
|
* [When using PyCharm to remotely debug a machine, the git repo is not detected. Do you have a solution?](#pycharm-remote-debug-detect-git)
|
||||||
|
* [Git is not well supported in Jupyter, so we just gave up on committing our code. Do you have a solution?](#commit-git-in-jupyter)
|
||||||
|
|
||||||
|
Jupyter and scikit-learn
|
||||||
|
|
||||||
|
* [I am using Jupyter Notebook. Is this supported?](#jupyter-notebook)
|
||||||
|
* [Can I use TRAINS with scikit-learn?](#use-scikit-learn)
|
||||||
|
* Also see, [Git and Jupyter](#commit-git-in-jupyter)
|
||||||
|
|
||||||
|
## General Information
|
||||||
|
|
||||||
|
### How do I know a new version came out? <a name="new-version-auto-update"></a>
|
||||||
|
|
||||||
|
Starting v0.9.3 TRAINS notifies on a new version release.
|
||||||
|
|
||||||
|
Example, new client version available
|
||||||
|
```bash
|
||||||
|
TRAINS new package available: UPGRADE to vX.Y.Z is recommended!
|
||||||
|
```
|
||||||
|
Example, new server version available
|
||||||
|
```bash
|
||||||
|
TRAINS-SERVER new version available: upgrade to vX.Y is recommended!
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
## How to change the location of TRAINS configuration file? <a name="change-config-path"></a>
|
## Configuration
|
||||||
|
|
||||||
|
### How can I change the location of TRAINS configuration file? <a name="change-config-path"></a>
|
||||||
|
|
||||||
Set "TRAINS_CONFIG_FILE" OS environment variable to override the default configuration file location.
|
Set "TRAINS_CONFIG_FILE" OS environment variable to override the default configuration file location.
|
||||||
|
|
||||||
@ -31,19 +71,19 @@ Set "TRAINS_CONFIG_FILE" OS environment variable to override the default configu
|
|||||||
export TRAINS_CONFIG_FILE="/home/user/mytrains.conf"
|
export TRAINS_CONFIG_FILE="/home/user/mytrains.conf"
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### How can I override TRAINS credentials from the OS environment? <a name="credentials-os-env"></a>
|
||||||
## How to override TRAINS credentials from OS environment? <a name="credentials-os-env"></a>
|
|
||||||
|
|
||||||
Set the OS environment variables below, in order to override the configuration file / defaults.
|
Set the OS environment variables below, in order to override the configuration file / defaults.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
export TRAINS_API_ACCESS_KEY="key_here"
|
export TRAINS_API_ACCESS_KEY="key_here"
|
||||||
export TRAINS_API_SECRET_KEY="secret_here"
|
export TRAINS_API_SECRET_KEY="secret_here"
|
||||||
export TRAINS_API_HOST="http://localhost:8080"
|
export TRAINS_API_HOST="http://localhost:8008"
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Models
|
||||||
|
|
||||||
## How to sort models by a certain metric? <a name="custom-columns"></a>
|
### How can I sort models by a certain metric? <a name="custom-columns"></a>
|
||||||
|
|
||||||
Models are associated with the experiments that created them.
|
Models are associated with the experiments that created them.
|
||||||
In order to sort experiments by a specific metric, add a custom column in the experiments table,
|
In order to sort experiments by a specific metric, add a custom column in the experiments table,
|
||||||
@ -51,8 +91,7 @@ In order to sort experiments by a specific metric, add a custom column in the ex
|
|||||||
<img src="https://github.com/allegroai/trains/blob/master/docs/screenshots/set_custom_column.png?raw=true" width=25%>
|
<img src="https://github.com/allegroai/trains/blob/master/docs/screenshots/set_custom_column.png?raw=true" width=25%>
|
||||||
<img src="https://github.com/allegroai/trains/blob/master/docs/screenshots/custom_column.png?raw=true" width=25%>
|
<img src="https://github.com/allegroai/trains/blob/master/docs/screenshots/custom_column.png?raw=true" width=25%>
|
||||||
|
|
||||||
|
### Can I store more information on the models? <a name="store-more-model-info"></a>
|
||||||
## Can I store more information on the models? <a name="store-more-model-info"></a>
|
|
||||||
|
|
||||||
#### For example, can I store enumeration of classes?
|
#### For example, can I store enumeration of classes?
|
||||||
|
|
||||||
@ -62,7 +101,7 @@ Yes! Use the `Task.set_model_label_enumeration()` method:
|
|||||||
Task.current_task().set_model_label_enumeration( {"label": int(0), } )
|
Task.current_task().set_model_label_enumeration( {"label": int(0), } )
|
||||||
```
|
```
|
||||||
|
|
||||||
## Can I store the model configuration file as well? <a name="store-model-configuration"></a>
|
### Can I store the model configuration file as well? <a name="store-model-configuration"></a>
|
||||||
|
|
||||||
Yes! Use the `Task.set_model_design()` method:
|
Yes! Use the `Task.set_model_design()` method:
|
||||||
|
|
||||||
@ -70,34 +109,101 @@ Yes! Use the `Task.set_model_design()` method:
|
|||||||
Task.current_task().set_model_design("a very long text with the configuration file's content")
|
Task.current_task().set_model_design("a very long text with the configuration file's content")
|
||||||
```
|
```
|
||||||
|
|
||||||
## I want to add more graphs, not just with Tensorboard. Is this supported? <a name="more-graph-types"></a>
|
### I am training multiple models at the same time, but I only see one of them. What happened? <a name="only-last-model-appears"></a>
|
||||||
|
|
||||||
Yes! Use a [Logger](https://github.com/allegroai/trains/blob/master/trains/logger.py) object. An instance can be always be retrieved using the `Task.current_task().get_logger()` method:
|
All models can be found under the project's **Models** tab,
|
||||||
|
that said, currently in the Experiment's information panel TRAINS shows only the last associated model.
|
||||||
|
|
||||||
|
This will be fixed in a future version.
|
||||||
|
|
||||||
|
### Can I log input and output models manually? <a name="manually-log-models"></a>
|
||||||
|
|
||||||
|
Yes! For example:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
# Get a logger object
|
input_model = InputModel.import_model(link_to_initial_model_file)
|
||||||
logger = Task.current_task().get_logger()
|
Task.current_task().connect(input_model)
|
||||||
|
|
||||||
# Report some scalar
|
OutputModel(Task.current_task()).update_weights(link_to_new_model_file_here)
|
||||||
logger.report_scalar("loss", "classification", iteration=42, value=1.337)
|
|
||||||
```
|
```
|
||||||
|
|
||||||
#### **TRAINS supports:**
|
See [InputModel](https://github.com/allegroai/trains/blob/master/trains/model.py#L319) and [OutputModel](https://github.com/allegroai/trains/blob/master/trains/model.py#L539) for more information.
|
||||||
* Scalars
|
|
||||||
* Plots
|
|
||||||
* 2D/3D Scatter Diagrams
|
|
||||||
* Histograms
|
|
||||||
* Surface Diagrams
|
|
||||||
* Confusion Matrices
|
|
||||||
* Images
|
|
||||||
* Text logs
|
|
||||||
|
|
||||||
For a more detailed example, see [here](https://github.com/allegroai/trains/blob/master/examples/manual_reporting.py).
|
## Experiments
|
||||||
|
|
||||||
|
### I noticed I keep getting the message `warning: uncommitted code`. What does it mean? <a name="uncommitted-code-warning"></a>
|
||||||
|
|
||||||
|
TRAINS not only detects your current repository and git commit,
|
||||||
|
but also warns you if you are using uncommitted code. TRAINS does this
|
||||||
|
because uncommitted code means this experiment will be difficult to reproduce.
|
||||||
|
|
||||||
|
If you still don't care, just ignore this message - it is merely a warning.
|
||||||
|
|
||||||
|
### I do not use Argarser for hyper-parameters. Do you have a solution? <a name="dont-want-argparser"></a>
|
||||||
|
|
||||||
|
Yes! TRAINS supports using a Python dictionary for hyper-parameter logging. Just use:
|
||||||
|
|
||||||
|
```python
|
||||||
|
parameters_dict = Task.current_task().connect(parameters_dict)
|
||||||
|
```
|
||||||
|
|
||||||
|
From this point onward, not only are the dictionary key/value pairs stored as part of the experiment, but any changes to the dictionary will be automatically updated in the task's information.
|
||||||
|
|
||||||
|
|
||||||
## Is there a way to create a graph comparing hyper-parameters vs model accuracy? <a name="compare-graph-parameters"></a>
|
### I noticed that all of my experiments appear as `Training`. Are there other options? <a name="other-experiment-types"></a>
|
||||||
|
|
||||||
Yes, You can manually create a plot with a single point X-axis for the hyper-parameter value,
|
Yes! When creating experiments and calling `Task.init`, you can provide an experiment type.
|
||||||
|
The currently supported types are `Task.TaskTypes.training` and `Task.TaskTypes.testing`. For example:
|
||||||
|
|
||||||
|
```python
|
||||||
|
task = Task.init(project_name, task_name, Task.TaskTypes.testing)
|
||||||
|
```
|
||||||
|
|
||||||
|
If you feel we should add a few more, let us know in the [issues](https://github.com/allegroai/trains/issues) section.
|
||||||
|
|
||||||
|
### Sometimes I see experiments as running when in fact they are not. What's going on? <a name="experiment-running-but-stopped"></a>
|
||||||
|
|
||||||
|
TRAINS monitors your Python process. When the process exits in an orderly fashion, TRAINS closes the experiment.
|
||||||
|
|
||||||
|
When the process crashes and terminates abnormally, the stop signal is sometimes missed. In such a case, you can safely right click the experiment in the Web-App and stop it.
|
||||||
|
|
||||||
|
## My code throws an exception, but my experiment status is not "Failed". What happened? <a name="exception-not-failed"></a>
|
||||||
|
|
||||||
|
This issue was resolved in v0.9.2. Upgrade TRAINS:
|
||||||
|
|
||||||
|
```pip install -U trains```
|
||||||
|
|
||||||
|
## When I run my experiment, I get an SSL Connection error [CERTIFICATE_VERIFY_FAILED]. Do you have a solution? <a name="ssl-connection-error"></a>
|
||||||
|
|
||||||
|
Your firewall may be preventing the connection. Try one of the following solutons:
|
||||||
|
|
||||||
|
* Direct python "requests" to use the enterprise certificate file by setting the OS environment variables CURL_CA_BUNDLE or REQUESTS_CA_BUNDLE.
|
||||||
|
|
||||||
|
You can see a detailed discussion at [https://stackoverflow.com/questions/48391750/disable-python-requests-ssl-validation-for-an-imported-module](https://stackoverflow.com/questions/48391750/disable-python-requests-ssl-validation-for-an-imported-module).
|
||||||
|
|
||||||
|
2. Disable certificate verification (for security reasons, this is not recommended):
|
||||||
|
|
||||||
|
1. Upgrade TRAINS to the current version:
|
||||||
|
|
||||||
|
```pip install -U trains```
|
||||||
|
|
||||||
|
1. Create a new **trains.conf** configuration file (sample file [here](https://github.com/allegroai/trains/blob/master/docs/trains.conf)), containing:
|
||||||
|
|
||||||
|
```api { verify_certificate = False }```
|
||||||
|
|
||||||
|
1. Copy the new **trains.conf** file to ~/trains.conf (on Windows: C:\Users\your_username\trains.conf)
|
||||||
|
|
||||||
|
## Graphs and Logs
|
||||||
|
|
||||||
|
### The first log lines are missing from the experiment log tab. Where did they go? <a name="first-log-lines-missing"></a>
|
||||||
|
|
||||||
|
Due to speed/optimization issues, we opted to display only the last several hundred log lines.
|
||||||
|
|
||||||
|
You can always downloaded the full log as a file using the Web-App.
|
||||||
|
|
||||||
|
### Can I create a graph comparing hyper-parameters vs model accuracy? <a name="compare-graph-parameters"></a>
|
||||||
|
|
||||||
|
Yes, you can manually create a plot with a single point X-axis for the hyper-parameter value,
|
||||||
and Y-Axis for the accuracy. For example:
|
and Y-Axis for the accuracy. For example:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@ -123,34 +229,40 @@ Task.current_task().get_logger().report_vector(
|
|||||||
|
|
||||||
<img src="https://github.com/allegroai/trains/blob/master/docs/screenshots/compare_plots_hist.png?raw=true" width="50%">
|
<img src="https://github.com/allegroai/trains/blob/master/docs/screenshots/compare_plots_hist.png?raw=true" width="50%">
|
||||||
|
|
||||||
## I noticed that all of my experiments appear as `Training`. Are there other options? <a name="other-experiment-types"></a>
|
### I want to add more graphs, not just with Tensorboard. Is this supported? <a name="more-graph-types"></a>
|
||||||
|
|
||||||
Yes! When creating experiments and calling `Task.init`, you can provide an experiment type.
|
Yes! Use a [Logger](https://github.com/allegroai/trains/blob/master/trains/logger.py) object. An instance can be always be retrieved using the `Task.current_task().get_logger()` method:
|
||||||
The currently supported types are `Task.TaskTypes.training` and `Task.TaskTypes.testing`. For example:
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
task = Task.init(project_name, task_name, Task.TaskTypes.testing)
|
# Get a logger object
|
||||||
|
logger = Task.current_task().get_logger()
|
||||||
|
|
||||||
|
# Report some scalar
|
||||||
|
logger.report_scalar("loss", "classification", iteration=42, value=1.337)
|
||||||
```
|
```
|
||||||
|
|
||||||
If you feel we should add a few more, let us know in the [issues](https://github.com/allegroai/trains/issues) section.
|
#### **TRAINS supports:**
|
||||||
|
|
||||||
|
* Scalars
|
||||||
|
* Plots
|
||||||
|
* 2D/3D Scatter Diagrams
|
||||||
|
* Histograms
|
||||||
|
* Surface Diagrams
|
||||||
|
* Confusion Matrices
|
||||||
|
* Images
|
||||||
|
* Text logs
|
||||||
|
|
||||||
## I noticed I keep getting the message `warning: uncommitted code`. What does it mean? <a name="uncommitted-code-warning"></a>
|
For a more detailed example, see [here](https://github.com/allegroai/trains/blob/master/examples/manual_reporting.py).
|
||||||
|
|
||||||
TRAINS not only detects your current repository and git commit,
|
## Git and Storage
|
||||||
but also warns you if you are using uncommitted code. TRAINS does this
|
|
||||||
because uncommitted code means this experiment will be difficult to reproduce.
|
|
||||||
|
|
||||||
If you still don't care, just ignore this message - it is merely a warning.
|
### Is there something TRAINS can do about uncommitted code running? <a name="help-uncommitted-code"></a>
|
||||||
|
|
||||||
|
|
||||||
## Is there something TRAINS can do about uncommitted code running? <a name="help-uncommitted-code"></a>
|
|
||||||
|
|
||||||
Yes! TRAINS currently stores the git diff as part of the experiment's information.
|
Yes! TRAINS currently stores the git diff as part of the experiment's information.
|
||||||
The Web-App will soon present the git diff as well. This is coming very soon!
|
The Web-App will soon present the git diff as well. This is coming very soon!
|
||||||
|
|
||||||
|
|
||||||
## I read there is a feature for centralized model storage. How do I use it? <a name="centralized-model-storage"></a>
|
### I read there is a feature for centralized model storage. How do I use it? <a name="centralized-model-storage"></a>
|
||||||
|
|
||||||
When calling `Task.init()`, providing the `output_uri` parameter allows you to specify the location in which model snapshots will be stored.
|
When calling `Task.init()`, providing the `output_uri` parameter allows you to specify the location in which model snapshots will be stored.
|
||||||
|
|
||||||
@ -182,74 +294,25 @@ taks = Task.init(project_name, task_name, output_uri="gs://bucket-name/folder")
|
|||||||
For a more detailed example, see [here](https://github.com/allegroai/trains/blob/master/docs/trains.conf#L55).
|
For a more detailed example, see [here](https://github.com/allegroai/trains/blob/master/docs/trains.conf#L55).
|
||||||
|
|
||||||
|
|
||||||
## I am training multiple models at the same time, but I only see one of them. What happened? <a name="only-last-model-appears"></a>
|
|
||||||
|
|
||||||
Although all models can be found under the project's **Models** tab, TRAINS currently shows only the last model associated with an experiment in the experiment's information panel.
|
### When using PyCharm to remotely debug a machine, the git repo is not detected. Do you have a solution? <a name="pycharm-remote-debug-detect-git"></a>
|
||||||
|
|
||||||
This will be fixed in a future version.
|
Yes! Since this is such a common occurrence, we created a PyCharm plugin that allows a remote debugger to grab your local repository / commit ID. See our [TRAINS PyCharm Plugin](https://github.com/allegroai/trains-pycharm-plugin) repository for instructions and [latest release](https://github.com/allegroai/trains-pycharm-plugin/releases).
|
||||||
|
|
||||||
## Can I log input and output models manually? <a name="manually-log-models"></a>
|
### Git is not well supported in Jupyter, so we just gave up on committing our code. Do you have a solution? <a name="commit-git-in-jupyter"></a>
|
||||||
|
|
||||||
Yes! For example:
|
|
||||||
|
|
||||||
```python
|
|
||||||
input_model = InputModel.import_model(link_to_initial_model_file)
|
|
||||||
Task.current_task().connect(input_model)
|
|
||||||
|
|
||||||
OutputModel(Task.current_task()).update_weights(link_to_new_model_file_here)
|
|
||||||
```
|
|
||||||
|
|
||||||
See [InputModel](https://github.com/allegroai/trains/blob/master/trains/model.py#L319) and [OutputModel](https://github.com/allegroai/trains/blob/master/trains/model.py#L539) for more information.
|
|
||||||
|
|
||||||
|
|
||||||
## I am using Jupyter Notebook. Is this supported? <a name="jupyter-notebook"></a>
|
|
||||||
|
|
||||||
Yes! Jupyter Notebook is supported. See [TRAINS Jupyter Plugin](https://github.com/allegroai/trains-jupyter-plugin).
|
|
||||||
|
|
||||||
|
|
||||||
## I do not use Argarser for hyper-parameters. Do you have a solution? <a name="dont-want-argparser"></a>
|
|
||||||
|
|
||||||
Yes! TRAINS supports using a Python dictionary for hyper-parameter logging. Just call:
|
|
||||||
|
|
||||||
```python
|
|
||||||
parameters_dict = Task.current_task().connect(parameters_dict)
|
|
||||||
```
|
|
||||||
|
|
||||||
From this point onward, not only are the dictionary key/value pairs stored as part of the experiment, but any changes to the dictionary will be automatically updated in the task's information.
|
|
||||||
|
|
||||||
|
|
||||||
## Git is not well supported in Jupyter, so we just gave up on committing our code. Do you have a solution? <a name="commit-git-in-jupyter"></a>
|
|
||||||
|
|
||||||
Yes! Check our [TRAINS Jupyter Plugin](https://github.com/allegroai/trains-jupyter-plugin). This plugin allows you to commit your notebook directly from Jupyter. It also saves the Python version of your code and creates an updated `requirements.txt` so you know which packages you were using.
|
Yes! Check our [TRAINS Jupyter Plugin](https://github.com/allegroai/trains-jupyter-plugin). This plugin allows you to commit your notebook directly from Jupyter. It also saves the Python version of your code and creates an updated `requirements.txt` so you know which packages you were using.
|
||||||
|
|
||||||
|
|
||||||
## Can I use TRAINS with scikit-learn? <a name="use-scikit-learn"></a>
|
## Jupyter and scikit-learn
|
||||||
|
|
||||||
|
### I am using Jupyter Notebook. Is this supported? <a name="jupyter-notebook"></a>
|
||||||
|
|
||||||
|
Yes! Jupyter Notebook is supported. See [TRAINS Jupyter Plugin](https://github.com/allegroai/trains-jupyter-plugin).
|
||||||
|
|
||||||
|
|
||||||
|
### Can I use TRAINS with scikit-learn? <a name="use-scikit-learn"></a>
|
||||||
|
|
||||||
Yes! `scikit-learn` is supported. Everything you do is logged.
|
Yes! `scikit-learn` is supported. Everything you do is logged.
|
||||||
|
|
||||||
**NOTE**: Models are not automatically logged because in most cases, scikit-learn will simply pickle the object to files so there is no underlying frame we can connect to.
|
**NOTE**: Models are not automatically logged because in most cases, scikit-learn will simply pickle the object to files so there is no underlying frame we can connect to.
|
||||||
|
|
||||||
|
|
||||||
## When using PyCharm to remotely debug a machine, the git repo is not detected. Do you have a solution? <a name="pycharm-remote-debug-detect-git"></a>
|
|
||||||
|
|
||||||
Yes! Since this is such a common occurrence, we created a PyCharm plugin that allows a remote debugger to grab your local repository / commit ID. See our [TRAINS PyCharm Plugin](https://github.com/allegroai/trains-pycharm-plugin) repository for instructions and [latest release](https://github.com/allegroai/trains-pycharm-plugin/releases).
|
|
||||||
|
|
||||||
|
|
||||||
## How do I know a new version came out? <a name="new-version-auto-update"></a>
|
|
||||||
|
|
||||||
TRAINS does not yet support auto-update checks. We hope to add this feature soon.
|
|
||||||
|
|
||||||
|
|
||||||
## Sometimes I see experiments as running when in fact they are not. What's going on? <a name="experiment-running-but-stopped"></a>
|
|
||||||
|
|
||||||
TRAINS monitors your Python process. When the process exits in an orderly fashion, TRAINS closes the experiment.
|
|
||||||
|
|
||||||
When the process crashes and terminates abnormally, the stop signal is sometimes missed. In such a case, you can safely right click the experiment in the Web-App and stop it.
|
|
||||||
|
|
||||||
|
|
||||||
## The first log lines are missing from the experiment log tab. Where did they go? <a name="first-log-lines-missing"></a>
|
|
||||||
|
|
||||||
Due to speed/optimization issues, we opted to display only the last several hundred log lines.
|
|
||||||
|
|
||||||
You can always downloaded the full log as a file using the Web-App.
|
|
||||||
|
|
||||||
|
Binary file not shown.
Before Width: | Height: | Size: 1.6 MiB After Width: | Height: | Size: 1.6 MiB |
Loading…
Reference in New Issue
Block a user