ClearML/clearml

Fork 0

mirror of https://github.com/clearml/clearml synced 2025-06-26 18:16:07 +00:00

allegroai a0ecc6d516 Documentation

2019-07-08 23:29:09 +03:00

13 KiB

Raw Blame History

TRAINS FAQ

General Information

How do I know a new version came out?

Configuration

How can I change the location of TRAINS configuration file?
How can I override TRAINS credentials from the OS environment?

Models

How can I sort models by a certain metric?
Can I store more information on the models?
Can I store the model configuration file as well?
I am training multiple models at the same time, but I only see one of them. What happened?
Can I log input and output models manually?

Experiments

I noticed I keep getting the message warning: uncommitted code. What does it mean?
I do not use Argarser for hyper-parameters. Do you have a solution?
I noticed that all of my experiments appear as Training. Are there other options?
Sometimes I see experiments as running when in fact they are not. What's going on?
My code throws an exception, but my experiment status is not "Failed". What happened?
When I run my experiment, I get an SSL Connection error [CERTIFICATE_VERIFY_FAILED]. Do you have a solution?

Graphs and Logs

The first log lines are missing from the experiment log tab. Where did they go?
Can I create a graph comparing hyper-parameters vs model accuracy?
I want to add more graphs, not just with Tensorboard. Is this supported?

GIT and Storage

Is there something TRAINS can do about uncommitted code running?
I read there is a feature for centralized model storage. How do I use it?
When using PyCharm to remotely debug a machine, the git repo is not detected. Do you have a solution?
Git is not well supported in Jupyter, so we just gave up on committing our code. Do you have a solution?

Jupyter and scikit-learn

I am using Jupyter Notebook. Is this supported?
Can I use TRAINS with scikit-learn?
Also see, Git and Jupyter

General Information

How do I know a new version came out?

Starting v0.9.3 TRAINS notifies on a new version release.

Example, new client version available

TRAINS new package available: UPGRADE to vX.Y.Z is recommended!

Example, new server version available

TRAINS-SERVER new version available: upgrade to vX.Y is recommended!

Configuration

How can I change the location of TRAINS configuration file?

Set "TRAINS_CONFIG_FILE" OS environment variable to override the default configuration file location.

export TRAINS_CONFIG_FILE="/home/user/mytrains.conf"

How can I override TRAINS credentials from the OS environment?

Set the OS environment variables below, in order to override the configuration file / defaults.

export TRAINS_API_ACCESS_KEY="key_here"
export TRAINS_API_SECRET_KEY="secret_here"
export TRAINS_API_HOST="http://localhost:8008"

Models

How can I sort models by a certain metric?

Models are associated with the experiments that created them. In order to sort experiments by a specific metric, add a custom column in the experiments table,

Can I store more information on the models?

For example, can I store enumeration of classes?

Yes! Use the Task.set_model_label_enumeration() method:

Task.current_task().set_model_label_enumeration( {"label": int(0), } )

Can I store the model configuration file as well?

Yes! Use the Task.set_model_design() method:

Task.current_task().set_model_design("a very long text with the configuration file's content")

I am training multiple models at the same time, but I only see one of them. What happened?

All models can be found under the project's Models tab, that said, currently in the Experiment's information panel TRAINS shows only the last associated model.

This will be fixed in a future version.

Can I log input and output models manually?

Yes! For example:

input_model = InputModel.import_model(link_to_initial_model_file)
Task.current_task().connect(input_model)

OutputModel(Task.current_task()).update_weights(link_to_new_model_file_here)

See InputModel and OutputModel for more information.

Experiments

I noticed I keep getting the message `warning: uncommitted code`. What does it mean?

TRAINS not only detects your current repository and git commit, but also warns you if you are using uncommitted code. TRAINS does this because uncommitted code means this experiment will be difficult to reproduce.

If you still don't care, just ignore this message - it is merely a warning.

I do not use Argarser for hyper-parameters. Do you have a solution?

Yes! TRAINS supports using a Python dictionary for hyper-parameter logging. Just use:

parameters_dict = Task.current_task().connect(parameters_dict)

From this point onward, not only are the dictionary key/value pairs stored as part of the experiment, but any changes to the dictionary will be automatically updated in the task's information.

I noticed that all of my experiments appear as `Training`. Are there other options?

Yes! When creating experiments and calling Task.init, you can provide an experiment type. The currently supported types are Task.TaskTypes.training and Task.TaskTypes.testing. For example:

task = Task.init(project_name, task_name, Task.TaskTypes.testing)

If you feel we should add a few more, let us know in the issues section.

Sometimes I see experiments as running when in fact they are not. What's going on?

TRAINS monitors your Python process. When the process exits in an orderly fashion, TRAINS closes the experiment.

When the process crashes and terminates abnormally, the stop signal is sometimes missed. In such a case, you can safely right click the experiment in the Web-App and stop it.

My code throws an exception, but my experiment status is not "Failed". What happened?

This issue was resolved in v0.9.2. Upgrade TRAINS:

pip install -U trains

When I run my experiment, I get an SSL Connection error [CERTIFICATE_VERIFY_FAILED]. Do you have a solution?

Your firewall may be preventing the connection. Try one of the following solutons:

Direct python "requests" to use the enterprise certificate file by setting the OS environment variables CURL_CA_BUNDLE or REQUESTS_CA_BUNDLE.

You can see a detailed discussion at https://stackoverflow.com/questions/48391750/disable-python-requests-ssl-validation-for-an-imported-module.

Disable certificate verification (for security reasons, this is not recommended):
1. Upgrade TRAINS to the current version:
  
  pip install -U trains
2. Create a new trains.conf configuration file (sample file here), containing:
  
  api { verify_certificate = False }
3. Copy the new trains.conf file to ~/trains.conf (on Windows: C:\Users\your_username\trains.conf)

Graphs and Logs

The first log lines are missing from the experiment log tab. Where did they go?

Due to speed/optimization issues, we opted to display only the last several hundred log lines.

You can always downloaded the full log as a file using the Web-App.

Can I create a graph comparing hyper-parameters vs model accuracy?

Yes, you can manually create a plot with a single point X-axis for the hyper-parameter value, and Y-Axis for the accuracy. For example:

number_layers = 10
accuracy = 0.95
Task.current_task().get_logger().report_scatter2d(
    "performance", "accuracy", iteration=0, 
    mode='markers', scatter=[(number_layers, accuracy)])

Assuming the hyper-parameter is "number_layers" with current value 10, and the accuracy for the trained model is 0.95. Then, the experiment comparison graph shows:

Another option is a histogram chart:

number_layers = 10
accuracy = 0.95
Task.current_task().get_logger().report_vector(
    "performance", "accuracy", iteration=0, labels=['accuracy'],
    values=[accuracy], xlabels=['number_layers %d' % number_layers])

I want to add more graphs, not just with Tensorboard. Is this supported?

Yes! Use a Logger object. An instance can be always be retrieved using the Task.current_task().get_logger() method:

# Get a logger object
logger = Task.current_task().get_logger()

# Report some scalar 
logger.report_scalar("loss", "classification", iteration=42, value=1.337)

TRAINS supports:

Scalars
Plots
2D/3D Scatter Diagrams
Histograms
Surface Diagrams
Confusion Matrices
Images
Text logs

For a more detailed example, see here.

Git and Storage

Is there something TRAINS can do about uncommitted code running?

Yes! TRAINS currently stores the git diff as part of the experiment's information. The Web-App will soon present the git diff as well. This is coming very soon!

I read there is a feature for centralized model storage. How do I use it?

When calling Task.init(), providing the output_uri parameter allows you to specify the location in which model snapshots will be stored.

For example, calling:

task = Task.init(project_name, task_name, output_uri="/mnt/shared/folder")

Will tell TRAINS to copy all stored snapshots into a sub-folder under /mnt/shared/folder. The sub-folder's name will contain the experiment's ID. Assuming the experiment's ID in this example is 6ea4f0b56d994320a713aeaf13a86d9d, the following folder will be used:

/mnt/shared/folder/task_6ea4f0b56d994320a713aeaf13a86d9d/models/

TRAINS supports more storage types for output_uri:

# AWS S3 bucket
task = Task.init(project_name, task_name, output_uri="s3://bucket-name/folder")

# Google Cloud Storage bucket
taks = Task.init(project_name, task_name, output_uri="gs://bucket-name/folder")

NOTE: These require configuring the storage credentials in ~/trains.conf. For a more detailed example, see here.

When using PyCharm to remotely debug a machine, the git repo is not detected. Do you have a solution?

Yes! Since this is such a common occurrence, we created a PyCharm plugin that allows a remote debugger to grab your local repository / commit ID. See our TRAINS PyCharm Plugin repository for instructions and latest release.

Git is not well supported in Jupyter, so we just gave up on committing our code. Do you have a solution?

Yes! Check our TRAINS Jupyter Plugin. This plugin allows you to commit your notebook directly from Jupyter. It also saves the Python version of your code and creates an updated requirements.txt so you know which packages you were using.

Jupyter and scikit-learn

I am using Jupyter Notebook. Is this supported?

Yes! Jupyter Notebook is supported. See TRAINS Jupyter Plugin.

Can I use TRAINS with scikit-learn?

Yes! scikit-learn is supported. Everything you do is logged.

NOTE: Models are not automatically logged because in most cases, scikit-learn will simply pickle the object to files so there is no underlying frame we can connect to.

13 KiB Raw Blame History

TRAINS FAQ

General Information

How do I know a new version came out?

Configuration

How can I change the location of TRAINS configuration file?

How can I override TRAINS credentials from the OS environment?

Models

How can I sort models by a certain metric?

Can I store more information on the models?

For example, can I store enumeration of classes?

Can I store the model configuration file as well?

I am training multiple models at the same time, but I only see one of them. What happened?

Can I log input and output models manually?

Experiments

I noticed I keep getting the message warning: uncommitted code. What does it mean?

I do not use Argarser for hyper-parameters. Do you have a solution?

I noticed that all of my experiments appear as Training. Are there other options?

Sometimes I see experiments as running when in fact they are not. What's going on?

My code throws an exception, but my experiment status is not "Failed". What happened?

When I run my experiment, I get an SSL Connection error [CERTIFICATE_VERIFY_FAILED]. Do you have a solution?

Graphs and Logs

The first log lines are missing from the experiment log tab. Where did they go?

Can I create a graph comparing hyper-parameters vs model accuracy?

I want to add more graphs, not just with Tensorboard. Is this supported?

TRAINS supports:

Git and Storage

Is there something TRAINS can do about uncommitted code running?

I read there is a feature for centralized model storage. How do I use it?

When using PyCharm to remotely debug a machine, the git repo is not detected. Do you have a solution?

Git is not well supported in Jupyter, so we just gave up on committing our code. Do you have a solution?

Jupyter and scikit-learn

I am using Jupyter Notebook. Is this supported?

Can I use TRAINS with scikit-learn?

13 KiB

Raw Blame History

I noticed I keep getting the message `warning: uncommitted code`. What does it mean?

I noticed that all of my experiments appear as `Training`. Are there other options?