23 KiB
title |
---|
Version 0.16 |
:::important Trains is now ClearML. :::
Version 0.16.4
Trains
Features
- Add Hydra support (GitHub trains Issue 219).
- Add cifar ignite example (GitHub trains Issue 237).
- Add auto extraction of
tar.gz
files when usingStorageManager
(GitHub trains Issue 237). - Add
Task.init()
argumentauto_connect_streams
controlling stdout / stderr / logging capture (GitHub trains Issue 181). - Add carriage return flush support using the
sdk.development.worker.console_cr_flush_period
configuration setting (GitHub trains Issue 181). - Add
Task.create_function_task()
to allow creating a new task, using a function and arguments, to be executed remotely (GitHub trains Issue 230). - Allow disabling SSL certificates verification using
Task.setup_upload()
argumentverify
or AWS S3 bucket configurationverify
property (GitHub trains Issue 256). - Add
StorageManager.get_files_server()
. - Add
Task.get_project_id()
using project name. - Add
project_name
argument toTask.set_project()
. - Add
Task.connect()
support for class / instance objects. - Add
Task get_configuration_object()
andTask.set_configuration_object()
for easier automation. - Improve Auto-Scaler - allow extra configurations, key name and security group are now optional, defaults using empty strings.
- Use a built-in matplotlib convertor.
- Add reporting text as debug sample example.
Bug Fixes
- Fix Optuna HPO parameter serializing (GitHub trains Issue 254).
- Fix connect dictionary
''
cast toNone
(GitHub trains Issue 258). - Fix lightgbm binding keyword argument issue (GitHub trains Issue 251).
- Fix artifact preview if artifact body is remote URI (GitHub trains Issue 239).
- Fix infinite recursion in
StorageManager
upload (GitHub trains Issue 253). - Fix keras reusing model object only if the filename is the same (GitHub trains Issue 252).
- Fix running remotely with no configuration should not crash but output a warning (GitHub trains Issue 243).
- Fix matplotlib 3.3.3 support:
- Fix global figure enumeration.
- Fix binding without a title reported a single plot (
untitled 00
) instead of increasing the counter.
- Fix Python 2.7 / 3.5 support.
- Fix quote issue when reporting debug images.
- Fix replace quote safe characters in upload file to include
;=@$
. - Fix
at_exit
called from another process should be ignored. - Fix
Task.set_tags()
for completed / published tasks. - Fix
Task.add_tags()
not working when running remotely. - Fix
Task.set_user_properties()
docstring and interface. - Fix preview with JSON (dict) artifacts did not store the artifact.
- Fix
Logger.report_text()
on task created usingTask.create()
was not supported. - Fix initialization for torch: only call torch
get_worker_info
if torch was loaded. - Fix flush (wait) on auxiliary task (obtained using
Task.get_task()
) should wait on all upload events. - Fix server was not updated with the defaults from the code when running remotely and configuration section is missing.
- Fix connect dict containing
None
default values, blocked the remote execution from passing string instead of None. - Fix
Task.upload_artifact()
argumentdelete_after_upload=True
used in conjunction withwait_for_upload=True
was not supported.
Version 0.16.3
Trains
Features
- Add LightGBM support.
- Add initial Hydra support (GitHub trains Issue 219).
- Add synchronous support for
Task.upload_artifact()
(GitHub trains Issue 231). - Add
sdk.development.store_code_diff_from_remote
(defaultfalse
) to store diff from remote HEAD instead of local HEAD (GitHub trains Issue 222). - Add
sdk.development.detect_with_conda_freeze
(defaulttrue
) for full conda freeze (requires trains-agent >= 16.2). - Add user properties support in Task object.
- Add
Logger.report_table()
support for table as list of lists. - Add support to split DAG and Table in pipeline DAG plot. Pipeline DAG single nodes are now round circles below the DAG graph..
- Add Pipeline / Optimization can be attached to any Task (not just the current task).
- Add
force_download
flag toStorageManager.get_local_copy()
. - Add control over the artifact preview using
Task.upload_artifact()
preview
argument. - Add
Logger.report_matplotlib_figure()
with examples. - Add
Task.set_task_type()
. - Improve AWS auto-scaler:
- Add key pair and security groups support.
- Add multi-line support for both extra bash script and extra
trains.conf
data.
- Update examples.
Bug Fixes
- Fix
Task.update_output_model()
wrong argument order (GitHub trains Issue 220). - Fix initializing task on argparse parse in remote mode. Do not call
Task.init()
to avoid auto connect, useTask.get_task()
instead. - Fix detected task cwd outside of repository root folder.
- Fix
Task.connect(dict)
to place non-existing entries on the section name instead of General. - Fix
Task.clone()
support for trains-server < 0.16. - Fix
StorageManager
cache extract zipped artifacts. Use modified time instead of access time for cached files. - Fix diff command output was stripped.
- Make sure local packages with multi-files are marked as
package
. - Fix
Task.set_base_docker()
should be skipped when running remotely. - Fix ArgParser binding handling of string argument with boolean default value (affects PyTorch Lightning integration).
- When using
detect_with_pip_freeze
make sure thatpackage @ file://
lines are replaced withpackage==x.y.z
as local file will probably not be available. - Fix git packages to new pip standard
package @ git+
. - Improve conda package naming
_
and-
support. - Do not add specific setuptools version to requirements (pip can't install it anyway).
- Fix image URL quoting when uploading from a file path.
Version 0.16.2
Trains
Features
- Add
Task.set_resource_monitor_iteration_timeout()
to set ResourceMonitor iteration wait duration timeout (GitHub trains Issue 208). - Add PyTorch Lightning save/restore model binding (GitHub trains Issue 212).
- Add
git diff
for repository submodule (requires git 2.14 or above). - Add
TrainsJob.is_completed()
andTrainsJob.is_aborted()
. - Add
Task.logger
property. - Add Pipeline Controller automation and example (see here).
- Add improved trace filtering capabilities in
trains.debugging.trace.trace_trains()
. - Add default help per argument (if not provided) in ArgParser binding.
- Deprecate
Task.reporter
. - Update PyTorch example.
- Remove warning on skipped auto-magic model logging (GitHub trains Issue 206).
- Support Keras restructuring for Network, Model and Sequential.
- Update autokeras requirements according to https://github.com/keras-team/autokeras#installation.
Bug Fixes
- Fix joblib auto logging models failing on compressed streams (GitHub trains Issue 203).
- Fix sending empty reports (GitHub trains Issue 205).
- Fix scatter2d sub-sampling and rounding.
- Fix plots reporting:
NaN
representation (matplotlib conversion).- Limit the number of digits in a plot to reduce plot size (using
sdk.metrics.plot_max_num_digits
configuration value).
- Fix
Task.wait_for_status()
to reload after it ends. - Fix thread wait Ctrl-C interrupt did not exit process.
- Improve Windows support for installed packages analysis.
- Fix auto model logging using relative path.
- Fix Hyperparameter Optimization example.
- Fix
Task.clone()
when working with TrainsServer < 0.16.0. - Fix pandas artifact handling.
- Avoid adding
unnamed:0
column. - Return original pandas object.
- Fix
TrainsJob
hyper-params overriding order was not guaranteed. - Fix ArgParse auto-connect to support default function type.
Trains-Agent
Features
-
conda:
- Add
agent.package_manager.conda_env_as_base_docker
allowing "docker_cmd" to contain link to a full pre-packaged conda environment (tar.gz
created byconda-pack
). UseTRAINS_CONDA_ENV_PACKAGE
environment variable to specifyconda tar.gz
file. - Add conda support for read-only pre-built environment (pass conda folder as
docker_cmd
on Task). - Improve trying to find conda executable.
- Add
-
k8s glue:
- Add support for limited number of services exposing ports.
- Add support for k8s pod custom user properties.
- Allow selecting external
trains.conf
file for the pod itself. - Allow providing pod template, extra bash init script, alternate SSH server port, gateway address (k8s ingress / ELB).
-
Allow specifying
cudatoolkit
version in the "installed packages" section when using conda as package manager (GitHub trains Issue 229). -
Add
agent.package_manager.force_repo_requirements_txt
. If True, "Installed Packages" on Task are ignored, and only repositoryrequirements.txt
is used. -
Pass
TRAINS_DOCKER_IMAGE
into docker for interactive sessions. -
Add
torchcsprng
andtorchtext
to PyTorch resolving.
Bug Fixes
-
When logging suppress "\r" when reading a current chunk of a file / stream. Add
agent.suppress_carriage_return
(default True) to support previous behavior. -
Make sure
TRAINS_AGENT_K8S_HOST_MOUNT
is used only once per mount. -
Fix k8s glue script to trains-agent default docker script.
-
Fix apply git diff from submodule only.
-
conda:
- Fix conda pip freeze to be consistent with trains 0.16.3.
- Fix conda environment support for trains 0.16.3 full env. Add
agent.package_manager.conda_full_env_update
to allow conda to update back the requirements (default False, to preserve previous behavior). - Fix running from conda environment -
conda.sh
not found in first conda PATH match.
-
Fix docker mode ubuntu / debian support by making sure not to ask for input (fix
tzdata
install). -
Fix repository detection - ignore environment
SSH_AUTH_SOCK
, only check if git user/pass are configured. -
git diff:
- Fix support for non-ascii diff.
- Fix diff with empty line at the end will cause corrupt diff apply message.
- Allow zero context diffs (useful when blind patching repository).
-
Fix
daemon --stop
when agent UID cannot be located. -
Fix nvidia docker support on some linux distros (SUSE).
-
Fix nvidia pytorch dockers support.
-
Fix torch CUDA 11.1 support.
-
Fix requirements dict with null entry in
pip
should be considered None install from repository'srequirements.txt
.
Version 0.16.1
Trains
Features
- Enhance HyperParameter optimizer.
Bug Fixes
- Fix typing dependency for
Python<3.5
(GitHub trains Issue 184). - Fix git+https requirements handling, resolve top_level.txt package name (kerastuner from git was not detected).
- Fix
Task.get_reported_console_output()
for new Trains Server API v2.9. - Fix cache handling for different partitions / drives / devices.
- Disable offline mode when running remotely (i.e. executed by Trains Agent).
- Fix artifact upload to only use file stream when not uploading a locally stored file (multipart upload is not supported on stream upload) (GitHub trains Issue 189).
- Fix double-escaped model design text when connecting OutputModel.
Trains Server
:::important Upgrading to this version requires a manual data migration. :::
Bug Fixes
- Fix model page issue causing N/A to show after switching tabs (Trains Slack channel thread).
- Removed experiments comparison limit (only 10 were allowed). Limit is now 100, configurable using
services.tasks.multi_task_histogram_limit
. (Trains Slack channel thread). - Fix scalar plots sometimes not calculated by the server in lower iteration values (Trains Slack channel thread).
- Fix error while retrieving experiment log when only a few lines were reported (GitHub trains-server Issue 59).
- Update Fixed User full-name on restart (Trains Slack channel thread).
- Fix project ordering issue.
- When loading plots, display a spinner and don't show "no data".
- Improve logging to provide more coherent ElasticSearch connection status in server log.
Trains Agent
Features
- Add
sdk.metrics.plot_max_num_digits
configuration option to reduce plot storage size. - Add
agent.package_manager.post_packages
andagent.package_manager.post_optional_packages
configuration options to control packages install order (e.g. horovod). - Add
agent.git_host
configuration option for limiting git credential usage for a specific host (overridable usingTRAINS_AGENT_GIT_HOST
environment variable). - Add
agent.force_git_ssh_port
configuration option to control HTTPS to SSH link conversion for non-standard SSH ports. - Add requirements detection features. Improve support for detecting new pip version (20+) supporting
package @ scheme://link
.
Bug Fixes
- Fix pre-installed packages are ignored when installing a git package wheel. Reinstalling a
git+http
link is enough to make sure all requirements are met / installed (GitHub Issue #196). - Fix incorrect check for spaces in current execution folder.
- Fix requirements detection:
- Update torch version after using downloaded / system pre-installed version.
- Do not install git packages twice when a new pip version is used (pip freeze will detect the correct git link version).
Version 0.16.0
Trains
Features
- Add continuing of previously executed experiments. Add
Task.init()
argumentcontinue_last_task
to continue a previously used Task (GitHub Issue #160). - Allow Task editing / creation from code.
Task.export_task/import_task/update_task()
(GitHub Issue #128). - Add offline mode. Use
Task.set_offline()
andTask.import_offline_session()
:- Support setting offline mode via
TRAINS_OFFLINE_MODE=1
environment variable. - Support setting offline API version via
TRAINS_OFFLINE_MODE=2.9
environment variable.
- Support setting offline mode via
- Automatically pickle all objects uploaded as artifacts,
task.upload_artifact()
argumentauto_pickle=True
(GitHub Issue #153). - Add multiple sections / groups support for Task hyperparameters, using
Task.connect()
. - Add multiple configurations (files) using
Task.connect_configuration()
. - Allow enabling OS environment logging using the
sdk.development.log_os_environments
configuration parameter (complements theTRAINS_LOG_ENVIRONMENT
environment variable). - Add Optuna support for hyperparameter optimization controller.
OptimizerOptuna
is now the default optimizer. - Add initial Keras-Tuner support (GitHub Issue keras-team/keras-tuner #334).
- Add automatic FastAI logging. It is disabled if Tensorboard is loaded (assuming TensorBoardLogger will be used).
- Support Tensorboard text logging (
add_text()
) as debug samples (.txt
files), instead of as console output. - Allow for more standard confusion matrix reporting.
Logger.report_confusion_matrix()
argumentyaxis_reversed
(flips the confusion matrix ifTrue
, defaultFalse
) (GitHub Issue #165). - Add support for Trains Server 0.16.0 (API v2.9 support).
- Allow disabling Trains update message from the log using the
TRAINS_SUPPRESS_UPDATE_MESSAGE
environment variable (GitHub Issue #157). - Add AWS EC2 Auto-Scaler service wizard and Service.
- Improved and updated examples:
- Add Keras Tuner CIFAR10 example.
- Add FastAI example.
- Update PyTorch Jupyter notebook examples (GitHub Issue #150).
- Support global requirements detection using
pip freeze
(setsdk.development.detect_with_pip_freeze
configuration intrains.conf
). - Add
Task.get_projects()
to get all projects in the system, sorted by last update time.
Bug Fixes
- Fix UTC to time stamp in comment (GitHub Issue #152).
- Fix and enhance GPU monitoring:
- Fix GPU stats on Windows machines (GitHub Issue #177).
- More robust GPU monitoring (GitHub Issue #170).
- Fix filename too long bug (GitHub trains-server Issue #49).
- Fix TensorFlow image logging to allow images with no width / height / color metadata (GitHub Issue #182).
- Fix multiprocessing Pool throw exception in pool hangs execution. Call original signal handler and re-flush
stdout
. - Fix
plotly
support formatplotlib
3.3. - Add Python 2.7 support for
get_current_thread_id()
. - Update examples requirements.
- Fix and improve signal handling.
- Fix Tensorboard 2D convolution histogram, improve histogram accuracy on very small histograms.
- Fix auto logging multiple argparse calls before
Task.init()
. - Limit experiment Git diff logging to 500Kb. If larger than 500Kb, diff section will contain a warning and entire diff will be uploaded as an artifact named
auxiliary_git_dif
. - Fix requirements detection:
- Fix Trains installed from
git+
. - Fix when Trains is not directly imported.
- Fix multiple
-e
packages were not detected (only the first one). - Fix running with Trains in
PYTHONPATH
resulted in double entry of trains.
- Fix Trains installed from
- Fix
Task.set_base_docker()
on main task to do nothing when running remotely.
Trains Server
:::important Upgrading to this version requires a manual data migration. :::
Features
- Add experiment hyperparameter grouping:
- HYPER PARAMETERS tab renamed to CONFIGURATION.
- CONFIGURATION tab contains the sections USER PROPERTIES, HYPER PARAMETERS, CONFIGURATION OBJECTS
- Add user properties group. Key-value pairs always editable (USER PROPERTIES section).
- Add command line options group * argparse and older experiments parameters (CONFIGURATIONS / HYPER PARAMETERS / Args).
- Add TensorFlow definitions group (CONFIGURATIONS / HYPER PARAMETERS / TF_DEFINE).
- Add environment variables group (CONFIGURATIONS / HYPER PARAMETERS / Environment).
- Improve experiment model configuration:
- Model design is in the ARTIFACTS tab.
- Experiment model description is in the CONFIGURATION OBJECTS section in the CONFIGURATION tab.
- Improve experiment comparison:
- In hyperparameter parallel coordinate comparison, hover over an experiment name to highlight it on plot (GitHub Issue #53).
- Remove fields providing no additional information from comparison.
- Improve the model framework filter. Filter contains only frameworks used by models in the project.
- Add configurable Trains services examples.
- Add support for text debug samples in the DEBUG SAMPLES section in the RESULTS tab.
- Add legend on / off toggle control for every plot.
- Add clear button for text areas (GitHub trains-server Issue #42).
- Reinstate the bottom bar Archive button.
- Add Trains community links to left bar.
- Add Hi-DPI display support.
- Add
debug.ping
endpoint for simple health monitoring. - Add support for field exclusion in
*.get_all
endpoints. - Move to ElasticSearch 7. Requires manual data migration.
Bug Fixes
- Auto-fit column width on column resize double click.
- Allow top-bar search if fewer than three characters are entered, and
Enter
is pressed.
Trains Agent
Features
- Add
agent.docker_init_bash_script
configuration section to allow finer control over Docker startup script. - Changed default Docker image from
nvidia/cuda
tonvidia/cuda:10.1-runtime-ubuntu18.04
to supportcudnn
frameworks (e.g. TF). - Improve support for Dockers with preinstalled
conda
environment. - Improve trains-agent-docker spinning.
- Add
daemon --order-fairness
for round-robin queue pulling. - Add
daemon --stop
to terminate a running agent (assuming other arguments are the same). If no additional arguments, Agents are terminated in lexicographical order. - Support cleanup of all log files on termination unless executed with
--debug
. - Add error message when Trains API Server is not accessible on startup.
Bug Fixes
- Fix GPU Windows monitoring support (GitHub Issue #177).
- Fix
.git-credentials
and.gitconfig
mapping into docker. - Fix non-root docker image usage.
- Fix docker to use
UTF-8
encoding, so prints won't break it. - Fix
--debug
to set all loggers toDEBUG
. - Fix task status change to
queued
should never happen during Task runtime. - Fix
requirement_parser
to supportpackage @ git+http
lines. - Fix GIT user/password in requirements and support for
-e git+http
lines. - Fix configuration wizard to generate
trains.conf
matching latest Trains definitions.