23 KiB
| title |
|---|
| Version 0.16 |
:::important Trains is now ClearML. :::
Version 0.16.4
Trains
Features
- Add Hydra support (GitHub trains Issue 219).
- Add cifar ignite example (GitHub trains Issue 237).
- Add auto extraction of
tar.gzfiles when usingStorageManager(GitHub trains Issue 237). - Add
Task.init()argumentauto_connect_streamscontrolling stdout / stderr / logging capture (GitHub trains Issue 181). - Add carriage return flush support using the
sdk.development.worker.console_cr_flush_periodconfiguration setting (GitHub trains Issue 181). - Add
Task.create_function_task()to allow creating a new task, using a function and arguments, to be executed remotely (GitHub trains Issue 230). - Allow disabling SSL certificates verification using
Task.setup_upload()argumentverifyor AWS S3 bucket configurationverifyproperty (GitHub trains Issue 256). - Add
StorageManager.get_files_server(). - Add
Task.get_project_id()using project name. - Add
project_nameargument toTask.set_project(). - Add
Task.connect()support for class / instance objects. - Add
Task get_configuration_object()andTask.set_configuration_object()for easier automation. - Improve Auto-Scaler - allow extra configurations, key name and security group are now optional, defaults using empty strings.
- Use a built-in matplotlib convertor.
- Add reporting text as debug sample example.
Bug Fixes
- Fix Optuna HPO parameter serializing (GitHub trains Issue 254).
- Fix connect dictionary
''cast toNone(GitHub trains Issue 258). - Fix lightgbm binding keyword argument issue (GitHub trains Issue 251).
- Fix artifact preview if artifact body is remote URI (GitHub trains Issue 239).
- Fix infinite recursion in
StorageManagerupload (GitHub trains Issue 253). - Fix keras reusing model object only if the filename is the same (GitHub trains Issue 252).
- Fix running remotely with no configuration should not crash but output a warning (GitHub trains Issue 243).
- Fix matplotlib 3.3.3 support:
- Fix global figure enumeration.
- Fix binding without a title reported a single plot (
untitled 00) instead of increasing the counter.
- Fix Python 2.7 / 3.5 support.
- Fix quote issue when reporting debug images.
- Fix replace quote safe characters in upload file to include
;=@$. - Fix
at_exitcalled from another process should be ignored. - Fix
Task.set_tags()for completed / published tasks. - Fix
Task.add_tags()not working when running remotely. - Fix
Task.set_user_properties()docstring and interface. - Fix preview with JSON (dict) artifacts did not store the artifact.
- Fix
Logger.report_text()on task created usingTask.create()was not supported. - Fix initialization for torch: only call torch
get_worker_infoif torch was loaded. - Fix flush (wait) on auxiliary task (obtained using
Task.get_task()) should wait on all upload events. - Fix server was not updated with the defaults from the code when running remotely and configuration section is missing.
- Fix connect dict containing
Nonedefault values, blocked the remote execution from passing string instead of None. - Fix
Task.upload_artifact()argumentdelete_after_upload=Trueused in conjunction withwait_for_upload=Truewas not supported.
Version 0.16.3
Trains
Features
- Add LightGBM support.
- Add initial Hydra support (GitHub trains Issue 219).
- Add synchronous support for
Task.upload_artifact()(GitHub trains Issue 231). - Add
sdk.development.store_code_diff_from_remote(defaultfalse) to store diff from remote HEAD instead of local HEAD (GitHub trains Issue 222). - Add
sdk.development.detect_with_conda_freeze(defaulttrue) for full conda freeze (requires trains-agent >= 16.2). - Add user properties support in Task object.
- Add
Logger.report_table()support for table as list of lists. - Add support to split DAG and Table in pipeline DAG plot. Pipeline DAG single nodes are now round circles below the DAG graph..
- Add Pipeline / Optimization can be attached to any Task (not just the current task).
- Add
force_downloadflag toStorageManager.get_local_copy(). - Add control over the artifact preview using
Task.upload_artifact()previewargument. - Add
Logger.report_matplotlib_figure()with examples. - Add
Task.set_task_type(). - Improve AWS auto-scaler:
- Add key pair and security groups support.
- Add multi-line support for both extra bash script and extra
trains.confdata.
- Update examples.
Bug Fixes
- Fix
Task.update_output_model()wrong argument order (GitHub trains Issue 220). - Fix initializing task on argparse parse in remote mode. Do not call
Task.init()to avoid auto connect, useTask.get_task()instead. - Fix detected task cwd outside of repository root folder.
- Fix
Task.connect(dict)to place non-existing entries on the section name instead of General. - Fix
Task.clone()support for trains-server < 0.16. - Fix
StorageManagercache extract zipped artifacts. Use modified time instead of access time for cached files. - Fix diff command output was stripped.
- Make sure local packages with multi-files are marked as
package. - Fix
Task.set_base_docker()should be skipped when running remotely. - Fix ArgParser binding handling of string argument with boolean default value (affects PyTorch Lightning integration).
- When using
detect_with_pip_freezemake sure thatpackage @ file://lines are replaced withpackage==x.y.zas local file will probably not be available. - Fix git packages to new pip standard
package @ git+. - Improve conda package naming
_and-support. - Do not add specific setuptools version to requirements (pip can't install it anyway).
- Fix image URL quoting when uploading from a file path.
Version 0.16.2
Trains
Features
- Add
Task.set_resource_monitor_iteration_timeout()to set ResourceMonitor iteration wait duration timeout (GitHub trains Issue 208). - Add PyTorch Lightning save/restore model binding (GitHub trains Issue 212).
- Add
git difffor repository submodule (requires git 2.14 or above). - Add
TrainsJob.is_completed()andTrainsJob.is_aborted(). - Add
Task.loggerproperty. - Add Pipeline Controller automation and example (see here).
- Add improved trace filtering capabilities in
trains.debugging.trace.trace_trains(). - Add default help per argument (if not provided) in ArgParser binding.
- Deprecate
Task.reporter. - Update PyTorch example.
- Remove warning on skipped auto-magic model logging (GitHub trains Issue 206).
- Support Keras restructuring for Network, Model and Sequential.
- Update autokeras requirements according to https://github.com/keras-team/autokeras#installation.
Bug Fixes
- Fix joblib auto logging models failing on compressed streams (GitHub trains Issue 203).
- Fix sending empty reports (GitHub trains Issue 205).
- Fix scatter2d sub-sampling and rounding.
- Fix plots reporting:
NaNrepresentation (matplotlib conversion).- Limit the number of digits in a plot to reduce plot size (using
sdk.metrics.plot_max_num_digitsconfiguration value).
- Fix
Task.wait_for_status()to reload after it ends. - Fix thread wait Ctrl-C interrupt did not exit process.
- Improve Windows support for installed packages analysis.
- Fix auto model logging using relative path.
- Fix Hyperparameter Optimization example.
- Fix
Task.clone()when working with TrainsServer < 0.16.0. - Fix pandas artifact handling.
- Avoid adding
unnamed:0column. - Return original pandas object.
- Fix
TrainsJobhyper-params overriding order was not guaranteed. - Fix ArgParse auto-connect to support default function type.
Trains-Agent
Features
-
conda:
- Add
agent.package_manager.conda_env_as_base_dockerallowing "docker_cmd" to contain link to a full pre-packaged conda environment (tar.gzcreated byconda-pack). UseTRAINS_CONDA_ENV_PACKAGEenvironment variable to specifyconda tar.gzfile. - Add conda support for read-only pre-built environment (pass conda folder as
docker_cmdon Task). - Improve trying to find conda executable.
- Add
-
k8s glue:
- Add support for limited number of services exposing ports.
- Add support for k8s pod custom user properties.
- Allow selecting external
trains.conffile for the pod itself. - Allow providing pod template, extra bash init script, alternate SSH server port, gateway address (k8s ingress / ELB).
-
Allow specifying
cudatoolkitversion in the "installed packages" section when using conda as package manager (GitHub trains Issue 229). -
Add
agent.package_manager.force_repo_requirements_txt. If True, "Installed Packages" on Task are ignored, and only repositoryrequirements.txtis used. -
Pass
TRAINS_DOCKER_IMAGEinto docker for interactive sessions. -
Add
torchcsprngandtorchtextto PyTorch resolving.
Bug Fixes
-
When logging suppress "\r" when reading a current chunk of a file / stream. Add
agent.suppress_carriage_return(default True) to support previous behavior. -
Make sure
TRAINS_AGENT_K8S_HOST_MOUNTis used only once per mount. -
Fix k8s glue script to trains-agent default docker script.
-
Fix apply git diff from submodule only.
-
conda:
- Fix conda pip freeze to be consistent with trains 0.16.3.
- Fix conda environment support for trains 0.16.3 full env. Add
agent.package_manager.conda_full_env_updateto allow conda to update back the requirements (default False, to preserve previous behavior). - Fix running from conda environment -
conda.shnot found in first conda PATH match.
-
Fix docker mode ubuntu / debian support by making sure not to ask for input (fix
tzdatainstall). -
Fix repository detection - ignore environment
SSH_AUTH_SOCK, only check if git user/pass are configured. -
git diff:
- Fix support for non-ascii diff.
- Fix diff with empty line at the end will cause corrupt diff apply message.
- Allow zero context diffs (useful when blind patching repository).
-
Fix
daemon --stopwhen agent UID cannot be located. -
Fix nvidia docker support on some linux distros (SUSE).
-
Fix nvidia pytorch dockers support.
-
Fix torch CUDA 11.1 support.
-
Fix requirements dict with null entry in
pipshould be considered None install from repository'srequirements.txt.
Version 0.16.1
Trains
Features
- Enhance HyperParameter optimizer.
Bug Fixes
- Fix typing dependency for Python<3.5 (GitHub trains Issue 184).
- Fix git+https requirements handling, resolve top_level.txt package name (kerastuner from git was not detected).
- Fix
Task.get_reported_console_output()for new Trains Server API v2.9. - Fix cache handling for different partitions / drives / devices.
- Disable offline mode when running remotely (i.e. executed by Trains Agent).
- Fix artifact upload to only use file stream when not uploading a locally stored file (multipart upload is not supported on stream upload) (GitHub trains Issue 189).
- Fix double-escaped model design text when connecting OutputModel.
Trains Server
:::important Upgrading to this version requires a manual data migration. :::
Bug Fixes
- Fix model page issue causing N/A to show after switching tabs (Trains Slack channel thread).
- Removed experiments comparison limit (only 10 were allowed). Limit is now 100, configurable using
services.tasks.multi_task_histogram_limit. (Trains Slack channel thread). - Fix scalar plots sometimes not calculated by the server in lower iteration values (Trains Slack channel thread).
- Fix error while retrieving experiment log when only a few lines were reported (GitHub trains-server Issue 59).
- Update Fixed User full-name on restart (Trains Slack channel thread).
- Fix project ordering issue.
- When loading plots, display a spinner and don't show "no data".
- Improve logging to provide more coherent ElasticSearch connection status in server log.
Trains Agent
Features
- Add
sdk.metrics.plot_max_num_digitsconfiguration option to reduce plot storage size. - Add
agent.package_manager.post_packagesandagent.package_manager.post_optional_packagesconfiguration options to control packages install order (e.g. horovod). - Add
agent.git_hostconfiguration option for limiting git credential usage for a specific host (overridable usingTRAINS_AGENT_GIT_HOSTenvironment variable). - Add
agent.force_git_ssh_portconfiguration option to control HTTPS to SSH link conversion for non-standard SSH ports. - Add requirements detection features. Improve support for detecting new pip version (20+) supporting
package @ scheme://link.
Bug Fixes
- Fix pre-installed packages are ignored when installing a git package wheel. Reinstalling a
git+httplink is enough to make sure all requirements are met / installed (GitHub Issue #196). - Fix incorrect check for spaces in current execution folder.
- Fix requirements detection:
- Update torch version after using downloaded / system pre-installed version.
- Do not install git packages twice when a new pip version is used (pip freeze will detect the correct git link version).
Version 0.16.0
Trains
Features
- Add continuing of previously executed experiments. Add
Task.init()argumentcontinue_last_taskto continue a previously used Task (GitHub Issue #160). - Allow Task editing / creation from code.
Task.export_task/import_task/update_task()(GitHub Issue #128). - Add offline mode. Use
Task.set_offline()andTask.import_offline_session():- Support setting offline mode via
TRAINS_OFFLINE_MODE=1environment variable. - Support setting offline API version via
TRAINS_OFFLINE_MODE=2.9environment variable.
- Support setting offline mode via
- Automatically pickle all objects uploaded as artifacts,
task.upload_artifact()argumentauto_pickle=True(GitHub Issue #153). - Add multiple sections / groups support for Task hyperparameters, using
Task.connect(). - Add multiple configurations (files) using
Task.connect_configuration(). - Allow enabling OS environment logging using the
sdk.development.log_os_environmentsconfiguration parameter (complements theTRAINS_LOG_ENVIRONMENTenvironment variable). - Add Optuna support for hyperparameter optimization controller.
OptimizerOptunais now the default optimizer. - Add initial Keras-Tuner support (GitHub Issue keras-team/keras-tuner #334).
- Add automatic FastAI logging. It is disabled if Tensorboard is loaded (assuming TensorBoardLogger will be used).
- Support Tensorboard text logging (
add_text()) as debug samples (.txtfiles), instead of as console output. - Allow for more standard confusion matrix reporting.
Logger.report_confusion_matrix()argumentyaxis_reversed(flips the confusion matrix ifTrue, defaultFalse) (GitHub Issue #165). - Add support for Trains Server 0.16.0 (API v2.9 support).
- Allow disabling Trains update message from the log using the
TRAINS_SUPPRESS_UPDATE_MESSAGEenvironment variable (GitHub Issue #157). - Add AWS EC2 Auto-Scaler service wizard and Service.
- Improved and updated examples:
- Add Keras Tuner CIFAR10 example.
- Add FastAI example.
- Update PyTorch Jupyter notebook examples (GitHub Issue #150).
- Support global requirements detection using
pip freeze(setsdk.development.detect_with_pip_freezeconfiguration intrains.conf). - Add
Task.get_projects()to get all projects in the system, sorted by last update time.
Bug Fixes
- Fix UTC to time stamp in comment (GitHub Issue #152).
- Fix and enhance GPU monitoring:
- Fix GPU stats on Windows machines (GitHub Issue #177).
- More robust GPU monitoring (GitHub Issue #170).
- Fix filename too long bug (GitHub trains-server Issue #49).
- Fix TensorFlow image logging to allow images with no width / height / color metadata (GitHub Issue #182).
- Fix multiprocessing Pool throw exception in pool hangs execution. Call original signal handler and re-flush
stdout. - Fix
plotlysupport formatplotlib3.3. - Add Python 2.7 support for
get_current_thread_id(). - Update examples requirements.
- Fix and improve signal handling.
- Fix Tensorboard 2D convolution histogram, improve histogram accuracy on very small histograms.
- Fix auto logging multiple argparse calls before
Task.init(). - Limit experiment Git diff logging to 500Kb. If larger than 500Kb, diff section will contain a warning and entire diff will be uploaded as an artifact named
auxiliary_git_dif. - Fix requirements detection:
- Fix Trains installed from
git+. - Fix when Trains is not directly imported.
- Fix multiple
-epackages were not detected (only the first one). - Fix running with Trains in
PYTHONPATHresulted in double entry of trains.
- Fix Trains installed from
- Fix
Task.set_base_docker()on main task to do nothing when running remotely.
Trains Server
:::important Upgrading to this version requires a manual data migration. :::
Features
- Add experiment hyperparameter grouping:
- HYPER PARAMETERS tab renamed to CONFIGURATION.
- CONFIGURATION tab contains the sections USER PROPERTIES, HYPER PARAMETERS, CONFIGURATION OBJECTS
- Add user properties group. Key-value pairs always editable (USER PROPERTIES section).
- Add command line options group * argparse and older experiments parameters (CONFIGURATIONS / HYPER PARAMETERS / Args).
- Add TensorFlow definitions group (CONFIGURATIONS / HYPER PARAMETERS / TF_DEFINE).
- Add environment variables group (CONFIGURATIONS / HYPER PARAMETERS / Environment).
- Improve experiment model configuration:
- Model design is in the ARTIFACTS tab.
- Experiment model description is in the CONFIGURATION OBJECTS section in the CONFIGURATION tab.
- Improve experiment comparison:
- In hyperparameter parallel coordinate comparison, hover over an experiment name to highlight it on plot (GitHub Issue #53).
- Remove fields providing no additional information from comparison.
- Improve the model framework filter. Filter contains only frameworks used by models in the project.
- Add configurable Trains services examples.
- Add support for text debug samples in the DEBUG SAMPLES section in the RESULTS tab.
- Add legend on / off toggle control for every plot.
- Add clear button for text areas (GitHub trains-server Issue #42).
- Reinstate the bottom bar Archive button.
- Add Trains community links to left bar.
- Add Hi-DPI display support.
- Add
debug.pingendpoint for simple health monitoring. - Add support for field exclusion in
*.get_allendpoints. - Move to ElasticSearch 7. Requires manual data migration.
Bug Fixes
- Auto-fit column width on column resize double click.
- Allow top-bar search if fewer than three characters are entered, and
Enteris pressed.
Trains Agent
Features
- Add
agent.docker_init_bash_scriptconfiguration section to allow finer control over Docker startup script. - Changed default Docker image from
nvidia/cudatonvidia/cuda:10.1-runtime-ubuntu18.04to supportcudnnframeworks (e.g. TF). - Improve support for Dockers with preinstalled
condaenvironment. - Improve trains-agent-docker spinning.
- Add
daemon --order-fairnessfor round-robin queue pulling. - Add
daemon --stopto terminate a running agent (assuming other arguments are the same). If no additional arguments, Agents are terminated in lexicographical order. - Support cleanup of all log files on termination unless executed with
--debug. - Add error message when Trains API Server is not accessible on startup.
Bug Fixes
- Fix GPU Windows monitoring support (GitHub Issue #177).
- Fix
.git-credentialsand.gitconfigmapping into docker. - Fix non-root docker image usage.
- Fix docker to use
UTF-8encoding, so prints won't break it. - Fix
--debugto set all loggers toDEBUG. - Fix task status change to
queuedshould never happen during Task runtime. - Fix
requirement_parserto supportpackage @ git+httplines. - Fix GIT user/password in requirements and support for
-e git+httplines. - Fix configuration wizard to generate
trains.confmatching latest Trains definitions.