clearml-docs/ver_0_16.md at d166467a3291549e10205db0f68bbdf4e01dcd19

mirror of https://github.com/clearml/clearml-docs synced 2025-01-31 14:37:18 +00:00

2024-06-23 10:00:06 +03:00

23 KiB

Raw Blame History

title
Version 0.16

:::important Trains is now ClearML. :::

Version 0.16.4

Trains

Features

Add Hydra support (GitHub trains Issue 219).
Add cifar ignite example (GitHub trains Issue 237).
Add auto extraction of tar.gz files when using StorageManager (GitHub trains Issue 237).
Add Task.init() argument auto_connect_streams controlling stdout / stderr / logging capture (GitHub trains Issue 181).
Add carriage return flush support using the sdk.development.worker.console_cr_flush_period configuration setting (GitHub trains Issue 181).
Add Task.create_function_task() to allow creating a new task, using a function and arguments, to be executed remotely (GitHub trains Issue 230).
Allow disabling SSL certificates verification using Task.setup_upload() argument verify or AWS S3 bucket configuration verify property (GitHub trains Issue 256).
Add StorageManager.get_files_server().
Add Task.get_project_id() using project name.
Add project_name argument to Task.set_project().
Add Task.connect() support for class / instance objects.
Add Task get_configuration_object() and Task.set_configuration_object() for easier automation.
Improve Auto-Scaler - allow extra configurations, key name and security group are now optional, defaults using empty strings.
Use a built-in matplotlib convertor.
Add reporting text as debug sample example.

Bug Fixes

Fix Optuna HPO parameter serializing (GitHub trains Issue 254).
Fix connect dictionary '' cast to None (GitHub trains Issue 258).
Fix lightgbm binding keyword argument issue (GitHub trains Issue 251).
Fix artifact preview if artifact body is remote URI (GitHub trains Issue 239).
Fix infinite recursion in StorageManager upload (GitHub trains Issue 253).
Fix keras reusing model object only if the filename is the same (GitHub trains Issue 252).
Fix running remotely with no configuration should not crash but output a warning (GitHub trains Issue 243).
Fix matplotlib 3.3.3 support:
- Fix global figure enumeration.
- Fix binding without a title reported a single plot (untitled 00) instead of increasing the counter.
Fix Python 2.7 / 3.5 support.
Fix quote issue when reporting debug images.
Fix replace quote safe characters in upload file to include ;=@$.
Fix at_exit called from another process should be ignored.
Fix Task.set_tags() for completed / published tasks.
Fix Task.add_tags() not working when running remotely.
Fix Task.set_user_properties() docstring and interface.
Fix preview with JSON (dict) artifacts did not store the artifact.
Fix Logger.report_text() on task created using Task.create() was not supported.
Fix initialization for torch: only call torch get_worker_info if torch was loaded.
Fix flush (wait) on auxiliary task (obtained using Task.get_task()) should wait on all upload events.
Fix server was not updated with the defaults from the code when running remotely and configuration section is missing.
Fix connect dict containing None default values, blocked the remote execution from passing string instead of None.
Fix Task.upload_artifact() argument delete_after_upload=True used in conjunction with wait_for_upload=True was not supported.

Version 0.16.3

Trains

Features

Add LightGBM support.
Add initial Hydra support (GitHub trains Issue 219).
Add synchronous support for Task.upload_artifact() (GitHub trains Issue 231).
Add sdk.development.store_code_diff_from_remote (default false) to store diff from remote HEAD instead of local HEAD (GitHub trains Issue 222).
Add sdk.development.detect_with_conda_freeze (default true) for full conda freeze (requires trains-agent >= 16.2).
Add user properties support in Task object.
Add Logger.report_table() support for table as list of lists.
Add support to split DAG and Table in pipeline DAG plot. Pipeline DAG single nodes are now round circles below the DAG graph..
Add Pipeline / Optimization can be attached to any Task (not just the current task).
Add force_download flag to StorageManager.get_local_copy().
Add control over the artifact preview using Task.upload_artifact() preview argument.
Add Logger.report_matplotlib_figure() with examples.
Add Task.set_task_type().
Improve AWS auto-scaler:
- Add key pair and security groups support.
- Add multi-line support for both extra bash script and extra trains.conf data.
Update examples.

Bug Fixes

Fix Task.update_output_model() wrong argument order (GitHub trains Issue 220).
Fix initializing task on argparse parse in remote mode. Do not call Task.init() to avoid auto connect, use Task.get_task() instead.
Fix detected task cwd outside of repository root folder.
Fix Task.connect(dict) to place non-existing entries on the section name instead of General.
Fix Task.clone() support for trains-server < 0.16.
Fix StorageManager cache extract zipped artifacts. Use modified time instead of access time for cached files.
Fix diff command output was stripped.
Make sure local packages with multi-files are marked as package.
Fix Task.set_base_docker() should be skipped when running remotely.
Fix ArgParser binding handling of string argument with boolean default value (affects PyTorch Lightning integration).
When using detect_with_pip_freeze make sure that package @ file:// lines are replaced with package==x.y.z as local file will probably not be available.
Fix git packages to new pip standard package @ git+.
Improve conda package naming _ and - support.
Do not add specific setuptools version to requirements (pip can't install it anyway).
Fix image URL quoting when uploading from a file path.

Version 0.16.2

Trains

Features

Add Task.set_resource_monitor_iteration_timeout() to set ResourceMonitor iteration wait duration timeout (GitHub trains Issue 208).
Add PyTorch Lightning save/restore model binding (GitHub trains Issue 212).
Add git diff for repository submodule (requires git 2.14 or above).
Add TrainsJob.is_completed() and TrainsJob.is_aborted().
Add Task.logger property.
Add Pipeline Controller automation and example (see here).
Add improved trace filtering capabilities in trains.debugging.trace.trace_trains().
Add default help per argument (if not provided) in ArgParser binding.
Deprecate Task.reporter.
Update PyTorch example.
Remove warning on skipped auto-magic model logging (GitHub trains Issue 206).
Support Keras restructuring for Network, Model and Sequential.
Update autokeras requirements according to https://github.com/keras-team/autokeras#installation.

Bug Fixes

Fix joblib auto logging models failing on compressed streams (GitHub trains Issue 203).
Fix sending empty reports (GitHub trains Issue 205).
Fix scatter2d sub-sampling and rounding.
Fix plots reporting:
- NaN representation (matplotlib conversion).
- Limit the number of digits in a plot to reduce plot size (using sdk.metrics.plot_max_num_digits configuration value).
Fix Task.wait_for_status() to reload after it ends.
Fix thread wait Ctrl-C interrupt did not exit process.
Improve Windows support for installed packages analysis.
Fix auto model logging using relative path.
Fix Hyperparameter Optimization example.
Fix Task.clone() when working with TrainsServer < 0.16.0.
Fix pandas artifact handling.
Avoid adding unnamed:0 column.
Return original pandas object.
Fix TrainsJob hyper-params overriding order was not guaranteed.
Fix ArgParse auto-connect to support default function type.

Trains-Agent

Features

conda:
- Add agent.package_manager.conda_env_as_base_docker allowing "docker_cmd" to contain link to a full pre-packaged conda environment (tar.gz created by conda-pack). Use TRAINS_CONDA_ENV_PACKAGE environment variable to specify conda tar.gz file.
- Add conda support for read-only pre-built environment (pass conda folder as docker_cmd on Task).
- Improve trying to find conda executable.
k8s glue:
- Add support for limited number of services exposing ports.
- Add support for k8s pod custom user properties.
- Allow selecting external trains.conf file for the pod itself.
- Allow providing pod template, extra bash init script, alternate SSH server port, gateway address (k8s ingress / ELB).
Allow specifying cudatoolkit version in the "installed packages" section when using conda as package manager (GitHub trains Issue 229).
Add agent.package_manager.force_repo_requirements_txt. If True, "Installed Packages" on Task are ignored, and only repository requirements.txt is used.
Pass TRAINS_DOCKER_IMAGE into docker for interactive sessions.
Add torchcsprng and torchtext to PyTorch resolving.

Bug Fixes

When logging suppress "\r" when reading a current chunk of a file / stream. Add agent.suppress_carriage_return (default True) to support previous behavior.
Make sure TRAINS_AGENT_K8S_HOST_MOUNT is used only once per mount.
Fix k8s glue script to trains-agent default docker script.
Fix apply git diff from submodule only.
conda:
- Fix conda pip freeze to be consistent with trains 0.16.3.
- Fix conda environment support for trains 0.16.3 full env. Add agent.package_manager.conda_full_env_update to allow conda to update back the requirements (default False, to preserve previous behavior).
- Fix running from conda environment - conda.sh not found in first conda PATH match.
Fix docker mode ubuntu / debian support by making sure not to ask for input (fix tzdata install).
Fix repository detection - ignore environment SSH_AUTH_SOCK, only check if git user/pass are configured.
git diff:
- Fix support for non-ascii diff.
- Fix diff with empty line at the end will cause corrupt diff apply message.
- Allow zero context diffs (useful when blind patching repository).
Fix daemon --stop when agent UID cannot be located.
Fix nvidia docker support on some linux distros (SUSE).
Fix nvidia pytorch dockers support.
Fix torch CUDA 11.1 support.
Fix requirements dict with null entry in pip should be considered None install from repository's requirements.txt.

Version 0.16.1

Trains

Features

Enhance HyperParameter optimizer.

Bug Fixes

Fix typing dependency for Python<3.5 (GitHub trains Issue 184).
Fix git+https requirements handling, resolve top_level.txt package name (kerastuner from git was not detected).
Fix Task.get_reported_console_output() for new Trains Server API v2.9.
Fix cache handling for different partitions / drives / devices.
Disable offline mode when running remotely (i.e. executed by Trains Agent).
Fix artifact upload to only use file stream when not uploading a locally stored file (multipart upload is not supported on stream upload) (GitHub trains Issue 189).
Fix double-escaped model design text when connecting OutputModel.

Trains Server

:::important Upgrading to this version requires a manual data migration. :::

Bug Fixes

Fix model page issue causing N/A to show after switching tabs (Trains Slack channel thread).
Removed experiments comparison limit (only 10 were allowed). Limit is now 100, configurable using services.tasks.multi_task_histogram_limit. (Trains Slack channel thread).
Fix scalar plots sometimes not calculated by the server in lower iteration values (Trains Slack channel thread).
Fix error while retrieving experiment log when only a few lines were reported (GitHub trains-server Issue 59).
Update Fixed User full-name on restart (Trains Slack channel thread).
Fix project ordering issue.
When loading plots, display a spinner and don't show "no data".
Improve logging to provide more coherent ElasticSearch connection status in server log.

Trains Agent

Features

Add sdk.metrics.plot_max_num_digits configuration option to reduce plot storage size.
Add agent.package_manager.post_packages and agent.package_manager.post_optional_packages configuration options to control packages install order (e.g. horovod).
Add agent.git_host configuration option for limiting git credential usage for a specific host (overridable using TRAINS_AGENT_GIT_HOST environment variable).
Add agent.force_git_ssh_port configuration option to control HTTPS to SSH link conversion for non-standard SSH ports.
Add requirements detection features. Improve support for detecting new pip version (20+) supporting package @ scheme://link.

Bug Fixes

Fix pre-installed packages are ignored when installing a git package wheel. Reinstalling a git+http link is enough to make sure all requirements are met / installed (GitHub Issue #196).
Fix incorrect check for spaces in current execution folder.
Fix requirements detection:
- Update torch version after using downloaded / system pre-installed version.
- Do not install git packages twice when a new pip version is used (pip freeze will detect the correct git link version).

Version 0.16.0

Trains

Features

Add continuing of previously executed experiments. Add Task.init() argument continue_last_task to continue a previously used Task (GitHub Issue #160).
Allow Task editing / creation from code. Task.export_task/import_task/update_task() (GitHub Issue #128).
Add offline mode. Use Task.set_offline() and Task.import_offline_session():
- Support setting offline mode via TRAINS_OFFLINE_MODE=1 environment variable.
- Support setting offline API version via TRAINS_OFFLINE_MODE=2.9 environment variable.
Automatically pickle all objects uploaded as artifacts, task.upload_artifact() argument auto_pickle=True (GitHub Issue #153).
Add multiple sections / groups support for Task hyperparameters, using Task.connect().
Add multiple configurations (files) using Task.connect_configuration().
Allow enabling OS environment logging using the sdk.development.log_os_environments configuration parameter (complements the TRAINS_LOG_ENVIRONMENT environment variable).
Add Optuna support for hyperparameter optimization controller. OptimizerOptuna is now the default optimizer.
Add initial Keras-Tuner support (GitHub Issue keras-team/keras-tuner #334).
Add automatic FastAI logging. It is disabled if Tensorboard is loaded (assuming TensorBoardLogger will be used).
Support Tensorboard text logging (add_text()) as debug samples (.txt files), instead of as console output.
Allow for more standard confusion matrix reporting. Logger.report_confusion_matrix() argument yaxis_reversed (flips the confusion matrix if True, default False) (GitHub Issue #165).
Add support for Trains Server 0.16.0 (API v2.9 support).
Allow disabling Trains update message from the log using the TRAINS_SUPPRESS_UPDATE_MESSAGE environment variable (GitHub Issue #157).
Add AWS EC2 Auto-Scaler service wizard and Service.
Improved and updated examples:
- Add Keras Tuner CIFAR10 example.
- Add FastAI example.
- Update PyTorch Jupyter notebook examples (GitHub Issue #150).
Support global requirements detection using pip freeze (set sdk.development.detect_with_pip_freeze configuration in trains.conf).
Add Task.get_projects() to get all projects in the system, sorted by last update time.

Bug Fixes

Fix UTC to time stamp in comment (GitHub Issue #152).
Fix and enhance GPU monitoring:
- Fix GPU stats on Windows machines (GitHub Issue #177).
- More robust GPU monitoring (GitHub Issue #170).
Fix filename too long bug (GitHub trains-server Issue #49).
Fix TensorFlow image logging to allow images with no width / height / color metadata (GitHub Issue #182).
Fix multiprocessing Pool throw exception in pool hangs execution. Call original signal handler and re-flush stdout.
Fix plotly support for matplotlib 3.3.
Add Python 2.7 support for get_current_thread_id().
Update examples requirements.
Fix and improve signal handling.
Fix Tensorboard 2D convolution histogram, improve histogram accuracy on very small histograms.
Fix auto logging multiple argparse calls before Task.init().
Limit experiment Git diff logging to 500Kb. If larger than 500Kb, diff section will contain a warning and entire diff will be uploaded as an artifact named auxiliary_git_dif.
Fix requirements detection:
- Fix Trains installed from git+.
- Fix when Trains is not directly imported.
- Fix multiple -e packages were not detected (only the first one).
- Fix running with Trains in PYTHONPATH resulted in double entry of trains.
Fix Task.set_base_docker() on main task to do nothing when running remotely.

Trains Server

:::important Upgrading to this version requires a manual data migration. :::

Features

Add experiment hyperparameter grouping:
- HYPER PARAMETERS tab renamed to CONFIGURATION.
- CONFIGURATION tab contains the sections USER PROPERTIES, HYPER PARAMETERS, CONFIGURATION OBJECTS
- Add user properties group. Key-value pairs always editable (USER PROPERTIES section).
- Add command line options group * argparse and older experiments parameters (CONFIGURATIONS / HYPER PARAMETERS / Args).
- Add TensorFlow definitions group (CONFIGURATIONS / HYPER PARAMETERS / TF_DEFINE).
- Add environment variables group (CONFIGURATIONS / HYPER PARAMETERS / Environment).
Improve experiment model configuration:
- Model design is in the ARTIFACTS tab.
- Experiment model description is in the CONFIGURATION OBJECTS section in the CONFIGURATION tab.
Improve experiment comparison:
- In hyperparameter parallel coordinate comparison, hover over an experiment name to highlight it on plot (GitHub Issue #53).
- Remove fields providing no additional information from comparison.
Improve the model framework filter. Filter contains only frameworks used by models in the project.
Add configurable Trains services examples.
Add support for text debug samples in the DEBUG SAMPLES section in the RESULTS tab.
Add legend on / off toggle control for every plot.
Add clear button for text areas (GitHub trains-server Issue #42).
Reinstate the bottom bar Archive button.
Add Trains community links to left bar.
Add Hi-DPI display support.
Add debug.ping endpoint for simple health monitoring.
Add support for field exclusion in *.get_all endpoints.
Move to ElasticSearch 7. Requires manual data migration.

Bug Fixes

Auto-fit column width on column resize double click.
Allow top-bar search if fewer than three characters are entered, and Enter is pressed.

Trains Agent

Features

Add agent.docker_init_bash_script configuration section to allow finer control over Docker startup script.
Changed default Docker image from nvidia/cuda to nvidia/cuda:10.1-runtime-ubuntu18.04 to support cudnn frameworks (e.g. TF).
Improve support for Dockers with preinstalled conda environment.
Improve trains-agent-docker spinning.
Add daemon --order-fairness for round-robin queue pulling.
Add daemon --stop to terminate a running agent (assuming other arguments are the same). If no additional arguments, Agents are terminated in lexicographical order.
Support cleanup of all log files on termination unless executed with --debug.
Add error message when Trains API Server is not accessible on startup.

Bug Fixes

Fix GPU Windows monitoring support (GitHub Issue #177).
Fix .git-credentials and .gitconfig mapping into docker.
Fix non-root docker image usage.
Fix docker to use UTF-8 encoding, so prints won't break it.
Fix --debug to set all loggers to DEBUG.
Fix task status change to queued should never happen during Task runtime.
Fix requirement_parser to support package @ git+http lines.
Fix GIT user/password in requirements and support for -e git+http lines.
Fix configuration wizard to generate trains.conf matching latest Trains definitions.

23 KiB Raw Blame History

Version 0.16.4

Trains

Version 0.16.3

Trains

Version 0.16.2

Trains

Trains-Agent

Version 0.16.1

Trains

Trains Server

Trains Agent

Version 0.16.0

Trains

Trains Server

Trains Agent

23 KiB

Raw Blame History