allegroai
121dec2a62
Version bump to v0.16.0
2020-08-10 17:28:00 +03:00
allegroai
4aacf9005e
Fix GPU Windows monitoring support (Trains Issue #177 )
2020-08-10 08:07:51 +03:00
allegroai
6b333202e9
Sync generated conf file with latest Trains
2020-08-08 14:44:45 +03:00
allegroai
ce6831368f
Fix GPU monitoring on Windows machines
2020-08-08 14:43:25 +03:00
allegroai
e4111c830b
Fix GIT user/pass in requirements and support for '-e git+http' lines
2020-07-30 14:30:23 +03:00
allegroai
52c1772b04
Add requirement_parser into trains-agent instead as a dependency. Fix requirement_parser to support 'package @ git+http' lines
2020-07-30 14:29:37 +03:00
allegroai
699d13bbb3
Fix task status change to queued should also never happen during Task runtime
2020-07-14 23:42:11 +03:00
allegroai
2c8d7d3d9a
Fix --debug to set all specified loggers to DEBUG
...
Add set_urllib_log_level, in debug set urllib log level to DEBUG
2020-07-11 01:45:46 +03:00
allegroai
b13cc1e8e7
Add error message when Trains API Server is not accessible on startup
2020-07-11 01:44:45 +03:00
allegroai
17d2bf2a3e
Change daemon --stop without any specific flag to terminate the agents by worker id lexicographic order
2020-07-11 01:43:54 +03:00
allegroai
94997f9c88
Add daemon --order-fairness for round-robin queue pulling
...
Add daemon --stop to terminate running agent (assume all the rest of the arguments are the same)
Clean up all log files on termination unless executed with --debug
2020-07-11 01:42:56 +03:00
allegroai
c6d998c4df
Add terminate process and rmtree utilities
2020-07-11 01:40:50 +03:00
allegroai
712efa208b
version bump
2020-07-06 21:09:21 +03:00
allegroai
09b6b6a9de
Fix non-root docker image usage
...
Fix broken trains-agent build
Improve support for dockers with preinstalled conda env
Improve trains-agent-docker spinning
2020-07-06 21:09:11 +03:00
allegroai
98ff9a50e6
Changed agent.docker_init_bash_script default value in comment
2020-07-06 21:05:55 +03:00
allegroai
1f4d358316
Changed default docker image from nvidia/cuda to "nvidia/cuda:10.1-runtime-ubuntu18.04" to support cudnn frameworks (TF)
2020-07-02 01:35:57 +03:00
allegroai
f693fa165c
Fix .git-credentials and .gitconfig mapping into docker
...
Add agent.docker_init_bash_script allow finer control over docker startup script
2020-07-02 01:33:13 +03:00
allegroai
c43084825c
Version bump to v0.15.1
2020-06-21 23:23:44 +03:00
allegroai
f1abee91dd
Add FORCE_LOCAL_TRAINS_AGENT_WHEEL to force the install of local trains agent wheel into the docker image
2020-06-21 23:23:26 +03:00
allegroai
c6b04edc34
version bump
2020-06-18 01:55:30 +03:00
allegroai
1f53a06299
Add agent.force_git_ssh_protocol option to force all git links to ssh:// (issue #16 )
...
Add git user/pass credentials for pip git packages (git+http and git+ssh) (issue #22 )
2020-06-18 01:55:14 +03:00
allegroai
257dd95401
Add warning on --gpus without detected CUDA version (see issue #24 )
2020-06-18 01:52:58 +03:00
allegroai
6fef58df6c
embed jsonmodels 2.4 into trains-agent
2020-06-18 00:30:40 +03:00
allegroai
473a8de8bb
Fix trains-agent init (max two verification retries, then print error)
2020-06-11 15:39:38 +03:00
allegroai
c58ffdb9f8
Version bump to v0.15.0
2020-06-01 19:56:59 +03:00
allegroai
54d9d77294
Allow services mode to re-register (docker can kill it and not exit gracefully)
2020-06-01 16:34:33 +03:00
allegroai
ce02385420
Fix services mode abort docker while installing, detect docker crash
2020-06-01 16:33:47 +03:00
allegroai
87ffd95eaa
Upgrade default pip version to <20.2
2020-06-01 16:33:00 +03:00
allegroai
522dd85d7b
Fix docker build with no --entry-point to use bash as an entrypoint
2020-06-01 11:05:06 +03:00
allegroai
3651c85fcd
Fix print if no repo (standalone script)
2020-05-31 14:03:31 +03:00
allegroai
566427d550
Fix build failing due to missing session
2020-05-31 14:02:42 +03:00
allegroai
cc99077c92
Do not monitor GPU when running with --cpu-only
2020-05-31 14:01:14 +03:00
allegroai
5f112447f7
CUDA_VISIBLE_DEVICES should not be set to "all"
2020-05-31 14:00:51 +03:00
allegroai
22c5f043aa
Fix detached mode to correctly use cache folder slots
2020-05-31 14:00:14 +03:00
allegroai
860ff8911c
Fix status message check containing "worker" (deprecated test)
2020-05-31 13:58:39 +03:00
allegroai
799b292146
Support running code from module (i.e. '-m' in execution entry point)
2020-05-31 13:54:13 +03:00
allegroai
fffe8e1c3f
Fix init wizard, correctly display the input servers
2020-05-31 13:53:34 +03:00
allegroai
8245293f7f
Fix request endpoint constant version numbers
2020-05-31 13:52:53 +03:00
allegroai
829b1d8f15
Use deep copy to clone configuration, always write configuration before launching a docker
2020-05-09 20:12:29 +03:00
allegroai
f6be64a4b5
Print conda install output if running in debug mode, turn on debugging if --debug flag is used
2020-05-09 20:11:01 +03:00
allegroai
21f6a73f66
Include CUDA version in the pytorch package fail error
2020-05-09 20:09:18 +03:00
allegroai
77c4c79a2f
Support pip 20.1 local/http package reference in pip freeze
2020-05-09 20:08:17 +03:00
allegroai
2ad929fa00
Add torch_nightly flag support (if torch wheel is not found on stable try the nightly builds), improve support for torch in freeze (add actually used HTTP link as comment to the original package)
2020-05-09 20:08:05 +03:00
allegroai
53f511f536
Improve docker host-mount support, use TRAINS_AGENT_DOCKER_HOST_MOUNT env var
2020-05-09 20:02:46 +03:00
allegroai
7c87797a40
Pass git credentials to dockerized task execution
2020-05-09 19:59:58 +03:00
allegroai
272fa07c29
Fix and enhance "build --docker"
...
- Fix standalone docker execution
- Add --install-globally option to install required packages in the docker's system python
- Add --entry-point option to allow automatic task cloning when running the docker
2020-05-09 19:57:25 +03:00
allegroai
6ce9cf7c2a
Fix version control links in requirements when using conda
2020-05-09 19:52:51 +03:00
allegroai
abb30ac2b8
Move --gpus and --cpu-only to worker args (used by daemon, execute and build)
2020-05-09 19:51:45 +03:00
allegroai
5bb257c46c
Add daemon --create-queue to automatically create a queue and use it if queue name doesn't exist in server
2020-05-09 19:50:53 +03:00
allegroai
c65b28ed92
Update venv_update URL
2020-05-09 19:47:00 +03:00
allegroai
fce8eb6782
Add OS environment configuration for git user/pass using TRAINS_AGENT_GIT_USER/TRAINS_AGENT_GIT_PASS
2020-05-09 19:46:46 +03:00
allegroai
9cb71b9526
Add daemon service mode to allow multiple tasks to be launched simultaneously on the same machine (--service-mode)
2020-05-09 19:45:14 +03:00
allegroai
38e02ca5cd
Add worker command state enforcement conforming and verification callback
2020-05-09 19:42:51 +03:00
allegroai
06bfea80bc
Fix read file scope
2020-04-09 11:27:04 +03:00
allegroai
e660c7f2be
Fix comments in config files
2020-04-09 11:23:45 +03:00
allegroai
fc28467080
Improve error message when failing to locate a task
2020-04-09 11:23:13 +03:00
allegroai
8d47905982
Show host information when failing to obtain a task
2020-04-01 19:12:45 +03:00
allegroai
a6a0b01f71
Remove deprecated OS environment variables
2020-04-01 19:11:37 +03:00
allegroai
2b561f6066
Version bump to v0.14.1
2020-03-24 20:37:18 +02:00
allegroai
61232d05dd
Fix run as user support in Windows and add fall-back for created user folders
2020-03-22 19:16:11 +02:00
allegroai
b3418e4496
Add daemon detached mode (--detached, -d) that runs agent in the background and returns immediately
2020-03-22 19:00:29 +02:00
allegroai
5ef627165c
Fix PyTorch support to ignore minor versions when looking for package to install or to download
2020-03-20 10:48:48 +02:00
allegroai
98a983d9a2
Add TRAINS_AGENT_EXTRA_PYTHON_PATH to allow adding additional python path for task execution (helpful when using extra untracked modules)
2020-03-20 10:46:56 +02:00
allegroai
482007c4ce
Fix run as user feature (TRAINS_AGENT_EXEC_USER)
2020-03-20 10:42:32 +02:00
allegroai
98198b8006
Auto mount ~/.git-credentials into docker container if file exists
2020-03-20 10:39:59 +02:00
allegroai
94bb11a81a
Change message when using local torch
2020-03-20 10:37:42 +02:00
allegroai
58ab67ea31
Fix execution output handling
2020-03-20 10:35:25 +02:00
allegroai
ea0ed4807e
Version bump to v0.14.0
2020-03-12 19:42:32 +02:00
allegroai
389600b91e
Fix git checkout with submodules
2020-03-12 18:39:47 +02:00
allegroai
5fb2550212
Update to backend API v2.5
2020-03-12 18:39:10 +02:00
allegroai
15e9e6b778
Fix "execute --clone" support
2020-03-12 18:38:35 +02:00
allegroai
aa75b92e46
Prefer docker image from command line over the one in the experiment
2020-03-12 18:35:49 +02:00
allegroai
757210d5b3
Add support for "execute --docker" and for cloning an experiment before execution
2020-03-12 18:33:07 +02:00
allegroai
00eb2f10ec
Version bump to v0.13.3
2020-03-09 16:07:50 +02:00
allegroai
3393372b9c
Do not share apt cache among agents on the same machine
2020-03-09 12:38:51 +02:00
allegroai
f2d2d702de
Fix k8s support to allow a specific network for the docker (do not use the parent daemon network definition)
2020-03-09 12:38:32 +02:00
allegroai
e3d0680d39
Improve Unicode/UTF stdout handling
2020-03-09 12:34:48 +02:00
allegroai
618c2ac5c4
Add default storage environment vars to generated agent configuration
2020-03-09 12:33:03 +02:00
allegroai
0272c4c79c
Add "--force-current-version" daemon command-line flag
2020-03-09 12:31:43 +02:00
allegroai
ff8cf63abf
Add "--force-current-version" daemon command-line flag
2020-03-09 12:27:39 +02:00
allegroai
2c7c7f5b44
Add K8s/trains glue service example
2020-03-05 14:10:08 +02:00
allegroai
47bcd3839a
Pass correct GPU limit when skipping gpus flag in docker mode
2020-03-05 14:07:44 +02:00
allegroai
0a3a8a1c52
Add support for mounting dockerized experiment folders to host when running on K8s in daemon mode
2020-03-05 13:13:03 +02:00
allegroai
231a907cff
Add support for running daemon inside a K8s pod in daemon mode
2020-03-05 13:03:36 +02:00
allegroai
8f95eecf2e
Add TRAINS_AGENT_EXEC_USER support for multiple daemon instances
2020-03-05 12:46:53 +02:00
allegroai
81008ee00e
Add support for launching a specific python version based on Task.script.binary
2020-03-01 17:15:18 +02:00
allegroai
f838c8fc70
Allow providing queue names to daemon
2020-02-26 16:58:25 +02:00
allegroai
596093aac6
Version bump to v0.13.2
2020-02-23 16:25:14 +02:00
allegroai
8f23f3b4c0
Add support for pulling recursive git modules as as well as main project
2020-02-23 15:48:12 +02:00
allegroai
95d503afdd
Fix pip install or upgrade with limit in conda
2020-02-23 15:47:28 +02:00
allegroai
73ee33be99
Print error in case Poetry configuration failed
2020-02-23 14:43:21 +02:00
allegroai
afec38a50e
Add missing models service
2020-02-18 11:31:58 +02:00
allegroai
f9c60904f4
version bump
2020-02-12 11:23:53 +02:00
allegroai
5d74f4b376
version bump
2020-02-10 10:47:20 +02:00
allegroai
d558c66d3c
Do not stop experiments if network is down
2020-02-10 10:47:13 +02:00
allegroai
43b2f7f41d
version bump
2020-02-04 18:06:45 +02:00
allegroai
28d752d568
Preinstall numpy if it exists in the requirements (temporary fix)
2020-02-04 18:06:25 +02:00
allegroai
5c6b3ccc94
Version bump to v0.13.1
2020-01-27 19:45:26 +02:00
allegroai
df10e6ed46
Fix conda support to install graphviz packages even if matplotlib was installed from pip
2020-01-27 19:22:51 +02:00
allegroai
8ef78fd058
version bump
2020-01-27 16:23:23 +02:00