Commit Graph

73 Commits

Author SHA1 Message Date
allegroai
17d2bf2a3e Change daemon --stop without any specific flag to terminate the agents by worker id lexicographic order 2020-07-11 01:43:54 +03:00
allegroai
94997f9c88 Add daemon --order-fairness for round-robin queue pulling
Add daemon --stop to terminate running agent (assume all the rest of the arguments are the same)
Clean up all log files on termination unless executed with --debug
2020-07-11 01:42:56 +03:00
allegroai
09b6b6a9de Fix non-root docker image usage
Fix broken trains-agent build
Improve support for dockers with preinstalled conda env
Improve trains-agent-docker spinning
2020-07-06 21:09:11 +03:00
allegroai
f693fa165c Fix .git-credentials and .gitconfig mapping into docker
Add agent.docker_init_bash_script allow finer control over docker startup script
2020-07-02 01:33:13 +03:00
allegroai
f1abee91dd Add FORCE_LOCAL_TRAINS_AGENT_WHEEL to force the install of local trains agent wheel into the docker image 2020-06-21 23:23:26 +03:00
allegroai
257dd95401 Add warning on --gpus without detected CUDA version (see issue #24) 2020-06-18 01:52:58 +03:00
allegroai
473a8de8bb Fix trains-agent init (max two verification retries, then print error) 2020-06-11 15:39:38 +03:00
allegroai
54d9d77294 Allow services mode to re-register (docker can kill it and not exit gracefully) 2020-06-01 16:34:33 +03:00
allegroai
ce02385420 Fix services mode abort docker while installing, detect docker crash 2020-06-01 16:33:47 +03:00
allegroai
522dd85d7b Fix docker build with no --entry-point to use bash as an entrypoint 2020-06-01 11:05:06 +03:00
allegroai
3651c85fcd Fix print if no repo (standalone script) 2020-05-31 14:03:31 +03:00
allegroai
566427d550 Fix build failing due to missing session 2020-05-31 14:02:42 +03:00
allegroai
22c5f043aa Fix detached mode to correctly use cache folder slots 2020-05-31 14:00:14 +03:00
allegroai
860ff8911c Fix status message check containing "worker" (deprecated test) 2020-05-31 13:58:39 +03:00
allegroai
799b292146 Support running code from module (i.e. '-m' in execution entry point) 2020-05-31 13:54:13 +03:00
allegroai
fffe8e1c3f Fix init wizard, correctly display the input servers 2020-05-31 13:53:34 +03:00
allegroai
829b1d8f15 Use deep copy to clone configuration, always write configuration before launching a docker 2020-05-09 20:12:29 +03:00
allegroai
53f511f536 Improve docker host-mount support, use TRAINS_AGENT_DOCKER_HOST_MOUNT env var 2020-05-09 20:02:46 +03:00
allegroai
7c87797a40 Pass git credentials to dockerized task execution 2020-05-09 19:59:58 +03:00
allegroai
272fa07c29 Fix and enhance "build --docker"
- Fix standalone docker execution
- Add --install-globally option to install required packages in the docker's system python
- Add --entry-point option to allow automatic task cloning when running the docker
2020-05-09 19:57:25 +03:00
allegroai
5bb257c46c Add daemon --create-queue to automatically create a queue and use it if queue name doesn't exist in server 2020-05-09 19:50:53 +03:00
allegroai
9cb71b9526 Add daemon service mode to allow multiple tasks to be launched simultaneously on the same machine (--service-mode) 2020-05-09 19:45:14 +03:00
allegroai
38e02ca5cd Add worker command state enforcement conforming and verification callback 2020-05-09 19:42:51 +03:00
allegroai
06bfea80bc Fix read file scope 2020-04-09 11:27:04 +03:00
allegroai
fc28467080 Improve error message when failing to locate a task 2020-04-09 11:23:13 +03:00
allegroai
8d47905982 Show host information when failing to obtain a task 2020-04-01 19:12:45 +03:00
allegroai
a6a0b01f71 Remove deprecated OS environment variables 2020-04-01 19:11:37 +03:00
allegroai
61232d05dd Fix run as user support in Windows and add fall-back for created user folders 2020-03-22 19:16:11 +02:00
allegroai
b3418e4496 Add daemon detached mode (--detached, -d) that runs agent in the background and returns immediately 2020-03-22 19:00:29 +02:00
allegroai
98a983d9a2 Add TRAINS_AGENT_EXTRA_PYTHON_PATH to allow adding additional python path for task execution (helpful when using extra untracked modules) 2020-03-20 10:46:56 +02:00
allegroai
482007c4ce Fix run as user feature (TRAINS_AGENT_EXEC_USER) 2020-03-20 10:42:32 +02:00
allegroai
98198b8006 Auto mount ~/.git-credentials into docker container if file exists 2020-03-20 10:39:59 +02:00
allegroai
58ab67ea31 Fix execution output handling 2020-03-20 10:35:25 +02:00
allegroai
15e9e6b778 Fix "execute --clone" support 2020-03-12 18:38:35 +02:00
allegroai
aa75b92e46 Prefer docker image from command line over the one in the experiment 2020-03-12 18:35:49 +02:00
allegroai
757210d5b3 Add support for "execute --docker" and for cloning an experiment before execution 2020-03-12 18:33:07 +02:00
allegroai
3393372b9c Do not share apt cache among agents on the same machine 2020-03-09 12:38:51 +02:00
allegroai
f2d2d702de Fix k8s support to allow a specific network for the docker (do not use the parent daemon network definition) 2020-03-09 12:38:32 +02:00
allegroai
e3d0680d39 Improve Unicode/UTF stdout handling 2020-03-09 12:34:48 +02:00
allegroai
0272c4c79c Add "--force-current-version" daemon command-line flag 2020-03-09 12:31:43 +02:00
allegroai
47bcd3839a Pass correct GPU limit when skipping gpus flag in docker mode 2020-03-05 14:07:44 +02:00
allegroai
0a3a8a1c52 Add support for mounting dockerized experiment folders to host when running on K8s in daemon mode 2020-03-05 13:13:03 +02:00
allegroai
231a907cff Add support for running daemon inside a K8s pod in daemon mode 2020-03-05 13:03:36 +02:00
allegroai
8f95eecf2e Add TRAINS_AGENT_EXEC_USER support for multiple daemon instances 2020-03-05 12:46:53 +02:00
allegroai
81008ee00e Add support for launching a specific python version based on Task.script.binary 2020-03-01 17:15:18 +02:00
allegroai
f838c8fc70 Allow providing queue names to daemon 2020-02-26 16:58:25 +02:00
allegroai
d558c66d3c Do not stop experiments if network is down 2020-02-10 10:47:13 +02:00
allegroai
a57a5b151c Daemon support for conda and poetry 2020-01-26 15:05:20 +02:00
allegroai
284271c654 Support limiting pip version, limit to <20 by default 2020-01-22 12:02:12 +02:00
allegroai
06897f7606 Fix poetry support 2020-01-21 16:23:36 +02:00