Commit Graph

410 Commits

Author SHA1 Message Date
allegroai
a671692832 Fix --services-mode with instance limit 2021-04-20 18:11:36 +03:00
allegroai
5c8675e43a Add support for dynamic gpus opportunistic scheduling (with min/max gpus per queue) 2021-04-20 18:11:16 +03:00
allegroai
60a58f6fad Fix poetry support (issue #57) 2021-04-14 11:22:07 +03:00
allegroai
537b67e0cd Fix agent can return non-zero error code and pods will end up restarting forever (issue #56) 2021-04-12 23:00:59 +03:00
allegroai
82c5e55fe4 Fix usage of not_set in k8s template merge 2021-04-07 21:30:13 +03:00
allegroai
945dd816ad Fix no docker arguments 2021-04-07 18:47:13 +03:00
allegroai
45009e6cc2 Add support for updating back docker on new API v2.13 2021-04-07 18:46:58 +03:00
allegroai
3774fa6abd Add support for new container base setup script feature 2021-04-07 18:46:14 +03:00
allegroai
e71e6865d2 Add agent.docker_install_opencv_libs (default: True) to enable auto opencv libs install for faster docker spin-up 2021-04-07 18:45:44 +03:00
allegroai
0e8f1528b1 Remove redundant py2 code 2021-04-07 18:44:59 +03:00
allegroai
c331babf51 Add stopping message on Task process termination
Fix --stop on dynamic gpus venv mode
2021-04-07 18:44:33 +03:00
allegroai
c59d268995 Fix venv cache crash on bad symbolic links 2021-04-07 18:44:11 +03:00
allegroai
9e9fcb0ba9 Add dynamic mode terminate dockers on sig_term 2021-04-07 18:43:44 +03:00
allegroai
f33e0b2f78 Verify docker command exists when running in docker mode 2021-04-07 18:42:27 +03:00
allegroai
0e4b99351f Add --stop support for dynamic gpus
Fix --stop mark tasks as aborted (not failed as before)
2021-04-07 18:42:10 +03:00
allegroai
81edd2860f Fix --dynamic-gpus should keep original queue priority order 2021-03-31 23:55:12 +03:00
allegroai
14ac584577 Support k8s glue container env vars merging 2021-03-31 23:53:58 +03:00
allegroai
9ce6baf074 Fix broken k8s glue docker args parsing
Fix empty env prevents override when merging template
2021-03-26 12:26:15 +03:00
allegroai
92a1e07b33 Fix local path replace back when using cache 2021-03-26 12:16:05 +03:00
allegroai
cb6bdece39 Fix cuda version from driver does not return minor version 2021-03-18 10:07:59 +02:00
allegroai
2ea38364bb Change the default conda channel order, so it pulls the correct pytorch 2021-03-18 10:07:58 +02:00
allegroai
cf6fdc0d81 Add support for PyJWT v2 2021-03-18 10:07:58 +02:00
allegroai
91eec99563 Add conda debug prints (--debug) 2021-03-18 10:07:58 +02:00
allegroai
d9b9b4984b Version bump to v0.17.2 2021-03-04 20:12:50 +02:00
allegroai
205f9dd816 Fix k8s glue does not pass docker environment variables
Remove deprecated flags
2021-03-03 15:07:06 +02:00
allegroai
9dfa1294e2 Add agent.enable_task_env set the OS environment based on the Environment section of the Task. 2021-02-28 19:47:44 +02:00
allegroai
f019905720 Fix venv cache support for local folders 2021-02-28 19:47:09 +02:00
allegroai
9c257858dd Fix venv cache support for local folders 2021-02-23 18:54:38 +02:00
allegroai
2006ab20dd Fix conda support for git+http links 2021-02-23 12:46:06 +02:00
allegroai
0caf31719c Fix venv caching always reinstall git repositories and local repositories 2021-02-23 12:45:34 +02:00
allegroai
5da7184276 Add agent.ignore_requested_python_version (control for multi python environments) 2021-02-23 12:45:00 +02:00
allegroai
50fccdab96 PEP8 2021-02-23 12:44:26 +02:00
allegroai
77d6ff6630 Fix docker mode without venvs cache dir 2021-02-17 00:04:07 +02:00
allegroai
58cb344ee6 Upgrade pynvml add detect CUDA version from driver level 2021-02-17 00:03:16 +02:00
allegroai
22d5892b12 Use shared git cache between multiple agents on the same machine 2021-02-14 13:49:29 +02:00
allegroai
f619969efc Add venvs_cache configuration 2021-02-14 13:48:57 +02:00
allegroai
ca242424ab Fix service-mode support for venvs
Fix --services-mode with venvs
2021-02-14 13:45:17 +02:00
allegroai
407deb84e9 Fix multi instances on Windows 2021-02-14 13:44:39 +02:00
allegroai
14589aa094 Fix CPU mode 2021-02-14 13:44:00 +02:00
allegroai
1260e3d942 Update cache entries on conda package manager 2021-02-11 14:47:26 +02:00
allegroai
b22d926d94 Fix cache to take cuda version into account 2021-02-11 14:47:05 +02:00
allegroai
410cc8c7be Add --dynamic-gpus and limit in --services-mode 2021-02-11 14:46:37 +02:00
allegroai
784c676f5b Fix "from clearml" runtime diff patching (make sure we move it to after all the __future__ imports) include handling triple quotes in comments 2021-02-11 14:46:06 +02:00
allegroai
296f7970df Fix file not found error (no 2) interpreted as aborted (i.e. ctrl-c) 2021-02-11 14:44:54 +02:00
allegroai
cd59933c9c Remove unused packages 2021-02-11 14:44:35 +02:00
allegroai
b95d3f5300 Add venv caching with docker mode support 2021-02-11 14:44:19 +02:00
allegroai
fa0d5d8469 Fix --detached not supported on Windows, ignore and issue warning 2021-02-11 14:40:09 +02:00
allegroai
8229843018 Add base-pod-number parameter to k8s glue and example 2021-01-26 20:00:18 +02:00
allegroai
c578b37c6d Change dump configuration and ssh on every docker run 2021-01-24 08:48:10 +02:00
allegroai
8ea062c0bd Fix environment variables CLEARML_WEB_HOST/CLEARML_FILES_HOST not passed to running tasks (or updated on the config object) 2021-01-24 08:47:33 +02:00
allegroai
5d8bbde434 Fix applying git diff on new added file 2021-01-24 08:46:42 +02:00
allegroai
0462af6a3d Allow providing namespace in k8s glue and k8s glue example 2021-01-20 19:01:03 +02:00
allegroai
161993f66f Add agent.force_git_ssh_user configuration value (issue #42)
Change default docker to nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
2021-01-10 12:38:45 +02:00
allegroai
b7f87fb8d3 Detect and delete "stuck" k8s pods k8s glue 2021-01-10 12:37:13 +02:00
allegroai
8fdb87f1f5 Fix docker --network returns None 2020-12-30 16:57:04 +02:00
allegroai
a1f2941ffd version bump 2020-12-25 02:10:06 +02:00
allegroai
428781af86 Fix support for Windows pip and Conda requirements.txt 2020-12-25 02:06:40 +02:00
allegroai
a455003c7f version bump 2020-12-23 00:13:51 +02:00
allegroai
b4d143812e initial clearml-agent v0.17.0 2020-12-22 23:00:57 +02:00
allegroai
6e1f74402e Rename trains-agent -> clearml-agent 2020-12-22 21:21:29 +02:00