Commit Graph

350 Commits

Author SHA1 Message Date
allegroai
531e514003 Add custom build script support
Add extra configurations when starting daemon
Propagate token to docker in case credentials are not available
2022-03-15 10:04:25 +02:00
allegroai
2cd9e706c8 Fix user-provided " is unnecessarily replaced to \\" 2022-03-15 10:02:28 +02:00
Idan Tene
e3e6a1dda8
Fix virtualenv python interpreter used (#98)
* Add virtualenv version logging
* Force using requested python interpreter
2022-02-27 11:25:25 +02:00
pollfly
36073ad488
Fix links (#100) 2022-02-17 12:04:11 +02:00
allegroai
d89d0f9ff5 Fix pathlib2 six conflict, version bump 2022-02-09 18:29:04 +02:00
allegroai
14c48d0a78 Fix FORCE_LOCAL_CLEARML_AGENT_WHEEL when running from a Windows host 2022-02-09 18:28:17 +02:00
allegroai
b1ee3e105b Version bump 2022-02-07 20:05:03 +02:00
allegroai
1f53c4fd1b Fix agent fails to check out code from main branch when branch/commit is not explicitly specified 2022-02-07 20:04:08 +02:00
allegroai
bfed3ccf4d Fix agent attempts to check out code when in standalone mode 2022-02-07 20:03:08 +02:00
pollfly
d521482409
Add spaces to help menu (#96) 2022-02-06 12:45:21 +02:00
allegroai
53eba5658f Fix conda package manager listed packages with local links (@ file://) should ignore the local package if it does not exist
Fix cuda patch version support in conda
2022-02-02 16:33:07 +02:00
allegroai
bb64e4a850 Fix hide_docker_command_env_vars mode to include URL passwords and handle env vars containing docker commands 2022-02-02 16:30:34 +02:00
allegroai
771690d5c0 Fix ENV_API_DEFAULT_REQ_METHOD no default value causes ValueError if not specified 2022-01-31 12:39:39 +02:00
pollfly
d39e30995a
Fix links (#93) 2022-01-27 12:15:36 +02:00
allegroai
363aaeaba8 Fix symbolic links not copied from cached VCS into working copy. Windows platform will result with default copy content instead of original symbolic link (issue #89) 2022-01-23 10:42:11 +02:00
allegroai
fa1307e62c Add agent.poetry_version to specify poetry version (and force installation of poetry if missing) 2022-01-23 10:40:05 +02:00
allegroai
e7c9e9695b Fix using deprecated abc support 2022-01-23 10:39:13 +02:00
Mal Miller
bf07b7f76d
Add environment variable for request method (#91)
* Add environment variable for default request method
2022-01-12 20:29:17 +02:00
allegroai
5afb604e3d Fix default_python set to None 2022-01-07 15:12:27 +02:00
allegroai
b3e8be6296 Add agent.force_git_root_python_path configuration setting to force adding the git repository root folder to the PYTHONPATH (if set working directory is not added to the PYHTONPATH) 2022-01-07 15:11:59 +02:00
allegroai
2cb452b1c2 Version bump 2021-12-29 13:21:31 +02:00
allegroai
938fcc4530 Add build --force-docker command line argument to the to allow ignoring task container data 2021-12-29 13:21:25 +02:00
allegroai
73625bf00f Version bump 2021-12-21 14:29:43 +02:00
allegroai
f41ed09dc1 Add support for custom docker image resolving 2021-12-21 14:29:43 +02:00
allegroai
f03c4576f7 Update default docker image 2021-12-21 14:29:43 +02:00
allegroai
5a6caf6399 Fix "git+git://" requirements 2021-10-29 22:58:28 +03:00
allegroai
a07053d961 Version bump to v1.1.1 2021-10-26 10:12:21 +03:00
allegroai
aa9a9a25fb version bump 2021-10-21 12:03:29 +03:00
allegroai
cd4a39d8fc Fix config example 2021-10-21 12:03:07 +03:00
allegroai
92e3f00435 Add support for truncating task log file after reporting to server 2021-10-21 12:02:31 +03:00
allegroai
a890e36a36 Fix PY2.7 support for pytorch 2021-10-19 10:47:09 +03:00
allegroai
bed94ee431 Add support for configuration env and files section 2021-10-19 10:46:43 +03:00
allegroai
175e99b12b Fix if queue tag default does not exist and --queue not specified, try queue name "default" 2021-10-16 23:21:45 +03:00
allegroai
2a941e3abf Fix --stop checking default queue tag (issue #80) 2021-10-16 23:21:12 +03:00
allegroai
3c8e0ae5db Improve PyJWT resiliency support 2021-10-10 09:08:36 +03:00
allegroai
e416ab526b Fix Python 3.5 compatibility 2021-09-26 00:05:08 +03:00
pollfly
e17246d8ea
Fix docstring typos (#79)
* edit doctring typo

* fix typos
2021-09-14 18:42:18 +03:00
allegroai
f6f043d1ca Version bump to v1.1.0 2021-09-13 15:25:25 +03:00
allegroai
db57441c5d Fix sensitive environment variable values are not masked in "executing docker" printout (issue #67) 2021-09-13 14:00:11 +03:00
allegroai
31d90be0a1 Fix package manager config documentation (issue #78) 2021-09-10 13:11:39 +03:00
allegroai
5a080798cb Add support for overriding initial server connection behavior using the CLEARML_AGENT_INITIAL_CONNECT_RETRY_OVERRIDE env var (defaults to true, allows boolean value or an explicit number specifying the number of connect retries) 2021-08-27 19:15:14 +03:00
pollfly
21c4857795
Fix doctring typo (#75) 2021-08-22 08:19:55 +03:00
allegroai
4149afa896 Add agent.docker_internal_mounts to control containers internal mounts (non-root containers) 2021-08-21 16:03:37 +03:00
allegroai
b196ab5793 Do not overwrite PYTHONIOENCODING if defined 2021-08-20 00:37:21 +03:00
allegroai
b39b54bbaf Add poetry cache into docker mapping (issue #74) 2021-08-13 11:02:21 +03:00
allegroai
26d76f52ac Fix venv cache cannot reinstall package from git with http credentials 2021-08-13 11:00:54 +03:00
allegroai
2fff28845d Fix support for unicode standalone scripts, changing default 'ascii' encoding to UTF-8. 2021-08-12 13:39:11 +03:00
allegroai
5e4c495d62 Add support for naming docker containers. Use agent.docker_container_name_format to configure the name format (disabled by default) (issue clearml/#412)
Add missing entries in docs/clearml.conf
2021-08-12 13:38:26 +03:00
allegroai
5c5802c089 Fix python package with git+git:// links or git+ssh:// conversion 2021-08-12 13:37:10 +03:00
allegroai
06010ef1b7 Disable default demo server (still available with CLEARML_NO_DEFAULT_SERVER=0) 2021-08-12 13:36:49 +03:00
allegroai
bd411a1984 version bump 2021-08-05 19:23:23 +03:00
allegroai
0fbbe774fa Fix support for "-r requirements.txt" in installed packages 2021-08-05 19:19:54 +03:00
allegroai
6b602889a5 Fix import loop 2021-08-03 01:28:08 +03:00
allegroai
cd046927f3 Add k8s glue update task status_message in hanging pods daemon
Fix k8s glue not throwing error when failing to push to queue
2021-08-02 22:59:31 +03:00
allegroai
5ed47d2d2c Add support for CLEARML_NO_DEFAULT_SERVER env var to prevent agent from using the demo server
Add support for FORCE_CLEARML_AGENT_REPO env var to allow installing agent from a repo url when executing a task
Implement skip venv installation on execute and allow custom binary
Fix services mode limit implementation in docker mode
2021-08-02 22:51:26 +03:00
allegroai
fd068c0933 Add support for env vars containing bash-style string lists using shlex
Add support for CLEARML_AGENT_SKIP_PIP_VENV_INSTALL env var to skip venv installation on execute and allow custom binary
Add support for CLEARML_AGENT_VENV_CACHE_PATH env var to allow overriding venv cache folder configuration
Add support for CLEARML_AGENT_EXTRA_DOCKER_ARGS env var to allow overriding extra docker args configuration
2021-08-02 22:38:36 +03:00
Simon Gasse
9456e493ac Enable rewriting SSH URLs
ClearML Agent allows to force git cloning via SSH and also has a
setting to force a username. The relevant settings are:
agent.force_git_ssh_protocol: true
agent.force_git_ssh_user: "git"

However, forcing a specific username or port only worked so far if the
agent translated either from https->ssh or from ssh->https. A given
ssh URL was not rewritten.

This commit adds a helper function and includes it in `_set_ssh_url`
to allow rewriting ssh URLs with the username and/or port given in the
config `agent.force_git_ssh_user`.
If neither username nor port are forced in the config, the URL is not
touched.

This is somewhat related to issue #42.
Note that rewriting https->https is not covered in this commit.
2021-07-31 23:34:27 +03:00
allegroai
42606d9247 Fix multiple k8s glue instances with pod limits
Version bump
2021-07-15 10:28:43 +03:00
allegroai
499b3dfa66 Fix k8s glue, do not reset Task before re-enqueuing as it will remove runtime properties 2021-07-15 10:27:54 +03:00
allegroai
ca360b7d43 Improve max pod limit check 2021-07-15 10:26:49 +03:00
allegroai
6470b16b70 Add k8s set task container if using default image/arguments 2021-07-15 10:26:09 +03:00
allegroai
4c9410c5fe Fix auto mount SSH_AUTH_SOCK into docker (issue #45) 2021-07-11 09:44:49 +03:00
allegroai
382604e923 Fix services mode killing child processes when running in services mode + venv 2021-06-30 23:58:25 +03:00
allegroai
0e7546f248 Fix docker force pull in k8s glue _kubectl_apply() 2021-06-27 09:42:14 +03:00
allegroai
e3c8bd5666 Add support for agent.docker_force_pull configuration setting in k8s glue 2021-06-25 17:36:08 +03:00
allegroai
3ae1741343 Fix k8s glue task container arguments not supported in kubectl_run command
Fix k8s glue not passing required extra_docker_bash_script to string format
2021-06-25 17:35:01 +03:00
allegroai
53c106c3af Fix k8s glue task container handling fails parsing docker image
Fix k8s glue uses task container image arguments when no image is specified
2021-06-25 17:34:28 +03:00
allegroai
44fc7dffe6 Fix key/secret usage printout 2021-06-24 19:37:59 +03:00
allegroai
aaa6b32f9f Fix support for "-r requirements.txt" inside "installed packages" 2021-06-24 19:26:35 +03:00
allegroai
821a0c4a2b Fix parsing VCS links starting with "git+git@" (notice "git+git://" was already supported) 2021-06-24 19:25:41 +03:00
allegroai
176b4a4cde Fix --services-mode when the execute agent fails when starting to run with error code 0 2021-06-16 18:32:29 +03:00
allegroai
29bf993be7 Add printout when using key/secret from env vars 2021-06-02 21:15:48 +03:00
allegroai
eda597dea5 Version bump 2021-06-02 13:17:57 +03:00
allegroai
8c56777125 Add CLEARML_AGENT_DISABLE_SSH_MOUNT allowing disabling the auto .ssh mount into the docker 2021-06-02 13:16:58 +03:00
allegroai
7e90ebd5db Fix _dynamic_gpu_get_available worker timeout increase to 10 minutes 2021-06-02 13:16:17 +03:00
allegroai
3a07bfe1d7 Version bump 2021-05-31 23:19:46 +03:00
allegroai
742cbf5767 Add docker environment arguments log masking support (issue #67) 2021-05-25 19:31:45 +03:00
allegroai
e93384b99b Fix --stop with dynamic gpus 2021-05-20 10:58:46 +03:00
allegroai
3c4e976093 Add agent.ignore_requested_python_version to config file 2021-05-19 15:20:44 +03:00
allegroai
1e795beec8 Fix support for spaces in docker arguments (issue #358) 2021-05-19 15:20:03 +03:00
allegroai
4f7407084d Fix standalone script with pre-exiting conda venv 2021-05-12 15:46:25 +03:00
allegroai
ae3d034531 Protect against None in execution.repository 2021-05-12 15:45:31 +03:00
allegroai
a2db1f5ab5 Remove queue name from pod name in k8s glue, add queue name and ID to pod labels (issue #64) 2021-05-05 12:03:35 +03:00
allegroai
cec6420c8f Version bump to v1.0.0 2021-05-03 18:33:53 +03:00
allegroai
4f18bb7ea0 Add k8s glue default restartPolicy=Never to template to prevent pods from restarting 2021-04-28 13:20:13 +03:00
Revital
24dc59e31f add space to help message 2021-04-27 13:50:44 +03:00
allegroai
08ff5e6db7 Add number of pods limit to k8s glue 2021-04-25 10:47:49 +03:00
allegroai
e60a6f9d14 Fix --stop support for dynamic gpus 2021-04-25 10:46:43 +03:00
Revital
35e714d8d9 fix --downtime help 2021-04-21 09:13:47 +03:00
allegroai
6f8d5710d6 Fix dynamic gpus priority queue 2021-04-20 18:11:59 +03:00
allegroai
a671692832 Fix --services-mode with instance limit 2021-04-20 18:11:36 +03:00
allegroai
5c8675e43a Add support for dynamic gpus opportunistic scheduling (with min/max gpus per queue) 2021-04-20 18:11:16 +03:00
allegroai
60a58f6fad Fix poetry support (issue #57) 2021-04-14 11:22:07 +03:00
allegroai
537b67e0cd Fix agent can return non-zero error code and pods will end up restarting forever (issue #56) 2021-04-12 23:00:59 +03:00
allegroai
82c5e55fe4 Fix usage of not_set in k8s template merge 2021-04-07 21:30:13 +03:00
allegroai
945dd816ad Fix no docker arguments 2021-04-07 18:47:13 +03:00
allegroai
45009e6cc2 Add support for updating back docker on new API v2.13 2021-04-07 18:46:58 +03:00
allegroai
3774fa6abd Add support for new container base setup script feature 2021-04-07 18:46:14 +03:00
allegroai
e71e6865d2 Add agent.docker_install_opencv_libs (default: True) to enable auto opencv libs install for faster docker spin-up 2021-04-07 18:45:44 +03:00
allegroai
0e8f1528b1 Remove redundant py2 code 2021-04-07 18:44:59 +03:00
allegroai
c331babf51 Add stopping message on Task process termination
Fix --stop on dynamic gpus venv mode
2021-04-07 18:44:33 +03:00
allegroai
c59d268995 Fix venv cache crash on bad symbolic links 2021-04-07 18:44:11 +03:00
allegroai
9e9fcb0ba9 Add dynamic mode terminate dockers on sig_term 2021-04-07 18:43:44 +03:00
allegroai
f33e0b2f78 Verify docker command exists when running in docker mode 2021-04-07 18:42:27 +03:00
allegroai
0e4b99351f Add --stop support for dynamic gpus
Fix --stop mark tasks as aborted (not failed as before)
2021-04-07 18:42:10 +03:00
allegroai
81edd2860f Fix --dynamic-gpus should keep original queue priority order 2021-03-31 23:55:12 +03:00
allegroai
14ac584577 Support k8s glue container env vars merging 2021-03-31 23:53:58 +03:00
allegroai
9ce6baf074 Fix broken k8s glue docker args parsing
Fix empty env prevents override when merging template
2021-03-26 12:26:15 +03:00
allegroai
92a1e07b33 Fix local path replace back when using cache 2021-03-26 12:16:05 +03:00
allegroai
cb6bdece39 Fix cuda version from driver does not return minor version 2021-03-18 10:07:59 +02:00
allegroai
2ea38364bb Change the default conda channel order, so it pulls the correct pytorch 2021-03-18 10:07:58 +02:00
allegroai
cf6fdc0d81 Add support for PyJWT v2 2021-03-18 10:07:58 +02:00
allegroai
91eec99563 Add conda debug prints (--debug) 2021-03-18 10:07:58 +02:00
allegroai
d9b9b4984b Version bump to v0.17.2 2021-03-04 20:12:50 +02:00
allegroai
205f9dd816 Fix k8s glue does not pass docker environment variables
Remove deprecated flags
2021-03-03 15:07:06 +02:00
allegroai
9dfa1294e2 Add agent.enable_task_env set the OS environment based on the Environment section of the Task. 2021-02-28 19:47:44 +02:00
allegroai
f019905720 Fix venv cache support for local folders 2021-02-28 19:47:09 +02:00
allegroai
9c257858dd Fix venv cache support for local folders 2021-02-23 18:54:38 +02:00
allegroai
2006ab20dd Fix conda support for git+http links 2021-02-23 12:46:06 +02:00
allegroai
0caf31719c Fix venv caching always reinstall git repositories and local repositories 2021-02-23 12:45:34 +02:00
allegroai
5da7184276 Add agent.ignore_requested_python_version (control for multi python environments) 2021-02-23 12:45:00 +02:00
allegroai
50fccdab96 PEP8 2021-02-23 12:44:26 +02:00
allegroai
77d6ff6630 Fix docker mode without venvs cache dir 2021-02-17 00:04:07 +02:00
allegroai
58cb344ee6 Upgrade pynvml add detect CUDA version from driver level 2021-02-17 00:03:16 +02:00
allegroai
22d5892b12 Use shared git cache between multiple agents on the same machine 2021-02-14 13:49:29 +02:00
allegroai
f619969efc Add venvs_cache configuration 2021-02-14 13:48:57 +02:00
allegroai
ca242424ab Fix service-mode support for venvs
Fix --services-mode with venvs
2021-02-14 13:45:17 +02:00
allegroai
407deb84e9 Fix multi instances on Windows 2021-02-14 13:44:39 +02:00
allegroai
14589aa094 Fix CPU mode 2021-02-14 13:44:00 +02:00
allegroai
1260e3d942 Update cache entries on conda package manager 2021-02-11 14:47:26 +02:00
allegroai
b22d926d94 Fix cache to take cuda version into account 2021-02-11 14:47:05 +02:00
allegroai
410cc8c7be Add --dynamic-gpus and limit in --services-mode 2021-02-11 14:46:37 +02:00
allegroai
784c676f5b Fix "from clearml" runtime diff patching (make sure we move it to after all the __future__ imports) include handling triple quotes in comments 2021-02-11 14:46:06 +02:00
allegroai
296f7970df Fix file not found error (no 2) interpreted as aborted (i.e. ctrl-c) 2021-02-11 14:44:54 +02:00
allegroai
cd59933c9c Remove unused packages 2021-02-11 14:44:35 +02:00
allegroai
b95d3f5300 Add venv caching with docker mode support 2021-02-11 14:44:19 +02:00
allegroai
fa0d5d8469 Fix --detached not supported on Windows, ignore and issue warning 2021-02-11 14:40:09 +02:00
allegroai
8229843018 Add base-pod-number parameter to k8s glue and example 2021-01-26 20:00:18 +02:00
allegroai
c578b37c6d Change dump configuration and ssh on every docker run 2021-01-24 08:48:10 +02:00
allegroai
8ea062c0bd Fix environment variables CLEARML_WEB_HOST/CLEARML_FILES_HOST not passed to running tasks (or updated on the config object) 2021-01-24 08:47:33 +02:00
allegroai
5d8bbde434 Fix applying git diff on new added file 2021-01-24 08:46:42 +02:00
allegroai
0462af6a3d Allow providing namespace in k8s glue and k8s glue example 2021-01-20 19:01:03 +02:00
allegroai
161993f66f Add agent.force_git_ssh_user configuration value (issue #42)
Change default docker to nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
2021-01-10 12:38:45 +02:00
allegroai
b7f87fb8d3 Detect and delete "stuck" k8s pods k8s glue 2021-01-10 12:37:13 +02:00
allegroai
8fdb87f1f5 Fix docker --network returns None 2020-12-30 16:57:04 +02:00
allegroai
a1f2941ffd version bump 2020-12-25 02:10:06 +02:00
allegroai
428781af86 Fix support for Windows pip and Conda requirements.txt 2020-12-25 02:06:40 +02:00
allegroai
a455003c7f version bump 2020-12-23 00:13:51 +02:00
allegroai
b4d143812e initial clearml-agent v0.17.0 2020-12-22 23:00:57 +02:00
allegroai
6e1f74402e Rename trains-agent -> clearml-agent 2020-12-22 21:21:29 +02:00