Commit Graph

349 Commits

Author SHA1 Message Date
allegroai
6a4fcda1bf Improve resource monitor 2024-03-17 19:06:57 +02:00
allegroai
a4ebf8293d Fix role support 2024-03-17 19:00:59 +02:00
allegroai
10fb157d58 Fix queue handling for backwards compatibility 2024-03-17 19:00:18 +02:00
allegroai
56058beec2 Update deprecated references 2024-03-17 18:59:48 +02:00
allegroai
9f207d5155 Fix dynamic GPU sometimes misses the initial print - if we found the closing print it should be good enough to signal everything is okay 2024-03-17 18:59:04 +02:00
allegroai
8a2bea3c14 Fix comment lines (#) are not ignored in docker startup bash script 2024-03-17 18:58:14 +02:00
allegroai
f1f9278928 Fix torch resolver settings applied to PytorchRequirement instance are not used 2024-03-17 18:56:47 +02:00
nfzd
2de1c926bf
Use correct Python version in Poetry init (#179)
* Use correct Python version in Poetry init

* Use interpreter override if configured

* Don't use agent.python_binary if it is empty

---------

Co-authored-by: Michael Mueller <michael.mueller@wsa.com>
2024-03-11 23:36:10 +02:00
ae-ae
8b2970350c
Fix FileNotFoundException crash in find_python_executable_for_version… (#192)
* Fix FileNotFoundException crash in find_python_executable_for_version (#164)

* Add a Windows check for error 9009 when searching for Python

---------

Co-authored-by: 12037964+ae-ae@users.noreply.github.com 12037964+ae-ae@users.noreply.github.com <ae-ae>
2024-03-06 09:17:31 +02:00
FeU-aKlos
a2758250b2
Fix queue handling in K8sIntegration and k8s_glue_example.py (#183)
* Fix queue handling in K8sIntegration and k8s_glue_example.py

* Update Dockerfile and k8s_glue_example.py

* Add executable permission to provider_entrypoint.sh

* ADJUST docker

* Update clearml-agent version

* ADDJUST stuff

* ADJUST queue string handling

* DELETE pip install from own repo
2024-02-29 14:20:54 +02:00
allegroai
01e8ffd854 Improve venv cache handling:
- Add FileLock readonly mode, default is write mode (i.e. exclusive lock, preserving behavior)
- Add venv cache now uses readonly lock when copying folders from venv cache into target folder. This enables multiple read, single write operation
- Do not lock the cache folder if we do not need to delete old entries
2024-02-29 14:19:24 +02:00
allegroai
74edf6aa36 Fix IOError on file lock when using shared folder 2024-02-29 14:16:25 +02:00
allegroai
09c5ef99af Fix Python 3.12 support by removing distutil imports 2024-02-29 14:12:21 +02:00
allegroai
17ae28a62f Add agent.venvs_cache.lock_timeout to control the venv cache folder lock timeout (in seconds, default 30) 2024-02-29 14:06:06 +02:00
allegroai
059a9385e9 Fix delete temp console pipe log files after Task execution is completed. This is important for long lasting services agents, avoiding collecting temp files on host machine 2024-02-29 14:03:30 +02:00
allegroai
9a321a410f Add CLEARML_AGENT_FORCE_TASK_INIT to allow runtime patching of script even if no repo is specified and the code is running a preinstalled docker 2024-02-29 14:02:27 +02:00
allegroai
919013d4fe Add CLEARML_AGENT_FORCE_POETRY to allow forcing poetry even when using pip requirements manager 2024-02-29 13:59:26 +02:00
allegroai
05530b712b Fix sanitization did not cover all keys 2024-02-29 13:56:14 +02:00
allegroai
8d15fd8798 Fix pippip is returned as a pip version if no value exists in agent.package_manager.pip_version 2024-02-29 13:55:41 +02:00
allegroai
b34329934b Add queue ID report before pulling task 2024-02-29 13:52:17 +02:00
allegroai
85049d8705 Move configuration sanitization settings to the default config file 2024-02-29 13:51:40 +02:00
allegroai
6fbd70786e Add protection for truncate() call 2024-02-29 13:51:09 +02:00
allegroai
05a65548da Fix agent.enable_git_ask_pass does not show in configuration dump 2024-02-29 13:50:52 +02:00
allegroai
6657003d65 Fix using controller-uid will not always return required pods 2024-02-29 13:49:30 +02:00
allegroai
c9fc092f4e Support force_system_packages argument in k8s glue class 2023-12-26 10:12:32 +02:00
allegroai
432ee395e1 Version bump to v1.7.0 2023-12-20 18:08:38 +02:00
allegroai
98fc4f0fb9 Add agent.resource_monitoring.disk_use_path configuration option to allow monitoring a different volume than the one containing the home folder 2023-12-20 17:49:33 +02:00
allegroai
111e774c21 Add extra_index_url sanitization in configuration printout 2023-12-20 17:49:04 +02:00
allegroai
3dd8d783e1 Fix agent.git_host setting will cause git@domain URLs to not be replaced by SSH URLs since furl cannot parse them to obtain host 2023-12-20 17:48:18 +02:00
allegroai
7c3e420df4 Add git clone verbosity using CLEARML_AGENT_GIT_CLONE_VERBOSE env var 2023-12-20 17:47:52 +02:00
allegroai
55b065a114 Update GPU stats and pynvml support 2023-12-20 17:47:19 +02:00
allegroai
faa97b6cc2 Set worker ID in k8s glue mode 2023-12-20 17:45:34 +02:00
allegroai
f5861b1e4a Change default agent.enable_git_ask_pass to True 2023-12-20 17:44:41 +02:00
allegroai
030cbb69f1 Fix check if process return code is SIGKILL (-9 or 137) and abort callback was called, do not mark as failed but as aborted 2023-12-20 17:43:02 +02:00
allegroai
564f769ff7 Add agent.docker_args_extra_precedes_task, agent.protected_docker_extra_args
to prevent the same switch to be used by both `extra_docker_args` and the a Task's docker args
2023-12-20 17:42:36 +02:00
allegroai
dd5d24b0ca Add CLEARML_AGENT_TEMP_STDOUT_FILE_DIR to allow specifying temp dir used for storing agent log files and temporary log files (daemon and execution) 2023-11-14 11:45:13 +02:00
allegroai
996bb797c3 Add env var in case we're running a service task 2023-11-14 11:44:36 +02:00
allegroai
9ad49a0d21 Fix KeyError if container does not contain the arguments field 2023-11-01 15:11:07 +02:00
allegroai
ba4fee7b19 Fix agent.package_manager.poetry_install_extra_args are used in all Poetry commands and not just in install (#173) 2023-11-01 15:10:40 +02:00
allegroai
0131db8b7d Add support for resource_applied() callback in k8s glue
Add support for sending log events with k8s-provided timestamps
Refactor env vars infrastructure
2023-11-01 15:10:08 +02:00
allegroai
d2384a9a95 Add example and support for prebuilt containers including services-mode support with overrides CLEARML_AGENT_FORCE_CODE_DIR CLEARML_AGENT_FORCE_EXEC_SCRIPT 2023-11-01 15:05:57 +02:00
allegroai
5b86c230c1 Fix an environment variable that should be set with a numerical value of 0 (i.e. end up as "0" or "0.0") is set to an empty string 2023-11-01 15:04:59 +02:00
allegroai
21e4be966f Fix recursion issue when deep-copying a session 2023-11-01 15:04:24 +02:00
allegroai
9c6cb421b3 When cleaning up pending pods, verify task is still aborted and pod is still pending before deleting the pod 2023-11-01 15:04:01 +02:00
allegroai
52405c343d Fix k8s glue configuration might be contaminated when changed during apply 2023-11-01 15:03:37 +02:00
allegroai
46f0c991c8 Add status reason when aborting before moving to k8s_scheduler queue 2023-11-01 15:02:24 +02:00
allegroai
0254279ed5 Version bump to v1.6.1 2023-09-06 15:41:29 +03:00
allegroai
58e0dc42ec Version bump to v1.6.0 2023-09-05 15:05:11 +03:00
allegroai
d16825029d Add new pytorch no resolver mode and CLEARML_AGENT_PACKAGE_PYTORCH_RESOLVE to change resolver on a Task basis, now supports "pip", "direct", "none" 2023-09-02 17:45:10 +03:00
allegroai
fb639afcb9 Fix PyTorch extra index pip resolver 2023-09-02 17:43:41 +03:00