Commit Graph

170 Commits

Author SHA1 Message Date
allegroai
0bb267115b Add venvs_cache.path mount override for non-root containers (use: agent.docker_internal_mounts.venvs_cache) 2024-07-24 17:59:18 +03:00
allegroai
2f0553b873 Fix CLEARML_MULTI_NODE_SINGLE_TASK should be read once not every reported line 2024-07-24 17:45:02 +03:00
allegroai
b2a4bf08ac Fix pass --docker only (i.e. no default container image) for --dynamic-gpus feature 2024-07-24 17:44:35 +03:00
allegroai
f18c6b809f Fix slurm multi-node rank detection 2024-07-24 17:44:05 +03:00
allegroai
cd5b4d2186 Add "-m module args" in script entry now supports standalone script, standalone script is converted to "untitled.py" by default or if specified in working_dir such as <dir>:<target_file> for example ".:standalone.py" 2024-07-24 17:43:21 +03:00
allegroai
5f1bab6711 Add default docker match_rules for enterprise users,
NOTICE: matching_rules are ignored if `--docker container` is passed in command line
2024-07-24 17:42:55 +03:00
allegroai
ab9b9db0c9 Add CLEARML_MULTI_NODE_SINGLE_TASK (values -1, 0, 1, 2) for easier multi-node singe Task workloads 2024-07-24 17:42:25 +03:00
allegroai
93df021108 Add support for .ipynb script entry files (install nbconvert in runtime, copnvert to python and execute the python script), including CLEARML_AGENT_FORCE_TASK_INIT patching of ipynb files (post python conversion) 2024-07-24 17:41:59 +03:00
allegroai
ebc5944b44 Fix setting tasks that someone just marked as aborted to started - only force Task to started after dequeuing it otherwise lease it as is 2024-07-24 17:39:26 +03:00
allegroai
8f41002845 Add task.script.binary /bin/bash support
Fix -m module $env to support parsing the $env before launching
2024-07-24 17:37:26 +03:00
allegroai
7e8670d57f Find the correct python version when using a pre-installed python environment 2024-07-21 14:10:38 +03:00
allegroai
41fc4ec646 Fix disabling vcs cache should not add vcs mount point to container 2024-04-19 23:48:50 +03:00
allegroai
502e266b6b Fix polling interval missing when not using daemon mode 2024-04-14 18:17:57 +03:00
Liron Ilouz
98cc0d86ba
Add option to set daemon polling interval (#197)
* add option to set worker polling interval

* polling interval minimum value

---------

Co-authored-by: Liron <liron@tapwithus.com>
2024-04-03 14:33:52 +03:00
allegroai
6a4fcda1bf Improve resource monitor 2024-03-17 19:06:57 +02:00
allegroai
a4ebf8293d Fix role support 2024-03-17 19:00:59 +02:00
allegroai
9f207d5155 Fix dynamic GPU sometimes misses the initial print - if we found the closing print it should be good enough to signal everything is okay 2024-03-17 18:59:04 +02:00
allegroai
8a2bea3c14 Fix comment lines (#) are not ignored in docker startup bash script 2024-03-17 18:58:14 +02:00
nfzd
2de1c926bf
Use correct Python version in Poetry init (#179)
* Use correct Python version in Poetry init

* Use interpreter override if configured

* Don't use agent.python_binary if it is empty

---------

Co-authored-by: Michael Mueller <michael.mueller@wsa.com>
2024-03-11 23:36:10 +02:00
ae-ae
8b2970350c
Fix FileNotFoundException crash in find_python_executable_for_version… (#192)
* Fix FileNotFoundException crash in find_python_executable_for_version (#164)

* Add a Windows check for error 9009 when searching for Python

---------

Co-authored-by: 12037964+ae-ae@users.noreply.github.com 12037964+ae-ae@users.noreply.github.com <ae-ae>
2024-03-06 09:17:31 +02:00
allegroai
01e8ffd854 Improve venv cache handling:
- Add FileLock readonly mode, default is write mode (i.e. exclusive lock, preserving behavior)
- Add venv cache now uses readonly lock when copying folders from venv cache into target folder. This enables multiple read, single write operation
- Do not lock the cache folder if we do not need to delete old entries
2024-02-29 14:19:24 +02:00
allegroai
09c5ef99af Fix Python 3.12 support by removing distutil imports 2024-02-29 14:12:21 +02:00
allegroai
059a9385e9 Fix delete temp console pipe log files after Task execution is completed. This is important for long lasting services agents, avoiding collecting temp files on host machine 2024-02-29 14:03:30 +02:00
allegroai
9a321a410f Add CLEARML_AGENT_FORCE_TASK_INIT to allow runtime patching of script even if no repo is specified and the code is running a preinstalled docker 2024-02-29 14:02:27 +02:00
allegroai
6fbd70786e Add protection for truncate() call 2024-02-29 13:51:09 +02:00
allegroai
030cbb69f1 Fix check if process return code is SIGKILL (-9 or 137) and abort callback was called, do not mark as failed but as aborted 2023-12-20 17:43:02 +02:00
allegroai
564f769ff7 Add agent.docker_args_extra_precedes_task, agent.protected_docker_extra_args
to prevent the same switch to be used by both `extra_docker_args` and the a Task's docker args
2023-12-20 17:42:36 +02:00
allegroai
dd5d24b0ca Add CLEARML_AGENT_TEMP_STDOUT_FILE_DIR to allow specifying temp dir used for storing agent log files and temporary log files (daemon and execution) 2023-11-14 11:45:13 +02:00
allegroai
996bb797c3 Add env var in case we're running a service task 2023-11-14 11:44:36 +02:00
allegroai
0131db8b7d Add support for resource_applied() callback in k8s glue
Add support for sending log events with k8s-provided timestamps
Refactor env vars infrastructure
2023-11-01 15:10:08 +02:00
allegroai
d2384a9a95 Add example and support for prebuilt containers including services-mode support with overrides CLEARML_AGENT_FORCE_CODE_DIR CLEARML_AGENT_FORCE_EXEC_SCRIPT 2023-11-01 15:05:57 +02:00
allegroai
21e4be966f Fix recursion issue when deep-copying a session 2023-11-01 15:04:24 +02:00
allegroai
52405c343d Fix k8s glue configuration might be contaminated when changed during apply 2023-11-01 15:03:37 +02:00
Alex Burlacu
ed1356976b Move extra configurations to Worker init to make sure all available configurations can be overridden 2023-08-24 19:00:36 +03:00
allegroai
159a6e9a5a Fix runtime property overriding existing properties 2023-07-20 10:41:15 +03:00
allegroai
4c056a17b9 Add support for k8s jobs execution
Strip docker container obtained from task in k8s apply
2023-07-04 14:45:00 +03:00
allegroai
21d98afca5 Add support for extra docker arguments referencing machines environment variables using the agent.docker_allow_host_environ configuration option to allow users to also be able to use $ENV in the task's docker arguments 2023-07-04 14:42:28 +03:00
allegroai
6a1bf11549 Fix Task docker arguments passed twice 2023-07-04 14:41:07 +03:00
allegroai
3ed63e2154 Fix docker container backwards compatibility for API <2.13
Fix default docker match rules resolver (used incorrect field "container" instead of "image")
Remove "container" (image) match rule option from default docker image resolver
2023-07-04 14:37:18 +03:00
allegroai
a1274299ce Add support for CLEARML_AGENT_EXTRA_DOCKER_LABELS env var 2023-07-03 11:08:59 +03:00
allegroai
c77224af68 Add support for task field injection into container docker name 2023-07-03 11:07:12 +03:00
allegroai
fec0ce1756 Better message for agent init when an existing clearml.conf is found 2023-05-21 22:51:11 +03:00
allegroai
1e09b88b7a Add alias CLEARML_AGENT_DOCKER_AGENT_REPO env var for the FORCE_CLEARML_AGENT_REPO env var 2023-05-21 22:50:01 +03:00
allegroai
ebb6231f5a Add CLEARML_AGENT_STANDALONE_CONFIG_BC to support backwards compatibility in standalone mode 2023-05-11 16:15:06 +03:00
allegroai
787c7d88bb Fix additional poetry cwd support feature 2023-03-28 14:35:41 +03:00
allegroai
46ded2864d Fix restart feature should be tested against agent session 2023-03-28 14:33:33 +03:00
allegroai
40456be948 Black formatting
Refactor path support
2023-03-05 18:05:00 +02:00
Niels ten Boom
3cedc104df
Add poetry cwd support (#142)
Closes #138
2023-03-05 14:19:57 +02:00
allegroai
95e996bfda Reintroduce CLEARML_AGENT_SERVICES_DOCKER_RESTART accidentally reverted by a previous merge 2023-02-05 10:34:38 +02:00
allegroai
b6d132b226 Fix build fails when target is relative path 2023-02-05 10:33:32 +02:00