clearml
47d35ef48f
Fix managed python environment inside container (PEP 668) remove usr/lib/python3.*/EXTERNALLY-MANAGED
2024-12-26 18:59:42 +02:00
clearml
26d748a4d8
Support creating queue with tags
2024-12-12 23:40:57 +02:00
clearml
d8366dedc6
Fix UV priority
...
Fix UV cache is disabled, UV handles its own cache
Fix UV freeze
Fix make sure we do not use pip cache if poetry/uv is used (even if we reverted to pip we can't know if someone changed the repository and now in a new version, a lock file exists)
2024-12-12 23:38:42 +02:00
mads-oestergaard
cc656e2969
Add support for uv as package manager ( #218 )
...
* add uv as a package manager
* update configs
* update worker and defs
* update environ
* Update configs to highlight sync command
* rename to sync_extra_args and set UV_CACHE_DIR
2024-11-27 13:44:55 +02:00
clearml
b65e5fed94
Scan more Python 3 versions
2024-11-17 13:55:51 +02:00
clearml
9af0f9fe41
Fix reload method is found in the config object
2024-10-28 18:12:22 +02:00
clearml
205cd47cb9
Fix use req_token_expiration_sec when creating a task session and not the default value
2024-10-28 18:11:42 +02:00
Matteo Destro
bf8d9c96e9
Handle OSError when checking for is_file ( #215 )
2024-10-13 10:08:03 +03:00
allegroai
a88487ff25
Add support for pip legacy resolver for versions specified in the agent.package_manager.pip_legacy_resolver
configuration option
...
Add skip existing packages
2024-09-22 22:36:06 +03:00
allegroai
760bbca74e
Fix failed Task in services mode logged "User aborted" instead of failed, add Task reason string
2024-08-27 22:56:37 +03:00
allegroai
e63fd31420
Fix string format
2024-08-27 22:55:49 +03:00
allegroai
99e1e54f94
Add support for tasks containing only bash script or python module command
2024-08-27 22:53:14 +03:00
allegroai
a4d3b5bad6
Fix only set Task started status on node rank 0
2024-08-27 22:52:31 +03:00
allegroai
b21665ed6e
Fix do not cache venv cache if venv/python skip env var was set
2024-08-27 22:52:01 +03:00
allegroai
d9f2a1999a
Fix Only send pip freeze update on RANK 0, only update task status on exit on RANK 0
2024-07-29 17:40:24 +03:00
allegroai
6213ef4c02
Add /bin/bash -c "command" support. Task binary
should be set to /bin/bash
and entry_point should be set to -c command
2024-07-24 18:00:13 +03:00
allegroai
aef6aa9fc8
Fix a race condition where in rare conditions popping a Task from a queue that was aborted did not set it to started before the watchdog killed it. Does not happen in k8s/slurm
2024-07-24 17:59:46 +03:00
allegroai
0bb267115b
Add venvs_cache.path mount override for non-root containers (use: agent.docker_internal_mounts.venvs_cache)
2024-07-24 17:59:18 +03:00
allegroai
2f0553b873
Fix CLEARML_MULTI_NODE_SINGLE_TASK should be read once not every reported line
2024-07-24 17:45:02 +03:00
allegroai
b2a4bf08ac
Fix pass --docker only (i.e. no default container image) for --dynamic-gpus feature
2024-07-24 17:44:35 +03:00
allegroai
f18c6b809f
Fix slurm multi-node rank detection
2024-07-24 17:44:05 +03:00
allegroai
cd5b4d2186
Add "-m module args" in script entry now supports standalone script, standalone script is converted to "untitled.py" by default or if specified in working_dir such as <dir>:<target_file> for example ".:standalone.py"
2024-07-24 17:43:21 +03:00
allegroai
5f1bab6711
Add default docker match_rules for enterprise users,
...
NOTICE: matching_rules are ignored if `--docker container` is passed in command line
2024-07-24 17:42:55 +03:00
allegroai
ab9b9db0c9
Add CLEARML_MULTI_NODE_SINGLE_TASK (values -1, 0, 1, 2) for easier multi-node singe Task workloads
2024-07-24 17:42:25 +03:00
allegroai
93df021108
Add support for .ipynb script entry files (install nbconvert in runtime, copnvert to python and execute the python script), including CLEARML_AGENT_FORCE_TASK_INIT patching of ipynb files (post python conversion)
2024-07-24 17:41:59 +03:00
allegroai
ebc5944b44
Fix setting tasks that someone just marked as aborted to started - only force Task to started after dequeuing it otherwise lease it as is
2024-07-24 17:39:26 +03:00
allegroai
8f41002845
Add task.script.binary /bin/bash support
...
Fix -m module $env to support parsing the $env before launching
2024-07-24 17:37:26 +03:00
allegroai
7e8670d57f
Find the correct python version when using a pre-installed python environment
2024-07-21 14:10:38 +03:00
allegroai
41fc4ec646
Fix disabling vcs cache should not add vcs mount point to container
2024-04-19 23:48:50 +03:00
allegroai
502e266b6b
Fix polling interval missing when not using daemon mode
2024-04-14 18:17:57 +03:00
Liron Ilouz
98cc0d86ba
Add option to set daemon polling interval ( #197 )
...
* add option to set worker polling interval
* polling interval minimum value
---------
Co-authored-by: Liron <liron@tapwithus.com>
2024-04-03 14:33:52 +03:00
allegroai
6a4fcda1bf
Improve resource monitor
2024-03-17 19:06:57 +02:00
allegroai
a4ebf8293d
Fix role support
2024-03-17 19:00:59 +02:00
allegroai
9f207d5155
Fix dynamic GPU sometimes misses the initial print - if we found the closing print it should be good enough to signal everything is okay
2024-03-17 18:59:04 +02:00
allegroai
8a2bea3c14
Fix comment lines (#) are not ignored in docker startup bash script
2024-03-17 18:58:14 +02:00
nfzd
2de1c926bf
Use correct Python version in Poetry init ( #179 )
...
* Use correct Python version in Poetry init
* Use interpreter override if configured
* Don't use agent.python_binary if it is empty
---------
Co-authored-by: Michael Mueller <michael.mueller@wsa.com>
2024-03-11 23:36:10 +02:00
ae-ae
8b2970350c
Fix FileNotFoundException crash in find_python_executable_for_version… ( #192 )
...
* Fix FileNotFoundException crash in find_python_executable_for_version (#164 )
* Add a Windows check for error 9009 when searching for Python
---------
Co-authored-by: 12037964+ae-ae@users.noreply.github.com 12037964+ae-ae@users.noreply.github.com <ae-ae>
2024-03-06 09:17:31 +02:00
allegroai
01e8ffd854
Improve venv cache handling:
...
- Add FileLock readonly mode, default is write mode (i.e. exclusive lock, preserving behavior)
- Add venv cache now uses readonly lock when copying folders from venv cache into target folder. This enables multiple read, single write operation
- Do not lock the cache folder if we do not need to delete old entries
2024-02-29 14:19:24 +02:00
allegroai
09c5ef99af
Fix Python 3.12 support by removing distutil imports
2024-02-29 14:12:21 +02:00
allegroai
059a9385e9
Fix delete temp console pipe log files after Task execution is completed. This is important for long lasting services agents, avoiding collecting temp files on host machine
2024-02-29 14:03:30 +02:00
allegroai
9a321a410f
Add CLEARML_AGENT_FORCE_TASK_INIT to allow runtime patching of script even if no repo is specified and the code is running a preinstalled docker
2024-02-29 14:02:27 +02:00
allegroai
6fbd70786e
Add protection for truncate() call
2024-02-29 13:51:09 +02:00
allegroai
030cbb69f1
Fix check if process return code is SIGKILL (-9 or 137) and abort callback was called, do not mark as failed but as aborted
2023-12-20 17:43:02 +02:00
allegroai
564f769ff7
Add agent.docker_args_extra_precedes_task
, agent.protected_docker_extra_args
...
to prevent the same switch to be used by both `extra_docker_args` and the a Task's docker args
2023-12-20 17:42:36 +02:00
allegroai
dd5d24b0ca
Add CLEARML_AGENT_TEMP_STDOUT_FILE_DIR to allow specifying temp dir used for storing agent log files and temporary log files (daemon and execution)
2023-11-14 11:45:13 +02:00
allegroai
996bb797c3
Add env var in case we're running a service task
2023-11-14 11:44:36 +02:00
allegroai
0131db8b7d
Add support for resource_applied() callback in k8s glue
...
Add support for sending log events with k8s-provided timestamps
Refactor env vars infrastructure
2023-11-01 15:10:08 +02:00
allegroai
d2384a9a95
Add example and support for prebuilt containers including services-mode support with overrides CLEARML_AGENT_FORCE_CODE_DIR CLEARML_AGENT_FORCE_EXEC_SCRIPT
2023-11-01 15:05:57 +02:00
allegroai
21e4be966f
Fix recursion issue when deep-copying a session
2023-11-01 15:04:24 +02:00
allegroai
52405c343d
Fix k8s glue configuration might be contaminated when changed during apply
2023-11-01 15:03:37 +02:00