clearml
66494c598d
Fix on_abort bash callback, if main processes leave while on_abort callback is running, wait for the on_abort to complete
2025-02-24 13:46:55 +02:00
clearml
0e2657421f
Added CLEARML_AGENT_ABORT_CALLBACK_CMD and CLEARML_AGENT_ABORT_CALLBACK_TIMEOUT
...
(default 180 sec) to define callback command to be called on abort status change
2025-02-24 13:46:00 +02:00
clearml
ee286e2fb7
Fix container default arguments should never be a list
2025-02-24 13:44:52 +02:00
clearml
d87521c36c
Add support for container rulebook overrides ('force_container_rules: true') and container rulebook task update ('update_back_task: true').
...
This addition allows users to override container arguments forcefully based on the tasks properties (repo, tags, project, user etc.), as well as offer additional defaults based on python required packages or python versions
2025-02-24 13:44:26 +02:00
clearml
8887453328
Cleanup error prints on bash startup script
2025-02-24 13:42:37 +02:00
clearml
8d3cb34390
Add default support for dns i.e. rocky/centos/fedora containers
2025-02-24 13:41:32 +02:00
clearml
4f91c45d38
Fix untitled file based on binary is now py/sh based on requested binary
2025-02-24 13:29:56 +02:00
clearml
c9f5b3d19a
Force the stop command to avoid a potential race
2025-02-24 13:26:26 +02:00
clearml
d32b82cb01
Integrate docker port mapping, to control non network=host port mapping, including port reassigning for multiple running agents on the same machine
2025-02-24 13:24:58 +02:00
clearml
8f28d2882a
Fix pip freeze dump to comply with yaml fancy print
2025-02-24 13:23:50 +02:00
clearml
546ffff95d
Fix cached venv tried to reinstall priority packages even through they are preinstalled
2025-02-24 13:23:00 +02:00
clearml
28e9280a4f
Reduce required packages
2025-01-26 23:03:16 +02:00
clearml
47d35ef48f
Fix managed python environment inside container (PEP 668) remove usr/lib/python3.*/EXTERNALLY-MANAGED
2024-12-26 18:59:42 +02:00
clearml
26d748a4d8
Support creating queue with tags
2024-12-12 23:40:57 +02:00
clearml
d8366dedc6
Fix UV priority
...
Fix UV cache is disabled, UV handles its own cache
Fix UV freeze
Fix make sure we do not use pip cache if poetry/uv is used (even if we reverted to pip we can't know if someone changed the repository and now in a new version, a lock file exists)
2024-12-12 23:38:42 +02:00
mads-oestergaard
cc656e2969
Add support for uv as package manager ( #218 )
...
* add uv as a package manager
* update configs
* update worker and defs
* update environ
* Update configs to highlight sync command
* rename to sync_extra_args and set UV_CACHE_DIR
2024-11-27 13:44:55 +02:00
clearml
b65e5fed94
Scan more Python 3 versions
2024-11-17 13:55:51 +02:00
clearml
9af0f9fe41
Fix reload method is found in the config object
2024-10-28 18:12:22 +02:00
clearml
205cd47cb9
Fix use req_token_expiration_sec when creating a task session and not the default value
2024-10-28 18:11:42 +02:00
Matteo Destro
bf8d9c96e9
Handle OSError when checking for is_file ( #215 )
2024-10-13 10:08:03 +03:00
allegroai
a88487ff25
Add support for pip legacy resolver for versions specified in the agent.package_manager.pip_legacy_resolver
configuration option
...
Add skip existing packages
2024-09-22 22:36:06 +03:00
allegroai
760bbca74e
Fix failed Task in services mode logged "User aborted" instead of failed, add Task reason string
2024-08-27 22:56:37 +03:00
allegroai
e63fd31420
Fix string format
2024-08-27 22:55:49 +03:00
allegroai
99e1e54f94
Add support for tasks containing only bash script or python module command
2024-08-27 22:53:14 +03:00
allegroai
a4d3b5bad6
Fix only set Task started status on node rank 0
2024-08-27 22:52:31 +03:00
allegroai
b21665ed6e
Fix do not cache venv cache if venv/python skip env var was set
2024-08-27 22:52:01 +03:00
allegroai
d9f2a1999a
Fix Only send pip freeze update on RANK 0, only update task status on exit on RANK 0
2024-07-29 17:40:24 +03:00
allegroai
6213ef4c02
Add /bin/bash -c "command" support. Task binary
should be set to /bin/bash
and entry_point should be set to -c command
2024-07-24 18:00:13 +03:00
allegroai
aef6aa9fc8
Fix a race condition where in rare conditions popping a Task from a queue that was aborted did not set it to started before the watchdog killed it. Does not happen in k8s/slurm
2024-07-24 17:59:46 +03:00
allegroai
0bb267115b
Add venvs_cache.path mount override for non-root containers (use: agent.docker_internal_mounts.venvs_cache)
2024-07-24 17:59:18 +03:00
allegroai
2f0553b873
Fix CLEARML_MULTI_NODE_SINGLE_TASK should be read once not every reported line
2024-07-24 17:45:02 +03:00
allegroai
b2a4bf08ac
Fix pass --docker only (i.e. no default container image) for --dynamic-gpus feature
2024-07-24 17:44:35 +03:00
allegroai
f18c6b809f
Fix slurm multi-node rank detection
2024-07-24 17:44:05 +03:00
allegroai
cd5b4d2186
Add "-m module args" in script entry now supports standalone script, standalone script is converted to "untitled.py" by default or if specified in working_dir such as <dir>:<target_file> for example ".:standalone.py"
2024-07-24 17:43:21 +03:00
allegroai
5f1bab6711
Add default docker match_rules for enterprise users,
...
NOTICE: matching_rules are ignored if `--docker container` is passed in command line
2024-07-24 17:42:55 +03:00
allegroai
ab9b9db0c9
Add CLEARML_MULTI_NODE_SINGLE_TASK (values -1, 0, 1, 2) for easier multi-node singe Task workloads
2024-07-24 17:42:25 +03:00
allegroai
93df021108
Add support for .ipynb script entry files (install nbconvert in runtime, copnvert to python and execute the python script), including CLEARML_AGENT_FORCE_TASK_INIT patching of ipynb files (post python conversion)
2024-07-24 17:41:59 +03:00
allegroai
ebc5944b44
Fix setting tasks that someone just marked as aborted to started - only force Task to started after dequeuing it otherwise lease it as is
2024-07-24 17:39:26 +03:00
allegroai
8f41002845
Add task.script.binary /bin/bash support
...
Fix -m module $env to support parsing the $env before launching
2024-07-24 17:37:26 +03:00
allegroai
7e8670d57f
Find the correct python version when using a pre-installed python environment
2024-07-21 14:10:38 +03:00
allegroai
41fc4ec646
Fix disabling vcs cache should not add vcs mount point to container
2024-04-19 23:48:50 +03:00
allegroai
502e266b6b
Fix polling interval missing when not using daemon mode
2024-04-14 18:17:57 +03:00
Liron Ilouz
98cc0d86ba
Add option to set daemon polling interval ( #197 )
...
* add option to set worker polling interval
* polling interval minimum value
---------
Co-authored-by: Liron <liron@tapwithus.com>
2024-04-03 14:33:52 +03:00
allegroai
6a4fcda1bf
Improve resource monitor
2024-03-17 19:06:57 +02:00
allegroai
a4ebf8293d
Fix role support
2024-03-17 19:00:59 +02:00
allegroai
9f207d5155
Fix dynamic GPU sometimes misses the initial print - if we found the closing print it should be good enough to signal everything is okay
2024-03-17 18:59:04 +02:00
allegroai
8a2bea3c14
Fix comment lines (#) are not ignored in docker startup bash script
2024-03-17 18:58:14 +02:00
nfzd
2de1c926bf
Use correct Python version in Poetry init ( #179 )
...
* Use correct Python version in Poetry init
* Use interpreter override if configured
* Don't use agent.python_binary if it is empty
---------
Co-authored-by: Michael Mueller <michael.mueller@wsa.com>
2024-03-11 23:36:10 +02:00
ae-ae
8b2970350c
Fix FileNotFoundException crash in find_python_executable_for_version… ( #192 )
...
* Fix FileNotFoundException crash in find_python_executable_for_version (#164 )
* Add a Windows check for error 9009 when searching for Python
---------
Co-authored-by: 12037964+ae-ae@users.noreply.github.com 12037964+ae-ae@users.noreply.github.com <ae-ae>
2024-03-06 09:17:31 +02:00
allegroai
01e8ffd854
Improve venv cache handling:
...
- Add FileLock readonly mode, default is write mode (i.e. exclusive lock, preserving behavior)
- Add venv cache now uses readonly lock when copying folders from venv cache into target folder. This enables multiple read, single write operation
- Do not lock the cache folder if we do not need to delete old entries
2024-02-29 14:19:24 +02:00