allegroai
6213ef4c02
Add /bin/bash -c "command" support. Task binary
should be set to /bin/bash
and entry_point should be set to -c command
2024-07-24 18:00:13 +03:00
allegroai
aef6aa9fc8
Fix a race condition where in rare conditions popping a Task from a queue that was aborted did not set it to started before the watchdog killed it. Does not happen in k8s/slurm
2024-07-24 17:59:46 +03:00
allegroai
0bb267115b
Add venvs_cache.path mount override for non-root containers (use: agent.docker_internal_mounts.venvs_cache)
2024-07-24 17:59:18 +03:00
allegroai
f89a92556f
Fix check logger is not None
2024-07-24 17:55:02 +03:00
allegroai
8ba4d75e80
Add CLEARML_TASK_ID and auth token to pod env vars in original entrypoint flow
2024-07-24 17:47:48 +03:00
allegroai
edc333ba5f
Add K8S_GLUE_POD_USE_IMAGE_ENTRYPOINT to allow running images without overriding the entrypoint (useful for agents using prebuilt images in k8s)
2024-07-24 17:46:27 +03:00
allegroai
2f0553b873
Fix CLEARML_MULTI_NODE_SINGLE_TASK should be read once not every reported line
2024-07-24 17:45:02 +03:00
allegroai
b2a4bf08ac
Fix pass --docker only (i.e. no default container image) for --dynamic-gpus feature
2024-07-24 17:44:35 +03:00
allegroai
f18c6b809f
Fix slurm multi-node rank detection
2024-07-24 17:44:05 +03:00
allegroai
cd5b4d2186
Add "-m module args" in script entry now supports standalone script, standalone script is converted to "untitled.py" by default or if specified in working_dir such as <dir>:<target_file> for example ".:standalone.py"
2024-07-24 17:43:21 +03:00
allegroai
5f1bab6711
Add default docker match_rules for enterprise users,
...
NOTICE: matching_rules are ignored if `--docker container` is passed in command line
2024-07-24 17:42:55 +03:00
allegroai
ab9b9db0c9
Add CLEARML_MULTI_NODE_SINGLE_TASK (values -1, 0, 1, 2) for easier multi-node singe Task workloads
2024-07-24 17:42:25 +03:00
allegroai
93df021108
Add support for .ipynb script entry files (install nbconvert in runtime, copnvert to python and execute the python script), including CLEARML_AGENT_FORCE_TASK_INIT patching of ipynb files (post python conversion)
2024-07-24 17:41:59 +03:00
allegroai
700ae85de0
Fix file mode should be optional in configuration files
section
2024-07-24 17:41:06 +03:00
allegroai
f367c5a571
Fix git fetch did not update new tags #209
2024-07-24 17:39:53 +03:00
allegroai
ebc5944b44
Fix setting tasks that someone just marked as aborted to started - only force Task to started after dequeuing it otherwise lease it as is
2024-07-24 17:39:26 +03:00
allegroai
8f41002845
Add task.script.binary /bin/bash support
...
Fix -m module $env to support parsing the $env before launching
2024-07-24 17:37:26 +03:00
allegroai
7e8670d57f
Find the correct python version when using a pre-installed python environment
2024-07-21 14:10:38 +03:00
allegroai
77de343863
Use "venv" module if virtualenv is not supported
2024-07-19 13:18:07 +03:00
allegroai
6b31883e45
Fix queue resolution when no queue is passed
2024-05-15 18:30:24 +03:00
allegroai
e48b4756fa
Add Python 3.12 support
2024-05-15 18:25:29 +03:00
allegroai
47147e3237
Fix cached repositories were not passing user/token when pulling, agent.vcs_cache.clone_on_pull_fail now defaults to false
2024-04-19 23:50:17 +03:00
allegroai
41fc4ec646
Fix disabling vcs cache should not add vcs mount point to container
2024-04-19 23:48:50 +03:00
allegroai
441e5a73b2
Fix conda env should not be cached if installing into base conda or conda existing env exists
2024-04-19 23:48:10 +03:00
allegroai
27ed6821c4
Add mirrorD config files to gitignore
2024-04-19 23:47:34 +03:00
allegroai
10c6629982
Support skipping re-enqueue on suspected preempted k8s pods
2024-04-19 23:46:57 +03:00
allegroai
6fb48a4c6e
Revert version to v1.8.1
2024-04-19 23:44:31 +03:00
allegroai
105ade31f1
Version bump to v1.8.2
2024-04-14 18:18:10 +03:00
allegroai
502e266b6b
Fix polling interval missing when not using daemon mode
2024-04-14 18:17:57 +03:00
allegroai
cd9a3b9f4e
Version bump to v1.8.1
2024-04-12 20:30:11 +03:00
allegroai
4179ac5234
Fix git pulling on cached invalid git entry. On error, re-clone the entire repo again (disable using "agent.vcs_cache.clone_on_pull_fail: false")
2024-04-12 20:29:36 +03:00
Liron Ilouz
98cc0d86ba
Add option to set daemon polling interval ( #197 )
...
* add option to set worker polling interval
* polling interval minimum value
---------
Co-authored-by: Liron <liron@tapwithus.com>
2024-04-03 14:33:52 +03:00
allegroai
293cbc0ac6
Version bump to v1.8.0
2024-04-02 16:38:22 +03:00
allegroai
4387ed73b6
Fix None handling when no limits exist
2024-04-02 16:36:09 +03:00
allegroai
43443ccf08
Pass task_id when resolving k8s template
2024-04-01 11:37:01 +03:00
allegroai
3d43240c8f
Improve conda package manager support
...
Add agent.package_manager.use_conda_base_env (CLEARML_USE_CONDA_BASE_ENV) allowing to use base conda environment (instead of installing a new one)
Fix conda support for python packages with markers and multiple specifications
Added "nvidia" conda channel and support for cuda-toolkit >= 12
2024-04-01 11:36:26 +03:00
allegroai
fc58ba947b
Update requirements
2024-04-01 11:35:07 +03:00
allegroai
22672d2444
Improve GPU monitoring
2024-03-17 19:13:57 +02:00
allegroai
6a4fcda1bf
Improve resource monitor
2024-03-17 19:06:57 +02:00
allegroai
a4ebf8293d
Fix role support
2024-03-17 19:00:59 +02:00
allegroai
10fb157d58
Fix queue handling for backwards compatibility
2024-03-17 19:00:18 +02:00
allegroai
56058beec2
Update deprecated references
2024-03-17 18:59:48 +02:00
allegroai
9f207d5155
Fix dynamic GPU sometimes misses the initial print - if we found the closing print it should be good enough to signal everything is okay
2024-03-17 18:59:04 +02:00
allegroai
8a2bea3c14
Fix comment lines (#) are not ignored in docker startup bash script
2024-03-17 18:58:14 +02:00
allegroai
f1f9278928
Fix torch resolver settings applied to PytorchRequirement instance are not used
2024-03-17 18:56:47 +02:00
nfzd
2de1c926bf
Use correct Python version in Poetry init ( #179 )
...
* Use correct Python version in Poetry init
* Use interpreter override if configured
* Don't use agent.python_binary if it is empty
---------
Co-authored-by: Michael Mueller <michael.mueller@wsa.com>
2024-03-11 23:36:10 +02:00
allegroai
e1104e60bb
Update README
2024-03-11 16:58:38 +02:00
ae-ae
8b2970350c
Fix FileNotFoundException crash in find_python_executable_for_version… ( #192 )
...
* Fix FileNotFoundException crash in find_python_executable_for_version (#164 )
* Add a Windows check for error 9009 when searching for Python
---------
Co-authored-by: 12037964+ae-ae@users.noreply.github.com 12037964+ae-ae@users.noreply.github.com <ae-ae>
2024-03-06 09:17:31 +02:00
FeU-aKlos
a2758250b2
Fix queue handling in K8sIntegration and k8s_glue_example.py ( #183 )
...
* Fix queue handling in K8sIntegration and k8s_glue_example.py
* Update Dockerfile and k8s_glue_example.py
* Add executable permission to provider_entrypoint.sh
* ADJUST docker
* Update clearml-agent version
* ADDJUST stuff
* ADJUST queue string handling
* DELETE pip install from own repo
2024-02-29 14:20:54 +02:00
allegroai
01e8ffd854
Improve venv cache handling:
...
- Add FileLock readonly mode, default is write mode (i.e. exclusive lock, preserving behavior)
- Add venv cache now uses readonly lock when copying folders from venv cache into target folder. This enables multiple read, single write operation
- Do not lock the cache folder if we do not need to delete old entries
2024-02-29 14:19:24 +02:00