Commit Graph

  • 0bb267115b Add venvs_cache.path mount override for non-root containers (use: agent.docker_internal_mounts.venvs_cache) allegroai 2024-07-24 17:59:18 +0300
  • f89a92556f Fix check logger is not None allegroai 2024-07-24 17:55:02 +0300
  • 8ba4d75e80 Add CLEARML_TASK_ID and auth token to pod env vars in original entrypoint flow allegroai 2024-07-24 17:47:48 +0300
  • edc333ba5f Add K8S_GLUE_POD_USE_IMAGE_ENTRYPOINT to allow running images without overriding the entrypoint (useful for agents using prebuilt images in k8s) allegroai 2024-07-24 17:46:27 +0300
  • 2f0553b873 Fix CLEARML_MULTI_NODE_SINGLE_TASK should be read once not every reported line allegroai 2024-07-24 17:45:02 +0300
  • b2a4bf08ac Fix pass --docker only (i.e. no default container image) for --dynamic-gpus feature allegroai 2024-07-24 17:44:35 +0300
  • f18c6b809f Fix slurm multi-node rank detection allegroai 2024-07-24 17:44:05 +0300
  • cd5b4d2186 Add "-m module args" in script entry now supports standalone script, standalone script is converted to "untitled.py" by default or if specified in working_dir such as <dir>:<target_file> for example ".:standalone.py" allegroai 2024-07-24 17:43:21 +0300
  • 5f1bab6711 Add default docker match_rules for enterprise users, NOTICE: matching_rules are ignored if --docker container is passed in command line allegroai 2024-07-24 17:42:55 +0300
  • ab9b9db0c9 Add CLEARML_MULTI_NODE_SINGLE_TASK (values -1, 0, 1, 2) for easier multi-node singe Task workloads allegroai 2024-07-24 17:42:25 +0300
  • 93df021108 Add support for .ipynb script entry files (install nbconvert in runtime, copnvert to python and execute the python script), including CLEARML_AGENT_FORCE_TASK_INIT patching of ipynb files (post python conversion) allegroai 2024-07-24 17:41:59 +0300
  • 700ae85de0 Fix file mode should be optional in configuration files section allegroai 2024-07-24 17:41:06 +0300
  • f367c5a571 Fix git fetch did not update new tags #209 allegroai 2024-07-24 17:39:53 +0300
  • ebc5944b44 Fix setting tasks that someone just marked as aborted to started - only force Task to started after dequeuing it otherwise lease it as is allegroai 2024-07-24 17:39:26 +0300
  • 8f41002845 Add task.script.binary /bin/bash support Fix -m module $env to support parsing the $env before launching allegroai 2024-07-24 17:37:26 +0300
  • 7e8670d57f Find the correct python version when using a pre-installed python environment allegroai 2024-07-21 14:10:38 +0300
  • 77de343863 Use "venv" module if virtualenv is not supported allegroai 2024-07-19 13:18:07 +0300
  • f71f13629c Merge remote-tracking branch 'origin/master' revital 2024-07-03 13:33:54 +0300
  • 07beb13846 add queue priority comment revital 2024-07-03 13:33:11 +0300
  • 3e328d2461 Merge https://github.com/allegroai/clearml-agent revital 2024-07-03 13:20:19 +0300
  • 94a44c34c5 Add NO_DOCKER flag to clearml-agent-services entrypoint Valentin Schabschneider 2024-05-27 08:09:33 +0000
  • 6b31883e45 Fix queue resolution when no queue is passed v1.8.1 allegroai 2024-05-15 18:30:24 +0300
  • e48b4756fa Add Python 3.12 support allegroai 2024-05-15 18:25:29 +0300
  • 16dc08eebd Update Docker base image to Ubuntu 22.04 and Kubectl to 1.29.3 Surya Kasturi 2024-05-10 11:07:00 +0100
  • 47147e3237 Fix cached repositories were not passing user/token when pulling, agent.vcs_cache.clone_on_pull_fail now defaults to false allegroai 2024-04-19 23:50:17 +0300
  • 41fc4ec646 Fix disabling vcs cache should not add vcs mount point to container allegroai 2024-04-19 23:48:50 +0300
  • 441e5a73b2 Fix conda env should not be cached if installing into base conda or conda existing env exists allegroai 2024-04-19 23:48:10 +0300
  • 27ed6821c4 Add mirrorD config files to gitignore allegroai 2024-04-19 23:47:34 +0300
  • 10c6629982 Support skipping re-enqueue on suspected preempted k8s pods allegroai 2024-04-19 23:46:57 +0300
  • 6fb48a4c6e Revert version to v1.8.1 allegroai 2024-04-19 23:44:31 +0300
  • 105ade31f1 Version bump to v1.8.2 allegroai 2024-04-14 18:18:10 +0300
  • 502e266b6b Fix polling interval missing when not using daemon mode allegroai 2024-04-14 18:17:57 +0300
  • cd9a3b9f4e Version bump to v1.8.1 allegroai 2024-04-12 20:30:11 +0300
  • 4179ac5234 Fix git pulling on cached invalid git entry. On error, re-clone the entire repo again (disable using "agent.vcs_cache.clone_on_pull_fail: false") allegroai 2024-04-12 20:29:36 +0300
  • 98cc0d86ba
    Add option to set daemon polling interval (#197) Liron Ilouz 2024-04-03 14:33:52 +0300
  • 0c8748fd83 polling interval minimum value Liron 2024-04-03 12:46:29 +0300
  • 238290122d
    Merge branch 'allegroai:master' into master Liron Ilouz 2024-04-03 11:57:18 +0300
  • 293cbc0ac6 Version bump to v1.8.0 v1.8.0 allegroai 2024-04-02 16:38:22 +0300
  • 4387ed73b6 Fix None handling when no limits exist allegroai 2024-04-02 16:36:09 +0300
  • 05600504b6
    Merge branch 'allegroai:master' into master Liron Ilouz 2024-04-01 16:59:15 +0300
  • 43443ccf08 Pass task_id when resolving k8s template allegroai 2024-04-01 11:37:01 +0300
  • 3d43240c8f Improve conda package manager support Add agent.package_manager.use_conda_base_env (CLEARML_USE_CONDA_BASE_ENV) allowing to use base conda environment (instead of installing a new one) Fix conda support for python packages with markers and multiple specifications Added "nvidia" conda channel and support for cuda-toolkit >= 12 allegroai 2024-04-01 11:36:26 +0300
  • fc58ba947b Update requirements allegroai 2024-04-01 11:35:07 +0300
  • f9874252cb initial commit Meshcheryakov Ilya 2024-03-25 21:04:36 +0300
  • 22672d2444 Improve GPU monitoring allegroai 2024-03-17 19:13:57 +0200
  • 6a4fcda1bf Improve resource monitor allegroai 2024-03-17 19:06:57 +0200
  • a4ebf8293d Fix role support allegroai 2024-03-17 19:00:59 +0200
  • 10fb157d58 Fix queue handling for backwards compatibility allegroai 2024-03-17 19:00:18 +0200
  • 56058beec2 Update deprecated references allegroai 2024-03-17 18:59:48 +0200
  • 9f207d5155 Fix dynamic GPU sometimes misses the initial print - if we found the closing print it should be good enough to signal everything is okay allegroai 2024-03-17 18:59:04 +0200
  • 8a2bea3c14 Fix comment lines (#) are not ignored in docker startup bash script allegroai 2024-03-17 18:58:14 +0200
  • f1f9278928 Fix torch resolver settings applied to PytorchRequirement instance are not used allegroai 2024-03-17 18:56:47 +0200
  • 437daa4938 add option to set worker polling interval Liron 2024-03-14 14:03:49 +0200
  • 2de1c926bf
    Use correct Python version in Poetry init (#179) nfzd 2024-03-11 22:36:10 +0100
  • e1104e60bb Update README allegroai 2024-03-11 16:58:28 +0200
  • f0b083d8c5 Don't use agent.python_binary if it is empty Michael Mueller 2024-03-08 16:14:16 +0100
  • 4ce3fed3c3 Use interpreter override if configured Michael Mueller 2024-02-21 09:46:27 +0100
  • 8833184787 Use correct Python version in Poetry init Michael Mueller 2024-01-05 11:21:17 +0100
  • 8b2970350c
    Fix FileNotFoundException crash in find_python_executable_for_version… (#192) ae-ae 2024-03-06 10:17:31 +0300
  • 4f620fa089 Add a Windows check for error 9009 when searching for Python ae-ae 2024-03-01 17:49:10 +0300
  • a2758250b2
    Fix queue handling in K8sIntegration and k8s_glue_example.py (#183) FeU-aKlos 2024-02-29 13:20:54 +0100
  • 01e8ffd854 Improve venv cache handling: - Add FileLock readonly mode, default is write mode (i.e. exclusive lock, preserving behavior) - Add venv cache now uses readonly lock when copying folders from venv cache into target folder. This enables multiple read, single write operation - Do not lock the cache folder if we do not need to delete old entries allegroai 2024-02-29 14:19:24 +0200
  • 74edf6aa36 Fix IOError on file lock when using shared folder allegroai 2024-02-29 14:16:25 +0200
  • 09c5ef99af Fix Python 3.12 support by removing distutil imports allegroai 2024-02-29 14:12:21 +0200
  • 17ae28a62f Add agent.venvs_cache.lock_timeout to control the venv cache folder lock timeout (in seconds, default 30) allegroai 2024-02-29 14:06:06 +0200
  • 059a9385e9 Fix delete temp console pipe log files after Task execution is completed. This is important for long lasting services agents, avoiding collecting temp files on host machine allegroai 2024-02-29 14:03:30 +0200
  • 9a321a410f Add CLEARML_AGENT_FORCE_TASK_INIT to allow runtime patching of script even if no repo is specified and the code is running a preinstalled docker allegroai 2024-02-29 14:02:27 +0200
  • 919013d4fe Add CLEARML_AGENT_FORCE_POETRY to allow forcing poetry even when using pip requirements manager allegroai 2024-02-29 13:59:26 +0200
  • 05530b712b Fix sanitization did not cover all keys allegroai 2024-02-29 13:56:14 +0200
  • 8d15fd8798 Fix pippip is returned as a pip version if no value exists in agent.package_manager.pip_version allegroai 2024-02-29 13:53:32 +0200
  • b34329934b Add queue ID report before pulling task allegroai 2024-02-29 13:52:17 +0200
  • 85049d8705 Move configuration sanitization settings to the default config file allegroai 2024-02-29 13:51:40 +0200
  • 6fbd70786e Add protection for truncate() call allegroai 2024-02-29 13:51:09 +0200
  • 05a65548da Fix agent.enable_git_ask_pass does not show in configuration dump allegroai 2024-02-29 13:50:52 +0200
  • 6657003d65 Fix using controller-uid will not always return required pods allegroai 2024-02-29 13:49:30 +0200
  • 4f6f1e29d4 Fix FileNotFoundException crash in find_python_executable_for_version (#164) 12037964+ae-ae@users.noreply.github.com 12037964+ae-ae@users.noreply.github.com 2024-02-25 23:43:47 +0300
  • f94c13fbe1
    Merge branch 'master' into master KuldipN 2024-02-23 15:52:25 +0530
  • cf64f9ba7e
    Update requirements.txt for security changes. KuldipN 2024-02-21 15:04:08 +0530
  • 56dcacf339 DELETE pip install from own repo Andreas Klos 2024-02-09 11:58:15 +0100
  • 040b62e577 ADJUST queue string handling Andreas Klos 2024-02-09 11:16:56 +0100
  • 77cab6a569 ADDJUST stuff Andreas Klos 2024-02-07 23:03:10 +0100
  • c85eac0ce6 Update clearml-agent version Andreas Klos 2024-02-05 21:09:30 +0100
  • a4f2daa8ea ADJUST docker Andreas Klos 2024-02-05 21:02:36 +0100
  • 06e423258f Add executable permission to provider_entrypoint.sh Andreas Klos 2024-02-05 20:59:00 +0100
  • 53e54a8edc Update Dockerfile and k8s_glue_example.py Andreas Klos 2024-02-05 15:30:05 +0100
  • f640eb6707 Move ssh filter into regex and use list.insert in path Mads Oestergaard 2024-02-02 10:50:52 +0100
  • 04fa8a9a86 Fix ssh to http conversion for azure devops repos Mads Oestergaard 2024-01-29 14:39:22 +0100
  • b134aa9d31 Fix queue handling in K8sIntegration and k8s_glue_example.py Andreas Klos 2024-01-25 11:07:32 +0100
  • 95dde6ca0c Update README allegroai 2024-01-25 11:27:56 +0200
  • c9fc092f4e Support force_system_packages argument in k8s glue class v1.7.0 allegroai 2023-12-26 10:12:32 +0200
  • 432ee395e1 Version bump to v1.7.0 allegroai 2023-12-20 18:08:38 +0200
  • 98fc4f0fb9 Add agent.resource_monitoring.disk_use_path configuration option to allow monitoring a different volume than the one containing the home folder allegroai 2023-12-20 17:49:33 +0200
  • 111e774c21 Add extra_index_url sanitization in configuration printout allegroai 2023-12-20 17:49:04 +0200
  • 3dd8d783e1 Fix agent.git_host setting will cause git@domain URLs to not be replaced by SSH URLs since furl cannot parse them to obtain host allegroai 2023-12-20 17:48:18 +0200
  • 7c3e420df4 Add git clone verbosity using CLEARML_AGENT_GIT_CLONE_VERBOSE env var allegroai 2023-12-20 17:47:52 +0200
  • 55b065a114 Update GPU stats and pynvml support allegroai 2023-12-20 17:47:19 +0200
  • faa97b6cc2 Set worker ID in k8s glue mode allegroai 2023-12-20 17:45:34 +0200
  • f5861b1e4a Change default agent.enable_git_ask_pass to True allegroai 2023-12-20 17:44:41 +0200
  • 030cbb69f1 Fix check if process return code is SIGKILL (-9 or 137) and abort callback was called, do not mark as failed but as aborted allegroai 2023-12-20 17:43:02 +0200
  • 564f769ff7 Add agent.docker_args_extra_precedes_task, agent.protected_docker_extra_args to prevent the same switch to be used by both extra_docker_args and the a Task's docker args allegroai 2023-12-20 17:42:36 +0200