Commit Graph

  • 7e8670d57f Find the correct python version when using a pre-installed python environment allegroai 2024-07-21 14:10:38 +03:00
  • 77de343863 Use "venv" module if virtualenv is not supported allegroai 2024-07-19 13:18:07 +03:00
  • 6b31883e45 Fix queue resolution when no queue is passed v1.8.1 allegroai 2024-05-15 18:30:24 +03:00
  • e48b4756fa Add Python 3.12 support allegroai 2024-05-15 18:25:29 +03:00
  • 47147e3237 Fix cached repositories were not passing user/token when pulling, agent.vcs_cache.clone_on_pull_fail now defaults to false allegroai 2024-04-19 23:50:17 +03:00
  • 41fc4ec646 Fix disabling vcs cache should not add vcs mount point to container allegroai 2024-04-19 23:48:50 +03:00
  • 441e5a73b2 Fix conda env should not be cached if installing into base conda or conda existing env exists allegroai 2024-04-19 23:48:10 +03:00
  • 27ed6821c4 Add mirrorD config files to gitignore allegroai 2024-04-19 23:47:34 +03:00
  • 10c6629982 Support skipping re-enqueue on suspected preempted k8s pods allegroai 2024-04-19 23:46:57 +03:00
  • 6fb48a4c6e Revert version to v1.8.1 allegroai 2024-04-19 23:44:31 +03:00
  • 105ade31f1 Version bump to v1.8.2 allegroai 2024-04-14 18:18:10 +03:00
  • 502e266b6b Fix polling interval missing when not using daemon mode allegroai 2024-04-14 18:17:57 +03:00
  • cd9a3b9f4e Version bump to v1.8.1 allegroai 2024-04-12 20:30:11 +03:00
  • 4179ac5234 Fix git pulling on cached invalid git entry. On error, re-clone the entire repo again (disable using "agent.vcs_cache.clone_on_pull_fail: false") allegroai 2024-04-12 20:29:36 +03:00
  • 98cc0d86ba Add option to set daemon polling interval (#197) Liron Ilouz 2024-04-03 14:33:52 +03:00
  • 293cbc0ac6 Version bump to v1.8.0 v1.8.0 allegroai 2024-04-02 16:38:22 +03:00
  • 4387ed73b6 Fix None handling when no limits exist allegroai 2024-04-02 16:36:09 +03:00
  • 43443ccf08 Pass task_id when resolving k8s template allegroai 2024-04-01 11:37:01 +03:00
  • 3d43240c8f Improve conda package manager support Add agent.package_manager.use_conda_base_env (CLEARML_USE_CONDA_BASE_ENV) allowing to use base conda environment (instead of installing a new one) Fix conda support for python packages with markers and multiple specifications Added "nvidia" conda channel and support for cuda-toolkit >= 12 allegroai 2024-04-01 11:36:26 +03:00
  • fc58ba947b Update requirements allegroai 2024-04-01 11:35:07 +03:00
  • 22672d2444 Improve GPU monitoring allegroai 2024-03-17 19:13:57 +02:00
  • 6a4fcda1bf Improve resource monitor allegroai 2024-03-17 19:06:57 +02:00
  • a4ebf8293d Fix role support allegroai 2024-03-17 19:00:59 +02:00
  • 10fb157d58 Fix queue handling for backwards compatibility allegroai 2024-03-17 19:00:18 +02:00
  • 56058beec2 Update deprecated references allegroai 2024-03-17 18:59:48 +02:00
  • 9f207d5155 Fix dynamic GPU sometimes misses the initial print - if we found the closing print it should be good enough to signal everything is okay allegroai 2024-03-17 18:59:04 +02:00
  • 8a2bea3c14 Fix comment lines (#) are not ignored in docker startup bash script allegroai 2024-03-17 18:58:14 +02:00
  • f1f9278928 Fix torch resolver settings applied to PytorchRequirement instance are not used allegroai 2024-03-17 18:56:47 +02:00
  • 2de1c926bf Use correct Python version in Poetry init (#179) nfzd 2024-03-11 22:36:10 +01:00
  • e1104e60bb Update README allegroai 2024-03-11 16:58:28 +02:00
  • 8b2970350c Fix FileNotFoundException crash in find_python_executable_for_version… (#192) ae-ae 2024-03-06 10:17:31 +03:00
  • a2758250b2 Fix queue handling in K8sIntegration and k8s_glue_example.py (#183) FeU-aKlos 2024-02-29 13:20:54 +01:00
  • 01e8ffd854 Improve venv cache handling: - Add FileLock readonly mode, default is write mode (i.e. exclusive lock, preserving behavior) - Add venv cache now uses readonly lock when copying folders from venv cache into target folder. This enables multiple read, single write operation - Do not lock the cache folder if we do not need to delete old entries allegroai 2024-02-29 14:19:24 +02:00
  • 74edf6aa36 Fix IOError on file lock when using shared folder allegroai 2024-02-29 14:16:25 +02:00
  • 09c5ef99af Fix Python 3.12 support by removing distutil imports allegroai 2024-02-29 14:12:21 +02:00
  • 17ae28a62f Add agent.venvs_cache.lock_timeout to control the venv cache folder lock timeout (in seconds, default 30) allegroai 2024-02-29 14:06:06 +02:00
  • 059a9385e9 Fix delete temp console pipe log files after Task execution is completed. This is important for long lasting services agents, avoiding collecting temp files on host machine allegroai 2024-02-29 14:03:30 +02:00
  • 9a321a410f Add CLEARML_AGENT_FORCE_TASK_INIT to allow runtime patching of script even if no repo is specified and the code is running a preinstalled docker allegroai 2024-02-29 14:02:27 +02:00
  • 919013d4fe Add CLEARML_AGENT_FORCE_POETRY to allow forcing poetry even when using pip requirements manager allegroai 2024-02-29 13:59:26 +02:00
  • 05530b712b Fix sanitization did not cover all keys allegroai 2024-02-29 13:56:14 +02:00
  • 8d15fd8798 Fix pippip is returned as a pip version if no value exists in agent.package_manager.pip_version allegroai 2024-02-29 13:53:32 +02:00
  • b34329934b Add queue ID report before pulling task allegroai 2024-02-29 13:52:17 +02:00
  • 85049d8705 Move configuration sanitization settings to the default config file allegroai 2024-02-29 13:51:40 +02:00
  • 6fbd70786e Add protection for truncate() call allegroai 2024-02-29 13:51:09 +02:00
  • 05a65548da Fix agent.enable_git_ask_pass does not show in configuration dump allegroai 2024-02-29 13:50:52 +02:00
  • 6657003d65 Fix using controller-uid will not always return required pods allegroai 2024-02-29 13:49:30 +02:00
  • 95dde6ca0c Update README allegroai 2024-01-25 11:27:56 +02:00
  • c9fc092f4e Support force_system_packages argument in k8s glue class v1.7.0 allegroai 2023-12-26 10:12:32 +02:00
  • 432ee395e1 Version bump to v1.7.0 allegroai 2023-12-20 18:08:38 +02:00
  • 98fc4f0fb9 Add agent.resource_monitoring.disk_use_path configuration option to allow monitoring a different volume than the one containing the home folder allegroai 2023-12-20 17:49:33 +02:00
  • 111e774c21 Add extra_index_url sanitization in configuration printout allegroai 2023-12-20 17:49:04 +02:00
  • 3dd8d783e1 Fix agent.git_host setting will cause git@domain URLs to not be replaced by SSH URLs since furl cannot parse them to obtain host allegroai 2023-12-20 17:48:18 +02:00
  • 7c3e420df4 Add git clone verbosity using CLEARML_AGENT_GIT_CLONE_VERBOSE env var allegroai 2023-12-20 17:47:52 +02:00
  • 55b065a114 Update GPU stats and pynvml support allegroai 2023-12-20 17:47:19 +02:00
  • faa97b6cc2 Set worker ID in k8s glue mode allegroai 2023-12-20 17:45:34 +02:00
  • f5861b1e4a Change default agent.enable_git_ask_pass to True allegroai 2023-12-20 17:44:41 +02:00
  • 030cbb69f1 Fix check if process return code is SIGKILL (-9 or 137) and abort callback was called, do not mark as failed but as aborted allegroai 2023-12-20 17:43:02 +02:00
  • 564f769ff7 Add agent.docker_args_extra_precedes_task, agent.protected_docker_extra_args to prevent the same switch to be used by both extra_docker_args and the a Task's docker args allegroai 2023-12-20 17:42:36 +02:00
  • 2c7f091e57 Update example (#177) pollfly 2023-12-09 12:52:44 +02:00
  • dd5d24b0ca Add CLEARML_AGENT_TEMP_STDOUT_FILE_DIR to allow specifying temp dir used for storing agent log files and temporary log files (daemon and execution) allegroai 2023-11-14 11:45:13 +02:00
  • 996bb797c3 Add env var in case we're running a service task allegroai 2023-11-14 11:44:36 +02:00
  • 9ad49a0d21 Fix KeyError if container does not contain the arguments field allegroai 2023-11-01 15:11:07 +02:00
  • ba4fee7b19 Fix agent.package_manager.poetry_install_extra_args are used in all Poetry commands and not just in install (#173) allegroai 2023-11-01 15:10:40 +02:00
  • 0131db8b7d Add support for resource_applied() callback in k8s glue Add support for sending log events with k8s-provided timestamps Refactor env vars infrastructure allegroai 2023-11-01 15:10:08 +02:00
  • d2384a9a95 Add example and support for prebuilt containers including services-mode support with overrides CLEARML_AGENT_FORCE_CODE_DIR CLEARML_AGENT_FORCE_EXEC_SCRIPT allegroai 2023-11-01 15:05:57 +02:00
  • 5b86c230c1 Fix an environment variable that should be set with a numerical value of 0 (i.e. end up as "0" or "0.0") is set to an empty string allegroai 2023-11-01 15:04:59 +02:00
  • 21e4be966f Fix recursion issue when deep-copying a session allegroai 2023-11-01 15:04:24 +02:00
  • 9c6cb421b3 When cleaning up pending pods, verify task is still aborted and pod is still pending before deleting the pod allegroai 2023-11-01 15:04:01 +02:00
  • 52405c343d Fix k8s glue configuration might be contaminated when changed during apply allegroai 2023-11-01 15:03:37 +02:00
  • 46f0c991c8 Add status reason when aborting before moving to k8s_scheduler queue allegroai 2023-11-01 15:02:24 +02:00
  • 0254279ed5 Version bump to v1.6.1 v1.6.1 allegroai 2023-09-06 15:41:29 +03:00
  • 0e1750f90e Fix requests library lower constraint breaks backwards compatibility allegroai 2023-09-06 15:40:48 +03:00
  • 58e0dc42ec Version bump to v1.6.0 v1.6.0 allegroai 2023-09-05 15:05:11 +03:00
  • d16825029d Add new pytorch no resolver mode and CLEARML_AGENT_PACKAGE_PYTORCH_RESOLVE to change resolver on a Task basis, now supports "pip", "direct", "none" allegroai 2023-09-02 17:45:10 +03:00
  • fb639afcb9 Fix PyTorch extra index pip resolver allegroai 2023-09-02 17:43:41 +03:00
  • eefb94d1bc Add Python 3.11 support allegroai 2023-09-02 17:42:27 +03:00
  • f1e9266075 Adjust docker image versions in a couple more places Alex Burlacu 2023-08-24 19:03:24 +03:00
  • e1e3c84a8d Update docker versions Alex Burlacu 2023-08-24 19:01:26 +03:00
  • ed1356976b Move extra configurations to Worker init to make sure all available configurations can be overridden Alex Burlacu 2023-08-24 19:00:36 +03:00
  • 2b815354e0 Improve file mode comment Alex Burlacu 2023-08-24 18:53:00 +03:00
  • edae380a9e Version bump Alex Burlacu 2023-08-24 18:51:47 +03:00
  • 946e9d9ce9 Fix invalid reference Alex Burlacu 2023-08-24 18:51:27 +03:00
  • a56343ffc7 Upgrade requests library (#162) jday1 2023-08-01 08:41:22 +01:00
  • 159a6e9a5a Fix runtime property overriding existing properties v1.6.0rc1 allegroai 2023-07-20 10:41:15 +03:00
  • 6b7ee12dc1 Edit README (#156) pollfly 2023-07-19 16:51:14 +03:00
  • 3838247716 Update k8s glue docker build resources allegroai 2023-07-19 16:47:50 +03:00
  • 6e7d35a42a Improve configuration files (#160) pollfly 2023-07-11 10:32:01 +03:00
  • 4c056a17b9 Add support for k8s jobs execution Strip docker container obtained from task in k8s apply v1.6.0rc0 allegroai 2023-07-04 14:45:00 +03:00
  • 21d98afca5 Add support for extra docker arguments referencing machines environment variables using the agent.docker_allow_host_environ configuration option to allow users to also be able to use $ENV in the task's docker arguments allegroai 2023-07-04 14:42:28 +03:00
  • 6a1bf11549 Fix Task docker arguments passed twice allegroai 2023-07-04 14:41:07 +03:00
  • 7115a9b9a7 Add CLEARML_EXTRA_PIP_INSTALL_FLAGS / agent.package_manager.extra_pip_install_flags to control additional pip install flags Fix pip version marking in "installed packages" is now preserved for and reinstalled allegroai 2023-07-04 14:39:40 +03:00
  • 450df2f8d3 Support skipping agent pip upgrade in container bash script using the CLEARML_AGENT_NO_UPDATE env var allegroai 2023-07-04 14:38:50 +03:00
  • ccf752c4e4 Add support for setting mode on files applied by the agent v1.5.3rc4 allegroai 2023-07-04 14:37:58 +03:00
  • 3ed63e2154 Fix docker container backwards compatibility for API <2.13 Fix default docker match rules resolver (used incorrect field "container" instead of "image") Remove "container" (image) match rule option from default docker image resolver allegroai 2023-07-04 14:37:18 +03:00
  • a535f93cd6 Add support for CLEARML_AGENT_FORCE_MAX_API_VERSION for testing allegroai 2023-07-04 14:35:54 +03:00
  • b380ec54c6 Improve config file comments allegroai 2023-07-04 14:34:43 +03:00
  • a1274299ce Add support for CLEARML_AGENT_EXTRA_DOCKER_LABELS env var allegroai 2023-07-03 11:08:59 +03:00
  • c77224af68 Add support for task field injection into container docker name allegroai 2023-07-03 11:07:12 +03:00
  • 95dadca45c Refactor k8s glue running/used pods getter v1.5.3rc3 allegroai 2023-05-21 22:56:12 +03:00
  • 685918fd9b Version bump to v1.5.3rc3 allegroai 2023-05-21 22:54:38 +03:00