clearml
fc1abbab0b
Refactor k8s glue
2024-12-26 18:58:00 +02:00
clearml
4fa61dde1f
Support ignoring kubectl errors
2024-12-12 23:41:31 +02:00
clearml
b65e5fed94
Scan more Python 3 versions
2024-11-17 13:55:51 +02:00
allegroai
6302d43990
Add support for skipping container apt installs using CLEARML_AGENT_SKIP_CONTAINER_APT env var in k8s
...
Add runtime callback support for setting runtime properties per task in k8s
Fix remove task from pending queue and set to failed when kubectl apply fails
2024-08-27 23:01:27 +03:00
allegroai
b8c762401b
Fix use same state transition if supported by the server (instead of stopping the task before re-enqueue)
2024-08-27 22:54:45 +03:00
allegroai
8ba4d75e80
Add CLEARML_TASK_ID and auth token to pod env vars in original entrypoint flow
2024-07-24 17:47:48 +03:00
allegroai
edc333ba5f
Add K8S_GLUE_POD_USE_IMAGE_ENTRYPOINT to allow running images without overriding the entrypoint (useful for agents using prebuilt images in k8s)
2024-07-24 17:46:27 +03:00
allegroai
10c6629982
Support skipping re-enqueue on suspected preempted k8s pods
2024-04-19 23:46:57 +03:00
allegroai
43443ccf08
Pass task_id when resolving k8s template
2024-04-01 11:37:01 +03:00
allegroai
10fb157d58
Fix queue handling for backwards compatibility
2024-03-17 19:00:18 +02:00
FeU-aKlos
a2758250b2
Fix queue handling in K8sIntegration and k8s_glue_example.py ( #183 )
...
* Fix queue handling in K8sIntegration and k8s_glue_example.py
* Update Dockerfile and k8s_glue_example.py
* Add executable permission to provider_entrypoint.sh
* ADJUST docker
* Update clearml-agent version
* ADDJUST stuff
* ADJUST queue string handling
* DELETE pip install from own repo
2024-02-29 14:20:54 +02:00
allegroai
b34329934b
Add queue ID report before pulling task
2024-02-29 13:52:17 +02:00
allegroai
6657003d65
Fix using controller-uid will not always return required pods
2024-02-29 13:49:30 +02:00
allegroai
c9fc092f4e
Support force_system_packages argument in k8s glue class
2023-12-26 10:12:32 +02:00
allegroai
55b065a114
Update GPU stats and pynvml support
2023-12-20 17:47:19 +02:00
allegroai
faa97b6cc2
Set worker ID in k8s glue mode
2023-12-20 17:45:34 +02:00
allegroai
9ad49a0d21
Fix KeyError if container does not contain the arguments field
2023-11-01 15:11:07 +02:00
allegroai
0131db8b7d
Add support for resource_applied() callback in k8s glue
...
Add support for sending log events with k8s-provided timestamps
Refactor env vars infrastructure
2023-11-01 15:10:08 +02:00
allegroai
9c6cb421b3
When cleaning up pending pods, verify task is still aborted and pod is still pending before deleting the pod
2023-11-01 15:04:01 +02:00
allegroai
46f0c991c8
Add status reason when aborting before moving to k8s_scheduler queue
2023-11-01 15:02:24 +02:00
Alex Burlacu
946e9d9ce9
Fix invalid reference
2023-08-24 18:51:27 +03:00
allegroai
4c056a17b9
Add support for k8s jobs execution
...
Strip docker container obtained from task in k8s apply
2023-07-04 14:45:00 +03:00
allegroai
450df2f8d3
Support skipping agent pip upgrade in container bash script using the CLEARML_AGENT_NO_UPDATE env var
2023-07-04 14:38:50 +03:00
allegroai
95dadca45c
Refactor k8s glue running/used pods getter
2023-05-21 22:56:12 +03:00
allegroai
7f5b3c8df4
Fix None config file in session causes k8s agent to raise exception
2023-03-28 14:33:55 +03:00
allegroai
40456be948
Black formatting
...
Refactor path support
2023-03-05 18:05:00 +02:00
allegroai
4f17a2c17d
Fix K8s glue does not delete pending pods if the tasks they represent were aborted
2023-02-05 10:32:16 +02:00
allegroai
af6a77918f
Fix _
is allowed in k8s label names
2023-02-05 10:29:48 +02:00
allegroai
b2da639582
Add CLEARML_AGENT_FORCE_SYSTEM_SITE_PACKAGES
env var (default true) to allow overriding default "system_site_packages: true" behavior when running tasks in containers (docker mode and k8s-glue)
2022-12-10 20:00:46 +02:00
allegroai
dc5e0033c8
Remove support for kubectl run
...
Allow customizing pod name prefix and limit pod label
Return deleted pods from cleanup
Some refactoring
2022-12-05 11:40:19 +02:00
allegroai
3dd5973734
Filter by phase when detecting hanging pods
...
More debug print-outs
Use task session when possible
Push task into k8s scheduler queue only if running from the same tenant
Make sure we pass git_user/pass to the task pod
Fix cleanup command not issued when no pods exist in a multi-queue setup
2022-12-05 11:29:59 +02:00
allegroai
6e7fb5f331
Fix sending task logs fails when agent is not running in the same tenant
2022-12-05 11:19:14 +02:00
allegroai
d794b047be
Fix system_site_packages is not turned on in k8s glue
2022-10-23 13:03:59 +03:00
allegroai
857a750eb1
Fix GCP load balancer not fwd GET request body, allow to change default request Action to Put/Post/Get. see api.http.default_method or CLEARML_API_DEFAULT_REQ_METHOD
2022-09-15 20:15:42 +03:00
allegroai
26aa50f1b5
Fix k8s glue extra_bash_init_cmd location in initial bash script
2022-09-02 23:50:03 +03:00
allegroai
8b4f1eefc2
Add more debug printouts in k8s glue
2022-09-02 23:49:28 +03:00
allegroai
97c2e21dcc
Fix resolving k8s pending queue may cause a queue with a uuid name to be created
2022-09-02 23:49:28 +03:00
allegroai
7292263f86
Add CLEARML_K8S_GLUE_START_AGENT_SCRIPT_PATH to allow customizing the agent startup script location for k8s glue agent
2022-08-23 23:16:36 +03:00
allegroai
820ab4dc0c
Fix k8s glue debug mode, refactoring
2022-08-01 18:55:49 +03:00
allegroai
d96b8ff906
Fix template namespace should override default namespace
2022-07-22 22:44:32 +03:00
allegroai
e687418194
Refactor k8s glue template handling
2022-07-22 22:43:07 +03:00
allegroai
2e5298b737
Add support for use-owner-token in k8s glue
2022-04-27 14:59:27 +03:00
allegroai
4c120d7cd0
Add ability to override container LOCAL_PYTHON, add auto python support (max 3.15)
2022-03-24 21:58:07 +02:00
allegroai
cd046927f3
Add k8s glue update task status_message in hanging pods daemon
...
Fix k8s glue not throwing error when failing to push to queue
2021-08-02 22:59:31 +03:00
allegroai
42606d9247
Fix multiple k8s glue instances with pod limits
...
Version bump
2021-07-15 10:28:43 +03:00
allegroai
499b3dfa66
Fix k8s glue, do not reset Task before re-enqueuing as it will remove runtime properties
2021-07-15 10:27:54 +03:00
allegroai
ca360b7d43
Improve max pod limit check
2021-07-15 10:26:49 +03:00
allegroai
6470b16b70
Add k8s set task container if using default image/arguments
2021-07-15 10:26:09 +03:00
allegroai
0e7546f248
Fix docker force pull in k8s glue _kubectl_apply()
2021-06-27 09:42:14 +03:00
allegroai
e3c8bd5666
Add support for agent.docker_force_pull configuration setting in k8s glue
2021-06-25 17:36:08 +03:00