allegroai
10c6629982
Support skipping re-enqueue on suspected preempted k8s pods
2024-04-19 23:46:57 +03:00
allegroai
43443ccf08
Pass task_id when resolving k8s template
2024-04-01 11:37:01 +03:00
allegroai
10fb157d58
Fix queue handling for backwards compatibility
2024-03-17 19:00:18 +02:00
FeU-aKlos
a2758250b2
Fix queue handling in K8sIntegration and k8s_glue_example.py ( #183 )
...
* Fix queue handling in K8sIntegration and k8s_glue_example.py
* Update Dockerfile and k8s_glue_example.py
* Add executable permission to provider_entrypoint.sh
* ADJUST docker
* Update clearml-agent version
* ADDJUST stuff
* ADJUST queue string handling
* DELETE pip install from own repo
2024-02-29 14:20:54 +02:00
allegroai
b34329934b
Add queue ID report before pulling task
2024-02-29 13:52:17 +02:00
allegroai
6657003d65
Fix using controller-uid will not always return required pods
2024-02-29 13:49:30 +02:00
allegroai
c9fc092f4e
Support force_system_packages argument in k8s glue class
2023-12-26 10:12:32 +02:00
allegroai
55b065a114
Update GPU stats and pynvml support
2023-12-20 17:47:19 +02:00
allegroai
faa97b6cc2
Set worker ID in k8s glue mode
2023-12-20 17:45:34 +02:00
allegroai
9ad49a0d21
Fix KeyError if container does not contain the arguments field
2023-11-01 15:11:07 +02:00
allegroai
0131db8b7d
Add support for resource_applied() callback in k8s glue
...
Add support for sending log events with k8s-provided timestamps
Refactor env vars infrastructure
2023-11-01 15:10:08 +02:00
allegroai
9c6cb421b3
When cleaning up pending pods, verify task is still aborted and pod is still pending before deleting the pod
2023-11-01 15:04:01 +02:00
allegroai
46f0c991c8
Add status reason when aborting before moving to k8s_scheduler queue
2023-11-01 15:02:24 +02:00
Alex Burlacu
946e9d9ce9
Fix invalid reference
2023-08-24 18:51:27 +03:00
allegroai
4c056a17b9
Add support for k8s jobs execution
...
Strip docker container obtained from task in k8s apply
2023-07-04 14:45:00 +03:00
allegroai
450df2f8d3
Support skipping agent pip upgrade in container bash script using the CLEARML_AGENT_NO_UPDATE env var
2023-07-04 14:38:50 +03:00
allegroai
95dadca45c
Refactor k8s glue running/used pods getter
2023-05-21 22:56:12 +03:00
allegroai
7f5b3c8df4
Fix None config file in session causes k8s agent to raise exception
2023-03-28 14:33:55 +03:00
allegroai
40456be948
Black formatting
...
Refactor path support
2023-03-05 18:05:00 +02:00
allegroai
4f17a2c17d
Fix K8s glue does not delete pending pods if the tasks they represent were aborted
2023-02-05 10:32:16 +02:00
allegroai
af6a77918f
Fix _
is allowed in k8s label names
2023-02-05 10:29:48 +02:00
allegroai
b2da639582
Add CLEARML_AGENT_FORCE_SYSTEM_SITE_PACKAGES
env var (default true) to allow overriding default "system_site_packages: true" behavior when running tasks in containers (docker mode and k8s-glue)
2022-12-10 20:00:46 +02:00
allegroai
dc5e0033c8
Remove support for kubectl run
...
Allow customizing pod name prefix and limit pod label
Return deleted pods from cleanup
Some refactoring
2022-12-05 11:40:19 +02:00
allegroai
3dd5973734
Filter by phase when detecting hanging pods
...
More debug print-outs
Use task session when possible
Push task into k8s scheduler queue only if running from the same tenant
Make sure we pass git_user/pass to the task pod
Fix cleanup command not issued when no pods exist in a multi-queue setup
2022-12-05 11:29:59 +02:00
allegroai
6e7fb5f331
Fix sending task logs fails when agent is not running in the same tenant
2022-12-05 11:19:14 +02:00
allegroai
d794b047be
Fix system_site_packages is not turned on in k8s glue
2022-10-23 13:03:59 +03:00
allegroai
857a750eb1
Fix GCP load balancer not fwd GET request body, allow to change default request Action to Put/Post/Get. see api.http.default_method or CLEARML_API_DEFAULT_REQ_METHOD
2022-09-15 20:15:42 +03:00
allegroai
26aa50f1b5
Fix k8s glue extra_bash_init_cmd location in initial bash script
2022-09-02 23:50:03 +03:00
allegroai
8b4f1eefc2
Add more debug printouts in k8s glue
2022-09-02 23:49:28 +03:00
allegroai
97c2e21dcc
Fix resolving k8s pending queue may cause a queue with a uuid name to be created
2022-09-02 23:49:28 +03:00
allegroai
7292263f86
Add CLEARML_K8S_GLUE_START_AGENT_SCRIPT_PATH to allow customizing the agent startup script location for k8s glue agent
2022-08-23 23:16:36 +03:00
allegroai
820ab4dc0c
Fix k8s glue debug mode, refactoring
2022-08-01 18:55:49 +03:00
allegroai
d96b8ff906
Fix template namespace should override default namespace
2022-07-22 22:44:32 +03:00
allegroai
e687418194
Refactor k8s glue template handling
2022-07-22 22:43:07 +03:00
allegroai
2e5298b737
Add support for use-owner-token in k8s glue
2022-04-27 14:59:27 +03:00
allegroai
4c120d7cd0
Add ability to override container LOCAL_PYTHON, add auto python support (max 3.15)
2022-03-24 21:58:07 +02:00
allegroai
cd046927f3
Add k8s glue update task status_message in hanging pods daemon
...
Fix k8s glue not throwing error when failing to push to queue
2021-08-02 22:59:31 +03:00
allegroai
42606d9247
Fix multiple k8s glue instances with pod limits
...
Version bump
2021-07-15 10:28:43 +03:00
allegroai
499b3dfa66
Fix k8s glue, do not reset Task before re-enqueuing as it will remove runtime properties
2021-07-15 10:27:54 +03:00
allegroai
ca360b7d43
Improve max pod limit check
2021-07-15 10:26:49 +03:00
allegroai
6470b16b70
Add k8s set task container if using default image/arguments
2021-07-15 10:26:09 +03:00
allegroai
0e7546f248
Fix docker force pull in k8s glue _kubectl_apply()
2021-06-27 09:42:14 +03:00
allegroai
e3c8bd5666
Add support for agent.docker_force_pull configuration setting in k8s glue
2021-06-25 17:36:08 +03:00
allegroai
3ae1741343
Fix k8s glue task container arguments not supported in kubectl_run command
...
Fix k8s glue not passing required extra_docker_bash_script to string format
2021-06-25 17:35:01 +03:00
allegroai
53c106c3af
Fix k8s glue task container handling fails parsing docker image
...
Fix k8s glue uses task container image arguments when no image is specified
2021-06-25 17:34:28 +03:00
allegroai
a2db1f5ab5
Remove queue name from pod name in k8s glue, add queue name and ID to pod labels (issue #64 )
2021-05-05 12:03:35 +03:00
allegroai
4f18bb7ea0
Add k8s glue default restartPolicy=Never to template to prevent pods from restarting
2021-04-28 13:20:13 +03:00
allegroai
08ff5e6db7
Add number of pods limit to k8s glue
2021-04-25 10:47:49 +03:00
allegroai
537b67e0cd
Fix agent can return non-zero error code and pods will end up restarting forever (issue #56 )
2021-04-12 23:00:59 +03:00
allegroai
e71e6865d2
Add agent.docker_install_opencv_libs (default: True) to enable auto opencv libs install for faster docker spin-up
2021-04-07 18:45:44 +03:00