clearml-helm-charts/charts/clearml-agent
Valeriano Manassero 546c2b6d2e
Changed: bump up to support k8s 1.29 (#270)
* Changed: bump up to support k8s 1.29

* Changed: k8s versions

* Changed: exclude serving

* Fixed: changes
2024-02-20 10:57:17 +01:00
..
ci
templates 262 clearml agent wrong nindent for agentk8sglue deployment template (#267) 2024-01-10 09:23:52 +01:00
.helmignore
Chart.yaml Changed: bump up to support k8s 1.29 (#270) 2024-02-20 10:57:17 +01:00
LICENSE
README.md Changed: bump up to support k8s 1.29 (#270) 2024-02-20 10:57:17 +01:00
README.md.gotmpl Changed: remove enterprise features (#204) 2023-04-13 17:44:58 +02:00
values.yaml 229 resouces in the values file for agentk8sglue deployment (#230) 2023-06-15 09:05:38 +02:00

ClearML Kubernetes Agent

Version: 5.1.3 Type: application AppVersion: 1.24

MLOps platform Task running agent

Homepage: https://clear.ml

Maintainers

Name Email Url
valeriano-manassero https://github.com/valeriano-manassero

Introduction

The clearml-agent is the Kubernetes agent for for ClearML. It allows you to schedule distributed experiments on a Kubernetes cluster.

Upgrading Chart

Upgrades/ Values upgrades

Updating to latest version of this chart can be done in two steps:

helm repo update
helm upgrade clearml-agent allegroai/clearml-agent

Changing values on existing installation can be done with:

helm upgrade clearml-agent allegroai/clearml-agent --version <CURRENT CHART VERSION> -f custom_values.yaml

Major upgrade from 3.* to 4.*

Before issuing helm upgrade:

  • if using securityContexts check for new value form in values.yaml (podSecurityContext and containerSecurityContext)

Source Code

Requirements

Kubernetes: >= 1.21.0-0 < 1.30.0-0

Values

Key Type Default Description
agentk8sglue object {"additionalClusterRoleBindings":[],"additionalRoleBindings":[],"affinity":{},"annotations":{},"apiServerUrlReference":"https://api.clear.ml","basePodTemplate":{"affinity":{},"annotations":{},"containerSecurityContext":{},"env":[],"fileMounts":[],"hostAliases":[],"initContainers":[],"labels":{},"nodeSelector":{},"podSecurityContext":{},"priorityClassName":"","resources":{},"schedulerName":"","tolerations":[],"volumeMounts":[],"volumes":[]},"clearmlcheckCertificate":true,"containerSecurityContext":{},"defaultContainerImage":"ubuntu:18.04","extraEnvs":[],"fileMounts":[],"fileServerUrlReference":"https://files.clear.ml","image":{"registry":"","repository":"allegroai/clearml-agent-k8s-base","tag":"1.24-21"},"initContainers":{"resources":{}},"labels":{},"nodeSelector":{},"podSecurityContext":{},"queue":"default","replicaCount":1,"resources":{},"serviceExistingAccountName":"","tolerations":[],"volumeMounts":[],"volumes":[],"webServerUrlReference":"https://app.clear.ml"} This agent will spawn queued experiments in new pods, a good use case is to combine this with GPU autoscaling nodes. https://github.com/allegroai/clearml-agent/tree/master/docker/k8s-glue
agentk8sglue.additionalClusterRoleBindings list [] additional existing ClusterRoleBindings
agentk8sglue.additionalRoleBindings list [] additional existing RoleBindings
agentk8sglue.affinity object {} affinity setup for Agent pod (example in values.yaml comments)
agentk8sglue.annotations object {} annotations setup for Agent pod (example in values.yaml comments)
agentk8sglue.apiServerUrlReference string "https://api.clear.ml" Reference to Api server url
agentk8sglue.basePodTemplate object {"affinity":{},"annotations":{},"containerSecurityContext":{},"env":[],"fileMounts":[],"hostAliases":[],"initContainers":[],"labels":{},"nodeSelector":{},"podSecurityContext":{},"priorityClassName":"","resources":{},"schedulerName":"","tolerations":[],"volumeMounts":[],"volumes":[]} base template for pods spawned to consume ClearML Task
agentk8sglue.basePodTemplate.affinity object {} affinity setup for pods spawned to consume ClearML Task
agentk8sglue.basePodTemplate.annotations object {} annotations setup for pods spawned to consume ClearML Task (example in values.yaml comments)
agentk8sglue.basePodTemplate.containerSecurityContext object {} securityContext setup for containers spawned to consume ClearML Task (example in values.yaml comments)
agentk8sglue.basePodTemplate.env list [] environment variables for pods spawned to consume ClearML Task (example in values.yaml comments)
agentk8sglue.basePodTemplate.fileMounts list [] file definition for pods spawned to consume ClearML Task (example in values.yaml comments)
agentk8sglue.basePodTemplate.hostAliases list [] hostAliases setup for pods spawned to consume ClearML Task (example in values.yaml comments)
agentk8sglue.basePodTemplate.initContainers list [] initContainers definition for pods spawned to consume ClearML Task (example in values.yaml comments)
agentk8sglue.basePodTemplate.labels object {} labels setup for pods spawned to consume ClearML Task (example in values.yaml comments)
agentk8sglue.basePodTemplate.nodeSelector object {} nodeSelector setup for pods spawned to consume ClearML Task (example in values.yaml comments)
agentk8sglue.basePodTemplate.podSecurityContext object {} securityContext setup for pods spawned to consume ClearML Task (example in values.yaml comments)
agentk8sglue.basePodTemplate.priorityClassName string "" priorityClassName setup for pods spawned to consume ClearML Task
agentk8sglue.basePodTemplate.resources object {} resources declaration for pods spawned to consume ClearML Task (example in values.yaml comments)
agentk8sglue.basePodTemplate.schedulerName string "" schedulerName setup for pods spawned to consume ClearML Task
agentk8sglue.basePodTemplate.tolerations list [] tolerations setup for pods spawned to consume ClearML Task (example in values.yaml comments)
agentk8sglue.basePodTemplate.volumeMounts list [] volume mounts definition for pods spawned to consume ClearML Task (example in values.yaml comments)
agentk8sglue.basePodTemplate.volumes list [] volumes definition for pods spawned to consume ClearML Task (example in values.yaml comments)
agentk8sglue.clearmlcheckCertificate bool true Check certificates validity for evefry UrlReference below.
agentk8sglue.containerSecurityContext object {} container securityContext setup for Agent pod (example in values.yaml comments)
agentk8sglue.defaultContainerImage string "ubuntu:18.04" default container image for ClearML Task pod
agentk8sglue.extraEnvs list [] Extra Environment variables for Glue Agent
agentk8sglue.fileMounts list [] file definition for Glue Agent (example in values.yaml comments)
agentk8sglue.fileServerUrlReference string "https://files.clear.ml" Reference to File server url
agentk8sglue.image object {"registry":"","repository":"allegroai/clearml-agent-k8s-base","tag":"1.24-21"} Glue Agent image configuration
agentk8sglue.initContainers object {"resources":{}} Glue Agent pod initContainers configs
agentk8sglue.initContainers.resources object {} Glue Agent initcontainers pod resources
agentk8sglue.labels object {} labels setup for Agent pod (example in values.yaml comments)
agentk8sglue.nodeSelector object {} nodeSelector setup for Agent pod (example in values.yaml comments)
agentk8sglue.podSecurityContext object {} container securityContext setup for Agent pod (example in values.yaml comments)
agentk8sglue.queue string "default" ClearML queue this agent will consume
agentk8sglue.replicaCount int 1 Glue Agent number of pods
agentk8sglue.resources object {} Glue Agent pod resources
agentk8sglue.serviceExistingAccountName string "" if set, don't create a serviceAccountName but use defined existing one
agentk8sglue.tolerations list [] tolerations setup for Agent pod (example in values.yaml comments)
agentk8sglue.volumeMounts list [] volume mounts definition for Glue Agent (example in values.yaml comments)
agentk8sglue.volumes list [] volumes definition for Glue Agent (example in values.yaml comments)
agentk8sglue.webServerUrlReference string "https://app.clear.ml" Reference to Web server url
clearml object {"agentk8sglueKey":"ACCESSKEY","agentk8sglueSecret":"SECRETKEY","clearmlConfig":"sdk {\n}","existingAgentk8sglueSecret":"","existingClearmlConfigSecret":""} ClearMl generic configurations
clearml.agentk8sglueKey string "ACCESSKEY" Agent k8s Glue basic auth key
clearml.agentk8sglueSecret string "SECRETKEY" Agent k8s Glue basic auth secret
clearml.clearmlConfig string "sdk {\n}" ClearML configuration file
clearml.existingAgentk8sglueSecret string "" If this is set, chart will not generate a secret but will use what is defined here
clearml.existingClearmlConfigSecret string "" If this is set, chart will not generate a secret but will use what is defined here
global object {"imageRegistry":"docker.io"} Global parameters section
global.imageRegistry string "docker.io" Images registry
imageCredentials object {"email":"someone@host.com","enabled":false,"existingSecret":"","password":"pwd","registry":"docker.io","username":"someone"} Private image registry configuration
imageCredentials.email string "someone@host.com" Email
imageCredentials.enabled bool false Use private authentication mode
imageCredentials.existingSecret string "" If this is set, chart will not generate a secret but will use what is defined here
imageCredentials.password string "pwd" Registry password
imageCredentials.registry string "docker.io" Registry name
imageCredentials.username string "someone" Registry username
sessions object {"externalIP":"0.0.0.0","maxServices":20,"portModeEnabled":false,"startingPort":30000,"svcAnnotations":{},"svcType":"NodePort"} Sessions internal service configuration
sessions.externalIP string "0.0.0.0" External IP sessions clients can connect to
sessions.maxServices int 20 maximum number of NodePorts exposed
sessions.portModeEnabled bool false Enable/Disable sessions portmode WARNING: only one Agent deployment can have this set to true
sessions.startingPort int 30000 starting range of exposed NodePorts
sessions.svcAnnotations object {} specific annotations for session services
sessions.svcType string "NodePort" service type ("NodePort" or "ClusterIP" or "LoadBalancer")