ClearML Ecosystem for Kubernetes

Version: 4.0.0 Type: application AppVersion: 1.5.0

MLOps platform



The clearml-server is the backend service infrastructure for ClearML. It allows multiple users to collaborate and manage their experiments.

clearml-server contains the following components:

  • The ClearML Web-App, a single-page UI for experiment management and browsing
  • RESTful API for:
    • Documenting and logging experiment information, statistics and results
    • Querying experiments history, logs and results
  • Locally-hosted file server for storing images and models making them easily accessible using the Web-App

Local environment

For development/evaluation it's possible to use kind. After installation, following commands will create a complete ClearML insatllation:

cat <<EOF | kind create cluster --config=-                                                                  ─╯
kind: Cluster
- role: control-plane
  # API server's default nodePort is 30008. If you customize it in helm values by
  # `apiserver.service.nodePort`, `containerPort` should match it
  - containerPort: 30008
    hostPort: 30008
    listenAddress: ""
    protocol: TCP
  # Web server's default nodePort is 30080. If you customize it in helm values by
  # `webserver.service.nodePort`, `containerPort` should match it
  - containerPort: 30080
    hostPort: 30080
    listenAddress: ""
    protocol: TCP
  # File server's default nodePort is 30081. If you customize it in helm values by
  # `fileserver.service.nodePort`, `containerPort` should match it
  - containerPort: 30081
    hostPort: 30081
    listenAddress: ""
    protocol: TCP
  - hostPath: /tmp/clearml-kind/
    containerPath: /var/local-path-provisioner

helm install clearml allegroai/clearml

After deployment, the services will be exposed on localhost on the following ports:

  • API server on 30008
  • Web server on 30080
  • File server on 30081

Data persisted in every Kubernetes volume by ClearML will be accessible in /tmp/clearml-kind folder on the host.

Production cluster environment

In a production environment it's suggested to install an ingress controller and verify that is working correctly. During ClearML deployment enable ingress section of chart values. This will create 3 ingress rules:

  • app.<your domain name>
  • files.<your domain name>
  • api.<your domain name>

(for example,, and

Just pointing the domain records to the IP where ingress controller is responding will complete the deployment process.

Upgrades/ Values upgrades

Updating to latest version of this chart can be done in two steps:

helm repo update
helm upgrade clearml allegroai/clearml

Changing values on existing installation can be done with:

helm upgrade clearml allegroai/clearml --version <CURRENT CHART VERSION> -f custom_values.yaml

Please note: updating values only should always be done setting explicit chart version to avoid a possible chart update. Keeping separate updates procedures between version and values can be a good practice to seprate potential concerns.

Additional Configuration for ClearML Server

You can also configure the clearml-server for:

  • fixed users (users with credentials)
  • non-responsive experiment watchdog settings

For detailed instructions, see the Optional Configuration section in the clearml-server repository README file.

Key Type Default Description
apiserver.additionalConfigs object {} additional configurations that can be used by api server; check examples in values.yaml file
apiserver.affinity object {}
apiserver.authCookiesMaxAge int 864000 Amount of seconds the authorization cookie will last in user browser
apiserver.configDir string "/opt/clearml/config"
apiserver.extraEnvs list []
apiserver.image.pullPolicy string "IfNotPresent"
apiserver.image.repository string "allegroai/clearml"
apiserver.image.tag string "1.5.0"
apiserver.livenessDelay int 60
apiserver.nodeSelector object {}
apiserver.podAnnotations object {}
apiserver.prepopulateArtifactsPath string "/mnt/fileserver"
apiserver.prepopulateEnabled string "true"
apiserver.prepopulateZipFiles string "/opt/clearml/db-pre-populate"
apiserver.readinessDelay int 60
apiserver.replicaCount int 1
apiserver.resources object {}
apiserver.service.nodePort int 30008 If service.type set to NodePort, this will be set to service's nodePort field. If service.type is set to others, this field will be ignored
apiserver.service.port int 8008
apiserver.service.type string "NodePort" This will set to service's spec.type field
apiserver.tolerations list []
clearml object {"defaultCompany":"d1bd92a3b039400cbafc60a7a5b1e52b"} ClearMl generic configurations
elasticsearch.clusterHealthCheckParams string "wait_for_status=yellow&timeout=1s"
elasticsearch.clusterName string "clearml-elastic"
elasticsearch.enabled bool true
elasticsearch.esConfig."elasticsearch.yml" string " false\n"
elasticsearch.esJavaOpts string "-Xmx2g -Xms2g"
elasticsearch.extraEnvs[0].name string "bootstrap.memory_lock"
elasticsearch.extraEnvs[0].value string "false"
elasticsearch.extraEnvs[1].name string "cluster.routing.allocation.node_initial_primaries_recoveries"
elasticsearch.extraEnvs[1].value string "500"
elasticsearch.extraEnvs[2].name string "cluster.routing.allocation.disk.watermark.low"
elasticsearch.extraEnvs[2].value string "500mb"
elasticsearch.extraEnvs[3].name string "cluster.routing.allocation.disk.watermark.high"
elasticsearch.extraEnvs[3].value string "500mb"
elasticsearch.extraEnvs[4].name string "cluster.routing.allocation.disk.watermark.flood_stage"
elasticsearch.extraEnvs[4].value string "500mb"
elasticsearch.extraEnvs[5].name string "http.compression_level"
elasticsearch.extraEnvs[5].value string "7"
elasticsearch.extraEnvs[6].name string "reindex.remote.whitelist"
elasticsearch.extraEnvs[6].value string "*.*"
elasticsearch.extraEnvs[7].name string "xpack.monitoring.enabled"
elasticsearch.extraEnvs[7].value string "false"
elasticsearch.extraEnvs[8].name string ""
elasticsearch.extraEnvs[8].value string "false"
elasticsearch.httpPort int 9200
elasticsearch.minimumMasterNodes int 1
elasticsearch.persistence.enabled bool true
elasticsearch.replicas int 1
elasticsearch.resources.limits.memory string "4Gi"
elasticsearch.resources.requests.memory string "4Gi" string "true"
elasticsearch.roles.ingest string "true"
elasticsearch.roles.master string "true"
elasticsearch.roles.remote_cluster_client string "true"
elasticsearch.volumeClaimTemplate.accessModes[0] string "ReadWriteOnce" string "50Gi"
externalServices.elasticsearchHost string "" Existing ElasticSearch Hostname to use if elasticsearch.enabled is false
externalServices.elasticsearchPort int 9200 Existing ElasticSearch Port to use if elasticsearch.enabled is false
externalServices.mongodbHost string "" Existing MongoDB Hostname to use if elasticsearch.enabled is false
externalServices.mongodbPort int 27017 Existing MongoDB Port to use if elasticsearch.enabled is false
externalServices.redisHost string "" Existing Redis Hostname to use if elasticsearch.enabled is false
externalServices.redisPort int 6379 Existing Redis Port to use if elasticsearch.enabled is false
fileserver.affinity object {}
fileserver.extraEnvs list []
fileserver.image.pullPolicy string "IfNotPresent"
fileserver.image.repository string "allegroai/clearml"
fileserver.image.tag string "1.5.0"
fileserver.nodeSelector object {}
fileserver.podAnnotations object {}
fileserver.replicaCount int 1
fileserver.resources object {}
fileserver.service.nodePort int 30081 If service.type set to NodePort, this will be set to service's nodePort field. If service.type is set to others, this field will be ignored
fileserver.service.port int 8081
fileserver.service.type string "NodePort" This will set to service's spec.type field string "" string "50Gi"
fileserver.tolerations list []
imageCredentials object {"email":"","enabled":false,"existingSecret":"","password":"pwd","registry":"","username":"someone"} Private image registry configuration string "" Email
imageCredentials.enabled bool false Use private authentication mode
imageCredentials.existingSecret string "" If this is set, chart will not generate a secret but will use what is defined here
imageCredentials.password string "pwd" Registry password
imageCredentials.registry string "" Registry name
imageCredentials.username string "someone" Registry username
ingress.annotations object {}
ingress.api.annotations object {}
ingress.api.enabled bool false
ingress.api.hostName string ""
ingress.api.path string "/"
ingress.api.tlsSecretName string "" object {} bool false string "" string "/" string ""
ingress.files.annotations object {}
ingress.files.enabled bool false
ingress.files.hostName string ""
ingress.files.path string "/"
ingress.files.tlsSecretName string "" string "clearml-server-ingress"
mongodb.architecture string "standalone"
mongodb.auth.enabled bool false
mongodb.enabled bool true
mongodb.persistence.accessModes[0] string "ReadWriteOnce"
mongodb.persistence.enabled bool true
mongodb.persistence.size string "50Gi"
mongodb.replicaCount int 1 string "{{ .Release.Name }}-mongodb"
mongodb.service.port int 27017
mongodb.service.portName string "mongo-service"
mongodb.service.type string "ClusterIP"
redis.cluster.enabled bool false
redis.databaseNumber int 0
redis.enabled bool true string "{{ .Release.Name }}-redis-master"
redis.master.persistence.accessModes[0] string "ReadWriteOnce"
redis.master.persistence.enabled bool true
redis.master.persistence.size string "5Gi"
redis.master.port int 6379
redis.usePassword bool false
secret.authToken string "1SCf0ov3Nm544Td2oZ0gXSrsNx5XhMWdVlKz1tOgcx158bD5RV" Set for auth_token field
secret.credentials.apiserver.accessKey string "5442F3443MJMORWZA3ZH" Set for apiserver_key field
secret.credentials.apiserver.secretKey string "BxapIRo9ZINi8x25CRxz8Wdmr2pQjzuWVB4PNASZqCtTyWgWVQ" Set for apiserver_secret field
secret.credentials.tests.accessKey string "ENP39EQM4SLACGD5FXB7" Set for tests_user_key field
secret.credentials.tests.secretKey string "lPcm0imbcBZ8mwgO7tpadutiS3gnJD05x9j7afwXPS35IKbpiQ" Set for tests_user_secret field
secret.httpSession string "9Tw20RbhJ1bLBiHEOWXvhplKGUbTgLzAtwFN2oLQvWwS0uRpD5" Set for http_session field
webserver.additionalConfigs object {}
webserver.affinity object {}
webserver.extraEnvs list []
webserver.image.pullPolicy string "IfNotPresent"
webserver.image.repository string "allegroai/clearml"
webserver.image.tag string "1.5.0"
webserver.nodeSelector object {}
webserver.podAnnotations object {}
webserver.replicaCount int 1
webserver.resources object {}
webserver.service.nodePort int 30080 If service.type set to NodePort, this will be set to service's nodePort field. If service.type is set to others, this field will be ignored
webserver.service.port int 80
webserver.service.type string "NodePort" This will set to service's spec.type field
webserver.tolerations list []