clearml-helm-charts/charts/clearml
Valeriano Manassero 1b5b9407f6
Configurable auth cookies age (#38)
* configurable auth cookies age

* version bump up
2021-12-09 08:14:09 +01:00
..
charts Initial load (#1) 2021-07-07 09:04:15 +02:00
templates Configurable auth cookies age (#38) 2021-12-09 08:14:09 +01:00
.helmignore Initial load (#1) 2021-07-07 09:04:15 +02:00
Chart.lock Initial load (#1) 2021-07-07 09:04:15 +02:00
Chart.yaml Configurable auth cookies age (#38) 2021-12-09 08:14:09 +01:00
LICENSE Clearml chart readme improvements (#7) 2021-07-07 11:44:21 +02:00
README.md Configurable auth cookies age (#38) 2021-12-09 08:14:09 +01:00
README.md.gotmpl feat: make service nodePort configurable and add some doc descriptions (#33) 2021-11-08 14:23:10 +01:00
values.yaml Configurable auth cookies age (#38) 2021-12-09 08:14:09 +01:00

ClearML Ecosystem for Kubernetes

Version: 3.2.0 Type: application AppVersion: 1.1.1

MLOps platform

Homepage: https://clear.ml

Maintainers

Name Email Url
valeriano-manassero https://github.com/valeriano-manassero

Introduction

The clearml-server is the backend service infrastructure for ClearML. It allows multiple users to collaborate and manage their experiments.

clearml-server contains the following components:

  • The ClearML Web-App, a single-page UI for experiment management and browsing
  • RESTful API for:
    • Documenting and logging experiment information, statistics and results
    • Querying experiments history, logs and results
  • Locally-hosted file server for storing images and models making them easily accessible using the Web-App

Local environment

For development/evaluation it's possible to use kind. After installation, following commands will create a complete ClearML insatllation:

mkdir -pm 777 /tmp/clearml-kind

cat <<EOF > /tmp/clearml-kind.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  extraPortMappings:
  # API server's default nodePort is 30008. If you customize it in helm values by
  # `apiserver.service.nodePort`, `containerPort` should match it
  - containerPort: 30008
    hostPort: 30008
    listenAddress: "127.0.0.1"
    protocol: TCP
  # Web server's default nodePort is 30080. If you customize it in helm values by
  # `webserver.service.nodePort`, `containerPort` should match it
  - containerPort: 30080
    hostPort: 30080
    listenAddress: "127.0.0.1"
    protocol: TCP
  # File server's default nodePort is 30081. If you customize it in helm values by
  # `fileserver.service.nodePort`, `containerPort` should match it
  - containerPort: 30081
    hostPort: 30081
    listenAddress: "127.0.0.1"
    protocol: TCP
  extraMounts:
  - hostPath: /tmp/clearml-kind/
    containerPath: /var/local-path-provisioner
EOF

kind create cluster --config /tmp/clearml-kind.yaml

helm install clearml allegroai/clearml

After deployment, the services will be exposed on localhost on the following ports:

  • API server on 30008
  • Web server on 30080
  • File server on 30081

Data persisted in every Kubernetes volume by ClearML will be accessible in /tmp/clearml-kind folder on the host.

Production cluster environment

In a production environment it's suggested to install an ingress controller and verify that is working correctly. During ClearML deployment enable ingress section of chart values. This will create 3 ingress rules:

  • app.<your domain name>
  • files.<your domain name>
  • api.<your domain name>

(for example, app.clearml.mydomainname.com, files.clearml.mydomainname.com and api.clearml.mydomainname.com)

Just pointing the domain records to the IP where ingress controller is responding will complete the deployment process.

Additional Configuration for ClearML Server

You can also configure the clearml-server for:

  • fixed users (users with credentials)
  • non-responsive experiment watchdog settings

For detailed instructions, see the Optional Configuration section in the clearml-server repository README file.

Source Code

Requirements

Repository Name Version
https://charts.bitnami.com/bitnami mongodb ~10.3.2
https://charts.bitnami.com/bitnami redis ~10.9.0
https://helm.elastic.co elasticsearch ~7.10.1

Values

Key Type Default Description
agentGroups.agent-group-cpu.affinity object {}
agentGroups.agent-group-cpu.agentVersion string ""
agentGroups.agent-group-cpu.awsAccessKeyId string nil
agentGroups.agent-group-cpu.awsDefaultRegion string nil
agentGroups.agent-group-cpu.awsSecretAccessKey string nil
agentGroups.agent-group-cpu.azureStorageAccount string nil
agentGroups.agent-group-cpu.azureStorageKey string nil
agentGroups.agent-group-cpu.clearmlAccessKey string nil
agentGroups.agent-group-cpu.clearmlConfig string "sdk {\n}"
agentGroups.agent-group-cpu.clearmlGitPassword string nil
agentGroups.agent-group-cpu.clearmlGitUser string nil
agentGroups.agent-group-cpu.clearmlSecretKey string nil
agentGroups.agent-group-cpu.enabled bool true
agentGroups.agent-group-cpu.image.pullPolicy string "IfNotPresent"
agentGroups.agent-group-cpu.image.repository string "ubuntu"
agentGroups.agent-group-cpu.image.tag string "18.04"
agentGroups.agent-group-cpu.name string "agent-group-cpu"
agentGroups.agent-group-cpu.nodeSelector object {}
agentGroups.agent-group-cpu.nvidiaGpusPerAgent int 0
agentGroups.agent-group-cpu.podAnnotations object {}
agentGroups.agent-group-cpu.queues string "default"
agentGroups.agent-group-cpu.replicaCount int 1
agentGroups.agent-group-cpu.tolerations list []
agentGroups.agent-group-cpu.updateStrategy string "Recreate"
agentGroups.agent-group-gpu.affinity object {}
agentGroups.agent-group-gpu.agentVersion string ""
agentGroups.agent-group-gpu.awsAccessKeyId string nil
agentGroups.agent-group-gpu.awsDefaultRegion string nil
agentGroups.agent-group-gpu.awsSecretAccessKey string nil
agentGroups.agent-group-gpu.azureStorageAccount string nil
agentGroups.agent-group-gpu.azureStorageKey string nil
agentGroups.agent-group-gpu.clearmlAccessKey string nil
agentGroups.agent-group-gpu.clearmlConfig string "sdk {\n}"
agentGroups.agent-group-gpu.clearmlGitPassword string nil
agentGroups.agent-group-gpu.clearmlGitUser string nil
agentGroups.agent-group-gpu.clearmlSecretKey string nil
agentGroups.agent-group-gpu.enabled bool true
agentGroups.agent-group-gpu.image.pullPolicy string "IfNotPresent"
agentGroups.agent-group-gpu.image.repository string "nvidia/cuda"
agentGroups.agent-group-gpu.image.tag string "11.0-base-ubuntu18.04"
agentGroups.agent-group-gpu.name string "agent-group-gpu"
agentGroups.agent-group-gpu.nodeSelector object {}
agentGroups.agent-group-gpu.nvidiaGpusPerAgent int 1
agentGroups.agent-group-gpu.podAnnotations object {}
agentGroups.agent-group-gpu.queues string "default"
agentGroups.agent-group-gpu.replicaCount int 0
agentGroups.agent-group-gpu.tolerations list []
agentGroups.agent-group-gpu.updateStrategy string "Recreate"
agentservices.affinity object {}
agentservices.agentVersion string ""
agentservices.awsAccessKeyId string nil
agentservices.awsDefaultRegion string nil
agentservices.awsSecretAccessKey string nil
agentservices.azureStorageAccount string nil
agentservices.azureStorageKey string nil
agentservices.clearmlFilesHost string nil
agentservices.clearmlGitPassword string nil
agentservices.clearmlGitUser string nil
agentservices.clearmlHostIp string nil
agentservices.clearmlWebHost string nil
agentservices.clearmlWorkerId string "clearml-services"
agentservices.enabled bool false
agentservices.extraEnvs list []
agentservices.googleCredentials string nil
agentservices.image.pullPolicy string "IfNotPresent"
agentservices.image.repository string "allegroai/clearml-agent-services"
agentservices.image.tag string "latest"
agentservices.nodeSelector object {}
agentservices.podAnnotations object {}
agentservices.replicaCount int 1
agentservices.resources object {}
agentservices.storage.data.class string "standard"
agentservices.storage.data.size string "50Gi"
agentservices.tolerations list []
apiserver.additionalConfigs object {}
apiserver.affinity object {}
apiserver.authCoockiesMaxAge int 864000 Amount of seconds the authorization cookie will last in user browser
apiserver.configDir string "/opt/clearml/config"
apiserver.extraEnvs list []
apiserver.image.pullPolicy string "IfNotPresent"
apiserver.image.repository string "allegroai/clearml"
apiserver.image.tag string "1.1.1"
apiserver.livenessDelay int 60
apiserver.nodeSelector object {}
apiserver.podAnnotations object {}
apiserver.prepopulateArtifactsPath string "/mnt/fileserver"
apiserver.prepopulateEnabled string "true"
apiserver.prepopulateZipFiles string "/opt/clearml/db-pre-populate"
apiserver.readinessDelay int 60
apiserver.replicaCount int 1
apiserver.resources object {}
apiserver.service.nodePort int 30008 If service.type set to NodePort, this will be set to service's nodePort field. If service.type is set to others, this field will be ignored
apiserver.service.port int 8008
apiserver.service.type string "NodePort" This will set to service's spec.type field
apiserver.tolerations list []
clearml.defaultCompany string "d1bd92a3b039400cbafc60a7a5b1e52b"
elasticsearch.clusterHealthCheckParams string "wait_for_status=yellow&timeout=1s"
elasticsearch.clusterName string "clearml-elastic"
elasticsearch.enabled bool true
elasticsearch.esConfig."elasticsearch.yml" string "xpack.security.enabled: false\n"
elasticsearch.esJavaOpts string "-Xmx2g -Xms2g"
elasticsearch.extraEnvs[0].name string "bootstrap.memory_lock"
elasticsearch.extraEnvs[0].value string "false"
elasticsearch.extraEnvs[1].name string "cluster.routing.allocation.node_initial_primaries_recoveries"
elasticsearch.extraEnvs[1].value string "500"
elasticsearch.extraEnvs[2].name string "cluster.routing.allocation.disk.watermark.low"
elasticsearch.extraEnvs[2].value string "500mb"
elasticsearch.extraEnvs[3].name string "cluster.routing.allocation.disk.watermark.high"
elasticsearch.extraEnvs[3].value string "500mb"
elasticsearch.extraEnvs[4].name string "cluster.routing.allocation.disk.watermark.flood_stage"
elasticsearch.extraEnvs[4].value string "500mb"
elasticsearch.extraEnvs[5].name string "http.compression_level"
elasticsearch.extraEnvs[5].value string "7"
elasticsearch.extraEnvs[6].name string "reindex.remote.whitelist"
elasticsearch.extraEnvs[6].value string "*.*"
elasticsearch.extraEnvs[7].name string "xpack.monitoring.enabled"
elasticsearch.extraEnvs[7].value string "false"
elasticsearch.extraEnvs[8].name string "xpack.security.enabled"
elasticsearch.extraEnvs[8].value string "false"
elasticsearch.httpPort int 9200
elasticsearch.minimumMasterNodes int 1
elasticsearch.persistence.enabled bool true
elasticsearch.replicas int 1
elasticsearch.resources.limits.memory string "4Gi"
elasticsearch.resources.requests.memory string "4Gi"
elasticsearch.roles.data string "true"
elasticsearch.roles.ingest string "true"
elasticsearch.roles.master string "true"
elasticsearch.roles.remote_cluster_client string "true"
elasticsearch.volumeClaimTemplate.accessModes[0] string "ReadWriteOnce"
elasticsearch.volumeClaimTemplate.resources.requests.storage string "50Gi"
externalServices.elasticsearchHost string "" Existing ElasticSearch Hostname to use if elasticsearch.enabled is false
externalServices.elasticsearchPort int 9200 Existing ElasticSearch Port to use if elasticsearch.enabled is false
externalServices.mongodbHost string "" Existing MongoDB Hostname to use if elasticsearch.enabled is false
externalServices.mongodbPort int 27017 Existing MongoDB Port to use if elasticsearch.enabled is false
externalServices.redisHost string "" Existing Redis Hostname to use if elasticsearch.enabled is false
externalServices.redisPort int 6379 Existing Redis Port to use if elasticsearch.enabled is false
fileserver.affinity object {}
fileserver.extraEnvs list []
fileserver.image.pullPolicy string "IfNotPresent"
fileserver.image.repository string "allegroai/clearml"
fileserver.image.tag string "1.1.1"
fileserver.nodeSelector object {}
fileserver.podAnnotations object {}
fileserver.replicaCount int 1
fileserver.resources object {}
fileserver.service.nodePort int 30081 If service.type set to NodePort, this will be set to service's nodePort field. If service.type is set to others, this field will be ignored
fileserver.service.port int 8081
fileserver.service.type string "NodePort" This will set to service's spec.type field
fileserver.storage.data.class string "standard"
fileserver.storage.data.size string "50Gi"
fileserver.tolerations list []
ingress.annotations object {}
ingress.api.hostName string "api.clearml.127-0-0-1.nip.io"
ingress.api.tlsSecretName string ""
ingress.app.hostName string "app.clearml.127-0-0-1.nip.io"
ingress.app.tlsSecretName string ""
ingress.enabled bool false
ingress.files.hostName string "files.clearml.127-0-0-1.nip.io"
ingress.files.tlsSecretName string ""
ingress.name string "clearml-server-ingress"
mongodb.architecture string "standalone"
mongodb.auth.enabled bool false
mongodb.enabled bool true
mongodb.persistence.accessModes[0] string "ReadWriteOnce"
mongodb.persistence.enabled bool true
mongodb.persistence.size string "50Gi"
mongodb.replicaCount int 1
mongodb.service.name string "{{ .Release.Name }}-mongodb"
mongodb.service.port int 27017
mongodb.service.portName string "mongo-service"
mongodb.service.type string "ClusterIP"
redis.cluster.enabled bool false
redis.databaseNumber int 0
redis.enabled bool true
redis.master.name string "{{ .Release.Name }}-redis-master"
redis.master.persistence.accessModes[0] string "ReadWriteOnce"
redis.master.persistence.enabled bool true
redis.master.persistence.size string "5Gi"
redis.master.port int 6379
redis.usePassword bool false
secret.authToken string "1SCf0ov3Nm544Td2oZ0gXSrsNx5XhMWdVlKz1tOgcx158bD5RV" Set for auth_token field
secret.credentials.apiserver.accessKey string "5442F3443MJMORWZA3ZH" Set for apiserver_key field
secret.credentials.apiserver.secretKey string "BxapIRo9ZINi8x25CRxz8Wdmr2pQjzuWVB4PNASZqCtTyWgWVQ" Set for apiserver_secret field
secret.credentials.tests.accessKey string "ENP39EQM4SLACGD5FXB7" Set for tests_user_key field
secret.credentials.tests.secretKey string "lPcm0imbcBZ8mwgO7tpadutiS3gnJD05x9j7afwXPS35IKbpiQ" Set for tests_user_secret field
secret.httpSession string "9Tw20RbhJ1bLBiHEOWXvhplKGUbTgLzAtwFN2oLQvWwS0uRpD5" Set for http_session field
webserver.affinity object {}
webserver.extraEnvs list []
webserver.image.pullPolicy string "IfNotPresent"
webserver.image.repository string "allegroai/clearml"
webserver.image.tag string "1.1.1"
webserver.nodeSelector object {}
webserver.podAnnotations object {}
webserver.replicaCount int 1
webserver.resources object {}
webserver.service.nodePort int 30080 If service.type set to NodePort, this will be set to service's nodePort field. If service.type is set to others, this field will be ignored
webserver.service.port int 80
webserver.service.type string "NodePort" This will set to service's spec.type field
webserver.tolerations list []