# ClearML Ecosystem for Kubernetes ![Version: 3.10.4](https://img.shields.io/badge/Version-3.10.4-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: 1.4.0](https://img.shields.io/badge/AppVersion-1.4.0-informational?style=flat-square) MLOps platform **Homepage:** ## Maintainers | Name | Email | Url | | ---- | ------ | --- | | valeriano-manassero | | | ## Introduction The **clearml-server** is the backend service infrastructure for [ClearML](https://github.com/allegroai/clearml). It allows multiple users to collaborate and manage their experiments. **clearml-server** contains the following components: * The ClearML Web-App, a single-page UI for experiment management and browsing * RESTful API for: * Documenting and logging experiment information, statistics and results * Querying experiments history, logs and results * Locally-hosted file server for storing images and models making them easily accessible using the Web-App ## Local environment For development/evaluation it's possible to use [kind](https://kind.sigs.k8s.io). After installation, following commands will create a complete ClearML insatllation: ``` cat <` * `files.` * `api.` (*for example, `app.clearml.mydomainname.com`, `files.clearml.mydomainname.com` and `api.clearml.mydomainname.com`*) Just pointing the domain records to the IP where ingress controller is responding will complete the deployment process. ## Upgrades/ Values upgrades Updating to latest version of this chart can be done in two steps: ``` helm repo update helm upgrade clearml allegroai/clearml ``` Changing values on existing installation can be done with: ``` helm upgrade clearml allegroai/clearml --version -f custom_values.yaml ``` Please note: updating values only should always be done setting explicit chart version to avoid a possible chart update. Keeping separate updates procedures between version and values can be a good practice to seprate potential concerns. ## Additional Configuration for ClearML Server You can also configure the **clearml-server** for: * fixed users (users with credentials) * non-responsive experiment watchdog settings For detailed instructions, see the [Optional Configuration](https://github.com/allegroai/clearml-server#optional-configuration) section in the **clearml-server** repository README file. ## Source Code * * ## Requirements | Repository | Name | Version | |------------|------|---------| | file://../../dependency_charts/elasticsearch | elasticsearch | 7.16.2 | | file://../../dependency_charts/mongodb | mongodb | 10.3.4 | | file://../../dependency_charts/redis | redis | 10.9.0 | ## Values | Key | Type | Default | Description | |-----|------|---------|-------------| | agentGroups.agent-group-cpu.affinity | object | `{}` | | | agentGroups.agent-group-cpu.agentVersion | string | `""` | | | agentGroups.agent-group-cpu.awsAccessKeyId | string | `nil` | | | agentGroups.agent-group-cpu.awsDefaultRegion | string | `nil` | | | agentGroups.agent-group-cpu.awsSecretAccessKey | string | `nil` | | | agentGroups.agent-group-cpu.azureStorageAccount | string | `nil` | | | agentGroups.agent-group-cpu.azureStorageKey | string | `nil` | | | agentGroups.agent-group-cpu.clearmlAccessKey | string | `nil` | | | agentGroups.agent-group-cpu.clearmlConfig | string | `"sdk {\n}"` | | | agentGroups.agent-group-cpu.clearmlGitPassword | string | `nil` | | | agentGroups.agent-group-cpu.clearmlGitUser | string | `nil` | | | agentGroups.agent-group-cpu.clearmlSecretKey | string | `nil` | | | agentGroups.agent-group-cpu.enabled | bool | `false` | | | agentGroups.agent-group-cpu.extraEnvs | list | `[]` | | | agentGroups.agent-group-cpu.image.pullPolicy | string | `"IfNotPresent"` | | | agentGroups.agent-group-cpu.image.repository | string | `"ubuntu"` | | | agentGroups.agent-group-cpu.image.tag | string | `"18.04"` | | | agentGroups.agent-group-cpu.name | string | `"agent-group-cpu"` | | | agentGroups.agent-group-cpu.nodeSelector | object | `{}` | | | agentGroups.agent-group-cpu.nvidiaGpusPerAgent | int | `0` | | | agentGroups.agent-group-cpu.podAnnotations | object | `{}` | | | agentGroups.agent-group-cpu.queues | string | `"default"` | | | agentGroups.agent-group-cpu.replicaCount | int | `1` | | | agentGroups.agent-group-cpu.tolerations | list | `[]` | | | agentGroups.agent-group-cpu.updateStrategy | string | `"Recreate"` | | | agentGroups.agent-group-gpu.affinity | object | `{}` | | | agentGroups.agent-group-gpu.agentVersion | string | `""` | | | agentGroups.agent-group-gpu.awsAccessKeyId | string | `nil` | | | agentGroups.agent-group-gpu.awsDefaultRegion | string | `nil` | | | agentGroups.agent-group-gpu.awsSecretAccessKey | string | `nil` | | | agentGroups.agent-group-gpu.azureStorageAccount | string | `nil` | | | agentGroups.agent-group-gpu.azureStorageKey | string | `nil` | | | agentGroups.agent-group-gpu.clearmlAccessKey | string | `nil` | | | agentGroups.agent-group-gpu.clearmlConfig | string | `"sdk {\n}"` | | | agentGroups.agent-group-gpu.clearmlGitPassword | string | `nil` | | | agentGroups.agent-group-gpu.clearmlGitUser | string | `nil` | | | agentGroups.agent-group-gpu.clearmlSecretKey | string | `nil` | | | agentGroups.agent-group-gpu.enabled | bool | `false` | | | agentGroups.agent-group-gpu.image.pullPolicy | string | `"IfNotPresent"` | | | agentGroups.agent-group-gpu.image.repository | string | `"nvidia/cuda"` | | | agentGroups.agent-group-gpu.image.tag | string | `"11.0-base-ubuntu18.04"` | | | agentGroups.agent-group-gpu.name | string | `"agent-group-gpu"` | | | agentGroups.agent-group-gpu.nodeSelector | object | `{}` | | | agentGroups.agent-group-gpu.nvidiaGpusPerAgent | int | `1` | | | agentGroups.agent-group-gpu.podAnnotations | object | `{}` | | | agentGroups.agent-group-gpu.queues | string | `"default"` | | | agentGroups.agent-group-gpu.replicaCount | int | `0` | | | agentGroups.agent-group-gpu.tolerations | list | `[]` | | | agentGroups.agent-group-gpu.updateStrategy | string | `"Recreate"` | | | agentk8sglue.defaultDockerImage | string | `"nvidia/cuda:11.3.1-cudnn8-runtime-ubuntu20.04"` | | | agentk8sglue.enabled | bool | `true` | | | agentk8sglue.id | string | `"k8s-agent"` | | | agentk8sglue.image.repository | string | `"allegroai/clearml-agent-k8s"` | | | agentk8sglue.image.tag | string | `"base-1.21"` | | | agentk8sglue.maxPods | int | `10` | | | agentk8sglue.podTemplate.env | list | `[]` | | | agentk8sglue.podTemplate.nodeSelector | object | `{}` | | | agentk8sglue.podTemplate.resources | object | `{}` | | | agentk8sglue.podTemplate.tolerations | list | `[]` | | | agentk8sglue.podTemplate.volumes | list | `[]` | | | agentk8sglue.queue | string | `"default"` | | | agentk8sglue.serviceAccountName | string | `"default"` | | | agentservices.affinity | object | `{}` | | | agentservices.agentVersion | string | `""` | | | agentservices.awsAccessKeyId | string | `nil` | | | agentservices.awsDefaultRegion | string | `nil` | | | agentservices.awsSecretAccessKey | string | `nil` | | | agentservices.azureStorageAccount | string | `nil` | | | agentservices.azureStorageKey | string | `nil` | | | agentservices.clearmlFilesHost | string | `nil` | | | agentservices.clearmlGitPassword | string | `nil` | | | agentservices.clearmlGitUser | string | `nil` | | | agentservices.clearmlHostIp | string | `nil` | | | agentservices.clearmlWebHost | string | `nil` | | | agentservices.clearmlWorkerId | string | `"clearml-services"` | | | agentservices.enabled | bool | `false` | | | agentservices.extraEnvs | list | `[]` | | | agentservices.googleCredentials | string | `nil` | | | agentservices.image.pullPolicy | string | `"IfNotPresent"` | | | agentservices.image.repository | string | `"allegroai/clearml-agent-services"` | | | agentservices.image.tag | string | `"latest"` | | | agentservices.nodeSelector | object | `{}` | | | agentservices.podAnnotations | object | `{}` | | | agentservices.replicaCount | int | `1` | | | agentservices.resources | object | `{}` | | | agentservices.storage.data.class | string | `""` | | | agentservices.storage.data.size | string | `"50Gi"` | | | agentservices.tolerations | list | `[]` | | | apiserver.additionalConfigs | object | `{}` | additional configurations that can be used by api server; check examples in values.yaml file | | apiserver.affinity | object | `{}` | | | apiserver.authCookiesMaxAge | int | `864000` | Amount of seconds the authorization cookie will last in user browser | | apiserver.configDir | string | `"/opt/clearml/config"` | | | apiserver.extraEnvs | list | `[]` | | | apiserver.image.pullPolicy | string | `"IfNotPresent"` | | | apiserver.image.repository | string | `"allegroai/clearml"` | | | apiserver.image.tag | string | `"1.4.0"` | | | apiserver.livenessDelay | int | `60` | | | apiserver.nodeSelector | object | `{}` | | | apiserver.podAnnotations | object | `{}` | | | apiserver.prepopulateArtifactsPath | string | `"/mnt/fileserver"` | | | apiserver.prepopulateEnabled | string | `"true"` | | | apiserver.prepopulateZipFiles | string | `"/opt/clearml/db-pre-populate"` | | | apiserver.readinessDelay | int | `60` | | | apiserver.replicaCount | int | `1` | | | apiserver.resources | object | `{}` | | | apiserver.service.nodePort | int | `30008` | If service.type set to NodePort, this will be set to service's nodePort field. If service.type is set to others, this field will be ignored | | apiserver.service.port | int | `8008` | | | apiserver.service.type | string | `"NodePort"` | This will set to service's spec.type field | | apiserver.tolerations | list | `[]` | | | clearml.defaultCompany | string | `"d1bd92a3b039400cbafc60a7a5b1e52b"` | | | elasticsearch.clusterHealthCheckParams | string | `"wait_for_status=yellow&timeout=1s"` | | | elasticsearch.clusterName | string | `"clearml-elastic"` | | | elasticsearch.enabled | bool | `true` | | | elasticsearch.esConfig."elasticsearch.yml" | string | `"xpack.security.enabled: false\n"` | | | elasticsearch.esJavaOpts | string | `"-Xmx2g -Xms2g"` | | | elasticsearch.extraEnvs[0].name | string | `"bootstrap.memory_lock"` | | | elasticsearch.extraEnvs[0].value | string | `"false"` | | | elasticsearch.extraEnvs[1].name | string | `"cluster.routing.allocation.node_initial_primaries_recoveries"` | | | elasticsearch.extraEnvs[1].value | string | `"500"` | | | elasticsearch.extraEnvs[2].name | string | `"cluster.routing.allocation.disk.watermark.low"` | | | elasticsearch.extraEnvs[2].value | string | `"500mb"` | | | elasticsearch.extraEnvs[3].name | string | `"cluster.routing.allocation.disk.watermark.high"` | | | elasticsearch.extraEnvs[3].value | string | `"500mb"` | | | elasticsearch.extraEnvs[4].name | string | `"cluster.routing.allocation.disk.watermark.flood_stage"` | | | elasticsearch.extraEnvs[4].value | string | `"500mb"` | | | elasticsearch.extraEnvs[5].name | string | `"http.compression_level"` | | | elasticsearch.extraEnvs[5].value | string | `"7"` | | | elasticsearch.extraEnvs[6].name | string | `"reindex.remote.whitelist"` | | | elasticsearch.extraEnvs[6].value | string | `"*.*"` | | | elasticsearch.extraEnvs[7].name | string | `"xpack.monitoring.enabled"` | | | elasticsearch.extraEnvs[7].value | string | `"false"` | | | elasticsearch.extraEnvs[8].name | string | `"xpack.security.enabled"` | | | elasticsearch.extraEnvs[8].value | string | `"false"` | | | elasticsearch.httpPort | int | `9200` | | | elasticsearch.minimumMasterNodes | int | `1` | | | elasticsearch.persistence.enabled | bool | `true` | | | elasticsearch.replicas | int | `1` | | | elasticsearch.resources.limits.memory | string | `"4Gi"` | | | elasticsearch.resources.requests.memory | string | `"4Gi"` | | | elasticsearch.roles.data | string | `"true"` | | | elasticsearch.roles.ingest | string | `"true"` | | | elasticsearch.roles.master | string | `"true"` | | | elasticsearch.roles.remote_cluster_client | string | `"true"` | | | elasticsearch.volumeClaimTemplate.accessModes[0] | string | `"ReadWriteOnce"` | | | elasticsearch.volumeClaimTemplate.resources.requests.storage | string | `"50Gi"` | | | externalServices.elasticsearchHost | string | `""` | Existing ElasticSearch Hostname to use if elasticsearch.enabled is false | | externalServices.elasticsearchPort | int | `9200` | Existing ElasticSearch Port to use if elasticsearch.enabled is false | | externalServices.mongodbHost | string | `""` | Existing MongoDB Hostname to use if elasticsearch.enabled is false | | externalServices.mongodbPort | int | `27017` | Existing MongoDB Port to use if elasticsearch.enabled is false | | externalServices.redisHost | string | `""` | Existing Redis Hostname to use if elasticsearch.enabled is false | | externalServices.redisPort | int | `6379` | Existing Redis Port to use if elasticsearch.enabled is false | | fileserver.affinity | object | `{}` | | | fileserver.extraEnvs | list | `[]` | | | fileserver.image.pullPolicy | string | `"IfNotPresent"` | | | fileserver.image.repository | string | `"allegroai/clearml"` | | | fileserver.image.tag | string | `"1.4.0"` | | | fileserver.nodeSelector | object | `{}` | | | fileserver.podAnnotations | object | `{}` | | | fileserver.replicaCount | int | `1` | | | fileserver.resources | object | `{}` | | | fileserver.service.nodePort | int | `30081` | If service.type set to NodePort, this will be set to service's nodePort field. If service.type is set to others, this field will be ignored | | fileserver.service.port | int | `8081` | | | fileserver.service.type | string | `"NodePort"` | This will set to service's spec.type field | | fileserver.storage.data.class | string | `""` | | | fileserver.storage.data.size | string | `"50Gi"` | | | fileserver.tolerations | list | `[]` | | | ingress.annotations | object | `{}` | | | ingress.api.annotations | object | `{}` | | | ingress.api.enabled | bool | `false` | | | ingress.api.hostName | string | `"api.clearml.127-0-0-1.nip.io"` | | | ingress.api.path | string | `"/"` | | | ingress.api.tlsSecretName | string | `""` | | | ingress.app.annotations | object | `{}` | | | ingress.app.enabled | bool | `false` | | | ingress.app.hostName | string | `"app.clearml.127-0-0-1.nip.io"` | | | ingress.app.path | string | `"/"` | | | ingress.app.tlsSecretName | string | `""` | | | ingress.files.annotations | object | `{}` | | | ingress.files.enabled | bool | `false` | | | ingress.files.hostName | string | `"files.clearml.127-0-0-1.nip.io"` | | | ingress.files.path | string | `"/"` | | | ingress.files.tlsSecretName | string | `""` | | | ingress.name | string | `"clearml-server-ingress"` | | | mongodb.architecture | string | `"standalone"` | | | mongodb.auth.enabled | bool | `false` | | | mongodb.enabled | bool | `true` | | | mongodb.persistence.accessModes[0] | string | `"ReadWriteOnce"` | | | mongodb.persistence.enabled | bool | `true` | | | mongodb.persistence.size | string | `"50Gi"` | | | mongodb.replicaCount | int | `1` | | | mongodb.service.name | string | `"{{ .Release.Name }}-mongodb"` | | | mongodb.service.port | int | `27017` | | | mongodb.service.portName | string | `"mongo-service"` | | | mongodb.service.type | string | `"ClusterIP"` | | | redis.cluster.enabled | bool | `false` | | | redis.databaseNumber | int | `0` | | | redis.enabled | bool | `true` | | | redis.master.name | string | `"{{ .Release.Name }}-redis-master"` | | | redis.master.persistence.accessModes[0] | string | `"ReadWriteOnce"` | | | redis.master.persistence.enabled | bool | `true` | | | redis.master.persistence.size | string | `"5Gi"` | | | redis.master.port | int | `6379` | | | redis.usePassword | bool | `false` | | | secret.authToken | string | `"1SCf0ov3Nm544Td2oZ0gXSrsNx5XhMWdVlKz1tOgcx158bD5RV"` | Set for auth_token field | | secret.credentials.apiserver.accessKey | string | `"5442F3443MJMORWZA3ZH"` | Set for apiserver_key field | | secret.credentials.apiserver.secretKey | string | `"BxapIRo9ZINi8x25CRxz8Wdmr2pQjzuWVB4PNASZqCtTyWgWVQ"` | Set for apiserver_secret field | | secret.credentials.tests.accessKey | string | `"ENP39EQM4SLACGD5FXB7"` | Set for tests_user_key field | | secret.credentials.tests.secretKey | string | `"lPcm0imbcBZ8mwgO7tpadutiS3gnJD05x9j7afwXPS35IKbpiQ"` | Set for tests_user_secret field | | secret.httpSession | string | `"9Tw20RbhJ1bLBiHEOWXvhplKGUbTgLzAtwFN2oLQvWwS0uRpD5"` | Set for http_session field | | webserver.additionalConfigs | object | `{}` | | | webserver.affinity | object | `{}` | | | webserver.extraEnvs | list | `[]` | | | webserver.image.pullPolicy | string | `"IfNotPresent"` | | | webserver.image.repository | string | `"allegroai/clearml"` | | | webserver.image.tag | string | `"1.4.0"` | | | webserver.nodeSelector | object | `{}` | | | webserver.podAnnotations | object | `{}` | | | webserver.replicaCount | int | `1` | | | webserver.resources | object | `{}` | | | webserver.service.nodePort | int | `30080` | If service.type set to NodePort, this will be set to service's nodePort field. If service.type is set to others, this field will be ignored | | webserver.service.port | int | `80` | | | webserver.service.type | string | `"NodePort"` | This will set to service's spec.type field | | webserver.tolerations | list | `[]` | |