# ClearML Ecosystem for Kubernetes ![Version: 7.3.2](https://img.shields.io/badge/Version-7.3.2-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: 1.12.0](https://img.shields.io/badge/AppVersion-1.12.0-informational?style=flat-square) MLOps platform **Homepage:** ## Maintainers | Name | Email | Url | | ---- | ------ | --- | | valeriano-manassero | | | ## Introduction The **clearml-server** is the backend service infrastructure for [ClearML](https://github.com/allegroai/clearml). It allows multiple users to collaborate and manage their experiments. **clearml-server** contains the following components: * The ClearML Web-App, a single-page UI for experiment management and browsing * RESTful API for: * Documenting and logging experiment information, statistics and results * Querying experiments history, logs and results * Locally-hosted file server for storing images and models making them easily accessible using the Web-App ## Local environment For development/evaluation it's possible to use [kind](https://kind.sigs.k8s.io). After installation, following commands will create a complete ClearML insatllation: ``` cat <` * `files.` * `api.` (*for example, `app.clearml.mydomainname.com`, `files.clearml.mydomainname.com` and `api.clearml.mydomainname.com`*) Just pointing the domain records to the IP where ingress controller is responding will complete the deployment process. A production ready cluster should also have some different configuration like the one proposed in `values-production.yaml` that can be applied with: ``` helm install clearml allegroai/clearml -f values-production.yaml ``` ## Upgrades/ Values upgrades Updating to latest version of this chart can be done in two steps: ``` helm repo update helm upgrade clearml allegroai/clearml ``` Changing values on existing installation can be done with: ``` helm upgrade clearml allegroai/clearml --version -f custom_values.yaml ``` Please note: updating values only should always be done setting explicit chart version to avoid a possible chart update. Keeping separate updates procedures between version and values can be a good practice to seprate potential concerns. ### Major upgrade from 5.* to 6.* Before issuing helm upgrade: * delete Redis statefulset(s) * scale MongoDB deployment(s) replicas to 0 * if using securityContexts check for new value form in values.yaml (podSecurityContext and containerSecurityContext) ## Additional Configuration for ClearML Server You can also configure the **clearml-server** for: * fixed users (users with credentials) * non-responsive experiment watchdog settings For detailed instructions, see the [Optional Configuration](https://github.com/allegroai/clearml-server#optional-configuration) section in the **clearml-server** repository README file. ## Source Code * * ## Requirements Kubernetes: `>= 1.21.0-0 < 1.29.0-0` | Repository | Name | Version | |------------|------|---------| | https://charts.bitnami.com/bitnami | mongodb | 12.1.31 | | https://charts.bitnami.com/bitnami | redis | 17.8.3 | | https://helm.elastic.co | elasticsearch | 7.17.3 | ## Values | Key | Type | Default | Description | |-----|------|---------|-------------| | apiserver | object | `{"additionalConfigs":{},"affinity":{},"containerSecurityContext":{},"enabled":true,"existingAdditionalConfigsConfigMap":"","existingAdditionalConfigsSecret":"","extraEnvs":[],"image":{"pullPolicy":"IfNotPresent","registry":"","repository":"allegroai/clearml","tag":"1.12.0-393"},"ingress":{"annotations":{},"enabled":false,"hostName":"api.clearml.127-0-0-1.nip.io","ingressClassName":"","path":"/","tlsSecretName":""},"initContainers":{"resources":{"limits":{"cpu":"10m","memory":"64Mi"},"requests":{"cpu":"10m","memory":"64Mi"}}},"nodeSelector":{},"podAnnotations":{},"podSecurityContext":{},"prepopulateEnabled":true,"processes":{"count":8,"maxRequests":1000,"maxRequestsJitter":300,"timeout":24000},"replicaCount":1,"resources":{"limits":{"cpu":"2000m","memory":"1Gi"},"requests":{"cpu":"100m","memory":"256Mi"}},"service":{"annotations":{},"nodePort":30008,"port":8008,"type":"NodePort"},"tolerations":[]}` | Api Server configurations | | apiserver.additionalConfigs | object | `{}` | files declared in this parameter will be mounted and read by apiserver (examples in values.yaml) if not overridden by existingAdditionalConfigsSecret | | apiserver.affinity | object | `{}` | Api Server affinity setup | | apiserver.containerSecurityContext | object | `{}` | Api Server containers security context | | apiserver.enabled | bool | `true` | Enable/Disable component deployment | | apiserver.existingAdditionalConfigsConfigMap | string | `""` | reference for files declared in existing ConfigMap will be mounted and read by apiserver (examples in values.yaml) | | apiserver.existingAdditionalConfigsSecret | string | `""` | reference for files declared in existing Secret will be mounted and read by apiserver (examples in values.yaml) if not overridden by existingAdditionalConfigsConfigMap | | apiserver.extraEnvs | list | `[]` | Api Server extra envrinoment variables | | apiserver.image | object | `{"pullPolicy":"IfNotPresent","registry":"","repository":"allegroai/clearml","tag":"1.12.0-393"}` | Api Server image configuration | | apiserver.ingress | object | `{"annotations":{},"enabled":false,"hostName":"api.clearml.127-0-0-1.nip.io","ingressClassName":"","path":"/","tlsSecretName":""}` | Ingress configuration for Api Server component | | apiserver.ingress.annotations | object | `{}` | Ingress annotations | | apiserver.ingress.enabled | bool | `false` | Enable/Disable ingress | | apiserver.ingress.hostName | string | `"api.clearml.127-0-0-1.nip.io"` | Ingress hostname domain | | apiserver.ingress.ingressClassName | string | `""` | ClassName (must be defined if no default ingressClassName is available) | | apiserver.ingress.path | string | `"/"` | Ingress root path url | | apiserver.ingress.tlsSecretName | string | `""` | Reference to secret containing TLS certificate. If set, it enables HTTPS on ingress rule. | | apiserver.initContainers | object | `{"resources":{"limits":{"cpu":"10m","memory":"64Mi"},"requests":{"cpu":"10m","memory":"64Mi"}}}` | Api Server resources per initContainers pod | | apiserver.nodeSelector | object | `{}` | Api Server nodeselector | | apiserver.podAnnotations | object | `{}` | specific annotation for Api Server pods | | apiserver.podSecurityContext | object | `{}` | Api Server pod security context | | apiserver.prepopulateEnabled | bool | `true` | Enable/Disable example data load | | apiserver.processes | object | `{"count":8,"maxRequests":1000,"maxRequestsJitter":300,"timeout":24000}` | Api Server internal processes configuration | | apiserver.processes.count | int | `8` | Api Server internal listing processes | | apiserver.processes.maxRequests | int | `1000` | Api Server maximum number of concurrent requests | | apiserver.processes.maxRequestsJitter | int | `300` | Api Server max jitter on api request | | apiserver.processes.timeout | int | `24000` | Api timeout (ms) | | apiserver.replicaCount | int | `1` | Api Server number of pods | | apiserver.resources | object | `{"limits":{"cpu":"2000m","memory":"1Gi"},"requests":{"cpu":"100m","memory":"256Mi"}}` | Api Server resources per pod; these are minimal requirements, it's suggested to increase these values in production environments | | apiserver.service | object | `{"annotations":{},"nodePort":30008,"port":8008,"type":"NodePort"}` | Api Server internal service configuration | | apiserver.service.annotations | object | `{}` | specific annotation for Api Server service | | apiserver.service.nodePort | int | `30008` | If service.type set to NodePort, this will be set to service's nodePort field. If service.type is set to others, this field will be ignored | | apiserver.tolerations | list | `[]` | Api Server tolerations setup | | clearml | object | `{"apiserverKey":"GGS9F4M6XB2DXJ5AFT9F","apiserverSecret":"2oGujVFhPfaozhpuz2GzQfA5OyxmMsR3WVJpsCR5hrgHFs20PO","clientConfigurationApiUrl":"","clientConfigurationFilesUrl":"","cookieDomain":"","cookieName":"clearml-token-k8s","defaultCompany":"d1bd92a3b039400cbafc60a7a5b1e52b","existingSecret":"","fileserverKey":"XXCRJ123CEE2KSQ068WO","fileserverSecret":"YIy8EVAC7QCT4FtgitxAQGyW7xRHDZ4jpYlTE7HKiscpORl1hG","readinessprobeKey":"GK4PRTVT3706T25K6BA1","readinessprobeSecret":"ymLh1ok5k5xNUQfS944Xdx9xjf0wueokqKM2dMZfHuH9ayItG2","secureAuthTokenSecret":"ymLh1ok5k5xNUQfS944Xdx9xjf0wueokqKM2dMZfHuH9ayItG2","testUserKey":"ENP39EQM4SLACGD5FXB7","testUserSecret":"lPcm0imbcBZ8mwgO7tpadutiS3gnJD05x9j7afwXPS35IKbpiQ"}` | ClearMl generic configurations | | clearml.apiserverKey | string | `"GGS9F4M6XB2DXJ5AFT9F"` | Api Server basic auth key | | clearml.apiserverSecret | string | `"2oGujVFhPfaozhpuz2GzQfA5OyxmMsR3WVJpsCR5hrgHFs20PO"` | Api Server basic auth secret | | clearml.clientConfigurationApiUrl | string | `""` | Override the API Urls displayed when showing an example of the SDK's clearml.conf configuration | | clearml.clientConfigurationFilesUrl | string | `""` | Override the Files Urls displayed when showing an example of the SDK's clearml.conf configuration | | clearml.cookieDomain | string | `""` | Cookie domain to be left empty if not exposed with an ingress | | clearml.cookieName | string | `"clearml-token-k8s"` | Name fo the UI cookie | | clearml.defaultCompany | string | `"d1bd92a3b039400cbafc60a7a5b1e52b"` | Company name | | clearml.existingSecret | string | `""` | Pass Clearml secrets using an existing secret must contain the keys: apiserver_key, apiserver_secret, secure_auth_token_secret, test_user_key, test_user_secret | | clearml.fileserverKey | string | `"XXCRJ123CEE2KSQ068WO"` | File Server basic auth key | | clearml.fileserverSecret | string | `"YIy8EVAC7QCT4FtgitxAQGyW7xRHDZ4jpYlTE7HKiscpORl1hG"` | File Server basic auth secret | | clearml.readinessprobeKey | string | `"GK4PRTVT3706T25K6BA1"` | Readiness probe basic auth key | | clearml.readinessprobeSecret | string | `"ymLh1ok5k5xNUQfS944Xdx9xjf0wueokqKM2dMZfHuH9ayItG2"` | Readiness probe basic auth secret | | clearml.secureAuthTokenSecret | string | `"ymLh1ok5k5xNUQfS944Xdx9xjf0wueokqKM2dMZfHuH9ayItG2"` | Secure Auth secret | | clearml.testUserKey | string | `"ENP39EQM4SLACGD5FXB7"` | Test Server basic auth key | | clearml.testUserSecret | string | `"lPcm0imbcBZ8mwgO7tpadutiS3gnJD05x9j7afwXPS35IKbpiQ"` | Test File Server basic auth secret | | elasticsearch | object | `{"clusterHealthCheckParams":"wait_for_status=yellow&timeout=1s","clusterName":"clearml-elastic","enabled":true,"esConfig":{"elasticsearch.yml":"xpack.security.enabled: false\n"},"esJavaOpts":"-Xmx2g -Xms2g","extraEnvs":[{"name":"bootstrap.memory_lock","value":"false"},{"name":"cluster.routing.allocation.node_initial_primaries_recoveries","value":"500"},{"name":"cluster.routing.allocation.disk.watermark.low","value":"500mb"},{"name":"cluster.routing.allocation.disk.watermark.high","value":"500mb"},{"name":"cluster.routing.allocation.disk.watermark.flood_stage","value":"500mb"},{"name":"http.compression_level","value":"7"},{"name":"reindex.remote.whitelist","value":"*.*"},{"name":"xpack.monitoring.enabled","value":"false"},{"name":"xpack.security.enabled","value":"false"}],"httpPort":9200,"minimumMasterNodes":1,"persistence":{"enabled":true},"rbac":{"create":true},"replicas":1,"resources":{"limits":{"cpu":"2000m","memory":"4Gi"},"requests":{"cpu":"100m","memory":"2Gi"}},"roles":{"data":"true","ingest":"true","master":"true","remote_cluster_client":"true"},"volumeClaimTemplate":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"50Gi"}},"storageClassName":null}}` | Configuration from https://github.com/elastic/helm-charts/blob/7.16/elasticsearch/values.yaml | | externalServices | object | `{"elasticsearchConnectionString":"[{\"host\":\"es_hostname1\",\"port\":9200},{\"host\":\"es_hostname2\",\"port\":9200},{\"host\":\"es_hostname3\",\"port\":9200}]","mongodbConnectionStringAuth":"mongodb://mongodb_hostname:27017/auth","mongodbConnectionStringBackend":"mongodb://mongodb_hostnamehostname:27017/backend","redisHost":"redis_hostname","redisPort":6379}` | Definition of external services to use if not enabled as dependency charts here | | externalServices.elasticsearchConnectionString | string | `"[{\"host\":\"es_hostname1\",\"port\":9200},{\"host\":\"es_hostname2\",\"port\":9200},{\"host\":\"es_hostname3\",\"port\":9200}]"` | Existing ElasticSearch connectionstring if elasticsearch.enabled is false (example in values.yaml) | | externalServices.mongodbConnectionStringAuth | string | `"mongodb://mongodb_hostname:27017/auth"` | Existing MongoDB connection string for BACKEND to use if mongodb.enabled is false (example in values.yaml) | | externalServices.mongodbConnectionStringBackend | string | `"mongodb://mongodb_hostnamehostname:27017/backend"` | Existing MongoDB connection string for AUTH to use if mongodb.enabled is false (example in values.yaml) | | externalServices.redisHost | string | `"redis_hostname"` | Existing Redis Hostname to use if redis.enabled is false (example in values.yaml) | | externalServices.redisPort | int | `6379` | Existing Redis Port to use if redis.enabled is false | | fileserver | object | `{"affinity":{},"containerSecurityContext":{},"enabled":true,"extraEnvs":[],"image":{"pullPolicy":"IfNotPresent","registry":"","repository":"allegroai/clearml","tag":"1.12.0-393"},"ingress":{"annotations":{},"enabled":false,"hostName":"files.clearml.127-0-0-1.nip.io","ingressClassName":"","path":"/","tlsSecretName":""},"initContainers":{"resources":{"limits":{"cpu":"10m","memory":"64Mi"},"requests":{"cpu":"10m","memory":"64Mi"}}},"nodeSelector":{},"podAnnotations":{},"podSecurityContext":{},"replicaCount":1,"resources":{"limits":{"cpu":"2000m","memory":"1Gi"},"requests":{"cpu":"100m","memory":"256Mi"}},"service":{"annotations":{},"nodePort":30081,"port":8081,"type":"NodePort"},"storage":{"data":{"accessMode":"ReadWriteOnce","class":"","existingPVC":"","size":"50Gi"},"enabled":true},"tolerations":[]}` | File Server configurations | | fileserver.affinity | object | `{}` | File Server affinity setup | | fileserver.containerSecurityContext | object | `{}` | File Server containers security context | | fileserver.enabled | bool | `true` | Enable/Disable component deployment | | fileserver.extraEnvs | list | `[]` | File Server extra envrinoment variables | | fileserver.image | object | `{"pullPolicy":"IfNotPresent","registry":"","repository":"allegroai/clearml","tag":"1.12.0-393"}` | File Server image configuration | | fileserver.ingress | object | `{"annotations":{},"enabled":false,"hostName":"files.clearml.127-0-0-1.nip.io","ingressClassName":"","path":"/","tlsSecretName":""}` | Ingress configuration for File Server component | | fileserver.ingress.annotations | object | `{}` | Ingress annotations | | fileserver.ingress.enabled | bool | `false` | Enable/Disable ingress | | fileserver.ingress.hostName | string | `"files.clearml.127-0-0-1.nip.io"` | Ingress hostname domain | | fileserver.ingress.ingressClassName | string | `""` | ClassName (must be defined if no default ingressClassName is available) | | fileserver.ingress.path | string | `"/"` | Ingress root path url | | fileserver.ingress.tlsSecretName | string | `""` | Reference to secret containing TLS certificate. If set, it enables HTTPS on ingress rule. | | fileserver.initContainers | object | `{"resources":{"limits":{"cpu":"10m","memory":"64Mi"},"requests":{"cpu":"10m","memory":"64Mi"}}}` | File Server resources per initContainers pod | | fileserver.nodeSelector | object | `{}` | File Server nodeselector | | fileserver.podAnnotations | object | `{}` | specific annotation for File Server pods | | fileserver.podSecurityContext | object | `{}` | File Server pod security context | | fileserver.replicaCount | int | `1` | File Server number of pods | | fileserver.resources | object | `{"limits":{"cpu":"2000m","memory":"1Gi"},"requests":{"cpu":"100m","memory":"256Mi"}}` | File Server resources per pod; these are minimal requirements, it's suggested to increase these values in production environments | | fileserver.service | object | `{"annotations":{},"nodePort":30081,"port":8081,"type":"NodePort"}` | File Server internal service configuration | | fileserver.service.annotations | object | `{}` | specific annotation for File Server service | | fileserver.service.nodePort | int | `30081` | If service.type set to NodePort, this will be set to service's nodePort field. If service.type is set to others, this field will be ignored | | fileserver.storage | object | `{"data":{"accessMode":"ReadWriteOnce","class":"","existingPVC":"","size":"50Gi"},"enabled":true}` | File server persistence settings | | fileserver.storage.data.accessMode | string | `"ReadWriteOnce"` | Access mode (must be ReadWriteMany if fileserver replica > 1) | | fileserver.storage.data.class | string | `""` | Storage class (use default if empty) | | fileserver.storage.data.existingPVC | string | `""` | If set, it uses an already existing PVC instead of dynamic provisioning | | fileserver.storage.enabled | bool | `true` | If set to false no PVC is created and emptyDir is used | | fileserver.tolerations | list | `[]` | File Server tolerations setup | | global | object | `{"imageRegistry":"docker.io"}` | Global parameters section | | global.imageRegistry | string | `"docker.io"` | Images registry | | imageCredentials | object | `{"email":"someone@host.com","enabled":false,"existingSecret":"","password":"pwd","registry":"docker.io","username":"someone"}` | Container registry configuration | | imageCredentials.email | string | `"someone@host.com"` | Email | | imageCredentials.enabled | bool | `false` | Use private authentication mode | | imageCredentials.existingSecret | string | `""` | If this is set, chart will not generate a secret but will use what is defined here | | imageCredentials.password | string | `"pwd"` | Registry password | | imageCredentials.registry | string | `"docker.io"` | Registry name | | imageCredentials.username | string | `"someone"` | Registry username | | mongodb | object | `{"architecture":"standalone","auth":{"enabled":false},"enabled":true,"persistence":{"accessModes":["ReadWriteOnce"],"enabled":true,"size":"50Gi","storageClass":null},"replicaCount":1}` | Configuration from https://github.com/bitnami/charts/blob/master/bitnami/mongodb/values.yaml | | redis | object | `{"architecture":"standalone","auth":{"enabled":false},"databaseNumber":0,"enabled":true,"master":{"name":"{{ .Release.Name }}-redis-master","persistence":{"accessModes":["ReadWriteOnce"],"enabled":true,"size":"5Gi","storageClass":null},"port":6379}}` | Configuration from https://github.com/bitnami/charts/blob/master/bitnami/redis/values.yaml | | webserver | object | `{"additionalConfigs":{},"affinity":{},"containerSecurityContext":{},"enabled":true,"extraEnvs":[],"image":{"pullPolicy":"IfNotPresent","registry":"","repository":"allegroai/clearml","tag":"1.12.0-393"},"ingress":{"annotations":{},"enabled":false,"hostName":"app.clearml.127-0-0-1.nip.io","ingressClassName":"","path":"/","tlsSecretName":""},"initContainers":{"resources":{"limits":{"cpu":"10m","memory":"64Mi"},"requests":{"cpu":"10m","memory":"64Mi"}}},"nodeSelector":{},"podAnnotations":{},"podSecurityContext":{},"replicaCount":1,"resources":{"limits":{"cpu":"2000m","memory":"1Gi"},"requests":{"cpu":"100m","memory":"256Mi"}},"service":{"annotations":{},"nodePort":30080,"port":8080,"type":"NodePort"},"tolerations":[]}` | Web Server configurations | | webserver.additionalConfigs | object | `{}` | Additional specific webserver configurations | | webserver.affinity | object | `{}` | Web Server affinity setup | | webserver.containerSecurityContext | object | `{}` | Web Server containers security context | | webserver.enabled | bool | `true` | Enable/Disable component deployment | | webserver.extraEnvs | list | `[]` | Web Server extra envrinoment variables | | webserver.image | object | `{"pullPolicy":"IfNotPresent","registry":"","repository":"allegroai/clearml","tag":"1.12.0-393"}` | Web Server image configuration | | webserver.ingress | object | `{"annotations":{},"enabled":false,"hostName":"app.clearml.127-0-0-1.nip.io","ingressClassName":"","path":"/","tlsSecretName":""}` | Ingress configuration for Web Server component | | webserver.ingress.annotations | object | `{}` | Ingress annotations | | webserver.ingress.enabled | bool | `false` | Enable/Disable ingress | | webserver.ingress.hostName | string | `"app.clearml.127-0-0-1.nip.io"` | Ingress hostname domain | | webserver.ingress.ingressClassName | string | `""` | ClassName (must be defined if no default ingressClassName is available) | | webserver.ingress.path | string | `"/"` | Ingress root path url | | webserver.ingress.tlsSecretName | string | `""` | Reference to secret containing TLS certificate. If set, it enables HTTPS on ingress rule. | | webserver.initContainers | object | `{"resources":{"limits":{"cpu":"10m","memory":"64Mi"},"requests":{"cpu":"10m","memory":"64Mi"}}}` | Web Server resources per initContainers pod | | webserver.nodeSelector | object | `{}` | Web Server nodeselector | | webserver.podAnnotations | object | `{}` | specific annotation for Web Server pods | | webserver.podSecurityContext | object | `{}` | Web Server pod security context | | webserver.replicaCount | int | `1` | Web Server number of pods | | webserver.resources | object | `{"limits":{"cpu":"2000m","memory":"1Gi"},"requests":{"cpu":"100m","memory":"256Mi"}}` | Web Server resources per pod; these are minimal requirements, it's suggested to increase these values in production environments | | webserver.service | object | `{"annotations":{},"nodePort":30080,"port":8080,"type":"NodePort"}` | Web Server internal service configuration | | webserver.service.annotations | object | `{}` | specific annotation for Web Server service | | webserver.service.nodePort | int | `30080` | If service.type set to NodePort, this will be set to service's nodePort field. If service.type is set to others, this field will be ignored | | webserver.tolerations | list | `[]` | Web Server tolerations setup |