2021-07-07 09:44:21 +00:00
# ClearML Ecosystem for Kubernetes
2021-07-07 07:04:15 +00:00
2022-09-16 06:28:41 +00:00
![Version: 4.2.1 ](https://img.shields.io/badge/Version-4.2.1-informational?style=flat-square ) ![Type: application ](https://img.shields.io/badge/Type-application-informational?style=flat-square ) ![AppVersion: 1.6.0 ](https://img.shields.io/badge/AppVersion-1.6.0-informational?style=flat-square )
2021-07-07 07:04:15 +00:00
MLOps platform
**Homepage:** < https: / / clear . ml >
## Maintainers
| Name | Email | Url |
| ---- | ------ | --- |
2022-04-19 06:22:08 +00:00
| valeriano-manassero | | < https: / / github . com / valeriano-manassero > |
2021-07-07 07:04:15 +00:00
2021-07-07 09:44:21 +00:00
## Introduction
The **clearml-server** is the backend service infrastructure for [ClearML ](https://github.com/allegroai/clearml ).
It allows multiple users to collaborate and manage their experiments.
**clearml-server** contains the following components:
* The ClearML Web-App, a single-page UI for experiment management and browsing
* RESTful API for:
* Documenting and logging experiment information, statistics and results
* Querying experiments history, logs and results
* Locally-hosted file server for storing images and models making them easily accessible using the Web-App
2021-07-15 15:34:29 +00:00
## Local environment
For development/evaluation it's possible to use [kind ](https://kind.sigs.k8s.io ).
After installation, following commands will create a complete ClearML insatllation:
```
2022-04-04 08:32:51 +00:00
cat < < EOF | kind create cluster --config = - ─ ╯
2021-07-15 15:34:29 +00:00
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
extraPortMappings:
2021-11-08 13:23:10 +00:00
# API server's default nodePort is 30008. If you customize it in helm values by
# `apiserver.service.nodePort` , `containerPort` should match it
2021-07-15 15:34:29 +00:00
- containerPort: 30008
hostPort: 30008
listenAddress: "127.0.0.1"
protocol: TCP
2021-11-08 13:23:10 +00:00
# Web server's default nodePort is 30080. If you customize it in helm values by
# `webserver.service.nodePort` , `containerPort` should match it
2021-07-15 15:34:29 +00:00
- containerPort: 30080
hostPort: 30080
listenAddress: "127.0.0.1"
protocol: TCP
2021-11-08 13:23:10 +00:00
# File server's default nodePort is 30081. If you customize it in helm values by
# `fileserver.service.nodePort` , `containerPort` should match it
2021-07-15 15:34:29 +00:00
- containerPort: 30081
hostPort: 30081
listenAddress: "127.0.0.1"
protocol: TCP
extraMounts:
2021-07-16 05:30:09 +00:00
- hostPath: /tmp/clearml-kind/
2021-07-15 15:34:29 +00:00
containerPath: /var/local-path-provisioner
EOF
helm install clearml allegroai/clearml
```
After deployment, the services will be exposed on localhost on the following ports:
2021-07-07 09:44:21 +00:00
* API server on `30008`
* Web server on `30080`
* File server on `30081`
2021-07-16 05:30:09 +00:00
Data persisted in every Kubernetes volume by ClearML will be accessible in /tmp/clearml-kind folder on the host.
2021-07-15 15:34:29 +00:00
## Production cluster environment
2021-07-07 09:44:21 +00:00
2021-07-15 15:34:29 +00:00
In a production environment it's suggested to install an ingress controller and verify that is working correctly.
During ClearML deployment enable `ingress` section of chart values.
This will create 3 ingress rules:
2021-07-07 09:44:21 +00:00
2021-07-15 15:34:29 +00:00
* `app.<your domain name>`
* `files.<your domain name>`
* `api.<your domain name>`
2021-07-07 09:44:21 +00:00
2021-07-15 15:34:29 +00:00
(*for example, `app.clearml.mydomainname.com` , `files.clearml.mydomainname.com` and `api.clearml.mydomainname.com` *)
2021-07-07 09:44:21 +00:00
2021-07-15 15:34:29 +00:00
Just pointing the domain records to the IP where ingress controller is responding will complete the deployment process.
2021-07-07 09:44:21 +00:00
2022-04-04 08:32:51 +00:00
## Upgrades/ Values upgrades
Updating to latest version of this chart can be done in two steps:
```
helm repo update
helm upgrade clearml allegroai/clearml
```
Changing values on existing installation can be done with:
```
helm upgrade clearml allegroai/clearml --version < CURRENT CHART VERSION > -f custom_values.yaml
```
Please note: updating values only should always be done setting explicit chart version to avoid a possible chart update.
Keeping separate updates procedures between version and values can be a good practice to seprate potential concerns.
2021-07-07 09:44:21 +00:00
## Additional Configuration for ClearML Server
You can also configure the **clearml-server** for:
* fixed users (users with credentials)
* non-responsive experiment watchdog settings
For detailed instructions, see the [Optional Configuration ](https://github.com/allegroai/clearml-server#optional-configuration ) section in the **clearml-server** repository README file.
2021-07-07 07:04:15 +00:00
## Source Code
* < https: // github . com / allegroai / clearml-helm-charts >
* < https: // github . com / allegroai / clearml >
## Requirements
2022-09-16 06:28:41 +00:00
Kubernetes: `>= 1.21.0-0 < 1.26.0-0`
2021-07-07 07:04:15 +00:00
| Repository | Name | Version |
|------------|------|---------|
2022-06-02 19:20:00 +00:00
| file://../../dependency_charts/elasticsearch | elasticsearch | 7.16.2 |
| file://../../dependency_charts/mongodb | mongodb | 10.3.4 |
| file://../../dependency_charts/redis | redis | 10.9.0 |
2021-07-07 07:04:15 +00:00
## Values
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| apiserver.affinity | object | `{}` | |
2021-12-09 10:39:04 +00:00
| apiserver.authCookiesMaxAge | int | `864000` | Amount of seconds the authorization cookie will last in user browser |
2021-07-07 07:04:15 +00:00
| apiserver.configDir | string | `"/opt/clearml/config"` | |
2022-08-09 05:08:12 +00:00
| apiserver.configuration | object | `{"additionalConfigs":{},"configRefName":"","secretRefName":""}` | additional configurations that can be used by api server; check examples in values.yaml file |
2021-07-07 07:04:15 +00:00
| apiserver.extraEnvs | list | `[]` | |
| apiserver.image.pullPolicy | string | `"IfNotPresent"` | |
| apiserver.image.repository | string | `"allegroai/clearml"` | |
2022-07-12 08:48:16 +00:00
| apiserver.image.tag | string | `"1.6.0"` | |
2021-07-07 07:04:15 +00:00
| apiserver.livenessDelay | int | `60` | |
| apiserver.nodeSelector | object | `{}` | |
| apiserver.podAnnotations | object | `{}` | |
| apiserver.prepopulateArtifactsPath | string | `"/mnt/fileserver"` | |
| apiserver.prepopulateEnabled | string | `"true"` | |
| apiserver.prepopulateZipFiles | string | `"/opt/clearml/db-pre-populate"` | |
| apiserver.readinessDelay | int | `60` | |
| apiserver.replicaCount | int | `1` | |
| apiserver.resources | object | `{}` | |
2021-11-08 13:23:10 +00:00
| apiserver.service.nodePort | int | `30008` | If service.type set to NodePort, this will be set to service's nodePort field. If service.type is set to others, this field will be ignored |
2021-07-07 07:04:15 +00:00
| apiserver.service.port | int | `8008` | |
2021-11-08 13:23:10 +00:00
| apiserver.service.type | string | `"NodePort"` | This will set to service's spec.type field |
2021-07-07 07:04:15 +00:00
| apiserver.tolerations | list | `[]` | |
2022-06-23 05:49:45 +00:00
| clearml | object | `{"defaultCompany":"d1bd92a3b039400cbafc60a7a5b1e52b"}` | ClearMl generic configurations |
2021-07-07 07:04:15 +00:00
| elasticsearch.clusterHealthCheckParams | string | `"wait_for_status=yellow&timeout=1s"` | |
| elasticsearch.clusterName | string | `"clearml-elastic"` | |
| elasticsearch.enabled | bool | `true` | |
| elasticsearch.esConfig."elasticsearch.yml" | string | `"xpack.security.enabled: false\n"` | |
| elasticsearch.esJavaOpts | string | `"-Xmx2g -Xms2g"` | |
| elasticsearch.extraEnvs[0].name | string | `"bootstrap.memory_lock"` | |
| elasticsearch.extraEnvs[0].value | string | `"false"` | |
| elasticsearch.extraEnvs[1].name | string | `"cluster.routing.allocation.node_initial_primaries_recoveries"` | |
| elasticsearch.extraEnvs[1].value | string | `"500"` | |
| elasticsearch.extraEnvs[2].name | string | `"cluster.routing.allocation.disk.watermark.low"` | |
| elasticsearch.extraEnvs[2].value | string | `"500mb"` | |
| elasticsearch.extraEnvs[3].name | string | `"cluster.routing.allocation.disk.watermark.high"` | |
| elasticsearch.extraEnvs[3].value | string | `"500mb"` | |
| elasticsearch.extraEnvs[4].name | string | `"cluster.routing.allocation.disk.watermark.flood_stage"` | |
| elasticsearch.extraEnvs[4].value | string | `"500mb"` | |
| elasticsearch.extraEnvs[5].name | string | `"http.compression_level"` | |
| elasticsearch.extraEnvs[5].value | string | `"7"` | |
| elasticsearch.extraEnvs[6].name | string | `"reindex.remote.whitelist"` | |
| elasticsearch.extraEnvs[6].value | string | `"*.*"` | |
| elasticsearch.extraEnvs[7].name | string | `"xpack.monitoring.enabled"` | |
| elasticsearch.extraEnvs[7].value | string | `"false"` | |
| elasticsearch.extraEnvs[8].name | string | `"xpack.security.enabled"` | |
| elasticsearch.extraEnvs[8].value | string | `"false"` | |
| elasticsearch.httpPort | int | `9200` | |
| elasticsearch.minimumMasterNodes | int | `1` | |
| elasticsearch.persistence.enabled | bool | `true` | |
| elasticsearch.replicas | int | `1` | |
| elasticsearch.resources.limits.memory | string | `"4Gi"` | |
| elasticsearch.resources.requests.memory | string | `"4Gi"` | |
| elasticsearch.roles.data | string | `"true"` | |
| elasticsearch.roles.ingest | string | `"true"` | |
| elasticsearch.roles.master | string | `"true"` | |
| elasticsearch.roles.remote_cluster_client | string | `"true"` | |
| elasticsearch.volumeClaimTemplate.accessModes[0] | string | `"ReadWriteOnce"` | |
| elasticsearch.volumeClaimTemplate.resources.requests.storage | string | `"50Gi"` | |
2021-11-26 07:11:55 +00:00
| externalServices.elasticsearchHost | string | `""` | Existing ElasticSearch Hostname to use if elasticsearch.enabled is false |
| externalServices.elasticsearchPort | int | `9200` | Existing ElasticSearch Port to use if elasticsearch.enabled is false |
| externalServices.mongodbHost | string | `""` | Existing MongoDB Hostname to use if elasticsearch.enabled is false |
| externalServices.mongodbPort | int | `27017` | Existing MongoDB Port to use if elasticsearch.enabled is false |
| externalServices.redisHost | string | `""` | Existing Redis Hostname to use if elasticsearch.enabled is false |
| externalServices.redisPort | int | `6379` | Existing Redis Port to use if elasticsearch.enabled is false |
2021-07-07 07:04:15 +00:00
| fileserver.affinity | object | `{}` | |
| fileserver.extraEnvs | list | `[]` | |
| fileserver.image.pullPolicy | string | `"IfNotPresent"` | |
| fileserver.image.repository | string | `"allegroai/clearml"` | |
2022-07-12 08:48:16 +00:00
| fileserver.image.tag | string | `"1.6.0"` | |
2021-07-07 07:04:15 +00:00
| fileserver.nodeSelector | object | `{}` | |
| fileserver.podAnnotations | object | `{}` | |
| fileserver.replicaCount | int | `1` | |
| fileserver.resources | object | `{}` | |
2021-11-08 13:23:10 +00:00
| fileserver.service.nodePort | int | `30081` | If service.type set to NodePort, this will be set to service's nodePort field. If service.type is set to others, this field will be ignored |
2021-07-07 07:04:15 +00:00
| fileserver.service.port | int | `8081` | |
2021-11-08 13:23:10 +00:00
| fileserver.service.type | string | `"NodePort"` | This will set to service's spec.type field |
2022-05-02 16:00:46 +00:00
| fileserver.storage.data.class | string | `""` | |
2021-07-07 07:04:15 +00:00
| fileserver.storage.data.size | string | `"50Gi"` | |
| fileserver.tolerations | list | `[]` | |
2022-06-23 05:49:45 +00:00
| imageCredentials | object | `{"email":"someone@host.com","enabled":false,"existingSecret":"","password":"pwd","registry":"docker.io","username":"someone"}` | Private image registry configuration |
| imageCredentials.email | string | `"someone@host.com"` | Email |
| imageCredentials.enabled | bool | `false` | Use private authentication mode |
| imageCredentials.existingSecret | string | `""` | If this is set, chart will not generate a secret but will use what is defined here |
| imageCredentials.password | string | `"pwd"` | Registry password |
| imageCredentials.registry | string | `"docker.io"` | Registry name |
| imageCredentials.username | string | `"someone"` | Registry username |
2021-07-07 07:04:15 +00:00
| ingress.annotations | object | `{}` | |
2022-01-18 17:06:01 +00:00
| ingress.api.annotations | object | `{}` | |
2022-03-16 17:04:56 +00:00
| ingress.api.enabled | bool | `false` | |
2021-09-16 06:51:07 +00:00
| ingress.api.hostName | string | `"api.clearml.127-0-0-1.nip.io"` | |
2022-03-16 17:04:56 +00:00
| ingress.api.path | string | `"/"` | |
2021-09-16 06:51:07 +00:00
| ingress.api.tlsSecretName | string | `""` | |
2022-01-18 17:06:01 +00:00
| ingress.app.annotations | object | `{}` | |
2022-03-16 17:04:56 +00:00
| ingress.app.enabled | bool | `false` | |
2021-09-16 06:51:07 +00:00
| ingress.app.hostName | string | `"app.clearml.127-0-0-1.nip.io"` | |
2022-03-16 17:04:56 +00:00
| ingress.app.path | string | `"/"` | |
2021-09-16 06:51:07 +00:00
| ingress.app.tlsSecretName | string | `""` | |
2022-01-18 17:06:01 +00:00
| ingress.files.annotations | object | `{}` | |
2022-03-16 17:04:56 +00:00
| ingress.files.enabled | bool | `false` | |
2021-09-16 06:51:07 +00:00
| ingress.files.hostName | string | `"files.clearml.127-0-0-1.nip.io"` | |
2022-03-16 17:04:56 +00:00
| ingress.files.path | string | `"/"` | |
2021-09-16 06:51:07 +00:00
| ingress.files.tlsSecretName | string | `""` | |
2021-07-07 07:04:15 +00:00
| ingress.name | string | `"clearml-server-ingress"` | |
| mongodb.architecture | string | `"standalone"` | |
| mongodb.auth.enabled | bool | `false` | |
| mongodb.enabled | bool | `true` | |
| mongodb.persistence.accessModes[0] | string | `"ReadWriteOnce"` | |
| mongodb.persistence.enabled | bool | `true` | |
| mongodb.persistence.size | string | `"50Gi"` | |
| mongodb.replicaCount | int | `1` | |
| mongodb.service.name | string | `"{{ .Release.Name }}-mongodb"` | |
| mongodb.service.port | int | `27017` | |
| mongodb.service.portName | string | `"mongo-service"` | |
| mongodb.service.type | string | `"ClusterIP"` | |
| redis.cluster.enabled | bool | `false` | |
| redis.databaseNumber | int | `0` | |
| redis.enabled | bool | `true` | |
| redis.master.name | string | `"{{ .Release.Name }}-redis-master"` | |
| redis.master.persistence.accessModes[0] | string | `"ReadWriteOnce"` | |
| redis.master.persistence.enabled | bool | `true` | |
| redis.master.persistence.size | string | `"5Gi"` | |
| redis.master.port | int | `6379` | |
| redis.usePassword | bool | `false` | |
2021-11-08 13:23:10 +00:00
| secret.authToken | string | `"1SCf0ov3Nm544Td2oZ0gXSrsNx5XhMWdVlKz1tOgcx158bD5RV"` | Set for auth_token field |
| secret.credentials.apiserver.accessKey | string | `"5442F3443MJMORWZA3ZH"` | Set for apiserver_key field |
| secret.credentials.apiserver.secretKey | string | `"BxapIRo9ZINi8x25CRxz8Wdmr2pQjzuWVB4PNASZqCtTyWgWVQ"` | Set for apiserver_secret field |
| secret.credentials.tests.accessKey | string | `"ENP39EQM4SLACGD5FXB7"` | Set for tests_user_key field |
| secret.credentials.tests.secretKey | string | `"lPcm0imbcBZ8mwgO7tpadutiS3gnJD05x9j7afwXPS35IKbpiQ"` | Set for tests_user_secret field |
2022-08-22 08:35:47 +00:00
| secret.existingSecret | string | `""` | If this is set, chart will not generate a secret but will use what is defined here |
2021-11-08 13:23:10 +00:00
| secret.httpSession | string | `"9Tw20RbhJ1bLBiHEOWXvhplKGUbTgLzAtwFN2oLQvWwS0uRpD5"` | Set for http_session field |
2022-03-16 17:04:56 +00:00
| webserver.additionalConfigs | object | `{}` | |
2021-07-07 07:04:15 +00:00
| webserver.affinity | object | `{}` | |
| webserver.extraEnvs | list | `[]` | |
| webserver.image.pullPolicy | string | `"IfNotPresent"` | |
| webserver.image.repository | string | `"allegroai/clearml"` | |
2022-07-12 08:48:16 +00:00
| webserver.image.tag | string | `"1.6.0"` | |
2021-07-07 07:04:15 +00:00
| webserver.nodeSelector | object | `{}` | |
| webserver.podAnnotations | object | `{}` | |
| webserver.replicaCount | int | `1` | |
| webserver.resources | object | `{}` | |
2021-11-08 13:23:10 +00:00
| webserver.service.nodePort | int | `30080` | If service.type set to NodePort, this will be set to service's nodePort field. If service.type is set to others, this field will be ignored |
2021-07-07 07:04:15 +00:00
| webserver.service.port | int | `80` | |
2021-11-08 13:23:10 +00:00
| webserver.service.type | string | `"NodePort"` | This will set to service's spec.type field |
2021-07-07 07:04:15 +00:00
| webserver.tolerations | list | `[]` | |