mirror of
https://github.com/clearml/clearml-helm-charts
synced 2025-04-17 01:31:13 +00:00
One default agent (#10)
* one cpu only agent by default * helm-docs update * suggest kind for single done cluster * bump up version * fix trailing space
This commit is contained in:
parent
cc8789d71f
commit
d269374a49
@ -8,6 +8,10 @@ Helm charts provided by [Allegro AI](https://clear.ml), ready to launch on Kuber
|
||||
|
||||
For setting up Kubernetes on various platforms refer to the Kubernetes [getting started guide](http://kubernetes.io/docs/getting-started-guides/).
|
||||
|
||||
### Setup a single node LOCAL Kubernetes on laptop/desktop
|
||||
|
||||
For setting up Kubernetes on your laptop/desktop we suggest [kind](https://kind.sigs.k8s.io).
|
||||
|
||||
### Install Helm
|
||||
|
||||
Helm is a tool for managing Kubernetes charts. Charts are packages of pre-configured Kubernetes resources.
|
||||
|
@ -2,7 +2,7 @@ apiVersion: v2
|
||||
name: clearml
|
||||
description: MLOps platform
|
||||
type: application
|
||||
version: "2.0.0-alpha2"
|
||||
version: "2.0.0-beta1"
|
||||
appVersion: "1.0.2"
|
||||
home: https://clear.ml
|
||||
icon: https://raw.githubusercontent.com/allegroai/clearml/master/docs/clearml-logo.svg
|
||||
|
@ -1,6 +1,6 @@
|
||||
# ClearML Ecosystem for Kubernetes
|
||||
|
||||
  
|
||||
  
|
||||
|
||||
MLOps platform
|
||||
|
||||
@ -16,8 +16,6 @@ MLOps platform
|
||||
|
||||
The **clearml-server** is the backend service infrastructure for [ClearML](https://github.com/allegroai/clearml).
|
||||
It allows multiple users to collaborate and manage their experiments.
|
||||
By default, *ClearML is set up to work with the ClearML Demo Server, which is open to anyone and resets periodically.
|
||||
In order to host your own server, you will need to install **clearml-server** and point ClearML to it.
|
||||
|
||||
**clearml-server** contains the following components:
|
||||
|
||||
@ -27,33 +25,59 @@ In order to host your own server, you will need to install **clearml-server** an
|
||||
* Querying experiments history, logs and results
|
||||
* Locally-hosted file server for storing images and models making them easily accessible using the Web-App
|
||||
|
||||
## Port Mapping
|
||||
## Local environment
|
||||
|
||||
After **clearml-server** is deployed, the services expose the following node ports:
|
||||
For development/evaluation it's possible to use [kind](https://kind.sigs.k8s.io).
|
||||
After installation, following commands will create a complete ClearML insatllation:
|
||||
|
||||
```
|
||||
cat <<EOF > /tmp/clearml-kind.yaml
|
||||
kind: Cluster
|
||||
apiVersion: kind.x-k8s.io/v1alpha4
|
||||
nodes:
|
||||
- role: control-plane
|
||||
extraPortMappings:
|
||||
- containerPort: 30008
|
||||
hostPort: 30008
|
||||
listenAddress: "127.0.0.1"
|
||||
protocol: TCP
|
||||
- containerPort: 30080
|
||||
hostPort: 30080
|
||||
listenAddress: "127.0.0.1"
|
||||
protocol: TCP
|
||||
- containerPort: 30081
|
||||
hostPort: 30081
|
||||
listenAddress: "127.0.0.1"
|
||||
protocol: TCP
|
||||
extraMounts:
|
||||
- hostPath: /var/folders/kind/
|
||||
containerPath: /var/local-path-provisioner
|
||||
EOF
|
||||
|
||||
kind create cluster --config /tmp/clearml-kind.yaml
|
||||
|
||||
helm install clearml allegroai/clearml
|
||||
```
|
||||
|
||||
After deployment, the services will be exposed on localhost on the following ports:
|
||||
|
||||
* API server on `30008`
|
||||
* Web server on `30080`
|
||||
* File server on `30081`
|
||||
|
||||
## Accessing ClearML Server
|
||||
## Production cluster environment
|
||||
|
||||
Access **clearml-server** by creating a load balancer and domain name with records pointing to the load balancer.
|
||||
In a production environment it's suggested to install an ingress controller and verify that is working correctly.
|
||||
During ClearML deployment enable `ingress` section of chart values.
|
||||
This will create 3 ingress rules:
|
||||
|
||||
Once you have a load balancer and domain name set up, follow these steps to configure access to clearml-server on your k8s cluster:
|
||||
* `app.<your domain name>`
|
||||
* `files.<your domain name>`
|
||||
* `api.<your domain name>`
|
||||
|
||||
1. Create domain records
|
||||
(*for example, `app.clearml.mydomainname.com`, `files.clearml.mydomainname.com` and `api.clearml.mydomainname.com`*)
|
||||
|
||||
* Create 3 records to be used for Web-App, File server and API access using the following rules:
|
||||
* `app.<your domain name>`
|
||||
* `files.<your domain name>`
|
||||
* `api.<your domain name>`
|
||||
|
||||
(*for example, `app.clearml.mydomainname.com`, `files.clearml.mydomainname.com` and `api.clearml.mydomainname.com`*)
|
||||
2. Point the records you created to the load balancer
|
||||
3. Configure the load balancer to redirect traffic coming from the records you created:
|
||||
* `app.<your domain name>` should be redirected to k8s cluster nodes on port `30080`
|
||||
* `files.<your domain name>` should be redirected to k8s cluster nodes on port `30081`
|
||||
* `api.<your domain name>` should be redirected to k8s cluster nodes on port `30008`
|
||||
Just pointing the domain records to the IP where ingress controller is responding will complete the deployment process.
|
||||
|
||||
## Additional Configuration for ClearML Server
|
||||
|
||||
@ -81,28 +105,50 @@ For detailed instructions, see the [Optional Configuration](https://github.com/a
|
||||
|
||||
| Key | Type | Default | Description |
|
||||
|-----|------|---------|-------------|
|
||||
| agentGroups.agent-group0.affinity | object | `{}` | |
|
||||
| agentGroups.agent-group0.agentVersion | string | `""` | |
|
||||
| agentGroups.agent-group0.awsAccessKeyId | string | `nil` | |
|
||||
| agentGroups.agent-group0.awsDefaultRegion | string | `nil` | |
|
||||
| agentGroups.agent-group0.awsSecretAccessKey | string | `nil` | |
|
||||
| agentGroups.agent-group0.azureStorageAccount | string | `nil` | |
|
||||
| agentGroups.agent-group0.azureStorageKey | string | `nil` | |
|
||||
| agentGroups.agent-group0.clearmlAccessKey | string | `nil` | |
|
||||
| agentGroups.agent-group0.clearmlConfig | string | `"sdk {\n}"` | |
|
||||
| agentGroups.agent-group0.clearmlGitPassword | string | `nil` | |
|
||||
| agentGroups.agent-group0.clearmlGitUser | string | `nil` | |
|
||||
| agentGroups.agent-group0.clearmlSecretKey | string | `nil` | |
|
||||
| agentGroups.agent-group0.image.pullPolicy | string | `"IfNotPresent"` | |
|
||||
| agentGroups.agent-group0.image.repository | string | `"nvidia/cuda"` | |
|
||||
| agentGroups.agent-group0.image.tag | string | `"11.0-base-ubuntu18.04"` | |
|
||||
| agentGroups.agent-group0.name | string | `"agent-group0"` | |
|
||||
| agentGroups.agent-group0.nodeSelector | object | `{}` | |
|
||||
| agentGroups.agent-group0.nvidiaGpusPerAgent | int | `1` | |
|
||||
| agentGroups.agent-group0.podAnnotations | object | `{}` | |
|
||||
| agentGroups.agent-group0.queues | string | `"default"` | |
|
||||
| agentGroups.agent-group0.replicaCount | int | `0` | |
|
||||
| agentGroups.agent-group0.tolerations | list | `[]` | |
|
||||
| agentGroups.agent-group-cpu.affinity | object | `{}` | |
|
||||
| agentGroups.agent-group-cpu.agentVersion | string | `""` | |
|
||||
| agentGroups.agent-group-cpu.awsAccessKeyId | string | `nil` | |
|
||||
| agentGroups.agent-group-cpu.awsDefaultRegion | string | `nil` | |
|
||||
| agentGroups.agent-group-cpu.awsSecretAccessKey | string | `nil` | |
|
||||
| agentGroups.agent-group-cpu.azureStorageAccount | string | `nil` | |
|
||||
| agentGroups.agent-group-cpu.azureStorageKey | string | `nil` | |
|
||||
| agentGroups.agent-group-cpu.clearmlAccessKey | string | `nil` | |
|
||||
| agentGroups.agent-group-cpu.clearmlConfig | string | `"sdk {\n}"` | |
|
||||
| agentGroups.agent-group-cpu.clearmlGitPassword | string | `nil` | |
|
||||
| agentGroups.agent-group-cpu.clearmlGitUser | string | `nil` | |
|
||||
| agentGroups.agent-group-cpu.clearmlSecretKey | string | `nil` | |
|
||||
| agentGroups.agent-group-cpu.image.pullPolicy | string | `"IfNotPresent"` | |
|
||||
| agentGroups.agent-group-cpu.image.repository | string | `"ubuntu"` | |
|
||||
| agentGroups.agent-group-cpu.image.tag | string | `"18.04"` | |
|
||||
| agentGroups.agent-group-cpu.name | string | `"agent-group-cpu"` | |
|
||||
| agentGroups.agent-group-cpu.nodeSelector | object | `{}` | |
|
||||
| agentGroups.agent-group-cpu.nvidiaGpusPerAgent | int | `0` | |
|
||||
| agentGroups.agent-group-cpu.podAnnotations | object | `{}` | |
|
||||
| agentGroups.agent-group-cpu.queues | string | `"default"` | |
|
||||
| agentGroups.agent-group-cpu.replicaCount | int | `1` | |
|
||||
| agentGroups.agent-group-cpu.tolerations | list | `[]` | |
|
||||
| agentGroups.agent-group-gpu.affinity | object | `{}` | |
|
||||
| agentGroups.agent-group-gpu.agentVersion | string | `""` | |
|
||||
| agentGroups.agent-group-gpu.awsAccessKeyId | string | `nil` | |
|
||||
| agentGroups.agent-group-gpu.awsDefaultRegion | string | `nil` | |
|
||||
| agentGroups.agent-group-gpu.awsSecretAccessKey | string | `nil` | |
|
||||
| agentGroups.agent-group-gpu.azureStorageAccount | string | `nil` | |
|
||||
| agentGroups.agent-group-gpu.azureStorageKey | string | `nil` | |
|
||||
| agentGroups.agent-group-gpu.clearmlAccessKey | string | `nil` | |
|
||||
| agentGroups.agent-group-gpu.clearmlConfig | string | `"sdk {\n}"` | |
|
||||
| agentGroups.agent-group-gpu.clearmlGitPassword | string | `nil` | |
|
||||
| agentGroups.agent-group-gpu.clearmlGitUser | string | `nil` | |
|
||||
| agentGroups.agent-group-gpu.clearmlSecretKey | string | `nil` | |
|
||||
| agentGroups.agent-group-gpu.image.pullPolicy | string | `"IfNotPresent"` | |
|
||||
| agentGroups.agent-group-gpu.image.repository | string | `"nvidia/cuda"` | |
|
||||
| agentGroups.agent-group-gpu.image.tag | string | `"11.0-base-ubuntu18.04"` | |
|
||||
| agentGroups.agent-group-gpu.name | string | `"agent-group-gpu"` | |
|
||||
| agentGroups.agent-group-gpu.nodeSelector | object | `{}` | |
|
||||
| agentGroups.agent-group-gpu.nvidiaGpusPerAgent | int | `1` | |
|
||||
| agentGroups.agent-group-gpu.podAnnotations | object | `{}` | |
|
||||
| agentGroups.agent-group-gpu.queues | string | `"default"` | |
|
||||
| agentGroups.agent-group-gpu.replicaCount | int | `0` | |
|
||||
| agentGroups.agent-group-gpu.tolerations | list | `[]` | |
|
||||
| agentservices.affinity | object | `{}` | |
|
||||
| agentservices.agentVersion | string | `""` | |
|
||||
| agentservices.awsAccessKeyId | string | `nil` | |
|
||||
|
@ -13,8 +13,6 @@
|
||||
|
||||
The **clearml-server** is the backend service infrastructure for [ClearML](https://github.com/allegroai/clearml).
|
||||
It allows multiple users to collaborate and manage their experiments.
|
||||
By default, *ClearML is set up to work with the ClearML Demo Server, which is open to anyone and resets periodically.
|
||||
In order to host your own server, you will need to install **clearml-server** and point ClearML to it.
|
||||
|
||||
**clearml-server** contains the following components:
|
||||
|
||||
@ -24,33 +22,59 @@ In order to host your own server, you will need to install **clearml-server** an
|
||||
* Querying experiments history, logs and results
|
||||
* Locally-hosted file server for storing images and models making them easily accessible using the Web-App
|
||||
|
||||
## Port Mapping
|
||||
## Local environment
|
||||
|
||||
After **clearml-server** is deployed, the services expose the following node ports:
|
||||
For development/evaluation it's possible to use [kind](https://kind.sigs.k8s.io).
|
||||
After installation, following commands will create a complete ClearML insatllation:
|
||||
|
||||
```
|
||||
cat <<EOF > /tmp/clearml-kind.yaml
|
||||
kind: Cluster
|
||||
apiVersion: kind.x-k8s.io/v1alpha4
|
||||
nodes:
|
||||
- role: control-plane
|
||||
extraPortMappings:
|
||||
- containerPort: 30008
|
||||
hostPort: 30008
|
||||
listenAddress: "127.0.0.1"
|
||||
protocol: TCP
|
||||
- containerPort: 30080
|
||||
hostPort: 30080
|
||||
listenAddress: "127.0.0.1"
|
||||
protocol: TCP
|
||||
- containerPort: 30081
|
||||
hostPort: 30081
|
||||
listenAddress: "127.0.0.1"
|
||||
protocol: TCP
|
||||
extraMounts:
|
||||
- hostPath: /var/folders/kind/
|
||||
containerPath: /var/local-path-provisioner
|
||||
EOF
|
||||
|
||||
kind create cluster --config /tmp/clearml-kind.yaml
|
||||
|
||||
helm install clearml allegroai/clearml
|
||||
```
|
||||
|
||||
After deployment, the services will be exposed on localhost on the following ports:
|
||||
|
||||
* API server on `30008`
|
||||
* Web server on `30080`
|
||||
* File server on `30081`
|
||||
|
||||
## Accessing ClearML Server
|
||||
## Production cluster environment
|
||||
|
||||
Access **clearml-server** by creating a load balancer and domain name with records pointing to the load balancer.
|
||||
In a production environment it's suggested to install an ingress controller and verify that is working correctly.
|
||||
During ClearML deployment enable `ingress` section of chart values.
|
||||
This will create 3 ingress rules:
|
||||
|
||||
Once you have a load balancer and domain name set up, follow these steps to configure access to clearml-server on your k8s cluster:
|
||||
* `app.<your domain name>`
|
||||
* `files.<your domain name>`
|
||||
* `api.<your domain name>`
|
||||
|
||||
1. Create domain records
|
||||
(*for example, `app.clearml.mydomainname.com`, `files.clearml.mydomainname.com` and `api.clearml.mydomainname.com`*)
|
||||
|
||||
* Create 3 records to be used for Web-App, File server and API access using the following rules:
|
||||
* `app.<your domain name>`
|
||||
* `files.<your domain name>`
|
||||
* `api.<your domain name>`
|
||||
|
||||
(*for example, `app.clearml.mydomainname.com`, `files.clearml.mydomainname.com` and `api.clearml.mydomainname.com`*)
|
||||
2. Point the records you created to the load balancer
|
||||
3. Configure the load balancer to redirect traffic coming from the records you created:
|
||||
* `app.<your domain name>` should be redirected to k8s cluster nodes on port `30080`
|
||||
* `files.<your domain name>` should be redirected to k8s cluster nodes on port `30081`
|
||||
* `api.<your domain name>` should be redirected to k8s cluster nodes on port `30008`
|
||||
Just pointing the domain records to the IP where ingress controller is responding will complete the deployment process.
|
||||
|
||||
## Additional Configuration for ClearML Server
|
||||
|
||||
|
@ -180,8 +180,40 @@ agentservices:
|
||||
size: 50Gi
|
||||
|
||||
agentGroups:
|
||||
agent-group0:
|
||||
name: agent-group0
|
||||
agent-group-cpu:
|
||||
name: agent-group-cpu
|
||||
replicaCount: 1
|
||||
nvidiaGpusPerAgent: 0
|
||||
agentVersion: "" # if set, it *MUST* include comparison operator (e.g. ">=0.16.1")
|
||||
queues: "default" # multiple queues can be specified separated by a space (e.g. "important_jobs default")
|
||||
clearmlGitUser: null
|
||||
clearmlGitPassword: null
|
||||
clearmlAccessKey: null
|
||||
clearmlSecretKey: null
|
||||
awsAccessKeyId: null
|
||||
awsSecretAccessKey: null
|
||||
awsDefaultRegion: null
|
||||
azureStorageAccount: null
|
||||
azureStorageKey: null
|
||||
clearmlConfig: |-
|
||||
sdk {
|
||||
}
|
||||
|
||||
image:
|
||||
repository: "ubuntu"
|
||||
pullPolicy: IfNotPresent
|
||||
tag: "18.04"
|
||||
|
||||
podAnnotations: {}
|
||||
|
||||
nodeSelector: {}
|
||||
|
||||
tolerations: []
|
||||
|
||||
affinity: {}
|
||||
|
||||
agent-group-gpu:
|
||||
name: agent-group-gpu
|
||||
replicaCount: 0
|
||||
nvidiaGpusPerAgent: 1
|
||||
agentVersion: "" # if set, it *MUST* include comparison operator (e.g. ">=0.16.1")
|
||||
|
Loading…
Reference in New Issue
Block a user