One default agent (#10)

* one cpu only agent by default * helm-docs update * suggest kind for single done cluster * bump up version * fix trailing space
2025-04-17 01:31:13 +00:00 · 2021-07-15 17:34:29 +02:00 · 2021-07-15 17:34:29 +02:00 · d269374a49
commit d269374a49
parent cc8789d71f
5 changed files with 170 additions and 64 deletions
--- a/README.md
+++ b/README.md
@ -8,6 +8,10 @@ Helm charts provided by [Allegro AI](https://clear.ml), ready to launch on Kuber
 For setting up Kubernetes on various platforms refer to the Kubernetes [getting started guide](http://kubernetes.io/docs/getting-started-guides/).
 ### Setup a single node LOCAL Kubernetes on laptop/desktop
 For setting up Kubernetes on your laptop/desktop we suggest [kind](https://kind.sigs.k8s.io).
 ### Install Helm
 Helm is a tool for managing Kubernetes charts. Charts are packages of pre-configured Kubernetes resources.
--- a/charts/clearml/Chart.yaml
+++ b/charts/clearml/Chart.yaml
@ -2,7 +2,7 @@ apiVersion: v2
 name: clearml
 description: MLOps platform
 type: application
-version: "2.0.0-alpha2"
+version: "2.0.0-beta1"
 appVersion: "1.0.2"
 home: https://clear.ml
 icon: https://raw.githubusercontent.com/allegroai/clearml/master/docs/clearml-logo.svg
--- a/charts/clearml/README.md
+++ b/charts/clearml/README.md
@ -1,6 +1,6 @@
 # ClearML Ecosystem for Kubernetes
-![Version: 2.0.0-alpha2](https://img.shields.io/badge/Version-2.0.0--alpha2-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: 1.0.2](https://img.shields.io/badge/AppVersion-1.0.2-informational?style=flat-square)
+![Version: 2.0.0-beta1](https://img.shields.io/badge/Version-2.0.0--beta1-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: 1.0.2](https://img.shields.io/badge/AppVersion-1.0.2-informational?style=flat-square)
 MLOps platform
@ -16,8 +16,6 @@ MLOps platform
 The **clearml-server** is the backend service infrastructure for [ClearML](https://github.com/allegroai/clearml).
 It allows multiple users to collaborate and manage their experiments.
 By default, *ClearML is set up to work with the ClearML Demo Server, which is open to anyone and resets periodically.
 In order to host your own server, you will need to install **clearml-server** and point ClearML to it.
 **clearml-server** contains the following components:
@ -27,33 +25,59 @@ In order to host your own server, you will need to install **clearml-server** an
    * Querying experiments history, logs and results
 * Locally-hosted file server for storing images and models making them easily accessible using the Web-App
-## Port Mapping
+## Local environment
-After **clearml-server** is deployed, the services expose the following node ports:
+For development/evaluation it's possible to use [kind](https://kind.sigs.k8s.io).
 After installation, following commands will create a complete ClearML insatllation:
 ```
 cat <<EOF > /tmp/clearml-kind.yaml
 kind: Cluster
 apiVersion: kind.x-k8s.io/v1alpha4
 nodes:
 - role: control-plane
  extraPortMappings:
  - containerPort: 30008
    hostPort: 30008
    listenAddress: "127.0.0.1"
    protocol: TCP
  - containerPort: 30080
    hostPort: 30080
    listenAddress: "127.0.0.1"
    protocol: TCP
  - containerPort: 30081
    hostPort: 30081
    listenAddress: "127.0.0.1"
    protocol: TCP
  extraMounts:
  - hostPath: /var/folders/kind/
    containerPath: /var/local-path-provisioner
 EOF
 kind create cluster --config /tmp/clearml-kind.yaml
 helm install clearml allegroai/clearml
 ```
 After deployment, the services will be exposed on localhost on the following ports:
 * API server on `30008`
 * Web server on `30080`
 * File server on `30081`
-## Accessing ClearML Server
+## Production cluster environment
-Access **clearml-server** by creating a load balancer and domain name with records pointing to the load balancer.
+In a production environment it's suggested to install an ingress controller and verify that is working correctly.
 During ClearML deployment enable `ingress` section of chart values.
 This will create 3 ingress rules:
 Once you have a load balancer and domain name set up, follow these steps to configure access to clearml-server on your k8s cluster:
 1. Create domain records
   * Create 3 records to be used for Web-App, File server and API access using the following rules:
 * `app.<your domain name>`
 * `files.<your domain name>`
 * `api.<your domain name>`
 (*for example, `app.clearml.mydomainname.com`, `files.clearml.mydomainname.com` and `api.clearml.mydomainname.com`*)
-2. Point the records you created to the load balancer
+
-3. Configure the load balancer to redirect traffic coming from the records you created:
+Just pointing the domain records to the IP where ingress controller is responding will complete the deployment process.
     * `app.<your domain name>` should be redirected to k8s cluster nodes on port `30080`
     * `files.<your domain name>` should be redirected to k8s cluster nodes on port `30081`
     * `api.<your domain name>` should be redirected to k8s cluster nodes on port `30008`
 ## Additional Configuration for ClearML Server
@ -81,28 +105,50 @@ For detailed instructions, see the [Optional Configuration](https://github.com/a
 | Key | Type | Default | Description |
 |-----|------|---------|-------------|
-| agentGroups.agent-group0.affinity | object | `{}` |  |
+| agentGroups.agent-group-cpu.affinity | object | `{}` |  |
-| agentGroups.agent-group0.agentVersion | string | `""` |  |
+| agentGroups.agent-group-cpu.agentVersion | string | `""` |  |
-| agentGroups.agent-group0.awsAccessKeyId | string | `nil` |  |
+| agentGroups.agent-group-cpu.awsAccessKeyId | string | `nil` |  |
-| agentGroups.agent-group0.awsDefaultRegion | string | `nil` |  |
+| agentGroups.agent-group-cpu.awsDefaultRegion | string | `nil` |  |
-| agentGroups.agent-group0.awsSecretAccessKey | string | `nil` |  |
+| agentGroups.agent-group-cpu.awsSecretAccessKey | string | `nil` |  |
-| agentGroups.agent-group0.azureStorageAccount | string | `nil` |  |
+| agentGroups.agent-group-cpu.azureStorageAccount | string | `nil` |  |
-| agentGroups.agent-group0.azureStorageKey | string | `nil` |  |
+| agentGroups.agent-group-cpu.azureStorageKey | string | `nil` |  |
-| agentGroups.agent-group0.clearmlAccessKey | string | `nil` |  |
+| agentGroups.agent-group-cpu.clearmlAccessKey | string | `nil` |  |
-| agentGroups.agent-group0.clearmlConfig | string | `"sdk {\n}"` |  |
+| agentGroups.agent-group-cpu.clearmlConfig | string | `"sdk {\n}"` |  |
-| agentGroups.agent-group0.clearmlGitPassword | string | `nil` |  |
+| agentGroups.agent-group-cpu.clearmlGitPassword | string | `nil` |  |
-| agentGroups.agent-group0.clearmlGitUser | string | `nil` |  |
+| agentGroups.agent-group-cpu.clearmlGitUser | string | `nil` |  |
-| agentGroups.agent-group0.clearmlSecretKey | string | `nil` |  |
+| agentGroups.agent-group-cpu.clearmlSecretKey | string | `nil` |  |
-| agentGroups.agent-group0.image.pullPolicy | string | `"IfNotPresent"` |  |
+| agentGroups.agent-group-cpu.image.pullPolicy | string | `"IfNotPresent"` |  |
-| agentGroups.agent-group0.image.repository | string | `"nvidia/cuda"` |  |
+| agentGroups.agent-group-cpu.image.repository | string | `"ubuntu"` |  |
-| agentGroups.agent-group0.image.tag | string | `"11.0-base-ubuntu18.04"` |  |
+| agentGroups.agent-group-cpu.image.tag | string | `"18.04"` |  |
-| agentGroups.agent-group0.name | string | `"agent-group0"` |  |
+| agentGroups.agent-group-cpu.name | string | `"agent-group-cpu"` |  |
-| agentGroups.agent-group0.nodeSelector | object | `{}` |  |
+| agentGroups.agent-group-cpu.nodeSelector | object | `{}` |  |
-| agentGroups.agent-group0.nvidiaGpusPerAgent | int | `1` |  |
+| agentGroups.agent-group-cpu.nvidiaGpusPerAgent | int | `0` |  |
-| agentGroups.agent-group0.podAnnotations | object | `{}` |  |
+| agentGroups.agent-group-cpu.podAnnotations | object | `{}` |  |
-| agentGroups.agent-group0.queues | string | `"default"` |  |
+| agentGroups.agent-group-cpu.queues | string | `"default"` |  |
-| agentGroups.agent-group0.replicaCount | int | `0` |  |
+| agentGroups.agent-group-cpu.replicaCount | int | `1` |  |
-| agentGroups.agent-group0.tolerations | list | `[]` |  |
+| agentGroups.agent-group-cpu.tolerations | list | `[]` |  |
 | agentGroups.agent-group-gpu.affinity | object | `{}` |  |
 | agentGroups.agent-group-gpu.agentVersion | string | `""` |  |
 | agentGroups.agent-group-gpu.awsAccessKeyId | string | `nil` |  |
 | agentGroups.agent-group-gpu.awsDefaultRegion | string | `nil` |  |
 | agentGroups.agent-group-gpu.awsSecretAccessKey | string | `nil` |  |
 | agentGroups.agent-group-gpu.azureStorageAccount | string | `nil` |  |
 | agentGroups.agent-group-gpu.azureStorageKey | string | `nil` |  |
 | agentGroups.agent-group-gpu.clearmlAccessKey | string | `nil` |  |
 | agentGroups.agent-group-gpu.clearmlConfig | string | `"sdk {\n}"` |  |
 | agentGroups.agent-group-gpu.clearmlGitPassword | string | `nil` |  |
 | agentGroups.agent-group-gpu.clearmlGitUser | string | `nil` |  |
 | agentGroups.agent-group-gpu.clearmlSecretKey | string | `nil` |  |
 | agentGroups.agent-group-gpu.image.pullPolicy | string | `"IfNotPresent"` |  |
 | agentGroups.agent-group-gpu.image.repository | string | `"nvidia/cuda"` |  |
 | agentGroups.agent-group-gpu.image.tag | string | `"11.0-base-ubuntu18.04"` |  |
 | agentGroups.agent-group-gpu.name | string | `"agent-group-gpu"` |  |
 | agentGroups.agent-group-gpu.nodeSelector | object | `{}` |  |
 | agentGroups.agent-group-gpu.nvidiaGpusPerAgent | int | `1` |  |
 | agentGroups.agent-group-gpu.podAnnotations | object | `{}` |  |
 | agentGroups.agent-group-gpu.queues | string | `"default"` |  |
 | agentGroups.agent-group-gpu.replicaCount | int | `0` |  |
 | agentGroups.agent-group-gpu.tolerations | list | `[]` |  |
 | agentservices.affinity | object | `{}` |  |
 | agentservices.agentVersion | string | `""` |  |
 | agentservices.awsAccessKeyId | string | `nil` |  |
--- a/charts/clearml/README.md.gotmpl
+++ b/charts/clearml/README.md.gotmpl
@ -13,8 +13,6 @@
 The **clearml-server** is the backend service infrastructure for [ClearML](https://github.com/allegroai/clearml).
 It allows multiple users to collaborate and manage their experiments.
 By default, *ClearML is set up to work with the ClearML Demo Server, which is open to anyone and resets periodically. 
 In order to host your own server, you will need to install **clearml-server** and point ClearML to it.
 **clearml-server** contains the following components:
@ -24,33 +22,59 @@ In order to host your own server, you will need to install **clearml-server** an
    * Querying experiments history, logs and results
 * Locally-hosted file server for storing images and models making them easily accessible using the Web-App
-## Port Mapping
+## Local environment
-After **clearml-server** is deployed, the services expose the following node ports:
+For development/evaluation it's possible to use [kind](https://kind.sigs.k8s.io).
 After installation, following commands will create a complete ClearML insatllation:
 ```
 cat <<EOF > /tmp/clearml-kind.yaml
 kind: Cluster
 apiVersion: kind.x-k8s.io/v1alpha4
 nodes:
 - role: control-plane
  extraPortMappings:
  - containerPort: 30008
    hostPort: 30008
    listenAddress: "127.0.0.1"
    protocol: TCP
  - containerPort: 30080
    hostPort: 30080
    listenAddress: "127.0.0.1"
    protocol: TCP
  - containerPort: 30081
    hostPort: 30081
    listenAddress: "127.0.0.1"
    protocol: TCP
  extraMounts:
  - hostPath: /var/folders/kind/
    containerPath: /var/local-path-provisioner
 EOF
 kind create cluster --config /tmp/clearml-kind.yaml
 helm install clearml allegroai/clearml
 ```
 After deployment, the services will be exposed on localhost on the following ports:
 * API server on `30008`
 * Web server on `30080`
 * File server on `30081`
-## Accessing ClearML Server
+## Production cluster environment
-Access **clearml-server** by creating a load balancer and domain name with records pointing to the load balancer.
+In a production environment it's suggested to install an ingress controller and verify that is working correctly.
 During ClearML deployment enable `ingress` section of chart values.
 This will create 3 ingress rules:
 Once you have a load balancer and domain name set up, follow these steps to configure access to clearml-server on your k8s cluster:
 1. Create domain records
   * Create 3 records to be used for Web-App, File server and API access using the following rules: 
 * `app.<your domain name>` 
 * `files.<your domain name>`
 * `api.<your domain name>`
 (*for example, `app.clearml.mydomainname.com`, `files.clearml.mydomainname.com` and `api.clearml.mydomainname.com`*)
-2. Point the records you created to the load balancer
+
-3. Configure the load balancer to redirect traffic coming from the records you created:
+Just pointing the domain records to the IP where ingress controller is responding will complete the deployment process.
     * `app.<your domain name>` should be redirected to k8s cluster nodes on port `30080`
     * `files.<your domain name>` should be redirected to k8s cluster nodes on port `30081`
     * `api.<your domain name>` should be redirected to k8s cluster nodes on port `30008`
 ## Additional Configuration for ClearML Server
--- a/charts/clearml/values.yaml
+++ b/charts/clearml/values.yaml
@ -180,8 +180,40 @@ agentservices:
      size: 50Gi
 agentGroups:
-  agent-group0:
+  agent-group-cpu:
-    name: agent-group0
+    name: agent-group-cpu
    replicaCount: 1
    nvidiaGpusPerAgent: 0
    agentVersion: ""  # if set, it *MUST* include comparison operator (e.g. ">=0.16.1")
    queues: "default"  # multiple queues can be specified separated by a space (e.g. "important_jobs default")
    clearmlGitUser: null
    clearmlGitPassword: null
    clearmlAccessKey: null
    clearmlSecretKey: null
    awsAccessKeyId: null
    awsSecretAccessKey: null
    awsDefaultRegion: null
    azureStorageAccount: null
    azureStorageKey: null
    clearmlConfig: |-
      sdk {
      }
    image:
      repository: "ubuntu"
      pullPolicy: IfNotPresent
      tag: "18.04"
    podAnnotations: {}
    nodeSelector: {}
    tolerations: []
    affinity: {}
  agent-group-gpu:
    name: agent-group-gpu
    replicaCount: 0
    nvidiaGpusPerAgent: 1
    agentVersion: ""  # if set, it *MUST* include comparison operator (e.g. ">=0.16.1")