diff --git a/README.md b/README.md index 551ef7c..d8c71a0 100644 --- a/README.md +++ b/README.md @@ -8,6 +8,10 @@ Helm charts provided by [Allegro AI](https://clear.ml), ready to launch on Kuber For setting up Kubernetes on various platforms refer to the Kubernetes [getting started guide](http://kubernetes.io/docs/getting-started-guides/). +### Setup a single node LOCAL Kubernetes on laptop/desktop + +For setting up Kubernetes on your laptop/desktop we suggest [kind](https://kind.sigs.k8s.io). + ### Install Helm Helm is a tool for managing Kubernetes charts. Charts are packages of pre-configured Kubernetes resources. diff --git a/charts/clearml/Chart.yaml b/charts/clearml/Chart.yaml index a33ec0e..f2e5525 100644 --- a/charts/clearml/Chart.yaml +++ b/charts/clearml/Chart.yaml @@ -2,7 +2,7 @@ apiVersion: v2 name: clearml description: MLOps platform type: application -version: "2.0.0-alpha2" +version: "2.0.0-beta1" appVersion: "1.0.2" home: https://clear.ml icon: https://raw.githubusercontent.com/allegroai/clearml/master/docs/clearml-logo.svg diff --git a/charts/clearml/README.md b/charts/clearml/README.md index 80c8c9b..21a9a36 100644 --- a/charts/clearml/README.md +++ b/charts/clearml/README.md @@ -1,6 +1,6 @@ # ClearML Ecosystem for Kubernetes -![Version: 2.0.0-alpha2](https://img.shields.io/badge/Version-2.0.0--alpha2-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: 1.0.2](https://img.shields.io/badge/AppVersion-1.0.2-informational?style=flat-square) +![Version: 2.0.0-beta1](https://img.shields.io/badge/Version-2.0.0--beta1-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: 1.0.2](https://img.shields.io/badge/AppVersion-1.0.2-informational?style=flat-square) MLOps platform @@ -16,8 +16,6 @@ MLOps platform The **clearml-server** is the backend service infrastructure for [ClearML](https://github.com/allegroai/clearml). It allows multiple users to collaborate and manage their experiments. -By default, *ClearML is set up to work with the ClearML Demo Server, which is open to anyone and resets periodically. -In order to host your own server, you will need to install **clearml-server** and point ClearML to it. **clearml-server** contains the following components: @@ -27,33 +25,59 @@ In order to host your own server, you will need to install **clearml-server** an * Querying experiments history, logs and results * Locally-hosted file server for storing images and models making them easily accessible using the Web-App -## Port Mapping +## Local environment -After **clearml-server** is deployed, the services expose the following node ports: +For development/evaluation it's possible to use [kind](https://kind.sigs.k8s.io). +After installation, following commands will create a complete ClearML insatllation: + +``` +cat < /tmp/clearml-kind.yaml +kind: Cluster +apiVersion: kind.x-k8s.io/v1alpha4 +nodes: +- role: control-plane + extraPortMappings: + - containerPort: 30008 + hostPort: 30008 + listenAddress: "127.0.0.1" + protocol: TCP + - containerPort: 30080 + hostPort: 30080 + listenAddress: "127.0.0.1" + protocol: TCP + - containerPort: 30081 + hostPort: 30081 + listenAddress: "127.0.0.1" + protocol: TCP + extraMounts: + - hostPath: /var/folders/kind/ + containerPath: /var/local-path-provisioner +EOF + +kind create cluster --config /tmp/clearml-kind.yaml + +helm install clearml allegroai/clearml +``` + +After deployment, the services will be exposed on localhost on the following ports: * API server on `30008` * Web server on `30080` * File server on `30081` -## Accessing ClearML Server +## Production cluster environment -Access **clearml-server** by creating a load balancer and domain name with records pointing to the load balancer. +In a production environment it's suggested to install an ingress controller and verify that is working correctly. +During ClearML deployment enable `ingress` section of chart values. +This will create 3 ingress rules: -Once you have a load balancer and domain name set up, follow these steps to configure access to clearml-server on your k8s cluster: +* `app.` +* `files.` +* `api.` -1. Create domain records +(*for example, `app.clearml.mydomainname.com`, `files.clearml.mydomainname.com` and `api.clearml.mydomainname.com`*) - * Create 3 records to be used for Web-App, File server and API access using the following rules: - * `app.` - * `files.` - * `api.` - - (*for example, `app.clearml.mydomainname.com`, `files.clearml.mydomainname.com` and `api.clearml.mydomainname.com`*) -2. Point the records you created to the load balancer -3. Configure the load balancer to redirect traffic coming from the records you created: - * `app.` should be redirected to k8s cluster nodes on port `30080` - * `files.` should be redirected to k8s cluster nodes on port `30081` - * `api.` should be redirected to k8s cluster nodes on port `30008` +Just pointing the domain records to the IP where ingress controller is responding will complete the deployment process. ## Additional Configuration for ClearML Server @@ -81,28 +105,50 @@ For detailed instructions, see the [Optional Configuration](https://github.com/a | Key | Type | Default | Description | |-----|------|---------|-------------| -| agentGroups.agent-group0.affinity | object | `{}` | | -| agentGroups.agent-group0.agentVersion | string | `""` | | -| agentGroups.agent-group0.awsAccessKeyId | string | `nil` | | -| agentGroups.agent-group0.awsDefaultRegion | string | `nil` | | -| agentGroups.agent-group0.awsSecretAccessKey | string | `nil` | | -| agentGroups.agent-group0.azureStorageAccount | string | `nil` | | -| agentGroups.agent-group0.azureStorageKey | string | `nil` | | -| agentGroups.agent-group0.clearmlAccessKey | string | `nil` | | -| agentGroups.agent-group0.clearmlConfig | string | `"sdk {\n}"` | | -| agentGroups.agent-group0.clearmlGitPassword | string | `nil` | | -| agentGroups.agent-group0.clearmlGitUser | string | `nil` | | -| agentGroups.agent-group0.clearmlSecretKey | string | `nil` | | -| agentGroups.agent-group0.image.pullPolicy | string | `"IfNotPresent"` | | -| agentGroups.agent-group0.image.repository | string | `"nvidia/cuda"` | | -| agentGroups.agent-group0.image.tag | string | `"11.0-base-ubuntu18.04"` | | -| agentGroups.agent-group0.name | string | `"agent-group0"` | | -| agentGroups.agent-group0.nodeSelector | object | `{}` | | -| agentGroups.agent-group0.nvidiaGpusPerAgent | int | `1` | | -| agentGroups.agent-group0.podAnnotations | object | `{}` | | -| agentGroups.agent-group0.queues | string | `"default"` | | -| agentGroups.agent-group0.replicaCount | int | `0` | | -| agentGroups.agent-group0.tolerations | list | `[]` | | +| agentGroups.agent-group-cpu.affinity | object | `{}` | | +| agentGroups.agent-group-cpu.agentVersion | string | `""` | | +| agentGroups.agent-group-cpu.awsAccessKeyId | string | `nil` | | +| agentGroups.agent-group-cpu.awsDefaultRegion | string | `nil` | | +| agentGroups.agent-group-cpu.awsSecretAccessKey | string | `nil` | | +| agentGroups.agent-group-cpu.azureStorageAccount | string | `nil` | | +| agentGroups.agent-group-cpu.azureStorageKey | string | `nil` | | +| agentGroups.agent-group-cpu.clearmlAccessKey | string | `nil` | | +| agentGroups.agent-group-cpu.clearmlConfig | string | `"sdk {\n}"` | | +| agentGroups.agent-group-cpu.clearmlGitPassword | string | `nil` | | +| agentGroups.agent-group-cpu.clearmlGitUser | string | `nil` | | +| agentGroups.agent-group-cpu.clearmlSecretKey | string | `nil` | | +| agentGroups.agent-group-cpu.image.pullPolicy | string | `"IfNotPresent"` | | +| agentGroups.agent-group-cpu.image.repository | string | `"ubuntu"` | | +| agentGroups.agent-group-cpu.image.tag | string | `"18.04"` | | +| agentGroups.agent-group-cpu.name | string | `"agent-group-cpu"` | | +| agentGroups.agent-group-cpu.nodeSelector | object | `{}` | | +| agentGroups.agent-group-cpu.nvidiaGpusPerAgent | int | `0` | | +| agentGroups.agent-group-cpu.podAnnotations | object | `{}` | | +| agentGroups.agent-group-cpu.queues | string | `"default"` | | +| agentGroups.agent-group-cpu.replicaCount | int | `1` | | +| agentGroups.agent-group-cpu.tolerations | list | `[]` | | +| agentGroups.agent-group-gpu.affinity | object | `{}` | | +| agentGroups.agent-group-gpu.agentVersion | string | `""` | | +| agentGroups.agent-group-gpu.awsAccessKeyId | string | `nil` | | +| agentGroups.agent-group-gpu.awsDefaultRegion | string | `nil` | | +| agentGroups.agent-group-gpu.awsSecretAccessKey | string | `nil` | | +| agentGroups.agent-group-gpu.azureStorageAccount | string | `nil` | | +| agentGroups.agent-group-gpu.azureStorageKey | string | `nil` | | +| agentGroups.agent-group-gpu.clearmlAccessKey | string | `nil` | | +| agentGroups.agent-group-gpu.clearmlConfig | string | `"sdk {\n}"` | | +| agentGroups.agent-group-gpu.clearmlGitPassword | string | `nil` | | +| agentGroups.agent-group-gpu.clearmlGitUser | string | `nil` | | +| agentGroups.agent-group-gpu.clearmlSecretKey | string | `nil` | | +| agentGroups.agent-group-gpu.image.pullPolicy | string | `"IfNotPresent"` | | +| agentGroups.agent-group-gpu.image.repository | string | `"nvidia/cuda"` | | +| agentGroups.agent-group-gpu.image.tag | string | `"11.0-base-ubuntu18.04"` | | +| agentGroups.agent-group-gpu.name | string | `"agent-group-gpu"` | | +| agentGroups.agent-group-gpu.nodeSelector | object | `{}` | | +| agentGroups.agent-group-gpu.nvidiaGpusPerAgent | int | `1` | | +| agentGroups.agent-group-gpu.podAnnotations | object | `{}` | | +| agentGroups.agent-group-gpu.queues | string | `"default"` | | +| agentGroups.agent-group-gpu.replicaCount | int | `0` | | +| agentGroups.agent-group-gpu.tolerations | list | `[]` | | | agentservices.affinity | object | `{}` | | | agentservices.agentVersion | string | `""` | | | agentservices.awsAccessKeyId | string | `nil` | | diff --git a/charts/clearml/README.md.gotmpl b/charts/clearml/README.md.gotmpl index fd78885..a8597e3 100644 --- a/charts/clearml/README.md.gotmpl +++ b/charts/clearml/README.md.gotmpl @@ -13,8 +13,6 @@ The **clearml-server** is the backend service infrastructure for [ClearML](https://github.com/allegroai/clearml). It allows multiple users to collaborate and manage their experiments. -By default, *ClearML is set up to work with the ClearML Demo Server, which is open to anyone and resets periodically. -In order to host your own server, you will need to install **clearml-server** and point ClearML to it. **clearml-server** contains the following components: @@ -24,33 +22,59 @@ In order to host your own server, you will need to install **clearml-server** an * Querying experiments history, logs and results * Locally-hosted file server for storing images and models making them easily accessible using the Web-App -## Port Mapping +## Local environment -After **clearml-server** is deployed, the services expose the following node ports: +For development/evaluation it's possible to use [kind](https://kind.sigs.k8s.io). +After installation, following commands will create a complete ClearML insatllation: + +``` +cat < /tmp/clearml-kind.yaml +kind: Cluster +apiVersion: kind.x-k8s.io/v1alpha4 +nodes: +- role: control-plane + extraPortMappings: + - containerPort: 30008 + hostPort: 30008 + listenAddress: "127.0.0.1" + protocol: TCP + - containerPort: 30080 + hostPort: 30080 + listenAddress: "127.0.0.1" + protocol: TCP + - containerPort: 30081 + hostPort: 30081 + listenAddress: "127.0.0.1" + protocol: TCP + extraMounts: + - hostPath: /var/folders/kind/ + containerPath: /var/local-path-provisioner +EOF + +kind create cluster --config /tmp/clearml-kind.yaml + +helm install clearml allegroai/clearml +``` + +After deployment, the services will be exposed on localhost on the following ports: * API server on `30008` * Web server on `30080` * File server on `30081` -## Accessing ClearML Server +## Production cluster environment -Access **clearml-server** by creating a load balancer and domain name with records pointing to the load balancer. +In a production environment it's suggested to install an ingress controller and verify that is working correctly. +During ClearML deployment enable `ingress` section of chart values. +This will create 3 ingress rules: -Once you have a load balancer and domain name set up, follow these steps to configure access to clearml-server on your k8s cluster: +* `app.` +* `files.` +* `api.` -1. Create domain records +(*for example, `app.clearml.mydomainname.com`, `files.clearml.mydomainname.com` and `api.clearml.mydomainname.com`*) - * Create 3 records to be used for Web-App, File server and API access using the following rules: - * `app.` - * `files.` - * `api.` - - (*for example, `app.clearml.mydomainname.com`, `files.clearml.mydomainname.com` and `api.clearml.mydomainname.com`*) -2. Point the records you created to the load balancer -3. Configure the load balancer to redirect traffic coming from the records you created: - * `app.` should be redirected to k8s cluster nodes on port `30080` - * `files.` should be redirected to k8s cluster nodes on port `30081` - * `api.` should be redirected to k8s cluster nodes on port `30008` +Just pointing the domain records to the IP where ingress controller is responding will complete the deployment process. ## Additional Configuration for ClearML Server diff --git a/charts/clearml/values.yaml b/charts/clearml/values.yaml index 3426f59..dfdc1d5 100644 --- a/charts/clearml/values.yaml +++ b/charts/clearml/values.yaml @@ -180,8 +180,40 @@ agentservices: size: 50Gi agentGroups: - agent-group0: - name: agent-group0 + agent-group-cpu: + name: agent-group-cpu + replicaCount: 1 + nvidiaGpusPerAgent: 0 + agentVersion: "" # if set, it *MUST* include comparison operator (e.g. ">=0.16.1") + queues: "default" # multiple queues can be specified separated by a space (e.g. "important_jobs default") + clearmlGitUser: null + clearmlGitPassword: null + clearmlAccessKey: null + clearmlSecretKey: null + awsAccessKeyId: null + awsSecretAccessKey: null + awsDefaultRegion: null + azureStorageAccount: null + azureStorageKey: null + clearmlConfig: |- + sdk { + } + + image: + repository: "ubuntu" + pullPolicy: IfNotPresent + tag: "18.04" + + podAnnotations: {} + + nodeSelector: {} + + tolerations: [] + + affinity: {} + + agent-group-gpu: + name: agent-group-gpu replicaCount: 0 nvidiaGpusPerAgent: 1 agentVersion: "" # if set, it *MUST* include comparison operator (e.g. ">=0.16.1")