Compare commits

...

12 Commits

Author SHA1 Message Date
Valeriano Manassero
df2843ec1d Ingress deprecation management (#24)
* if k8s version is >=1.19 use v1

* bump up version
2021-09-06 12:04:58 +02:00
Valeriano Manassero
618a269c97 Fix service url generation (#21)
* service url generation functions

* use generation functions

* bump up version
2021-08-26 10:58:06 +02:00
Valeriano Manassero
3f215d2d90 Use many ingresses (#20)
* use many ingresses

* bump up version
2021-08-25 14:49:43 +02:00
Valeriano Manassero
03223fc1c1 Use Recreate as strategy (#19)
* use Recreate as strategybump up version

* fix strategy indentation and position

* updatesStrategy configurable

* updateStrategy parameter

* use 2.2.0 instead of patch release
2021-08-17 14:59:13 +02:00
Valeriano Manassero
898089b7fb Fix agent clearml conf (#18)
* fix agent mount clearml.conf

* bump up version
2021-08-14 12:00:23 +02:00
Valeriano Manassero
732bb970aa Configurable prefix ingress (#17)
* ingress configurable prefixes

* chart version bump up

* fix version number
2021-08-12 12:02:26 +03:00
Valeriano Manassero
91d45281fa bump up app version to 1.1.1 (#16) 2021-08-06 14:18:53 +02:00
Valeriano Manassero
28b6e9f4e4 Remove badges from root README since we have them in chart one (#14) 2021-07-27 16:18:02 +02:00
Valeriano Manassero
7f6df85ec5 Bump up version (#13)
* bump up to 1.1.0

* helm-docs update
2021-07-27 15:26:51 +02:00
Valeriano Manassero
97f1077072 Readme improvements (#12)
* better contributing guidelins

* added more info on repo itself
2021-07-27 13:30:55 +02:00
Valeriano Manassero
189de106c9 Kind data folder (#11)
* explain data folder for kind

* bump up version
2021-07-16 07:30:09 +02:00
Valeriano Manassero
d269374a49 One default agent (#10)
* one cpu only agent by default

* helm-docs update

* suggest kind for single done cluster

* bump up version

* fix trailing space
2021-07-15 17:34:29 +02:00
13 changed files with 468 additions and 134 deletions

View File

@@ -1,16 +1,60 @@
# Contributing to AllegroAI helm charts repository
# Guidelines for Contributing
:+1::tada: First off, thank you for taking the time to contribute! :tada::+1:
:+1::tada: Firstly, we thank you for taking the time to contribute! :tada::+1:
This page contains information about reporting issues as well as some tips and guidelines useful to experienced open source contributors.
Contribution comes in many forms:
* Reporting [issues](https://github.com/allegroai/clearml-helm-charts/issues) you've come upon
* Participating in issue discussions in the [issue tracker](https://github.com/allegroai/clearml-helm-charts/issues) and the [ClearML community slack space](https://join.slack.com/t/allegroai-trains/shared_invite/enQtOTQyMTI1MzQxMzE4LTY5NTUxOTY1NmQ1MzQ5MjRhMGRhZmM4ODE5NTNjMTg2NTBlZGQzZGVkMWU3ZDg1MGE1MjQxNDEzMWU2NmVjZmY)
* Suggesting new features or enhancements
* Implementing new features or fixing outstanding issues
## Reporting issues
A great way to contribute to the project is to send a detailed report when you encounter an issue. We always appreciate a well-written, thorough bug report, and will thank you for it!
The following is a set of guidelines for contributing to ClearML.
These are primarily guidelines, not rules.
Use your best judgment and feel free to propose changes to this document in a pull request.
Check that our [issue database](https://github.com/allegroai/clearml-helm-charts/issues) doesn't already include that problem or suggestion before submitting an issue. If you find a match, you can use the "subscribe" button to get notified on updates. Do not leave random "+1" or "I have this too" comments, as they only clutter the discussion, and don't help resolving it. However, if you have ways to reproduce the issue or have additional information that may help resolving the issue, please leave a comment.
## Reporting Issues
By following these guidelines, you help maintainers and the community understand your report, reproduce the behavior, and find related reports.
## Contributing code
Pull requests are always welcome! Not sure if that typo is worth a pull request? Found a bug and know how to fix it? Do it! We appreciate the help. Any significant improvement should be documented as [a GitHub issue](https://github.com/allegroai/clearml-helm-charts/issues) before starting working on it.
Before reporting an issue, please check whether it already appears [here](https://github.com/allegroai/clearml-helm-charts/issues).
If it does, join the on-going discussion instead.
We are always thrilled to receive pull requests and we do our best to process them quickly and provide feedback
**Note**: If you find a **Closed** issue that may be the same issue which you are currently experiencing,
then open a **New** issue and include a link to the original (Closed) issue in the body of your new one.
When reporting an issue, please include as much detail as possible: explain the problem and include additional details to help maintainers reproduce the problem:
* **Use a clear and descriptive title** for the issue to identify the problem.
* **Describe the exact steps necessary to reproduce the problem** in as much detail as possible. Please do not just summarize what you did. Make sure to explain how you did it.
* **Provide the specific environment setup.** Include the `pip freeze` output, specific environment variables, Python version, and other relevant information.
* **Provide specific examples to demonstrate the steps.** Include links to files or GitHub projects, or copy/paste snippets which you use in those examples.
* **If you are reporting any ClearML crash,** include a crash report with a stack trace from the operating system. Make sure to add the crash report in the issue and place it in a [code block](https://help.github.com/en/articles/getting-started-with-writing-and-formatting-on-github#multiple-lines),
a [file attachment](https://help.github.com/articles/file-attachments-on-issues-and-pull-requests/), or just put it in a [gist](https://gist.github.com/) (and provide link to that gist).
* **Describe the behavior you observed after following the steps** and the exact problem with that behavior.
* **Explain which behavior you expected to see and why.**
* **For Web-App issues, please include screenshots and animated GIFs** which recreate the described steps and clearly demonstrate the problem. You can use [LICEcap](https://www.cockos.com/licecap/) to record GIFs on macOS and Windows, and [silentcast](https://github.com/colinkeenan/silentcast) or [byzanz](https://github.com/threedaymonk/byzanz) on Linux.
## Suggesting New Features and Enhancements
By following these guidelines, you help maintainers and the community understand your suggestion and find related suggestions.
Enhancement suggestions are tracked as GitHub issues. After you determine which repository your enhancement suggestion is related to, create an issue on that repository and provide the following:
* **A clear and descriptive title** for the issue to identify the suggestion.
* **A step-by-step description of the suggested enhancement** in as much detail as possible.
* **Specific examples to demonstrate the steps.** Include copy/pasteable snippets which you use in those examples as [Markdown code blocks](https://help.github.com/articles/markdown-basics/#multiple-lines).
* **Describe the current behavior and explain which behavior you expected to see instead and why.**
* **Include screenshots or animated GIFs** which help you demonstrate the steps or point out the part of ClearML which the suggestion is related to. You can use [LICEcap](https://www.cockos.com/licecap/) to record GIFs on macOS and Windows, and [silentcast](https://github.com/colinkeenan/silentcast) or [byzanz](https://github.com/threedaymonk/byzanz) on Linux.
## Pull Requests
Before you submit a new PR:
* Verify the work you plan to merge addresses an existing [issue](https://github.com/allegroai/clearml-helm-charts/issues) (If not, open a new one)
* Check related discussions in the [ClearML slack community](https://join.slack.com/t/allegroai-trains/shared_invite/enQtOTQyMTI1MzQxMzE4LTY5NTUxOTY1NmQ1MzQ5MjRhMGRhZmM4ODE5NTNjMTg2NTBlZGQzZGVkMWU3ZDg1MGE1MjQxNDEzMWU2NmVjZmY) (Or start your own discussion on the `#clearml-dev` channel)
* Make sure your code conforms to the ClearML coding standards by running:
`flake8 --max-line-length=120 --statistics --show-source --extend-ignore=E501 ./clearml*`
In your PR include:
* A reference to the issue it addresses
* A brief description of the approach you've taken for implementing

View File

@@ -1,13 +1,55 @@
# ClearML Helm Charts Library for Kubernetes
## Auto-Magical Experiment Manager & Version Control for AI
Helm charts provided by [Allegro AI](https://clear.ml), ready to launch on Kubernetes using [Kubernetes Helm](https://github.com/helm/helm).
## Introduction
The **clearml-server** is the backend service infrastructure for [ClearML](https://github.com/allegroai/clearml).
It allows multiple users to collaborate and manage their experiments.
By default, **ClearML** is set up to work with the **ClearML** demo server, which is open to anyone and resets periodically.
In order to host your own server, you will need to install **clearml-server** and point **ClearML** to it.
**clearml-server** contains the following components:
* The **ClearML** Web-App, a single-page UI for experiment management and browsing
* RESTful API for:
* Documenting and logging experiment information, statistics and results
* Querying experiments history, logs and results
* Locally-hosted file server for storing images and models making them easily accessible using the Web-App
Use this repository to deploy **clearml-server** on Kubernetes clusters.
## Provided in this repository
### [All around Helm Chart](https://github.com/allegroai/clearml-helm-charts/tree/main/charts/clearml)
## Who We Are
ClearML is supported by the team behind *allegro.ai*,
where we build deep learning pipelines and infrastructure for enterprise companies.
We built ClearML to track and control the glorious but messy process of training production-grade deep learning models.
We are committed to vigorously supporting and expanding the capabilities of ClearML.
We promise to always be backwardly compatible, making sure all your logs, data and pipelines
will always upgrade with you.
## License
Server Side Public License, Version 1 (see the [LICENSE](https://en.wikipedia.org/wiki/Server_Side_Public_License) for more information)
## Requirements
### Setup a Kubernetes Cluster
For setting up Kubernetes on various platforms refer to the Kubernetes [getting started guide](http://kubernetes.io/docs/getting-started-guides/).
### Setup a single node LOCAL Kubernetes on laptop/desktop
For setting up Kubernetes on your laptop/desktop we suggest [kind](https://kind.sigs.k8s.io).
### Install Helm
Helm is a tool for managing Kubernetes charts. Charts are packages of pre-configured Kubernetes resources.
@@ -22,3 +64,20 @@ $ helm repo update
$ helm search repo allegroai
$ helm install <release-name> allegroai/<chart>
```
## Documentation, Community & Support
More information in the [official documentation](https://allegro.ai/clearml/docs) and [on YouTube](https://www.youtube.com/c/ClearML).
If you have any questions: post on our [Slack Channel](https://join.slack.com/t/clearml/shared_invite/zt-c0t13pty-aVUZZW1TSSSg2vyIGVPBhg), or tag your questions on [stackoverflow](https://stackoverflow.com/questions/tagged/clearml) with '**[clearml](https://stackoverflow.com/questions/tagged/clearml)**' tag (*previously [trains](https://stackoverflow.com/questions/tagged/trains) tag*).
For feature requests or bug reports, please use [GitHub issues](https://github.com/allegroai/clearml-helm-charts/issues).
Additionally, you can always find us at *clearml@allegro.ai*
## Contributing
**PRs are always welcomed** :heart: See more details in the ClearML [Guidelines for Contributing](https://github.com/allegroai/clearml-helm-charts/blob/main/CONTRIBUTING.md).
_May the force (and the goddess of learning rates) be with you!_

View File

@@ -2,8 +2,8 @@ apiVersion: v2
name: clearml
description: MLOps platform
type: application
version: "2.0.0-alpha2"
appVersion: "1.0.2"
version: "2.2.3"
appVersion: "1.1.1"
home: https://clear.ml
icon: https://raw.githubusercontent.com/allegroai/clearml/master/docs/clearml-logo.svg
sources:

View File

@@ -1,6 +1,6 @@
# ClearML Ecosystem for Kubernetes
![Version: 2.0.0-alpha2](https://img.shields.io/badge/Version-2.0.0--alpha2-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: 1.0.2](https://img.shields.io/badge/AppVersion-1.0.2-informational?style=flat-square)
![Version: 2.2.3](https://img.shields.io/badge/Version-2.2.3-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: 1.1.1](https://img.shields.io/badge/AppVersion-1.1.1-informational?style=flat-square)
MLOps platform
@@ -16,8 +16,6 @@ MLOps platform
The **clearml-server** is the backend service infrastructure for [ClearML](https://github.com/allegroai/clearml).
It allows multiple users to collaborate and manage their experiments.
By default, *ClearML is set up to work with the ClearML Demo Server, which is open to anyone and resets periodically.
In order to host your own server, you will need to install **clearml-server** and point ClearML to it.
**clearml-server** contains the following components:
@@ -27,33 +25,63 @@ In order to host your own server, you will need to install **clearml-server** an
* Querying experiments history, logs and results
* Locally-hosted file server for storing images and models making them easily accessible using the Web-App
## Port Mapping
## Local environment
After **clearml-server** is deployed, the services expose the following node ports:
For development/evaluation it's possible to use [kind](https://kind.sigs.k8s.io).
After installation, following commands will create a complete ClearML insatllation:
```
mkdir -pm 777 /tmp/clearml-kind
cat <<EOF > /tmp/clearml-kind.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
extraPortMappings:
- containerPort: 30008
hostPort: 30008
listenAddress: "127.0.0.1"
protocol: TCP
- containerPort: 30080
hostPort: 30080
listenAddress: "127.0.0.1"
protocol: TCP
- containerPort: 30081
hostPort: 30081
listenAddress: "127.0.0.1"
protocol: TCP
extraMounts:
- hostPath: /tmp/clearml-kind/
containerPath: /var/local-path-provisioner
EOF
kind create cluster --config /tmp/clearml-kind.yaml
helm install clearml allegroai/clearml
```
After deployment, the services will be exposed on localhost on the following ports:
* API server on `30008`
* Web server on `30080`
* File server on `30081`
## Accessing ClearML Server
Data persisted in every Kubernetes volume by ClearML will be accessible in /tmp/clearml-kind folder on the host.
Access **clearml-server** by creating a load balancer and domain name with records pointing to the load balancer.
## Production cluster environment
Once you have a load balancer and domain name set up, follow these steps to configure access to clearml-server on your k8s cluster:
In a production environment it's suggested to install an ingress controller and verify that is working correctly.
During ClearML deployment enable `ingress` section of chart values.
This will create 3 ingress rules:
1. Create domain records
* `app.<your domain name>`
* `files.<your domain name>`
* `api.<your domain name>`
* Create 3 records to be used for Web-App, File server and API access using the following rules:
* `app.<your domain name>`
* `files.<your domain name>`
* `api.<your domain name>`
(*for example, `app.clearml.mydomainname.com`, `files.clearml.mydomainname.com` and `api.clearml.mydomainname.com`*)
2. Point the records you created to the load balancer
3. Configure the load balancer to redirect traffic coming from the records you created:
* `app.<your domain name>` should be redirected to k8s cluster nodes on port `30080`
* `files.<your domain name>` should be redirected to k8s cluster nodes on port `30081`
* `api.<your domain name>` should be redirected to k8s cluster nodes on port `30008`
(*for example, `app.clearml.mydomainname.com`, `files.clearml.mydomainname.com` and `api.clearml.mydomainname.com`*)
Just pointing the domain records to the IP where ingress controller is responding will complete the deployment process.
## Additional Configuration for ClearML Server
@@ -81,28 +109,52 @@ For detailed instructions, see the [Optional Configuration](https://github.com/a
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| agentGroups.agent-group0.affinity | object | `{}` | |
| agentGroups.agent-group0.agentVersion | string | `""` | |
| agentGroups.agent-group0.awsAccessKeyId | string | `nil` | |
| agentGroups.agent-group0.awsDefaultRegion | string | `nil` | |
| agentGroups.agent-group0.awsSecretAccessKey | string | `nil` | |
| agentGroups.agent-group0.azureStorageAccount | string | `nil` | |
| agentGroups.agent-group0.azureStorageKey | string | `nil` | |
| agentGroups.agent-group0.clearmlAccessKey | string | `nil` | |
| agentGroups.agent-group0.clearmlConfig | string | `"sdk {\n}"` | |
| agentGroups.agent-group0.clearmlGitPassword | string | `nil` | |
| agentGroups.agent-group0.clearmlGitUser | string | `nil` | |
| agentGroups.agent-group0.clearmlSecretKey | string | `nil` | |
| agentGroups.agent-group0.image.pullPolicy | string | `"IfNotPresent"` | |
| agentGroups.agent-group0.image.repository | string | `"nvidia/cuda"` | |
| agentGroups.agent-group0.image.tag | string | `"11.0-base-ubuntu18.04"` | |
| agentGroups.agent-group0.name | string | `"agent-group0"` | |
| agentGroups.agent-group0.nodeSelector | object | `{}` | |
| agentGroups.agent-group0.nvidiaGpusPerAgent | int | `1` | |
| agentGroups.agent-group0.podAnnotations | object | `{}` | |
| agentGroups.agent-group0.queues | string | `"default"` | |
| agentGroups.agent-group0.replicaCount | int | `0` | |
| agentGroups.agent-group0.tolerations | list | `[]` | |
| agentGroups.agent-group-cpu.affinity | object | `{}` | |
| agentGroups.agent-group-cpu.agentVersion | string | `""` | |
| agentGroups.agent-group-cpu.awsAccessKeyId | string | `nil` | |
| agentGroups.agent-group-cpu.awsDefaultRegion | string | `nil` | |
| agentGroups.agent-group-cpu.awsSecretAccessKey | string | `nil` | |
| agentGroups.agent-group-cpu.azureStorageAccount | string | `nil` | |
| agentGroups.agent-group-cpu.azureStorageKey | string | `nil` | |
| agentGroups.agent-group-cpu.clearmlAccessKey | string | `nil` | |
| agentGroups.agent-group-cpu.clearmlConfig | string | `"sdk {\n}"` | |
| agentGroups.agent-group-cpu.clearmlGitPassword | string | `nil` | |
| agentGroups.agent-group-cpu.clearmlGitUser | string | `nil` | |
| agentGroups.agent-group-cpu.clearmlSecretKey | string | `nil` | |
| agentGroups.agent-group-cpu.image.pullPolicy | string | `"IfNotPresent"` | |
| agentGroups.agent-group-cpu.image.repository | string | `"ubuntu"` | |
| agentGroups.agent-group-cpu.image.tag | string | `"18.04"` | |
| agentGroups.agent-group-cpu.name | string | `"agent-group-cpu"` | |
| agentGroups.agent-group-cpu.nodeSelector | object | `{}` | |
| agentGroups.agent-group-cpu.nvidiaGpusPerAgent | int | `0` | |
| agentGroups.agent-group-cpu.podAnnotations | object | `{}` | |
| agentGroups.agent-group-cpu.queues | string | `"default"` | |
| agentGroups.agent-group-cpu.replicaCount | int | `1` | |
| agentGroups.agent-group-cpu.tolerations | list | `[]` | |
| agentGroups.agent-group-cpu.updateStrategy | string | `"Recreate"` | |
| agentGroups.agent-group-gpu.affinity | object | `{}` | |
| agentGroups.agent-group-gpu.agentVersion | string | `""` | |
| agentGroups.agent-group-gpu.awsAccessKeyId | string | `nil` | |
| agentGroups.agent-group-gpu.awsDefaultRegion | string | `nil` | |
| agentGroups.agent-group-gpu.awsSecretAccessKey | string | `nil` | |
| agentGroups.agent-group-gpu.azureStorageAccount | string | `nil` | |
| agentGroups.agent-group-gpu.azureStorageKey | string | `nil` | |
| agentGroups.agent-group-gpu.clearmlAccessKey | string | `nil` | |
| agentGroups.agent-group-gpu.clearmlConfig | string | `"sdk {\n}"` | |
| agentGroups.agent-group-gpu.clearmlGitPassword | string | `nil` | |
| agentGroups.agent-group-gpu.clearmlGitUser | string | `nil` | |
| agentGroups.agent-group-gpu.clearmlSecretKey | string | `nil` | |
| agentGroups.agent-group-gpu.image.pullPolicy | string | `"IfNotPresent"` | |
| agentGroups.agent-group-gpu.image.repository | string | `"nvidia/cuda"` | |
| agentGroups.agent-group-gpu.image.tag | string | `"11.0-base-ubuntu18.04"` | |
| agentGroups.agent-group-gpu.name | string | `"agent-group-gpu"` | |
| agentGroups.agent-group-gpu.nodeSelector | object | `{}` | |
| agentGroups.agent-group-gpu.nvidiaGpusPerAgent | int | `1` | |
| agentGroups.agent-group-gpu.podAnnotations | object | `{}` | |
| agentGroups.agent-group-gpu.queues | string | `"default"` | |
| agentGroups.agent-group-gpu.replicaCount | int | `0` | |
| agentGroups.agent-group-gpu.tolerations | list | `[]` | |
| agentGroups.agent-group-gpu.updateStrategy | string | `"Recreate"` | |
| agentservices.affinity | object | `{}` | |
| agentservices.agentVersion | string | `""` | |
| agentservices.awsAccessKeyId | string | `nil` | |
@@ -133,7 +185,7 @@ For detailed instructions, see the [Optional Configuration](https://github.com/a
| apiserver.extraEnvs | list | `[]` | |
| apiserver.image.pullPolicy | string | `"IfNotPresent"` | |
| apiserver.image.repository | string | `"allegroai/clearml"` | |
| apiserver.image.tag | string | `"1.0.2"` | |
| apiserver.image.tag | string | `"1.1.1"` | |
| apiserver.livenessDelay | int | `60` | |
| apiserver.nodeSelector | object | `{}` | |
| apiserver.podAnnotations | object | `{}` | |
@@ -189,7 +241,7 @@ For detailed instructions, see the [Optional Configuration](https://github.com/a
| fileserver.extraEnvs | list | `[]` | |
| fileserver.image.pullPolicy | string | `"IfNotPresent"` | |
| fileserver.image.repository | string | `"allegroai/clearml"` | |
| fileserver.image.tag | string | `"1.0.2"` | |
| fileserver.image.tag | string | `"1.1.1"` | |
| fileserver.nodeSelector | object | `{}` | |
| fileserver.podAnnotations | object | `{}` | |
| fileserver.replicaCount | int | `1` | |
@@ -202,6 +254,9 @@ For detailed instructions, see the [Optional Configuration](https://github.com/a
| ingress.annotations | object | `{}` | |
| ingress.enabled | bool | `false` | |
| ingress.host | string | `""` | |
| ingress.hostPrefixApi | string | `"api."` | |
| ingress.hostPrefixApp | string | `"app."` | |
| ingress.hostPrefixFiles | string | `"files."` | |
| ingress.name | string | `"clearml-server-ingress"` | |
| ingress.tls.secretName | string | `""` | |
| mongodb.architecture | string | `"standalone"` | |
@@ -228,7 +283,7 @@ For detailed instructions, see the [Optional Configuration](https://github.com/a
| webserver.extraEnvs | list | `[]` | |
| webserver.image.pullPolicy | string | `"IfNotPresent"` | |
| webserver.image.repository | string | `"allegroai/clearml"` | |
| webserver.image.tag | string | `"1.0.2"` | |
| webserver.image.tag | string | `"1.1.1"` | |
| webserver.nodeSelector | object | `{}` | |
| webserver.podAnnotations | object | `{}` | |
| webserver.replicaCount | int | `1` | |

View File

@@ -13,8 +13,6 @@
The **clearml-server** is the backend service infrastructure for [ClearML](https://github.com/allegroai/clearml).
It allows multiple users to collaborate and manage their experiments.
By default, *ClearML is set up to work with the ClearML Demo Server, which is open to anyone and resets periodically.
In order to host your own server, you will need to install **clearml-server** and point ClearML to it.
**clearml-server** contains the following components:
@@ -24,33 +22,63 @@ In order to host your own server, you will need to install **clearml-server** an
* Querying experiments history, logs and results
* Locally-hosted file server for storing images and models making them easily accessible using the Web-App
## Port Mapping
## Local environment
After **clearml-server** is deployed, the services expose the following node ports:
For development/evaluation it's possible to use [kind](https://kind.sigs.k8s.io).
After installation, following commands will create a complete ClearML insatllation:
```
mkdir -pm 777 /tmp/clearml-kind
cat <<EOF > /tmp/clearml-kind.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
extraPortMappings:
- containerPort: 30008
hostPort: 30008
listenAddress: "127.0.0.1"
protocol: TCP
- containerPort: 30080
hostPort: 30080
listenAddress: "127.0.0.1"
protocol: TCP
- containerPort: 30081
hostPort: 30081
listenAddress: "127.0.0.1"
protocol: TCP
extraMounts:
- hostPath: /tmp/clearml-kind/
containerPath: /var/local-path-provisioner
EOF
kind create cluster --config /tmp/clearml-kind.yaml
helm install clearml allegroai/clearml
```
After deployment, the services will be exposed on localhost on the following ports:
* API server on `30008`
* Web server on `30080`
* File server on `30081`
## Accessing ClearML Server
Data persisted in every Kubernetes volume by ClearML will be accessible in /tmp/clearml-kind folder on the host.
Access **clearml-server** by creating a load balancer and domain name with records pointing to the load balancer.
## Production cluster environment
Once you have a load balancer and domain name set up, follow these steps to configure access to clearml-server on your k8s cluster:
In a production environment it's suggested to install an ingress controller and verify that is working correctly.
During ClearML deployment enable `ingress` section of chart values.
This will create 3 ingress rules:
1. Create domain records
* `app.<your domain name>`
* `files.<your domain name>`
* `api.<your domain name>`
* Create 3 records to be used for Web-App, File server and API access using the following rules:
* `app.<your domain name>`
* `files.<your domain name>`
* `api.<your domain name>`
(*for example, `app.clearml.mydomainname.com`, `files.clearml.mydomainname.com` and `api.clearml.mydomainname.com`*)
2. Point the records you created to the load balancer
3. Configure the load balancer to redirect traffic coming from the records you created:
* `app.<your domain name>` should be redirected to k8s cluster nodes on port `30080`
* `files.<your domain name>` should be redirected to k8s cluster nodes on port `30081`
* `api.<your domain name>` should be redirected to k8s cluster nodes on port `30008`
(*for example, `app.clearml.mydomainname.com`, `files.clearml.mydomainname.com` and `api.clearml.mydomainname.com`*)
Just pointing the domain records to the IP where ingress controller is responding will complete the deployment process.
## Additional Configuration for ClearML Server

View File

@@ -95,3 +95,48 @@ Create the name of the service account to use
{{- default "default" .Values.serviceAccount.name }}
{{- end }}
{{- end }}
{{/*
Create the name of the App service to use
*/}}
{{- define "clearml.serviceApp" -}}
{{- if .Values.ingress.enabled }}
{{- if .Values.ingress.tls.secretName }}
{{- printf "%s%s%s" "https://" .Values.ingress.hostPrefixApp .Values.ingress.host }}
{{- else }}
{{- printf "%s%s%s" "http://" .Values.ingress.hostPrefixApp .Values.ingress.host }}
{{- end }}
{{- else }}
{{- printf "%s%s%s%s" "http://" (include "clearml.fullname" .) "-webserver:" (.Values.webserver.service.port | quote) }}
{{- end }}
{{- end }}
{{/*
Create the name of the Api service to use
*/}}
{{- define "clearml.serviceApi" -}}
{{- if .Values.ingress.enabled }}
{{- if .Values.ingress.tls.secretName }}
{{- printf "%s%s%s" "https://" .Values.ingress.hostPrefixApi .Values.ingress.host }}
{{- else }}
{{- printf "%s%s%s" "http://" .Values.ingress.hostPrefixApi .Values.ingress.host }}
{{- end }}
{{- else }}
{{- printf "%s%s%s%s" "http://" (include "clearml.fullname" .) "-apiserver:" (.Values.apiserver.service.port | quote) }}
{{- end }}
{{- end }}
{{/*
Create the name of the Files service to use
*/}}
{{- define "clearml.serviceFiles" -}}
{{- if .Values.ingress.enabled }}
{{- if .Values.ingress.tls.secretName }}
{{- printf "%s%s%s" "https://" .Values.ingress.hostPrefixFiles .Values.ingress.host }}
{{- else }}
{{- printf "%s%s%s" "http://" .Values.ingress.hostPrefixFiles .Values.ingress.host }}
{{- end }}
{{- else }}
{{- printf "%s%s%s%s" "http://" (include "clearml.fullname" .) "-fileserver:" (.Values.fileserver.service.port | quote) }}
{{- end }}
{{- end }}

View File

@@ -9,6 +9,8 @@ metadata:
{{- include "clearml.labels" $ | nindent 4 }}
spec:
replicas: {{ .replicaCount }}
strategy:
type: {{ .updateStrategy }}
selector:
matchLabels:
{{- include "clearml.selectorLabelsAgent" $ | nindent 6 }}
@@ -38,7 +40,7 @@ spec:
- -c
- >
set -x;
while [ $(curl -sw '%{http_code}' "http://{{ include "clearml.fullname" $ }}-apiserver:{{ $.Values.apiserver.service.port }}/debug.ping" -o /dev/null) -ne 200 ] ; do
while [ $(curl -sw '%{http_code}' "{{ include "clearml.serviceApi" $ }}/debug.ping" -o /dev/null) -ne 200 ] ; do
echo "waiting for apiserver" ;
sleep 5 ;
done
@@ -54,11 +56,11 @@ spec:
{{ .nvidiaGpusPerAgent }}
env:
- name: CLEARML_API_HOST
value: 'http://{{ include "clearml.fullname" $ }}-apiserver:{{ $.Values.apiserver.service.port }}'
value: {{ include "clearml.serviceApi" $ }}
- name: CLEARML_WEB_HOST
value: 'http://{{ include "clearml.fullname" $ }}-webserver:{{ $.Values.webserver.service.port }}'
value: {{ include "clearml.serviceApp" $ }}
- name: CLEARML_FILES_HOST
value: 'http://{{ include "clearml.fullname" $ }}-fileserver:{{ $.Values.fileserver.service.port }}'
value: {{ include "clearml.serviceFiles" $ }}
- name: CLEARML_AGENT_GIT_USER
value: {{ .clearmlGitUser}}
- name: CLEARML_AGENT_GIT_PASS
@@ -91,6 +93,13 @@ spec:
python3 -m pip install -U pip ;
python3 -m pip install clearml-agent{{ .agentVersion}} ;
CLEARML_AGENT_K8S_HOST_MOUNT=/root/.clearml:/root/.clearml clearml-agent daemon --queue {{ .queues}}"
{{ if .clearmlConfig }}
volumeMounts:
- name: agent-clearml-conf-volume
mountPath: /root/clearml.conf
subPath: clearml.conf
readOnly: true
{{- end }}
{{- with .nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 8 }}

View File

@@ -30,7 +30,7 @@ spec:
- -c
- >
set -x;
while [ $(curl -sw '%{http_code}' "http://{{ include "clearml.fullname" . }}-apiserver:{{ .Values.apiserver.service.port }}/debug.ping" -o /dev/null) -ne 200 ] ; do
while [ $(curl -sw '%{http_code}' "{{ include "clearml.serviceApi" $ }}/debug.ping" -o /dev/null) -ne 200 ] ; do
echo "waiting for apiserver" ;
sleep 5 ;
done
@@ -42,7 +42,7 @@ spec:
- name: CLEARML_HOST_IP
value: {{ .Values.agentservices.clearmlHostIp }}
- name: CLEARML_API_HOST
value: "http://{{ include "clearml.fullname" . }}-apiserver:{{ .Values.apiserver.service.port }}"
value: {{ include "clearml.serviceApi" $ }}
- name: CLEARML_WEB_HOST
value: {{ .Values.agentservices.clearmlWebHost }}
- name: CLEARML_FILES_HOST

View File

@@ -0,0 +1,35 @@
{{- if .Values.ingress.enabled -}}
{{- $fullName := include "clearml.fullname" . -}}
{{- if semverCompare ">=1.19-0" .Capabilities.KubeVersion.GitVersion -}}
apiVersion: networking.k8s.io/v1
{{- else if semverCompare ">=1.14-0" .Capabilities.KubeVersion.GitVersion -}}
apiVersion: networking.k8s.io/v1beta1
{{- else -}}
apiVersion: extensions/v1beta1
{{- end }}
kind: Ingress
metadata:
name: {{ $fullName }}-api
labels:
{{- include "clearml.labels" . | nindent 4 }}
{{- with .Values.ingress.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
spec:
{{- if .Values.ingress.tls.secretName }}
tls:
- hosts:
- "{{ .Values.ingress.hostPrefixApi }}{{ .Values.ingress.host }}"
secretName: {{ .Values.ingress.tls.secretName }}
{{- end }}
rules:
- host: "{{ .Values.ingress.hostPrefixApi }}{{ .Values.ingress.host }}"
http:
paths:
- path: "/"
pathType: Prefix
backend:
serviceName: {{ include "clearml.fullname" . }}-apiserver
servicePort: {{ .Values.apiserver.service.port }}
{{- end }}

View File

@@ -0,0 +1,35 @@
{{- if .Values.ingress.enabled -}}
{{- $fullName := include "clearml.fullname" . -}}
{{- if semverCompare ">=1.19-0" .Capabilities.KubeVersion.GitVersion -}}
apiVersion: networking.k8s.io/v1
{{- else if semverCompare ">=1.14-0" .Capabilities.KubeVersion.GitVersion -}}
apiVersion: networking.k8s.io/v1beta1
{{- else -}}
apiVersion: extensions/v1beta1
{{- end }}
kind: Ingress
metadata:
name: {{ $fullName }}-app
labels:
{{- include "clearml.labels" . | nindent 4 }}
{{- with .Values.ingress.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
spec:
{{- if .Values.ingress.tls.secretName }}
tls:
- hosts:
- "{{ .Values.ingress.hostPrefixApp }}{{ .Values.ingress.host }}"
secretName: {{ .Values.ingress.tls.secretName }}
{{- end }}
rules:
- host: "{{ .Values.ingress.hostPrefixApp }}{{ .Values.ingress.host }}"
http:
paths:
- path: "/"
pathType: Prefix
backend:
serviceName: {{ include "clearml.fullname" . }}-webserver
servicePort: {{ .Values.webserver.service.port }}
{{- end }}

View File

@@ -0,0 +1,35 @@
{{- if .Values.ingress.enabled -}}
{{- $fullName := include "clearml.fullname" . -}}
{{- if semverCompare ">=1.19-0" .Capabilities.KubeVersion.GitVersion -}}
apiVersion: networking.k8s.io/v1
{{- else if semverCompare ">=1.14-0" .Capabilities.KubeVersion.GitVersion -}}
apiVersion: networking.k8s.io/v1beta1
{{- else -}}
apiVersion: extensions/v1beta1
{{- end }}
kind: Ingress
metadata:
name: {{ $fullName }}-files
labels:
{{- include "clearml.labels" . | nindent 4 }}
{{- with .Values.ingress.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
spec:
{{- if .Values.ingress.tls.secretName }}
tls:
- hosts:
- "{{ .Values.ingress.hostPrefixFiles }}{{ .Values.ingress.host }}"
secretName: {{ .Values.ingress.tls.secretName }}
{{- end }}
rules:
- host: "{{ .Values.ingress.hostPrefixFiles }}{{ .Values.ingress.host }}"
http:
paths:
- path: "/"
pathType: Prefix
backend:
serviceName: {{ include "clearml.fullname" . }}-fileserver
servicePort: {{ .Values.fileserver.service.port }}
{{- end }}

View File

@@ -1,48 +0,0 @@
{{- if .Values.ingress.enabled -}}
{{- $fullName := include "clearml.fullname" . -}}
{{- if semverCompare ">=1.14-0" .Capabilities.KubeVersion.GitVersion -}}
apiVersion: networking.k8s.io/v1beta1
{{- else -}}
apiVersion: extensions/v1beta1
{{- end }}
kind: Ingress
metadata:
name: {{ $fullName }}
labels:
{{- include "clearml.labels" . | nindent 4 }}
{{- with .Values.ingress.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
spec:
{{- if .Values.ingress.tls.secretName }}
tls:
- hosts:
- "app.{{ .Values.ingress.host }}"
- "files.{{ .Values.ingress.host }}"
- "api.{{ .Values.ingress.host }}"
secretName: {{ .Values.ingress.tls.secretName }}
{{- end }}
rules:
- host: "app.{{ .Values.ingress.host }}"
http:
paths:
- path: "/*"
backend:
serviceName: {{ include "clearml.fullname" . }}-webserver
servicePort: {{ .Values.webserver.service.port }}
- host: "api.{{ .Values.ingress.host }}"
http:
paths:
- path: "/*"
backend:
serviceName: {{ include "clearml.fullname" . }}-apiserver
servicePort: {{ .Values.apiserver.service.port }}
- host: "files.{{ .Values.ingress.host }}"
http:
paths:
- path: "/*"
backend:
serviceName: {{ include "clearml.fullname" . }}-fileserver
servicePort: {{ .Values.fileserver.service.port }}
{{- end }}

View File

@@ -5,6 +5,9 @@ ingress:
name: clearml-server-ingress
annotations: {}
host: ""
hostPrefixApp: "app."
hostPrefixApi: "api."
hostPrefixFiles: "files."
tls:
secretName: ""
@@ -26,7 +29,7 @@ apiserver:
image:
repository: "allegroai/clearml"
pullPolicy: IfNotPresent
tag: "1.0.2"
tag: "1.1.1"
extraEnvs: []
@@ -67,7 +70,7 @@ fileserver:
image:
repository: "allegroai/clearml"
pullPolicy: IfNotPresent
tag: "1.0.2"
tag: "1.1.1"
extraEnvs: []
@@ -108,7 +111,7 @@ webserver:
image:
repository: "allegroai/clearml"
pullPolicy: IfNotPresent
tag: "1.0.2"
tag: "1.1.1"
podAnnotations: {}
@@ -180,9 +183,43 @@ agentservices:
size: 50Gi
agentGroups:
agent-group0:
name: agent-group0
agent-group-cpu:
name: agent-group-cpu
replicaCount: 1
updateStrategy: Recreate
nvidiaGpusPerAgent: 0
agentVersion: "" # if set, it *MUST* include comparison operator (e.g. ">=0.16.1")
queues: "default" # multiple queues can be specified separated by a space (e.g. "important_jobs default")
clearmlGitUser: null
clearmlGitPassword: null
clearmlAccessKey: null
clearmlSecretKey: null
awsAccessKeyId: null
awsSecretAccessKey: null
awsDefaultRegion: null
azureStorageAccount: null
azureStorageKey: null
clearmlConfig: |-
sdk {
}
image:
repository: "ubuntu"
pullPolicy: IfNotPresent
tag: "18.04"
podAnnotations: {}
nodeSelector: {}
tolerations: []
affinity: {}
agent-group-gpu:
name: agent-group-gpu
replicaCount: 0
updateStrategy: Recreate
nvidiaGpusPerAgent: 1
agentVersion: "" # if set, it *MUST* include comparison operator (e.g. ">=0.16.1")
queues: "default" # multiple queues can be specified separated by a space (e.g. "important_jobs default")