Compare commits

...

20 Commits

Author SHA1 Message Date
Valeriano Manassero
979e73fe3d Fix ingress compat (#29)
* fix ingress compatibility with different k8s version

* bump up version
2021-09-16 10:54:25 +02:00
Valeriano Manassero
7352f35836 Helpers fix (#28)
* fix wrong service names

* bump up version
2021-09-16 09:11:58 +02:00
Valeriano Manassero
82ad17860d New ingress style (#27)
* new ingress style

* bump up version

* hostName fix

* helm-docs update
2021-09-16 08:51:07 +02:00
Valeriano Manassero
aa761dd450 Agent enable switch (#26)
* enable/disable switch

* bump up chart
2021-09-15 08:13:01 +02:00
Valeriano Manassero
7ff2f94d1a Apiserver configmap (#25)
* metadata name fix

* use toString

* use configmap for apiserver configs

* bump up version

* indentation fix

* fix trailing whitespaces
2021-09-14 15:43:10 +02:00
Valeriano Manassero
618a269c97 Fix service url generation (#21)
* service url generation functions

* use generation functions

* bump up version
2021-08-26 10:58:06 +02:00
Valeriano Manassero
3f215d2d90 Use many ingresses (#20)
* use many ingresses

* bump up version
2021-08-25 14:49:43 +02:00
Valeriano Manassero
03223fc1c1 Use Recreate as strategy (#19)
* use Recreate as strategybump up version

* fix strategy indentation and position

* updatesStrategy configurable

* updateStrategy parameter

* use 2.2.0 instead of patch release
2021-08-17 14:59:13 +02:00
Valeriano Manassero
898089b7fb Fix agent clearml conf (#18)
* fix agent mount clearml.conf

* bump up version
2021-08-14 12:00:23 +02:00
Valeriano Manassero
732bb970aa Configurable prefix ingress (#17)
* ingress configurable prefixes

* chart version bump up

* fix version number
2021-08-12 12:02:26 +03:00
Valeriano Manassero
91d45281fa bump up app version to 1.1.1 (#16) 2021-08-06 14:18:53 +02:00
Valeriano Manassero
28b6e9f4e4 Remove badges from root README since we have them in chart one (#14) 2021-07-27 16:18:02 +02:00
Valeriano Manassero
7f6df85ec5 Bump up version (#13)
* bump up to 1.1.0

* helm-docs update
2021-07-27 15:26:51 +02:00
Valeriano Manassero
97f1077072 Readme improvements (#12)
* better contributing guidelins

* added more info on repo itself
2021-07-27 13:30:55 +02:00
Valeriano Manassero
189de106c9 Kind data folder (#11)
* explain data folder for kind

* bump up version
2021-07-16 07:30:09 +02:00
Valeriano Manassero
d269374a49 One default agent (#10)
* one cpu only agent by default

* helm-docs update

* suggest kind for single done cluster

* bump up version

* fix trailing space
2021-07-15 17:34:29 +02:00
Valeriano Manassero
cc8789d71f Clearml chart readme improvements (#7)
* clearml chart LICENSE

* bump up version

* improved readme

* clearml chart name fix
2021-07-07 11:44:21 +02:00
Valeriano Manassero
6a2e3ed47e typo fixes (#6) 2021-07-07 09:39:04 +02:00
Valeriano Manassero
873fb6f7f0 added repo update reference (#5) 2021-07-07 09:28:30 +02:00
Valeriano Manassero
d6e967c9f5 filter release trigger (#4) 2021-07-07 09:20:48 +02:00
18 changed files with 634 additions and 149 deletions

View File

@@ -4,6 +4,8 @@ on:
push:
branches:
- main
paths:
- 'charts/**'
jobs:
release:

View File

@@ -1,16 +1,60 @@
# Contributing to AllegroAI helm charts repository
# Guidelines for Contributing
:+1::tada: First off, thank you for taking the time to contribute! :tada::+1:
:+1::tada: Firstly, we thank you for taking the time to contribute! :tada::+1:
This page contains information about reporting issues as well as some tips and guidelines useful to experienced open source contributors.
Contribution comes in many forms:
* Reporting [issues](https://github.com/allegroai/clearml-helm-charts/issues) you've come upon
* Participating in issue discussions in the [issue tracker](https://github.com/allegroai/clearml-helm-charts/issues) and the [ClearML community slack space](https://join.slack.com/t/allegroai-trains/shared_invite/enQtOTQyMTI1MzQxMzE4LTY5NTUxOTY1NmQ1MzQ5MjRhMGRhZmM4ODE5NTNjMTg2NTBlZGQzZGVkMWU3ZDg1MGE1MjQxNDEzMWU2NmVjZmY)
* Suggesting new features or enhancements
* Implementing new features or fixing outstanding issues
## Reporting issues
A great way to contribute to the project is to send a detailed report when you encounter an issue. We always appreciate a well-written, thorough bug report, and will thank you for it!
The following is a set of guidelines for contributing to ClearML.
These are primarily guidelines, not rules.
Use your best judgment and feel free to propose changes to this document in a pull request.
Check that our [issue database](https://github.com/allegroai/clearml-helm-charts/issues) doesn't already include that problem or suggestion before submitting an issue. If you find a match, you can use the "subscribe" button to get notified on updates. Do not leave random "+1" or "I have this too" comments, as they only clutter the discussion, and don't help resolving it. However, if you have ways to reproduce the issue or have additional information that may help resolving the issue, please leave a comment.
## Reporting Issues
By following these guidelines, you help maintainers and the community understand your report, reproduce the behavior, and find related reports.
## Contributing code
Pull requests are always welcome! Not sure if that typo is worth a pull request? Found a bug and know how to fix it? Do it! We appreciate the help. Any significant improvement should be documented as [a GitHub issue](https://github.com/allegroai/clearml-helm-charts/issues) before starting working on it.
Before reporting an issue, please check whether it already appears [here](https://github.com/allegroai/clearml-helm-charts/issues).
If it does, join the on-going discussion instead.
We are always thrilled to receive pull requests and we do our best to process them quickly and provide feedback
**Note**: If you find a **Closed** issue that may be the same issue which you are currently experiencing,
then open a **New** issue and include a link to the original (Closed) issue in the body of your new one.
When reporting an issue, please include as much detail as possible: explain the problem and include additional details to help maintainers reproduce the problem:
* **Use a clear and descriptive title** for the issue to identify the problem.
* **Describe the exact steps necessary to reproduce the problem** in as much detail as possible. Please do not just summarize what you did. Make sure to explain how you did it.
* **Provide the specific environment setup.** Include the `pip freeze` output, specific environment variables, Python version, and other relevant information.
* **Provide specific examples to demonstrate the steps.** Include links to files or GitHub projects, or copy/paste snippets which you use in those examples.
* **If you are reporting any ClearML crash,** include a crash report with a stack trace from the operating system. Make sure to add the crash report in the issue and place it in a [code block](https://help.github.com/en/articles/getting-started-with-writing-and-formatting-on-github#multiple-lines),
a [file attachment](https://help.github.com/articles/file-attachments-on-issues-and-pull-requests/), or just put it in a [gist](https://gist.github.com/) (and provide link to that gist).
* **Describe the behavior you observed after following the steps** and the exact problem with that behavior.
* **Explain which behavior you expected to see and why.**
* **For Web-App issues, please include screenshots and animated GIFs** which recreate the described steps and clearly demonstrate the problem. You can use [LICEcap](https://www.cockos.com/licecap/) to record GIFs on macOS and Windows, and [silentcast](https://github.com/colinkeenan/silentcast) or [byzanz](https://github.com/threedaymonk/byzanz) on Linux.
## Suggesting New Features and Enhancements
By following these guidelines, you help maintainers and the community understand your suggestion and find related suggestions.
Enhancement suggestions are tracked as GitHub issues. After you determine which repository your enhancement suggestion is related to, create an issue on that repository and provide the following:
* **A clear and descriptive title** for the issue to identify the suggestion.
* **A step-by-step description of the suggested enhancement** in as much detail as possible.
* **Specific examples to demonstrate the steps.** Include copy/pasteable snippets which you use in those examples as [Markdown code blocks](https://help.github.com/articles/markdown-basics/#multiple-lines).
* **Describe the current behavior and explain which behavior you expected to see instead and why.**
* **Include screenshots or animated GIFs** which help you demonstrate the steps or point out the part of ClearML which the suggestion is related to. You can use [LICEcap](https://www.cockos.com/licecap/) to record GIFs on macOS and Windows, and [silentcast](https://github.com/colinkeenan/silentcast) or [byzanz](https://github.com/threedaymonk/byzanz) on Linux.
## Pull Requests
Before you submit a new PR:
* Verify the work you plan to merge addresses an existing [issue](https://github.com/allegroai/clearml-helm-charts/issues) (If not, open a new one)
* Check related discussions in the [ClearML slack community](https://join.slack.com/t/allegroai-trains/shared_invite/enQtOTQyMTI1MzQxMzE4LTY5NTUxOTY1NmQ1MzQ5MjRhMGRhZmM4ODE5NTNjMTg2NTBlZGQzZGVkMWU3ZDg1MGE1MjQxNDEzMWU2NmVjZmY) (Or start your own discussion on the `#clearml-dev` channel)
* Make sure your code conforms to the ClearML coding standards by running:
`flake8 --max-line-length=120 --statistics --show-source --extend-ignore=E501 ./clearml*`
In your PR include:
* A reference to the issue it addresses
* A brief description of the approach you've taken for implementing

View File

@@ -1,13 +1,55 @@
# ClearML Helm Charts Library for Kubernetes
## Auto-Magical Experiment Manager & Version Control for AI
Helm charts provided by [Allegro AI](https://clear.ml), ready to launch on Kubernetes using [Kubernetes Helm](https://github.com/helm/helm).
## Introduction
The **clearml-server** is the backend service infrastructure for [ClearML](https://github.com/allegroai/clearml).
It allows multiple users to collaborate and manage their experiments.
By default, **ClearML** is set up to work with the **ClearML** demo server, which is open to anyone and resets periodically.
In order to host your own server, you will need to install **clearml-server** and point **ClearML** to it.
**clearml-server** contains the following components:
* The **ClearML** Web-App, a single-page UI for experiment management and browsing
* RESTful API for:
* Documenting and logging experiment information, statistics and results
* Querying experiments history, logs and results
* Locally-hosted file server for storing images and models making them easily accessible using the Web-App
Use this repository to deploy **clearml-server** on Kubernetes clusters.
## Provided in this repository
### [All around Helm Chart](https://github.com/allegroai/clearml-helm-charts/tree/main/charts/clearml)
## Who We Are
ClearML is supported by the team behind *allegro.ai*,
where we build deep learning pipelines and infrastructure for enterprise companies.
We built ClearML to track and control the glorious but messy process of training production-grade deep learning models.
We are committed to vigorously supporting and expanding the capabilities of ClearML.
We promise to always be backwardly compatible, making sure all your logs, data and pipelines
will always upgrade with you.
## License
Server Side Public License, Version 1 (see the [LICENSE](https://en.wikipedia.org/wiki/Server_Side_Public_License) for more information)
## Requirements
### Setup a Kubernetes Cluster
For setting up Kubernetes on various platforms refer to the Kubernetes [getting started guide](http://kubernetes.io/docs/getting-started-guides/).
### Setup a single node LOCAL Kubernetes on laptop/desktop
For setting up Kubernetes on your laptop/desktop we suggest [kind](https://kind.sigs.k8s.io).
### Install Helm
Helm is a tool for managing Kubernetes charts. Charts are packages of pre-configured Kubernetes resources.
@@ -18,6 +60,24 @@ To install Helm, refer to the [Helm install guide](https://github.com/helm/helm#
```bash
$ helm repo add allegroai https://allegroai.github.io/clearml-helm-charts
$ helm repo update
$ helm search repo allegroai
$ helm install my-release allegroai/<chart>
$ helm install <release-name> allegroai/<chart>
```
## Documentation, Community & Support
More information in the [official documentation](https://allegro.ai/clearml/docs) and [on YouTube](https://www.youtube.com/c/ClearML).
If you have any questions: post on our [Slack Channel](https://join.slack.com/t/clearml/shared_invite/zt-c0t13pty-aVUZZW1TSSSg2vyIGVPBhg), or tag your questions on [stackoverflow](https://stackoverflow.com/questions/tagged/clearml) with '**[clearml](https://stackoverflow.com/questions/tagged/clearml)**' tag (*previously [trains](https://stackoverflow.com/questions/tagged/trains) tag*).
For feature requests or bug reports, please use [GitHub issues](https://github.com/allegroai/clearml-helm-charts/issues).
Additionally, you can always find us at *clearml@allegro.ai*
## Contributing
**PRs are always welcomed** :heart: See more details in the ClearML [Guidelines for Contributing](https://github.com/allegroai/clearml-helm-charts/blob/main/CONTRIBUTING.md).
_May the force (and the goddess of learning rates) be with you!_

View File

@@ -2,8 +2,8 @@ apiVersion: v2
name: clearml
description: MLOps platform
type: application
version: "2.0.0-alpha1"
appVersion: "1.0.2"
version: "3.0.2"
appVersion: "1.1.1"
home: https://clear.ml
icon: https://raw.githubusercontent.com/allegroai/clearml/master/docs/clearml-logo.svg
sources:

View File

@@ -9,7 +9,7 @@
TERMS AND CONDITIONS
0. Definitions.
“This License” refers to Server Side Public License.
“Copyright” also means copyright-like laws that apply to other kinds of
@@ -173,7 +173,7 @@
access or legal rights of the compilation's users beyond what the
individual works permit. Inclusion of a covered work in an aggregate does
not cause this License to apply to the other parts of the aggregate.
6. Conveying Non-Source Forms.
You may convey a covered work in object code form under the terms of
@@ -185,7 +185,7 @@
(including a physical distribution medium), accompanied by the
Corresponding Source fixed on a durable physical medium customarily
used for software interchange.
b) Convey the object code in, or embodied in, a physical product
(including a physical distribution medium), accompanied by a written
offer, valid for at least three years and valid for as long as you
@@ -196,12 +196,12 @@
for software interchange, for a price no more than your reasonable cost
of physically performing this conveying of source, or (2) access to
copy the Corresponding Source from a network server at no charge.
c) Convey individual copies of the object code with a copy of the
written offer to provide the Corresponding Source. This alternative is
allowed only occasionally and noncommercially, and only if you received
the object code with such an offer, in accord with subsection 6b.
d) Convey the object code by offering access from a designated place
(gratis or for a charge), and offer equivalent access to the
Corresponding Source in the same way through the same place at no
@@ -214,7 +214,7 @@
Regardless of what server hosts the Corresponding Source, you remain
obligated to ensure that it is available for as long as needed to
satisfy these requirements.
e) Convey the object code using peer-to-peer transmission, provided you
inform other peers where the object code and Corresponding Source of
the work are being offered to the general public at no charge under
@@ -496,7 +496,7 @@
application program interfaces, automation software, monitoring software,
backup software, storage software and hosting software, all such that a
user could run an instance of the service using the Service Source Code
you make available.
you make available.
14. Revised Versions of this License.
@@ -532,9 +532,9 @@
PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
16. Limitation of Liability.
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING
@@ -544,7 +544,7 @@
OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.
17. Interpretation of Sections 15 and 16.
If the disclaimer of warranty and limitation of liability provided above
@@ -553,5 +553,5 @@
waiver of all civil liability in connection with the Program, unless a
warranty or assumption of liability accompanies a copy of the Program in
return for a fee.
END OF TERMS AND CONDITIONS
END OF TERMS AND CONDITIONS

View File

@@ -1,6 +1,6 @@
# clearml
# ClearML Ecosystem for Kubernetes
![Version: 2.0.0-alpha1](https://img.shields.io/badge/Version-2.0.0--alpha1-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: 1.0.2](https://img.shields.io/badge/AppVersion-1.0.2-informational?style=flat-square)
![Version: 3.0.2](https://img.shields.io/badge/Version-3.0.2-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: 1.1.1](https://img.shields.io/badge/AppVersion-1.1.1-informational?style=flat-square)
MLOps platform
@@ -12,6 +12,86 @@ MLOps platform
| ---- | ------ | --- |
| valeriano-manassero | | https://github.com/valeriano-manassero |
## Introduction
The **clearml-server** is the backend service infrastructure for [ClearML](https://github.com/allegroai/clearml).
It allows multiple users to collaborate and manage their experiments.
**clearml-server** contains the following components:
* The ClearML Web-App, a single-page UI for experiment management and browsing
* RESTful API for:
* Documenting and logging experiment information, statistics and results
* Querying experiments history, logs and results
* Locally-hosted file server for storing images and models making them easily accessible using the Web-App
## Local environment
For development/evaluation it's possible to use [kind](https://kind.sigs.k8s.io).
After installation, following commands will create a complete ClearML insatllation:
```
mkdir -pm 777 /tmp/clearml-kind
cat <<EOF > /tmp/clearml-kind.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
extraPortMappings:
- containerPort: 30008
hostPort: 30008
listenAddress: "127.0.0.1"
protocol: TCP
- containerPort: 30080
hostPort: 30080
listenAddress: "127.0.0.1"
protocol: TCP
- containerPort: 30081
hostPort: 30081
listenAddress: "127.0.0.1"
protocol: TCP
extraMounts:
- hostPath: /tmp/clearml-kind/
containerPath: /var/local-path-provisioner
EOF
kind create cluster --config /tmp/clearml-kind.yaml
helm install clearml allegroai/clearml
```
After deployment, the services will be exposed on localhost on the following ports:
* API server on `30008`
* Web server on `30080`
* File server on `30081`
Data persisted in every Kubernetes volume by ClearML will be accessible in /tmp/clearml-kind folder on the host.
## Production cluster environment
In a production environment it's suggested to install an ingress controller and verify that is working correctly.
During ClearML deployment enable `ingress` section of chart values.
This will create 3 ingress rules:
* `app.<your domain name>`
* `files.<your domain name>`
* `api.<your domain name>`
(*for example, `app.clearml.mydomainname.com`, `files.clearml.mydomainname.com` and `api.clearml.mydomainname.com`*)
Just pointing the domain records to the IP where ingress controller is responding will complete the deployment process.
## Additional Configuration for ClearML Server
You can also configure the **clearml-server** for:
* fixed users (users with credentials)
* non-responsive experiment watchdog settings
For detailed instructions, see the [Optional Configuration](https://github.com/allegroai/clearml-server#optional-configuration) section in the **clearml-server** repository README file.
## Source Code
* <https://github.com/allegroai/clearml-helm-charts>
@@ -29,28 +109,54 @@ MLOps platform
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| agentGroups.agent-group0.affinity | object | `{}` | |
| agentGroups.agent-group0.agentVersion | string | `""` | |
| agentGroups.agent-group0.awsAccessKeyId | string | `nil` | |
| agentGroups.agent-group0.awsDefaultRegion | string | `nil` | |
| agentGroups.agent-group0.awsSecretAccessKey | string | `nil` | |
| agentGroups.agent-group0.azureStorageAccount | string | `nil` | |
| agentGroups.agent-group0.azureStorageKey | string | `nil` | |
| agentGroups.agent-group0.clearmlAccessKey | string | `nil` | |
| agentGroups.agent-group0.clearmlConfig | string | `"sdk {\n}"` | |
| agentGroups.agent-group0.clearmlGitPassword | string | `nil` | |
| agentGroups.agent-group0.clearmlGitUser | string | `nil` | |
| agentGroups.agent-group0.clearmlSecretKey | string | `nil` | |
| agentGroups.agent-group0.image.pullPolicy | string | `"IfNotPresent"` | |
| agentGroups.agent-group0.image.repository | string | `"nvidia/cuda"` | |
| agentGroups.agent-group0.image.tag | string | `"11.0-base-ubuntu18.04"` | |
| agentGroups.agent-group0.name | string | `"agent-group0"` | |
| agentGroups.agent-group0.nodeSelector | object | `{}` | |
| agentGroups.agent-group0.nvidiaGpusPerAgent | int | `1` | |
| agentGroups.agent-group0.podAnnotations | object | `{}` | |
| agentGroups.agent-group0.queues | string | `"default"` | |
| agentGroups.agent-group0.replicaCount | int | `0` | |
| agentGroups.agent-group0.tolerations | list | `[]` | |
| agentGroups.agent-group-cpu.affinity | object | `{}` | |
| agentGroups.agent-group-cpu.agentVersion | string | `""` | |
| agentGroups.agent-group-cpu.awsAccessKeyId | string | `nil` | |
| agentGroups.agent-group-cpu.awsDefaultRegion | string | `nil` | |
| agentGroups.agent-group-cpu.awsSecretAccessKey | string | `nil` | |
| agentGroups.agent-group-cpu.azureStorageAccount | string | `nil` | |
| agentGroups.agent-group-cpu.azureStorageKey | string | `nil` | |
| agentGroups.agent-group-cpu.clearmlAccessKey | string | `nil` | |
| agentGroups.agent-group-cpu.clearmlConfig | string | `"sdk {\n}"` | |
| agentGroups.agent-group-cpu.clearmlGitPassword | string | `nil` | |
| agentGroups.agent-group-cpu.clearmlGitUser | string | `nil` | |
| agentGroups.agent-group-cpu.clearmlSecretKey | string | `nil` | |
| agentGroups.agent-group-cpu.enabled | bool | `true` | |
| agentGroups.agent-group-cpu.image.pullPolicy | string | `"IfNotPresent"` | |
| agentGroups.agent-group-cpu.image.repository | string | `"ubuntu"` | |
| agentGroups.agent-group-cpu.image.tag | string | `"18.04"` | |
| agentGroups.agent-group-cpu.name | string | `"agent-group-cpu"` | |
| agentGroups.agent-group-cpu.nodeSelector | object | `{}` | |
| agentGroups.agent-group-cpu.nvidiaGpusPerAgent | int | `0` | |
| agentGroups.agent-group-cpu.podAnnotations | object | `{}` | |
| agentGroups.agent-group-cpu.queues | string | `"default"` | |
| agentGroups.agent-group-cpu.replicaCount | int | `1` | |
| agentGroups.agent-group-cpu.tolerations | list | `[]` | |
| agentGroups.agent-group-cpu.updateStrategy | string | `"Recreate"` | |
| agentGroups.agent-group-gpu.affinity | object | `{}` | |
| agentGroups.agent-group-gpu.agentVersion | string | `""` | |
| agentGroups.agent-group-gpu.awsAccessKeyId | string | `nil` | |
| agentGroups.agent-group-gpu.awsDefaultRegion | string | `nil` | |
| agentGroups.agent-group-gpu.awsSecretAccessKey | string | `nil` | |
| agentGroups.agent-group-gpu.azureStorageAccount | string | `nil` | |
| agentGroups.agent-group-gpu.azureStorageKey | string | `nil` | |
| agentGroups.agent-group-gpu.clearmlAccessKey | string | `nil` | |
| agentGroups.agent-group-gpu.clearmlConfig | string | `"sdk {\n}"` | |
| agentGroups.agent-group-gpu.clearmlGitPassword | string | `nil` | |
| agentGroups.agent-group-gpu.clearmlGitUser | string | `nil` | |
| agentGroups.agent-group-gpu.clearmlSecretKey | string | `nil` | |
| agentGroups.agent-group-gpu.enabled | bool | `true` | |
| agentGroups.agent-group-gpu.image.pullPolicy | string | `"IfNotPresent"` | |
| agentGroups.agent-group-gpu.image.repository | string | `"nvidia/cuda"` | |
| agentGroups.agent-group-gpu.image.tag | string | `"11.0-base-ubuntu18.04"` | |
| agentGroups.agent-group-gpu.name | string | `"agent-group-gpu"` | |
| agentGroups.agent-group-gpu.nodeSelector | object | `{}` | |
| agentGroups.agent-group-gpu.nvidiaGpusPerAgent | int | `1` | |
| agentGroups.agent-group-gpu.podAnnotations | object | `{}` | |
| agentGroups.agent-group-gpu.queues | string | `"default"` | |
| agentGroups.agent-group-gpu.replicaCount | int | `0` | |
| agentGroups.agent-group-gpu.tolerations | list | `[]` | |
| agentGroups.agent-group-gpu.updateStrategy | string | `"Recreate"` | |
| agentservices.affinity | object | `{}` | |
| agentservices.agentVersion | string | `""` | |
| agentservices.awsAccessKeyId | string | `nil` | |
@@ -76,12 +182,13 @@ MLOps platform
| agentservices.storage.data.class | string | `"standard"` | |
| agentservices.storage.data.size | string | `"50Gi"` | |
| agentservices.tolerations | list | `[]` | |
| apiserver.additionalConfigs | object | `{}` | |
| apiserver.affinity | object | `{}` | |
| apiserver.configDir | string | `"/opt/clearml/config"` | |
| apiserver.extraEnvs | list | `[]` | |
| apiserver.image.pullPolicy | string | `"IfNotPresent"` | |
| apiserver.image.repository | string | `"allegroai/clearml"` | |
| apiserver.image.tag | string | `"1.0.2"` | |
| apiserver.image.tag | string | `"1.1.1"` | |
| apiserver.livenessDelay | int | `60` | |
| apiserver.nodeSelector | object | `{}` | |
| apiserver.podAnnotations | object | `{}` | |
@@ -93,9 +200,6 @@ MLOps platform
| apiserver.resources | object | `{}` | |
| apiserver.service.port | int | `8008` | |
| apiserver.service.type | string | `"NodePort"` | |
| apiserver.storage.config.class | string | `"standard"` | |
| apiserver.storage.config.size | string | `"1Gi"` | |
| apiserver.storage.enableConfigVolume | bool | `false` | |
| apiserver.tolerations | list | `[]` | |
| clearml.defaultCompany | string | `"d1bd92a3b039400cbafc60a7a5b1e52b"` | |
| elasticsearch.clusterHealthCheckParams | string | `"wait_for_status=yellow&timeout=1s"` | |
@@ -137,7 +241,7 @@ MLOps platform
| fileserver.extraEnvs | list | `[]` | |
| fileserver.image.pullPolicy | string | `"IfNotPresent"` | |
| fileserver.image.repository | string | `"allegroai/clearml"` | |
| fileserver.image.tag | string | `"1.0.2"` | |
| fileserver.image.tag | string | `"1.1.1"` | |
| fileserver.nodeSelector | object | `{}` | |
| fileserver.podAnnotations | object | `{}` | |
| fileserver.replicaCount | int | `1` | |
@@ -148,10 +252,14 @@ MLOps platform
| fileserver.storage.data.size | string | `"50Gi"` | |
| fileserver.tolerations | list | `[]` | |
| ingress.annotations | object | `{}` | |
| ingress.api.hostName | string | `"api.clearml.127-0-0-1.nip.io"` | |
| ingress.api.tlsSecretName | string | `""` | |
| ingress.app.hostName | string | `"app.clearml.127-0-0-1.nip.io"` | |
| ingress.app.tlsSecretName | string | `""` | |
| ingress.enabled | bool | `false` | |
| ingress.host | string | `""` | |
| ingress.files.hostName | string | `"files.clearml.127-0-0-1.nip.io"` | |
| ingress.files.tlsSecretName | string | `""` | |
| ingress.name | string | `"clearml-server-ingress"` | |
| ingress.tls.secretName | string | `""` | |
| mongodb.architecture | string | `"standalone"` | |
| mongodb.auth.enabled | bool | `false` | |
| mongodb.enabled | bool | `true` | |
@@ -176,7 +284,7 @@ MLOps platform
| webserver.extraEnvs | list | `[]` | |
| webserver.image.pullPolicy | string | `"IfNotPresent"` | |
| webserver.image.repository | string | `"allegroai/clearml"` | |
| webserver.image.tag | string | `"1.0.2"` | |
| webserver.image.tag | string | `"1.1.1"` | |
| webserver.nodeSelector | object | `{}` | |
| webserver.podAnnotations | object | `{}` | |
| webserver.replicaCount | int | `1` | |
@@ -184,6 +292,3 @@ MLOps platform
| webserver.service.port | int | `80` | |
| webserver.service.type | string | `"NodePort"` | |
| webserver.tolerations | list | `[]` | |
----------------------------------------------
Autogenerated from chart metadata using [helm-docs v1.5.0](https://github.com/norwoodj/helm-docs/releases/v1.5.0)

View File

@@ -0,0 +1,96 @@
# ClearML Ecosystem for Kubernetes
{{ template "chart.deprecationWarning" . }}
{{ template "chart.badgesSection" . }}
{{ template "chart.description" . }}
{{ template "chart.homepageLine" . }}
{{ template "chart.maintainersSection" . }}
## Introduction
The **clearml-server** is the backend service infrastructure for [ClearML](https://github.com/allegroai/clearml).
It allows multiple users to collaborate and manage their experiments.
**clearml-server** contains the following components:
* The ClearML Web-App, a single-page UI for experiment management and browsing
* RESTful API for:
* Documenting and logging experiment information, statistics and results
* Querying experiments history, logs and results
* Locally-hosted file server for storing images and models making them easily accessible using the Web-App
## Local environment
For development/evaluation it's possible to use [kind](https://kind.sigs.k8s.io).
After installation, following commands will create a complete ClearML insatllation:
```
mkdir -pm 777 /tmp/clearml-kind
cat <<EOF > /tmp/clearml-kind.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
extraPortMappings:
- containerPort: 30008
hostPort: 30008
listenAddress: "127.0.0.1"
protocol: TCP
- containerPort: 30080
hostPort: 30080
listenAddress: "127.0.0.1"
protocol: TCP
- containerPort: 30081
hostPort: 30081
listenAddress: "127.0.0.1"
protocol: TCP
extraMounts:
- hostPath: /tmp/clearml-kind/
containerPath: /var/local-path-provisioner
EOF
kind create cluster --config /tmp/clearml-kind.yaml
helm install clearml allegroai/clearml
```
After deployment, the services will be exposed on localhost on the following ports:
* API server on `30008`
* Web server on `30080`
* File server on `30081`
Data persisted in every Kubernetes volume by ClearML will be accessible in /tmp/clearml-kind folder on the host.
## Production cluster environment
In a production environment it's suggested to install an ingress controller and verify that is working correctly.
During ClearML deployment enable `ingress` section of chart values.
This will create 3 ingress rules:
* `app.<your domain name>`
* `files.<your domain name>`
* `api.<your domain name>`
(*for example, `app.clearml.mydomainname.com`, `files.clearml.mydomainname.com` and `api.clearml.mydomainname.com`*)
Just pointing the domain records to the IP where ingress controller is responding will complete the deployment process.
## Additional Configuration for ClearML Server
You can also configure the **clearml-server** for:
* fixed users (users with credentials)
* non-responsive experiment watchdog settings
For detailed instructions, see the [Optional Configuration](https://github.com/allegroai/clearml-server#optional-configuration) section in the **clearml-server** repository README file.
{{ template "chart.sourcesSection" . }}
{{ template "chart.requirementsSection" . }}
{{ template "chart.valuesSection" . }}

View File

@@ -95,3 +95,48 @@ Create the name of the service account to use
{{- default "default" .Values.serviceAccount.name }}
{{- end }}
{{- end }}
{{/*
Create the name of the App service to use
*/}}
{{- define "clearml.serviceApp" -}}
{{- if .Values.ingress.enabled }}
{{- if .Values.ingress.app.tlsSecretName }}
{{- printf "%s%s%s" "https://" .Values.ingress.app.hostName }}
{{- else }}
{{- printf "%s%s%s" "http://" .Values.ingress.app.hostName }}
{{- end }}
{{- else }}
{{- printf "%s%s%s%s" "http://" (include "clearml.fullname" .) "-webserver:" (.Values.webserver.service.port | toString) }}
{{- end }}
{{- end }}
{{/*
Create the name of the Api service to use
*/}}
{{- define "clearml.serviceApi" -}}
{{- if .Values.ingress.enabled }}
{{- if .Values.ingress.api.tlsSecretName }}
{{- printf "%s%s%s" "https://" .Values.ingress.api.hostName }}
{{- else }}
{{- printf "%s%s%s" "http://" .Values.ingress.api.hostName }}
{{- end }}
{{- else }}
{{- printf "%s%s%s%s" "http://" (include "clearml.fullname" .) "-apiserver:" (.Values.apiserver.service.port | toString) }}
{{- end }}
{{- end }}
{{/*
Create the name of the Files service to use
*/}}
{{- define "clearml.serviceFiles" -}}
{{- if .Values.ingress.enabled }}
{{- if .Values.ingress.files.tlsSecretName }}
{{- printf "%s%s%s" "https://" .Values.ingress.files.hostName }}
{{- else }}
{{- printf "%s%s%s" "http://" .Values.ingress.files.hostName }}
{{- end }}
{{- else }}
{{- printf "%s%s%s%s" "http://" (include "clearml.fullname" .) "-fileserver:" (.Values.fileserver.service.port | toString) }}
{{- end }}
{{- end }}

View File

@@ -0,0 +1,13 @@
{{- if .Values.apiserver.additionalConfigs -}}
apiVersion: v1
kind: ConfigMap
metadata:
name: "{{ include "clearml.fullname" . }}-apiserver-configmap"
labels:
{{- include "clearml.labels" . | nindent 4 }}
data:
{{- range $key, $val := .Values.apiserver.additionalConfigs }}
{{ $key }}: |
{{- $val | nindent 4 }}
{{- end }}
{{- end -}}

View File

@@ -1,5 +1,6 @@
{{- range $key, $value := .Values.agentGroups }}
{{- with $value }}
{{- if .enabled }}
---
apiVersion: apps/v1
kind: Deployment
@@ -9,6 +10,8 @@ metadata:
{{- include "clearml.labels" $ | nindent 4 }}
spec:
replicas: {{ .replicaCount }}
strategy:
type: {{ .updateStrategy }}
selector:
matchLabels:
{{- include "clearml.selectorLabelsAgent" $ | nindent 6 }}
@@ -38,7 +41,7 @@ spec:
- -c
- >
set -x;
while [ $(curl -sw '%{http_code}' "http://{{ include "clearml.fullname" $ }}-apiserver:{{ $.Values.apiserver.service.port }}/debug.ping" -o /dev/null) -ne 200 ] ; do
while [ $(curl -sw '%{http_code}' "{{ include "clearml.serviceApi" $ }}/debug.ping" -o /dev/null) -ne 200 ] ; do
echo "waiting for apiserver" ;
sleep 5 ;
done
@@ -54,11 +57,11 @@ spec:
{{ .nvidiaGpusPerAgent }}
env:
- name: CLEARML_API_HOST
value: 'http://{{ include "clearml.fullname" $ }}-apiserver:{{ $.Values.apiserver.service.port }}'
value: {{ include "clearml.serviceApi" $ }}
- name: CLEARML_WEB_HOST
value: 'http://{{ include "clearml.fullname" $ }}-webserver:{{ $.Values.webserver.service.port }}'
value: {{ include "clearml.serviceApp" $ }}
- name: CLEARML_FILES_HOST
value: 'http://{{ include "clearml.fullname" $ }}-fileserver:{{ $.Values.fileserver.service.port }}'
value: {{ include "clearml.serviceFiles" $ }}
- name: CLEARML_AGENT_GIT_USER
value: {{ .clearmlGitUser}}
- name: CLEARML_AGENT_GIT_PASS
@@ -91,6 +94,13 @@ spec:
python3 -m pip install -U pip ;
python3 -m pip install clearml-agent{{ .agentVersion}} ;
CLEARML_AGENT_K8S_HOST_MOUNT=/root/.clearml:/root/.clearml clearml-agent daemon --queue {{ .queues}}"
{{ if .clearmlConfig }}
volumeMounts:
- name: agent-clearml-conf-volume
mountPath: /root/clearml.conf
subPath: clearml.conf
readOnly: true
{{- end }}
{{- with .nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 8 }}
@@ -105,3 +115,4 @@ spec:
{{- end }}
{{- end }}
{{- end }}
{{- end }}

View File

@@ -30,7 +30,7 @@ spec:
- -c
- >
set -x;
while [ $(curl -sw '%{http_code}' "http://{{ include "clearml.fullname" . }}-apiserver:{{ .Values.apiserver.service.port }}/debug.ping" -o /dev/null) -ne 200 ] ; do
while [ $(curl -sw '%{http_code}' "{{ include "clearml.serviceApi" $ }}/debug.ping" -o /dev/null) -ne 200 ] ; do
echo "waiting for apiserver" ;
sleep 5 ;
done
@@ -42,7 +42,7 @@ spec:
- name: CLEARML_HOST_IP
value: {{ .Values.agentservices.clearmlHostIp }}
- name: CLEARML_API_HOST
value: "http://{{ include "clearml.fullname" . }}-apiserver:{{ .Values.apiserver.service.port }}"
value: {{ include "clearml.serviceApi" $ }}
- name: CLEARML_WEB_HOST
value: {{ .Values.agentservices.clearmlWebHost }}
- name: CLEARML_FILES_HOST

View File

@@ -18,12 +18,6 @@ spec:
labels:
{{- include "clearml.selectorLabelsApiServer" . | nindent 8 }}
spec:
{{- if .Values.apiserver.storage.enableConfigVolume }}
volumes:
- name: apiserver-config
persistentVolumeClaim:
claimName: {{ include "clearml.fullname" . }}-apiserver-config
{{- end }}
containers:
- name: {{ .Chart.Name }}
image: "{{ .Values.apiserver.image.repository }}:{{ .Values.apiserver.image.tag | default .Chart.AppVersion }}"
@@ -101,13 +95,19 @@ spec:
httpGet:
path: /debug.ping
port: 8008
{{- if .Values.apiserver.storage.enableConfigVolume }}
{{- if .Values.apiserver.additionalConfigs }}
volumeMounts:
- name: apiserver-config
mountPath: /opt/clearml/config
{{- end }}
resources:
{{- toYaml .Values.apiserver.resources | nindent 12 }}
{{- if .Values.apiserver.additionalConfigs }}
volumes:
- name: apiserver-config
configMap:
name: "{{ include "clearml.fullname" . }}-apiserver-configmap"
{{- end }}
{{- with .Values.apiserver.nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 8 }}

View File

@@ -0,0 +1,42 @@
{{- if .Values.ingress.enabled -}}
{{- if semverCompare ">=1.19-0" .Capabilities.KubeVersion.GitVersion -}}
apiVersion: networking.k8s.io/v1
{{- else if semverCompare ">=1.14-0" .Capabilities.KubeVersion.GitVersion -}}
apiVersion: networking.k8s.io/v1beta1
{{- else -}}
apiVersion: extensions/v1beta1
{{- end }}
kind: Ingress
metadata:
name: {{ include "clearml.fullname" . }}-api
labels:
{{- include "clearml.labels" . | nindent 4 }}
{{- with .Values.ingress.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
spec:
{{- if .Values.ingress.api.tlsSecretName }}
tls:
- hosts:
- {{ .Values.ingress.api.hostName }}
secretName: {{ .Values.ingress.api.tlsSecretName }}
{{- end }}
rules:
- host: {{ .Values.ingress.api.hostName }}
http:
paths:
- path: "/"
{{ if semverCompare ">=1.19-0" .Capabilities.KubeVersion.GitVersion }}
pathType: Prefix
backend:
service:
name: {{ include "clearml.fullname" . }}-apiserver
port:
number: {{ .Values.apiserver.service.port }}
{{ else }}
backend:
serviceName: {{ include "clearml.fullname" . }}-apiserver
servicePort: {{ .Values.apiserver.service.port }}
{{ end }}
{{- end }}

View File

@@ -0,0 +1,42 @@
{{- if .Values.ingress.enabled -}}
{{- if semverCompare ">=1.19-0" .Capabilities.KubeVersion.GitVersion -}}
apiVersion: networking.k8s.io/v1
{{- else if semverCompare ">=1.14-0" .Capabilities.KubeVersion.GitVersion -}}
apiVersion: networking.k8s.io/v1beta1
{{- else -}}
apiVersion: extensions/v1beta1
{{- end }}
kind: Ingress
metadata:
name: {{ include "clearml.fullname" . }}-app
labels:
{{- include "clearml.labels" . | nindent 4 }}
{{- with .Values.ingress.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
spec:
{{- if .Values.ingress.app.tlsSecretName }}
tls:
- hosts:
- {{ .Values.ingress.app.hostName }}
secretName: {{ .Values.ingress.app.tlsSecretName }}
{{- end }}
rules:
- host: {{ .Values.ingress.app.hostName }}
http:
paths:
- path: "/"
{{ if semverCompare ">=1.19-0" .Capabilities.KubeVersion.GitVersion }}
pathType: Prefix
backend:
service:
name: {{ include "clearml.fullname" . }}-webserver
port:
number: {{ .Values.webserver.service.port }}
{{ else }}
backend:
serviceName: {{ include "clearml.fullname" . }}-webserver
servicePort: {{ .Values.webserver.service.port }}
{{ end }}
{{- end }}

View File

@@ -0,0 +1,42 @@
{{- if .Values.ingress.enabled -}}
{{- if semverCompare ">=1.19-0" .Capabilities.KubeVersion.GitVersion -}}
apiVersion: networking.k8s.io/v1
{{- else if semverCompare ">=1.14-0" .Capabilities.KubeVersion.GitVersion -}}
apiVersion: networking.k8s.io/v1beta1
{{- else -}}
apiVersion: extensions/v1beta1
{{- end }}
kind: Ingress
metadata:
name: {{ include "clearml.fullname" . }}-files
labels:
{{- include "clearml.labels" . | nindent 4 }}
{{- with .Values.ingress.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
spec:
{{- if .Values.ingress.files.tlsSecretName }}
tls:
- hosts:
- {{ .Values.ingress.files.hostName }}
secretName: {{ .Values.ingress.files.tlsSecretName }}
{{- end }}
rules:
- host: {{ .Values.ingress.files.hostName }}
http:
paths:
- path: "/"
{{ if semverCompare ">=1.19-0" .Capabilities.KubeVersion.GitVersion }}
pathType: Prefix
backend:
service:
name: {{ include "clearml.fullname" . }}-fileserver
port:
number: {{ .Values.fileserver.service.port }}
{{ else }}
backend:
serviceName: {{ include "clearml.fullname" . }}-fileserver
servicePort: {{ .Values.fileserver.service.port }}
{{ end }}
{{- end }}

View File

@@ -1,48 +0,0 @@
{{- if .Values.ingress.enabled -}}
{{- $fullName := include "clearml.fullname" . -}}
{{- if semverCompare ">=1.14-0" .Capabilities.KubeVersion.GitVersion -}}
apiVersion: networking.k8s.io/v1beta1
{{- else -}}
apiVersion: extensions/v1beta1
{{- end }}
kind: Ingress
metadata:
name: {{ $fullName }}
labels:
{{- include "clearml.labels" . | nindent 4 }}
{{- with .Values.ingress.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
spec:
{{- if .Values.ingress.tls.secretName }}
tls:
- hosts:
- "app.{{ .Values.ingress.host }}"
- "files.{{ .Values.ingress.host }}"
- "api.{{ .Values.ingress.host }}"
secretName: {{ .Values.ingress.tls.secretName }}
{{- end }}
rules:
- host: "app.{{ .Values.ingress.host }}"
http:
paths:
- path: "/*"
backend:
serviceName: {{ include "clearml.fullname" . }}-webserver
servicePort: {{ .Values.webserver.service.port }}
- host: "api.{{ .Values.ingress.host }}"
http:
paths:
- path: "/*"
backend:
serviceName: {{ include "clearml.fullname" . }}-apiserver
servicePort: {{ .Values.apiserver.service.port }}
- host: "files.{{ .Values.ingress.host }}"
http:
paths:
- path: "/*"
backend:
serviceName: {{ include "clearml.fullname" . }}-fileserver
servicePort: {{ .Values.fileserver.service.port }}
{{- end }}

View File

@@ -1,15 +0,0 @@
{{- if .Values.apiserver.storage.enableConfigVolume }}
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: {{ include "clearml.fullname" . }}-apiserver-config
labels:
{{- include "clearml.labels" . | nindent 4 }}
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: {{ .Values.apiserver.storage.config.size | quote }}
storageClassName: {{ .Values.apiserver.storage.config.class | quote }}
{{- end }}

View File

@@ -4,9 +4,15 @@ ingress:
enabled: false
name: clearml-server-ingress
annotations: {}
host: ""
tls:
secretName: ""
app:
hostName: "app.clearml.127-0-0-1.nip.io"
tlsSecretName: ""
api:
hostName: "api.clearml.127-0-0-1.nip.io"
tlsSecretName: ""
files:
hostName: "files.clearml.127-0-0-1.nip.io"
tlsSecretName: ""
apiserver:
prepopulateEnabled: "true"
@@ -26,7 +32,7 @@ apiserver:
image:
repository: "allegroai/clearml"
pullPolicy: IfNotPresent
tag: "1.0.2"
tag: "1.1.1"
extraEnvs: []
@@ -50,12 +56,16 @@ apiserver:
affinity: {}
# Optional: used in pvc-apiserver containing optional server configuration files
storage:
enableConfigVolume: false
config:
class: "standard"
size: 1Gi
additionalConfigs: {}
# services.conf: |
# tasks {
# non_responsive_tasks_watchdog {
# # In-progress tasks that haven't been updated for at least 'value' seconds will be stopped by the watchdog
# threshold_sec: 21000
# # Watchdog will sleep for this number of seconds after each cycle
# watch_interval_sec: 900
# }
# }
fileserver:
service:
@@ -67,7 +77,7 @@ fileserver:
image:
repository: "allegroai/clearml"
pullPolicy: IfNotPresent
tag: "1.0.2"
tag: "1.1.1"
extraEnvs: []
@@ -108,7 +118,7 @@ webserver:
image:
repository: "allegroai/clearml"
pullPolicy: IfNotPresent
tag: "1.0.2"
tag: "1.1.1"
podAnnotations: {}
@@ -180,9 +190,45 @@ agentservices:
size: 50Gi
agentGroups:
agent-group0:
name: agent-group0
agent-group-cpu:
enabled: true
name: agent-group-cpu
replicaCount: 1
updateStrategy: Recreate
nvidiaGpusPerAgent: 0
agentVersion: "" # if set, it *MUST* include comparison operator (e.g. ">=0.16.1")
queues: "default" # multiple queues can be specified separated by a space (e.g. "important_jobs default")
clearmlGitUser: null
clearmlGitPassword: null
clearmlAccessKey: null
clearmlSecretKey: null
awsAccessKeyId: null
awsSecretAccessKey: null
awsDefaultRegion: null
azureStorageAccount: null
azureStorageKey: null
clearmlConfig: |-
sdk {
}
image:
repository: "ubuntu"
pullPolicy: IfNotPresent
tag: "18.04"
podAnnotations: {}
nodeSelector: {}
tolerations: []
affinity: {}
agent-group-gpu:
enabled: true
name: agent-group-gpu
replicaCount: 0
updateStrategy: Recreate
nvidiaGpusPerAgent: 1
agentVersion: "" # if set, it *MUST* include comparison operator (e.g. ">=0.16.1")
queues: "default" # multiple queues can be specified separated by a space (e.g. "important_jobs default")