Compare commits

...

6 Commits

Author SHA1 Message Date
Valeriano Manassero
8747bceb4e 1.7.0 upgrade (#112)
* Changed: mage update

* Changed: version update
2022-10-05 14:34:47 +02:00
Valeriano Manassero
6aea682b0d Fix: agent release (#109)
* Fix: agent release

* Changed: version bump
2022-09-16 08:42:34 +02:00
Valeriano Manassero
4704415662 Make PDR compatible with k8s 1.25 (#108)
* Changed: pdr version

* Changed: dependency update

* Changed: removed eol k8s

* Changed: kind versions update

* Removed: incompatible version with GH actions

* Changed: updated action
2022-09-16 08:28:41 +02:00
Brett Cullen
8374ece563 Added missing brackets around .Values.imageCredentials.existingSecret (#107) 2022-09-16 00:12:03 +03:00
Brett Cullen
0871e73831 Fixed missing brackets for k8 secret (docker config) (#106) 2022-09-15 23:35:36 +03:00
Niels ten Boom
a90b91f024 feat: expand volumemount capabilities for agent (#104)
* upgrade

* add upgrade instruction

* fix readme for agent

* Added newline at the end

* Try to fix CI

* Edited type added

* Update README.md

Co-authored-by: Valeriano Manassero <14011549+valeriano-manassero@users.noreply.github.com>
2022-09-13 14:53:44 +02:00
19 changed files with 125 additions and 72 deletions

View File

@@ -2,6 +2,7 @@ name: Lint and Test Charts
on:
pull_request:
types: [opened, synchronize, edited, reopened]
paths:
- 'charts/**'
@@ -21,16 +22,16 @@ jobs:
strategy:
matrix:
k8s:
- v1.22.7
- v1.23.6
- v1.24.0
- v1.22.13
- v1.23.10
- v1.24.4
- v1.25.0
steps:
- name: Checkout
uses: actions/checkout@v1
- name: Create kind ${{ matrix.k8s }} cluster
uses: helm/kind-action@v1.2.0
uses: helm/kind-action@v1.3.0
with:
version: v0.13.0
node_image: kindest/node:${{ matrix.k8s }}
- name: Set up chart-testing
uses: helm/chart-testing-action@v2.2.1
@@ -42,11 +43,6 @@ jobs:
echo "::set-output name=changed::true"
echo "::set-output name=changed_charts::\"${changed//$'\n'/,}\""
fi
- name: Inject secrets
run: |
find ./charts/*/ci/*.yaml -type f -exec sed -i "s/AGENTK8SGLUEKEY/${{ secrets.agentk8sglueKey }}/g" {} \;
find ./charts/*/ci/*.yaml -type f -exec sed -i "s/AGENTK8SGLUESECRET/${{ secrets.agentk8sglueSecret }}/g" {} \;
if: steps.list-changed.outputs.changed == 'true'
- name: Run chart-testing (lint and install)
run: ct lint-and-install --chart-dirs=charts --target-branch=main --helm-extra-args="--timeout=15m" --charts=${{steps.list-changed.outputs.changed_charts}} --debug=true
if: steps.list-changed.outputs.changed == 'true'

View File

@@ -2,9 +2,9 @@ apiVersion: v2
name: clearml-agent
description: MLOps platform
type: application
version: "1.3.0"
version: "2.0.1"
appVersion: "1.24"
kubeVersion: ">= 1.19.0-0 < 1.25.0-0"
kubeVersion: ">= 1.19.0-0 < 1.26.0-0"
home: https://clear.ml
icon: https://raw.githubusercontent.com/allegroai/clearml/master/docs/clearml-logo.svg
sources:

View File

@@ -1,6 +1,6 @@
# clearml-agent
# ClearML Kubernetes Agent
![Version: 1.3.0](https://img.shields.io/badge/Version-1.3.0-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: 1.24](https://img.shields.io/badge/AppVersion-1.24-informational?style=flat-square)
![Version: 2.0.1](https://img.shields.io/badge/Version-2.0.1-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: 1.24](https://img.shields.io/badge/AppVersion-1.24-informational?style=flat-square)
MLOps platform
@@ -12,6 +12,11 @@ MLOps platform
| ---- | ------ | --- |
| valeriano-manassero | | <https://github.com/valeriano-manassero> |
## Introduction
The **clearml-agent** is the Kubernetes agent for for [ClearML](https://github.com/allegroai/clearml).
It allows you to schedule distributed experiments on a Kubernetes cluster.
## Source Code
* <https://github.com/allegroai/clearml-helm-charts>
@@ -19,13 +24,13 @@ MLOps platform
## Requirements
Kubernetes: `>= 1.19.0-0 < 1.25.0-0`
Kubernetes: `>= 1.19.0-0 < 1.26.0-0`
## Values
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| agentk8sglue | object | `{"apiServerUrlReference":"https://api.clear.ml","clearmlcheckCertificate":true,"defaultContainerImage":"ubuntu:18.04","extraEnvs":[],"fileServerUrlReference":"https://files.clear.ml","id":"k8s-agent","image":{"repository":"allegroai/clearml-agent-k8s-base","tag":"1.24-18"},"maxPods":10,"podTemplate":{"env":[],"nodeSelector":{},"resources":{},"tolerations":[],"volumes":[]},"queue":"default","replicaCount":1,"serviceAccountName":"default","webServerUrlReference":"https://app.clear.ml"}` | This agent will spawn queued experiments in new pods, a good use case is to combine this with GPU autoscaling nodes. https://github.com/allegroai/clearml-agent/tree/master/docker/k8s-glue |
| agentk8sglue | object | `{"apiServerUrlReference":"https://api.clear.ml","clearmlcheckCertificate":true,"defaultContainerImage":"ubuntu:18.04","extraEnvs":[],"fileServerUrlReference":"https://files.clear.ml","id":"k8s-agent","image":{"repository":"allegroai/clearml-agent-k8s-base","tag":"1.24-18"},"maxPods":10,"podTemplate":{"env":[],"nodeSelector":{},"resources":{},"tolerations":[],"volumeMounts":[],"volumes":[]},"queue":"default","replicaCount":1,"serviceAccountName":"default","webServerUrlReference":"https://app.clear.ml"}` | This agent will spawn queued experiments in new pods, a good use case is to combine this with GPU autoscaling nodes. https://github.com/allegroai/clearml-agent/tree/master/docker/k8s-glue |
| agentk8sglue.apiServerUrlReference | string | `"https://api.clear.ml"` | Reference to Api server url |
| agentk8sglue.clearmlcheckCertificate | bool | `true` | Check certificates validity for evefry UrlReference below. |
| agentk8sglue.defaultContainerImage | string | `"ubuntu:18.04"` | default container image for ClearML Task pod |
@@ -34,11 +39,12 @@ Kubernetes: `>= 1.19.0-0 < 1.25.0-0`
| agentk8sglue.id | string | `"k8s-agent"` | ClearML worker ID (must be unique across the entire ClearMLenvironment) |
| agentk8sglue.image | object | `{"repository":"allegroai/clearml-agent-k8s-base","tag":"1.24-18"}` | Glue Agent image configuration |
| agentk8sglue.maxPods | int | `10` | maximum concurrent consume ClearML Task pod |
| agentk8sglue.podTemplate | object | `{"env":[],"nodeSelector":{},"resources":{},"tolerations":[],"volumes":[]}` | template for pods spawned to consume ClearML Task |
| agentk8sglue.podTemplate | object | `{"env":[],"nodeSelector":{},"resources":{},"tolerations":[],"volumeMounts":[],"volumes":[]}` | template for pods spawned to consume ClearML Task |
| agentk8sglue.podTemplate.env | list | `[]` | environment variables for pods spawned to consume ClearML Task (example in values.yaml comments) |
| agentk8sglue.podTemplate.nodeSelector | object | `{}` | nodeSelector setup for pods spawned to consume ClearML Task (example in values.yaml comments) |
| agentk8sglue.podTemplate.resources | object | `{}` | resources declaration for pods spawned to consume ClearML Task (example in values.yaml comments) |
| agentk8sglue.podTemplate.tolerations | list | `[]` | tolerations setup for pods spawned to consume ClearML Task (example in values.yaml comments) |
| agentk8sglue.podTemplate.volumeMounts | list | `[]` | volumeMounts definition for pods spawned to consume ClearML Task (example in values.yaml comments) |
| agentk8sglue.podTemplate.volumes | list | `[]` | volumes definition for pods spawned to consume ClearML Task (example in values.yaml comments) |
| agentk8sglue.queue | string | `"default"` | ClearML queue this agent will consume |
| agentk8sglue.replicaCount | int | `1` | Glue Agent number of pods |
@@ -58,5 +64,26 @@ Kubernetes: `>= 1.19.0-0 < 1.25.0-0`
| imageCredentials.registry | string | `"docker.io"` | Registry name |
| imageCredentials.username | string | `"someone"` | Registry username |
----------------------------------------------
Autogenerated from chart metadata using [helm-docs v1.11.0](https://github.com/norwoodj/helm-docs/releases/v1.11.0)
# Upgrading Chart
### From v1.x to v2.x
Chart 1.x was under the assumption that all mounted volumes would be PVC's. Version > 2.x allows for more flexibility and will inject the yaml from podTemplate.volumes and podtemplate.volumeMounts directly.
v1.x
```
volumes:
- name: "yourvolume"
path: "/yourpath"
```
v2.x
```
volumes:
- name: "yourvolume"
persistentVolumeClaim:
claimName: "yourvolume"
volumeMounts:
- name: "yourvolume"
mountPath: "/yourpath"
```

View File

@@ -0,0 +1,45 @@
# ClearML Kubernetes Agent
{{ template "chart.deprecationWarning" . }}
{{ template "chart.badgesSection" . }}
{{ template "chart.description" . }}
{{ template "chart.homepageLine" . }}
{{ template "chart.maintainersSection" . }}
## Introduction
The **clearml-agent** is the Kubernetes agent for for [ClearML](https://github.com/allegroai/clearml).
It allows you to schedule distributed experiments on a Kubernetes cluster.
{{ template "chart.sourcesSection" . }}
{{ template "chart.requirementsSection" . }}
{{ template "chart.valuesSection" . }}
# Upgrading Chart
### From v1.x to v2.x
Chart 1.x was under the assumption that all mounted volumes would be PVC's. Version > 2.x allows for more flexibility and will inject the yaml from podTemplate.volumes and podtemplate.volumeMounts directly.
v1.x
```
volumes:
- name: "yourvolume"
path: "/yourpath"
```
v2.x
```
volumes:
- name: "yourvolume"
persistentVolumeClaim:
claimName: "yourvolume"
volumeMounts:
- name: "yourvolume"
mountPath: "/yourpath"
```

View File

@@ -11,27 +11,24 @@ data:
{{- if .Values.imageCredentials.enabled }}
imagePullSecrets:
{{- if .Values.imageCredentials.existingSecret }}
- name: .Values.imageCredentials.existingSecret
- name: {{.Values.imageCredentials.existingSecret}}
{{- else }}
- name: {{ include "agentk8sglue.referenceName" . }}-clearml-agent-registry-key
{{- end }}
{{- end }}
serviceAccountName: {{ .Values.agentk8sglue.serviceAccountName }}
{{- with .Values.agentk8sglue.podTemplate.volumes }}
volumes:
{{- range .Values.agentk8sglue.podTemplate.volumes }}
- name: {{ .name }}
persistentVolumeClaim:
claimName: {{ .name }}
{{- toYaml . | nindent 8 }}
{{- end }}
containers:
- resources:
{{- toYaml .Values.agentk8sglue.podTemplate.resources | nindent 10 }}
ports:
- containerPort: 10022
{{- with .Values.agentk8sglue.podTemplate.volumeMounts }}
volumeMounts:
{{- range .Values.agentk8sglue.podTemplate.volumes }}
- mountPath: {{ .path }}
name: {{ .name }}
{{- toYaml . | nindent 10 }}
{{- end }}
env:
- name: CLEARML_API_HOST

View File

@@ -19,31 +19,11 @@ spec:
{{- if .Values.imageCredentials.enabled }}
imagePullSecrets:
{{- if .Values.imageCredentials.existingSecret }}
- name: .Values.imageCredentials.existingSecret
- name: "{{.Values.imageCredentials.existingSecret}}"
{{- else }}
- name: {{ include "agentk8sglue.referenceName" . }}-clearml-agent-registry-key
{{- end }}
{{- end }}
initContainers:
- name: init-k8s-glue
image: "{{ .Values.agentk8sglue.image.repository }}:{{ .Values.agentk8sglue.image.tag }}"
command:
- /bin/sh
- -c
- >
set -x;
while [ $(curl {{ if not .Values.agentk8sglue.clearmlcheckCertificate }}--insecure{{ end }} -sw '%{http_code}' "{{.Values.agentk8sglue.apiServerUrlReference}}/debug.ping" -o /dev/null) -ne 200 ] ; do
echo "waiting for apiserver" ;
sleep 5 ;
done;
while [[ $(curl {{ if not .Values.agentk8sglue.clearmlcheckCertificate }}--insecure{{ end }} -sw '%{http_code}' "{{.Values.agentk8sglue.fileServerUrlReference}}/" -o /dev/null) =~ 403|405 ]] ; do
echo "waiting for fileserver" ;
sleep 5 ;
done;
while [ $(curl {{ if not .Values.agentk8sglue.clearmlcheckCertificate }}--insecure{{ end }} -sw '%{http_code}' "{{.Values.agentk8sglue.webServerUrlReference}}/" -o /dev/null) -ne 200 ] ; do
echo "waiting for webserver" ;
sleep 5 ;
done
containers:
- name: k8s-glue
image: "{{ .Values.agentk8sglue.image.repository }}:{{ .Values.agentk8sglue.image.tag }}"

View File

@@ -71,7 +71,12 @@ agentk8sglue:
# -- volumes definition for pods spawned to consume ClearML Task (example in values.yaml comments)
volumes: []
# - name: "yourvolume"
# path: "/yourpath"
# persistentVolumeClaim:
# claimName: "yourvolume"
# -- volumeMounts definition for pods spawned to consume ClearML Task (example in values.yaml comments)
volumeMounts: []
# - name: "yourvolume"
# mountPath: "/yourpath"
# -- environment variables for pods spawned to consume ClearML Task (example in values.yaml comments)
env: []
# # to setup access to private repo, setup secret with git credentials:

View File

@@ -2,8 +2,9 @@ apiVersion: v2
name: clearml
description: MLOps platform
type: application
version: "4.2.0"
appVersion: "1.6.0"
version: "4.3.0"
appVersion: "1.7.0"
kubeVersion: ">= 1.21.0-0 < 1.26.0-0"
home: https://clear.ml
icon: https://raw.githubusercontent.com/allegroai/clearml/master/docs/clearml-logo.svg
sources:

View File

@@ -1,6 +1,6 @@
# ClearML Ecosystem for Kubernetes
![Version: 4.2.0](https://img.shields.io/badge/Version-4.2.0-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: 1.6.0](https://img.shields.io/badge/AppVersion-1.6.0-informational?style=flat-square)
![Version: 4.3.0](https://img.shields.io/badge/Version-4.3.0-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: 1.7.0](https://img.shields.io/badge/AppVersion-1.7.0-informational?style=flat-square)
MLOps platform
@@ -119,6 +119,8 @@ For detailed instructions, see the [Optional Configuration](https://github.com/a
## Requirements
Kubernetes: `>= 1.21.0-0 < 1.26.0-0`
| Repository | Name | Version |
|------------|------|---------|
| file://../../dependency_charts/elasticsearch | elasticsearch | 7.16.2 |
@@ -136,7 +138,7 @@ For detailed instructions, see the [Optional Configuration](https://github.com/a
| apiserver.extraEnvs | list | `[]` | |
| apiserver.image.pullPolicy | string | `"IfNotPresent"` | |
| apiserver.image.repository | string | `"allegroai/clearml"` | |
| apiserver.image.tag | string | `"1.6.0"` | |
| apiserver.image.tag | string | `"1.7.0"` | |
| apiserver.livenessDelay | int | `60` | |
| apiserver.nodeSelector | object | `{}` | |
| apiserver.podAnnotations | object | `{}` | |
@@ -188,15 +190,15 @@ For detailed instructions, see the [Optional Configuration](https://github.com/a
| elasticsearch.volumeClaimTemplate.resources.requests.storage | string | `"50Gi"` | |
| externalServices.elasticsearchHost | string | `""` | Existing ElasticSearch Hostname to use if elasticsearch.enabled is false |
| externalServices.elasticsearchPort | int | `9200` | Existing ElasticSearch Port to use if elasticsearch.enabled is false |
| externalServices.mongodbHost | string | `""` | Existing MongoDB Hostname to use if elasticsearch.enabled is false |
| externalServices.mongodbPort | int | `27017` | Existing MongoDB Port to use if elasticsearch.enabled is false |
| externalServices.redisHost | string | `""` | Existing Redis Hostname to use if elasticsearch.enabled is false |
| externalServices.redisPort | int | `6379` | Existing Redis Port to use if elasticsearch.enabled is false |
| externalServices.mongodbHost | string | `""` | Existing MongoDB Hostname to use if mongodb.enabled is false |
| externalServices.mongodbPort | int | `27017` | Existing MongoDB Port to use if mongodb.enabled is false |
| externalServices.redisHost | string | `""` | Existing Redis Hostname to use if redis.enabled is false |
| externalServices.redisPort | int | `6379` | Existing Redis Port to use if redis.enabled is false |
| fileserver.affinity | object | `{}` | |
| fileserver.extraEnvs | list | `[]` | |
| fileserver.image.pullPolicy | string | `"IfNotPresent"` | |
| fileserver.image.repository | string | `"allegroai/clearml"` | |
| fileserver.image.tag | string | `"1.6.0"` | |
| fileserver.image.tag | string | `"1.7.0"` | |
| fileserver.nodeSelector | object | `{}` | |
| fileserver.podAnnotations | object | `{}` | |
| fileserver.replicaCount | int | `1` | |
@@ -263,7 +265,7 @@ For detailed instructions, see the [Optional Configuration](https://github.com/a
| webserver.extraEnvs | list | `[]` | |
| webserver.image.pullPolicy | string | `"IfNotPresent"` | |
| webserver.image.repository | string | `"allegroai/clearml"` | |
| webserver.image.tag | string | `"1.6.0"` | |
| webserver.image.tag | string | `"1.7.0"` | |
| webserver.nodeSelector | object | `{}` | |
| webserver.podAnnotations | object | `{}` | |
| webserver.replicaCount | int | `1` | |

View File

@@ -83,7 +83,7 @@ apiserver:
image:
repository: "allegroai/clearml"
pullPolicy: IfNotPresent
tag: "1.6.0"
tag: "1.7.0"
extraEnvs: []
@@ -155,7 +155,7 @@ fileserver:
image:
repository: "allegroai/clearml"
pullPolicy: IfNotPresent
tag: "1.6.0"
tag: "1.7.0"
extraEnvs: []
@@ -200,7 +200,7 @@ webserver:
image:
repository: "allegroai/clearml"
pullPolicy: IfNotPresent
tag: "1.6.0"
tag: "1.7.0"
podAnnotations: {}
@@ -229,13 +229,13 @@ externalServices:
elasticsearchHost: ""
# -- Existing ElasticSearch Port to use if elasticsearch.enabled is false
elasticsearchPort: 9200
# -- Existing MongoDB Hostname to use if elasticsearch.enabled is false
# -- Existing MongoDB Hostname to use if mongodb.enabled is false
mongodbHost: ""
# -- Existing MongoDB Port to use if elasticsearch.enabled is false
# -- Existing MongoDB Port to use if mongodb.enabled is false
mongodbPort: 27017
# -- Existing Redis Hostname to use if elasticsearch.enabled is false
# -- Existing Redis Hostname to use if redis.enabled is false
redisHost: ""
# -- Existing Redis Port to use if elasticsearch.enabled is false
# -- Existing Redis Port to use if redis.enabled is false
redisPort: 6379
redis: # configuration from https://github.com/bitnami/charts/blob/master/bitnami/redis/values.yaml

View File

@@ -1,6 +1,6 @@
---
{{- if .Values.maxUnavailable }}
apiVersion: policy/v1beta1
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: "{{ template "elasticsearch.uname" . }}-pdb"

View File

@@ -1,6 +1,6 @@
{{- if .Values.podSecurityPolicy.create -}}
{{- $fullName := include "elasticsearch.uname" . -}}
apiVersion: policy/v1beta1
apiVersion: policy/v1
kind: PodSecurityPolicy
metadata:
name: {{ default $fullName .Values.podSecurityPolicy.name | quote }}

View File

@@ -1,5 +1,5 @@
{{- if and (include "mongodb.arbiter.enabled" .) .Values.arbiter.pdb.create }}
apiVersion: policy/v1beta1
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: {{ include "mongodb.fullname" . }}-arbiter

View File

@@ -1,5 +1,5 @@
{{- if and (eq .Values.architecture "replicaset") .Values.pdb.create }}
apiVersion: policy/v1beta1
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: {{ include "mongodb.fullname" . }}

View File

@@ -58,7 +58,7 @@ Return the appropriate apiVersion for PodSecurityPolicy.
*/}}
{{- define "podSecurityPolicy.apiVersion" -}}
{{- if semverCompare ">=1.14-0" .Capabilities.KubeVersion.GitVersion -}}
{{- print "policy/v1beta1" -}}
{{- print "policy/v1" -}}
{{- else -}}
{{- print "extensions/v1beta1" -}}
{{- end -}}

View File

@@ -1,5 +1,5 @@
{{- if .Values.podDisruptionBudget.enabled }}
apiVersion: policy/v1beta1
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: {{ template "redis.fullname" . }}