diff --git a/docs/deploying_clearml/enterprise_deploy/multi_tenant_k8s.md b/docs/deploying_clearml/enterprise_deploy/multi_tenant_k8s.md index 604d169c..14f74dfa 100644 --- a/docs/deploying_clearml/enterprise_deploy/multi_tenant_k8s.md +++ b/docs/deploying_clearml/enterprise_deploy/multi_tenant_k8s.md @@ -4,668 +4,298 @@ title: Multi-Tenant Service on Kubernetes This guide provides step-by-step instructions for installing a ClearML multi-tenant service on a Kubernetes cluster. -It covers the installation and configuration steps necessary to set up ClearML in a cloud environment, including -enabling specific features and setting up necessary components. +## ClearML Server -## Prerequisites +To install the ClearML Server, follow the [ClearML Kubernetes Installation guide](k8s.md). -* A Kubernetes cluster -* Credentials for the ClearML Enterprise Helm chart repository -* Credentials for the ClearML Enterprise DockerHub repository -* Credentials for the ClearML billing DockerHub repository -* URL for downloading the ClearML Enterprise applications configuration -* ClearML Billing server Helm chart +Update the Server's `clearml-values.override.yaml` with the following values: -## Setting up ClearML Helm Repository - -You need to add the ClearML Enterprise Helm repository to your local Helm setup. This repository contains the Helm -charts required for deploying the ClearML Server and its components. - -To add the ClearML Enterprise repository using the following command. Replace `` with the private tokens sent to -you by ClearML: - -``` -helm repo add allegroai-enterprise --username --password +```yaml +apiserver: + extraEnvs: + - name: CLEARML__services__organization__features__user_management_advanced + value: "true" + - name: CLEARML__services__auth__ui_features_per_role__user__show_datasets + value: "false" + - name: CLEARML__services__auth__ui_features_per_role__user__show_orchestration + value: "false" + - name: CLEARML__services__workers__resource_usages__supervisor_company + value: "" + - name: CLEARML__secure__credentials__supervisor__role + value: "system" + - name: CLEARML__secure__credentials__supervisor__allow_login + value: "true" + - name: CLEARML__secure__credentials__supervisor__user_key + value: "" + - name: CLEARML__secure__credentials__supervisor__user_secret + value: "" + - name: CLEARML__secure__credentials__supervisor__sec_groups + value: "[\"users\", \"admins\", \"queue_admins\"]" + - name: CLEARML__secure__credentials__supervisor__email + value: "\"\"" + - name: CLEARML__apiserver__company__unique_names + value: "true" ``` -## Enabling Dynamic MIG GPUs +These settings configure the **supervisor**, an administrative user belonging to a designated supervisor company. +Admins in the supervisor company can view the resource usage dashboards across all tenants. -Allocating GPU fractions dynamically make use of the NVIDIA GPU operator. +The `` and `` can be used to log in as the +supervisor user from the ClearML Web UI via `app.`. -1. Add the NVIDIA Helm repository: +Note that the `` value must be explicitly quoted. To do so, put `\"` around the quoted value. +Example `"\"email@example.com\""`. - ``` - helm repo add nvidia - helm repo update - ``` +For configuring SSO, see the [SSO (Identity Provider) Setup guide](extra_configs/sso_login.md). -2. Install the NVIDIA GPU operator with the following configuration: +### Create a Tenant - ``` - helm install -n gpu-operator \\ - gpu-operator \\ - nvidia/gpu-operator \\ - --create-namespace \\ - --set migManager.enabled=false \\ - --set mig.strategy=mixed - ``` +This section explains how to create a new tenant (company) in the ClearML Server using the ClearML API. -## Install CDMO Chart +Note that placeholders (``) in the following configuration should be substituted with a valid domain based +on your installation values. -The ClearML Dynamic MIG Operator (CDMO) enables running AI workloads on k8s with optimized hardware utilization and -workload performance by facilitating MIG GPUs partitioning. +* **Define variables to use in the next steps:** -1. Prepare the `overrides.yaml` file so it will contain the following content. Replace `` - with the private token provided by ClearML: - - ``` - imageCredentials: - password: "" - ``` - -2. Install the CDMO chart: - - ``` - helm install -n cdmo-operator \\ - cdmo \\ - allegroai-enterprise/clearml-dynamic-mig-operator \\ - --create-namespace \\ - -f overrides.yaml - ``` - -### Enable MIG support - -1. Enable dynamic MIG support on your cluster by running the following command on **all nodes used for training** (run - for **each GPU** ID in your cluster): - - ``` - nvidia-smi -i -mig 1 - ``` - - This command can be issued from inside the `nvidia-device-plugin-daemonset` pod on the related node. - - If the result of the previous command indicates that a node reboot is necessary, perform the reboot. - -2. After enabling MIG support, label the MIG GPU nodes accordingly. This labeling helps in identifying nodes configured - with MIG support for resource management and scheduling: - - ``` - kubectl label nodes "cdmo.clear.ml/gpu-partitioning=mig" - ``` - -## Install ClearML Chart - -Install the ClearML chart with the required configuration: - -1. Prepare the `overrides.yaml` file and input the following content. Make sure to replace `` and `` - with a valid domain that will have records pointing to the ingress controller accordingly. - The credentials specified in `` and `` can be used to log in as the - supervisor user in the web UI. - - Note that the `` value must be explicitly quoted. To do so, put `\\"` around the quoted value. - For example `"\\"email@example.com\\””`. - - ``` - imageCredentials: - password: "" - clearml: - cookieDomain: "" - apiserver: - image: - tag: "3.21.6-1443" - ingress: - enabled: true - hostName: "api." - service: - type: ClusterIP - extraEnvs: - - name: CLEARML__billing__enabled: - value: "true" - - name: CLEARML__HOSTS__KAFKA__BILLING__HOST - value: "[clearml-billing-kafka.clearml-billing:9092]" - - name: CLEARML__HOSTS__REDIS__BILLING__HOST - value: clearml-billing-redis-master.clearml-billing - - name: CLEARML__HOSTS__REDIS__BILLING__DB - value: "2" - - name: CLEARML__SECURE__KAFKA__BILLING__security_protocol - value: SASL_PLAINTEXT - - name: CLEARML__SECURE__KAFKA__BILLING__sasl_mechanism - value: SCRAM-SHA-512 - - name: CLEARML__SECURE__KAFKA__BILLING__sasl_plain_username - value: billing - - name: CLEARML__SECURE__KAFKA__BILLING__sasl_plain_password - value: "jdhfKmsd1" - - name: CLEARML__secure__login__sso__oauth_client__auth0__client_id - value: "" - - name: CLEARML__secure__login__sso__oauth_client__auth0__client_secret - value: "" - - name: CLEARML__services__login__sso__oauth_client__auth0__base_url - value: "" - - name: CLEARML__services__login__sso__oauth_client__auth0__authorize_url - value: "" - - name: CLEARML__services__login__sso__oauth_client__auth0__access_token_url - value: "" - - name: CLEARML__services__login__sso__oauth_client__auth0__audience - value: "" - - name: CLEARML__services__organization__features__user_management_advanced - value: "true" - - name: CLEARML__services__auth__ui_features_per_role__user__show_datasets - value: "false" - - name: CLEARML__services__auth__ui_features_per_role__user__show_orchestration - value: "false" - - name: CLEARML__services__applications__max_running_apps_per_company - value: "3" - - name: CLEARML__services__auth__default_groups__users__features - value: "[\\"applications\\"]" - - name: CLEARML__services__auth__default_groups__admins__features - value: "[\\"config_vault\\", \\"experiments\\", \\"queues\\", \\"show_projects\\", \\"resource_dashboard\\", \\"user_management\\", \\"user_management_advanced\\", \\"app_management\\", \\"sso_management\\", \\"service_users\\", \\"resource_policy\\"]" - - name: CLEARML__services__workers__resource_usages__supervisor_company - value: "d1bd92a3b039400cbafc60a7a5b1e52b" # Default company - - name: CLEARML__secure__credentials__supervisor__role - value: "system" - - name: CLEARML__secure__credentials__supervisor__allow_login - value: "true" - - name: CLEARML__secure__credentials__supervisor__user_key - value: "" - - name: CLEARML__secure__credentials__supervisor__user_secret - value: "" - - name: CLEARML__secure__credentials__supervisor__sec_groups - value: "[\\"users\\", \\"admins\\", \\"queue_admins\\"]" - - name: CLEARML__secure__credentials__supervisor__email - value: "\\"\\"" - - name: CLEARML__apiserver__company__unique_names - value: "true" - fileserver: - ingress: - enabled: true - hostName: "file." - service: - type: ClusterIP - webserver: - image: - tag: "3.21.3-1657" - ingress: - enabled: true - hostName: "app." - service: - type: ClusterIP - clearmlApplications: - enabled: true - ``` - -2. Install ClearML: - - ``` - helm install -n clearml \\ - clearml \\ - allegroai-enterprise/clearml-enterprise \\ - --create-namespace \\ - -f overrides.yaml - ``` - -## Shared Redis installation - -Set up a shared Redis instance that multiple components of your ClearML deployment can use: - -1. lf not there already, add Bitnami repository: - - ``` - helm repo add bitnami - ``` - -2. Prepare the `overrides.yaml` with the following content: - - ``` - auth: - password: "sdkWoq23" - ``` - -3. Install Redis: - - ``` - helm install -n redis-shared \\ - redis \\ - bitnami/redis \\ - --create-namespace \\ - --version=17.8.3 \\ - -f overrides.yaml - ``` - -## Install Billing Chart - -The billing chart is not available as part of the ClearML private Helm repo. `clearml-billing-1.1.0.tgz` is directly -provided by the ClearML team. - -1. Prepare `values.override.yaml` - Create the file with the following content, replacing `` - with the appropriate value: - - ``` - imageCredentials: - username: dockerhubcustpocbillingaccess - password: "" - ``` - -1. Install the billing chart: - - ``` - helm install -n clearml-billing \\ - clearml-billing \\ - clearml-billing-1.0.0.tgz \\ - --create-namespace \\ - -f overrides.yaml - ``` - -## Namespace Isolation using Network Policies - -For enhanced security, isolate namespaces using the following NetworkPolicies: - -``` -apiVersion: networking.k8s.io/v1 -kind: NetworkPolicy -metadata: - name: default-deny-ingress - namespace: clearml -spec: - podSelector: {} - policyTypes: - - Ingress - ingress: - - from: - - podSelector: {} ---- -apiVersion: networking.k8s.io/v1 -kind: NetworkPolicy -metadata: - name: allow-clearml-ingress - namespace: clearml -spec: - podSelector: - matchLabels: - app.kubernetes.io/name: clearml-clearml-enterprise - policyTypes: - - Ingress - ingress: - - from: - - ipBlock: - cidr: 0.0.0.0/0 ---- -apiVersion: networking.k8s.io/v1 -kind: NetworkPolicy -metadata: - name: allow-clearml-ingress - namespace: clearml-billing -spec: - podSelector: {} - policyTypes: - - Ingress - ingress: - - from: - - podSelector: {} - - namespaceSelector: - matchLabels: - kubernetes.io/metadata.name: clearml -``` - -## Application Installation - -To install ClearML GUI applications: - -1. Get the apps to install and the installation script by downloading and extracting the archive provided by ClearML - - ``` - wget -O apps.zip "" - unzip apps.zip - ``` - -2. Install the apps: - - ``` - python upload_apps.py \\ --host $APISERVER_ADDRESS \\ --user $APISERVER_USER --password $APISERVER_PASSWORD \\ --dir apps -ml - ``` - -## Tenant Configuration - -Create tenants and corresponding admin users, and set up an SSO domain whitelist for secure access. To configure tenants, -follow these steps (all requests must be authenticated by root or admin). Note that placeholders like `` -must be substituted with valid domain names or values from responses. - -1. Define the following variables: - - ``` + ```bash APISERVER_URL="https://api." APISERVER_KEY="" APISERVER_SECRET="" ``` -2. Create a **Tenant** (company): + :::note + The apiserver key and secret should be the same as those used for installing the ClearML Enterprise Server Chart. + ::: - ``` - curl $APISERVER_URL/system.create_company \\ - -H "Content-Type: application/json" \\ - -u $APISERVER_KEY:$APISERVER_SECRET \\ - -d '{"name":""}' +* **Create a Tenant (company):** + + ```bash + curl $APISERVER_URL/system.create_company \ + -H "Content-Type: application/json" \ + -u $APISERVER_KEY:$APISERVER_SECRET \ + -d '{"name":""}' ``` - This returns the new Company ID (``). If needed, you can list all companies with the following command: + The result returns the new Company ID (``). - ``` + To view existing tenants: + + ```bash curl -u $APISERVER_KEY:$APISERVER_SECRET $APISERVER_URL/system.get_companies ``` -3. Create an **Admin User**: +* **Create an Admin User for the Tenant:** - ``` - curl $APISERVER_URL/auth.create_user \\ - -H "Content-Type: application/json" \\ - -u $APISERVER_KEY:$APISERVER_SECRET \\ - -d '{"name":"","company":"","email":"","role":"admin"}' + ```bash + curl $APISERVER_URL/auth.create_user \ + -H "Content-Type: application/json" \ + -u $APISERVER_KEY:$APISERVER_SECRET \ + -d '{"name":"","company":"","email":"","role":"admin","internal":"true"}' ``` - This returns the new User ID (``). + The result returns the new User ID (``). -4. Generate **Credentials** for the new Admin User: +* **Create Credentials for the new Admin User:** - ``` - curl $APISERVER_URL/auth.create_credentials \\ - -H "Content-Type: application/json" \\ - -H "X-Clearml-Impersonate-As: " \\ - -u $APISERVER_KEY:$APISERVER_SECRET + ```bash + curl $APISERVER_URL/auth.create_credentials \ + -H "Content-Type: application/json" \ + -H "X-Clearml-Impersonate-As: " \ + -u $APISERVER_KEY:$APISERVER_SECRET ``` - This returns a set of key and secret credentials associated with the new Admin User. + The result returns a set of key and secret credentials associated with the new Admin User. -5. Create an SSO Domain **Whitelist**. The `` is the email domain setup for users to access through SSO. + :::note + You can use this set of credentials to set up a [ClearML Agent](agent_k8s.md) or [App Gateway](appgw.md) for the new Tenant. + ::: - ``` - curl $APISERVER_URL/login.set_domains \\ - -H "Content-Type: application/json" \\ - -H "X-Clearml-Act-As: " \\ - -u $APISERVER_KEY:$APISERVER_SECRET \\ - -d '{"domains":[""]}' +#### Create IDP/SSO Sign-in Rules + +You can configure how new users are assigned to tenants upon first signing in to the system using one or more of the +following methods: + +* **Route an email to a tenant based on the email's domain:** + + Automatically assign new users to a tenant based on their email domain. + + :::caution + Note that providing the same domain name for multiple tenants will result in unstable behavior and should be avoided. + ::: + + ```bash + curl $APISERVER_URL/login.set_domains \ + -H "Content-Type: application/json" \ + -H "X-Clearml-Act-As: " \ + -u $APISERVER_KEY:$APISERVER_SECRET \ + -d '{"domains":[""]}' ``` -### Install ClearML Agent Chart + * `` is the email domain set up for users to access through SSO (e.g. `"acme.io"`, `"clear.ml"`). + * All new users with matching domains will be routed to the associated tenant. -To install the ClearML Agent Chart, follow these steps: +* **Route specific email(s) to a tenant:** -1. Prepare the `overrides.yaml` file with the following content. Make sure to replace placeholders like - ``, ``, and `` with the appropriate values: + Assign specific email addresses to a tenant. You can + use the `is_admin` property to choose whether these users will be set as admins in this tenant upon login. - ``` - imageCredentials: - password: "" - clearml: - agentk8sglueKey: "-" # TODO --> Generate credentials from API in the new tenant - agentk8sglueSecret: "-" # TODO --> Generate credentials from API in the new tenant - agentk8sglue: - extraEnvs: - - name: CLEARML_K8S_SUPPORT_SUSPENSION - value: "1" - - name: CLEARML_K8S_PORTS_MODE_ON_REQUEST_ONLY - value: "1" - - name: CLEARML_AGENT_REDIS_HOST - value: "redis-master.redis-shared" - - name: CLEARML_AGENT_REDIS_PORT - value: "6379" - - name: CLEARML_AGENT_REDIS_DB - value: "0" - - name: CLEARML_AGENT_REDIS_PASSWORD - value: "sdkWoq23" - image: - tag: 1.24-1.8.1rc99-159 - monitoredResources: - maxResources: 3 - minResourcesFieldName: "metadata|labels|required-resources" - maxResourcesFieldName: "metadata|labels|required-resources" - apiServerUrlReference: "https://api." - fileServerUrlReference: "https://file." - webServerUrlReference: "https://app." - defaultContainerImage: "python:3.9" - debugMode: true - createQueues: true - queues: - default: - templateOverrides: - labels: - required-resources: "0.5" - billing-monitored: "true" - queueSettings: - maxPods: 10 - gpu-fraction-1_00: - templateOverrides: - labels: - required-resources: "1" - billing-monitored: "true" - resources: - limits: - nvidia.com/mig-7g.40gb: 1 - clear.ml/fraction-1: "1" - queueSettings: - maxPods: 10 - gpu-fraction-0_50: - templateOverrides: - labels: - required-resources: "0.5" - billing-monitored: "true" - resources: - limits: - nvidia.com/mig-3g.20gb: 1 - clear.ml/fraction-1: "0.5" - queueSettings: - maxPods: 10 - gpu-fraction-0_25: - templateOverrides: - labels: - required-resources: "0.25" - billing-monitored: "true" - resources: - limits: - nvidia.com/mig-2g.10gb: 1 - clear.ml/fraction-1: "0.25" - queueSettings: - maxPods: 10 - sessions: - portModeEnabled: false # set to true when using TCP ports mode - agentID: "" - externalIP: 0.0.0.0 # IP of one of the workers - startingPort: 31010 # be careful to not overlap other tenants (startingPort + maxServices) - maxServices: 10 + Note that you can create more than one list per tenant (using multiple API calls). This way you can create one list + for admin users and another for non-admin users. + + :::caution + Note that including the same email address in more than a single tenant's list will result in unstable behavior and + should be avoided. + ::: + + ```bash + curl $APISERVER_URL/login.add_whitelist_entries \ + -H "Content-Type: application/json" \ + -H "X-Clearml-Act-As: " \ + -u $APISERVER_KEY:$APISERVER_SECRET \ + -d '{"emails":["", "", ...],"is_admin":false}' ``` -2. Install the ClearML Agent Chart in the specified tenant namespace: + To remove an email(s) from these lists, use the following API call. Note that this will not affect a user who has + already logged in using one of these email addresses: - ``` - helm install -n \\ - clearml-agent \\ - allegroai-enterprise/clearml-enterprise-agent \\ - --create-namespace \\ - -f overrides.yaml + ```bash + curl $APISERVER_URL/login.remove_whitelist_entries \ + -H "Content-Type: application/json" \ + -H "X-Clearml-Act-As: " \ + -u $APISERVER_KEY:$APISERVER_SECRET \ + -d '{"emails":["", "", ...]}' ``` -3. Create a queue via the API: +##### View Current Login Routing Rules - ``` - curl $APISERVER_URL/queues.create \\ - -H "Content-Type: application/json" \\ - -H "X-Clearml-Impersonate-As: 75557e2ab172405bbe153705e91d1782" \\ - -u $APISERVER_KEY:$APISERVER_SECRET \\ - -d '{"name":"default"}' - ``` - -### Tenant Namespace Isolation with NetworkPolicies - -To ensure network isolation for each tenant, you need to create a `NetworkPolicy` in the tenant namespace. This way -the entire namespace/tenant will not accept any connection from other namespaces. - -Create a `NetworkPolicy` in the tenant namespace with the following configuration: - - ``` - apiVersion: networking.k8s.io/v1 - kind: NetworkPolicy - metadata: - name: default-deny-ingress - spec: - podSelector: {} - policyTypes: - - Ingress - ingress: - - from: - - podSelector: {} - ``` - -### Install the App Gateway Router Chart - -Install the App Gateway Router in your Kubernetes cluster, allowing it to manage and route tasks: - -1. Prepare the `overrides.yaml` file with the following content: - - ``` - imageCredentials: - password: "" - clearml: - apiServerUrlReference: "" - apiserverKey: "" - apiserverSecret: "" - ingress: - enabled: true - hostName: "" - ``` - -2. Install App Gateway Router in the specified tenant namespace: - - ``` - helm install -n \\ - clearml-ttr \\ - clearml-enterprise/clearml-task-traffic-router \\ - --create-namespace \\ - -f overrides.yaml - ``` - -## Configuring Options per Tenant - -### Override Options When Creating a New Tenant - -When creating a new tenant company, you can specify several tenant options. These include: - -* `features` - Add features to a company -* `exclude_features` - Exclude features from a company. -* `allowed_users` - Set the maximum number of users for a company. - -#### Example: Create a New Tenant with a Specific Feature Set - -``` -curl $APISERVER_URL/system.create_company \ --H "Content-Type: application/json" \ --u $APISERVER_KEY:$APISERVER_SECRET \ --d '{"name":"", "defaults": { "allowed_users": "10", "features": ["experiments"], "exclude_features": ["app_management", "applications", "user_management"] }}' +To get the current IDP/SSO login rule settings for this tenant: + +```bash +curl $APISERVER_URL/login.get_settings \ + -H "X-Clearml-Act-As: " \ + -u $APISERVER_KEY:$APISERVER_SECRET ``` -**Note**: make sure to replace the `` placeholder. +### Feature Control by User Group -### Limit Features for all Users +The server's `clearml-values.override.yaml` can control the features available to +some users or groups in the system. -This Helm Chart value in the `overrides.yaml` will have priority over all tenants, and will limit the features -available to any user in the system. This means that even if the feature is enabled for the tenant, if it's not in this -list, the user will not see it. +Example: with the following configuration, all users in the `users` group will only have the `applications` feature enabled. -Example: all users will only have the `applications` feature enabled. - -``` +```yaml apiserver: extraEnvs: - name: CLEARML__services__auth__default_groups__users__features value: "[\"applications\"]" ``` -**Available Features**: +See a list of available features under [Available Features](#available-features). -* `applications` - Viewing and running applications -* `data_management` - Working with hyper-datasets and dataviews -* `experiments` - Viewing experiment table and launching experiments -* `queues` - Viewing the queues screen -* `queue_management` - Creating and deleting queues -* `pipelines` - Viewing/managing pipelines in the system -* `reports` - Viewing and managing reports in the system -* `show_dashboard` - Show the dashboard screen -* `show_projects` - Show the projects menu option -* `resource_dashboard` - Display the resource dashboard in the orchestration page +## Workers +To install and configure the ClearML components that run user workloads, refer to: +* [ClearML Enterprise Agent](agent_k8s.md) +* [App Gateway](appgw.md). -## Configuring Groups +:::note +Make sure to set up Agent and App Gateway using a Tenant's admin user credentials created with the Tenant creation APIs +described [above](#create-a-tenant). +::: -Groups in ClearML are used to manage user permissions and control access to specific features within the platform. -The following section explains the different types of groups and how to configure them, with a focus on configuration-based, -cross-tenant groups. +### Tenant Separation -### Types of Groups +In multi-tenant setups, you can separate the tenant workers in different namespaces: + +* Create a Kubernetes Namespace for each tenant +* Deploy a dedicated ClearML Agent and AI Application Gateway in each Namespace. +* Configure a tenant Agent and Gateway with credentials created on the ClearML Server by a user of the same tenant. + +Additional network separation can be achieved via Kubernetes Network Policies. + +## Additional Configuration + +### Override Options for New Tenants + +When creating a new tenant company, you can configure the following tenant options: + +* `features` - Add features to a company. +* `exclude_features` - Exclude features from a company. +* `allowed_users` - Set the maximum number of users for a company. + +```bash +curl $APISERVER_URL/system.create_company \ + -H "Content-Type: application/json" \ + -u $APISERVER_KEY:$APISERVER_SECRET \ + -d '{"name":"", "defaults": { "allowed_users": "10", "features": ["experiments"], "exclude_features": ["app_management", "applications", "user_management"] }}' +``` + +### Global Features Limits + +The following setting in `clearml-values.override.yaml` defines a global feature whitelist. It overrides all tenant-specific +configurations, ensuring that only the listed features are available to any user in the system. + +Example: Restrict all users to only the `applications` feature: + +```yaml +apiserver: + extraEnvs: + - name: CLEARML__services__auth__default_groups__users__features + value: "[\"applications\"]" +``` + +For the complete list of available features, see [Available Features](#available-features). + +### Configuring Groups + +ClearML groups are used to control user permissions and access to platform. +This section describes the types of groups available and how to configure them--especially cross-tenant groups. + +#### Group Types ClearML utilizes several types of groups: -* **Built-in Groups** - These groups exist by default in every ClearML installation: - * **`users`**: All registered users automatically belong to this group. It typically defines the baseline set of +* **Built-in Groups** (default in every ClearML installation): + * **`users`**: All registered users belong to this group. It defines the baseline set of permissions and features available to everyone. * **`admins`**: Users in this group have administrative privileges. - * **`queue_admins`**: Users in this group have specific permissions to manage execution queues. -* **Tenant-Specific Groups (UI)** - Additional groups can be created specific to a tenant (organization workspace) + * **`queue_admins`**: Users in this group can manage task execution queues. +* **Tenant-Specific Groups (via UI)** - Additional tenant-specific groups can be created directly through the ClearML Web UI (under **Settings > Users & Groups**). Users can be assigned to these groups via - the UI. These groups are managed *within* a specific tenant. For more information, see [Users & Groups](../../webapp/settings/webapp_settings_users.md). + the UI. For more information, see [Users & Groups](../../webapp/settings/webapp_settings_users.md). * **Cross-Tenant Groups (Configuration)** - These groups are defined centrally in the ClearML configuration files (e.g., Helm chart values, docker-compose environment variables). They offer several advantages: - * **Cross-Tenant Definition:** Defined once in the configuration, applicable across the deployment. - * **Fine-Grained Feature Control:** Allows precise assignment of specific ClearML features to groups. - * **Automation:** Suitable for infrastructure-as-code and automated deployment setups. + * Reusable across tenants + * Fine-grained control over enabled features +#### Configuring Cross-Tenant Groups +Cross-tenant groups are defined using environment variables (e.g., in `apiserver`). The naming convention is: -### Configuring Cross-Tenant Groups - -To define a cross-tenant group, you need to set specific configuration variables. These are typically set as environment -variables for the relevant ClearML services (like `apiserver`). The naming convention follows this -pattern: `CLEARML__services__auth__default_groups____`. +``` +CLEARML__services__auth__default_groups____ +``` Replace `` with the desired name for your group (e.g., `my_group_name`, `Data_Scientists`, `MLOps_Engineers`). -#### Configuration Variables +##### Configuration Variables For each group you define in the configuration, you need to specify the following properties: -* **`id`**: A unique identifier for the group. This **must** be a standard UUID (Universally Unique Identifier). You can - generate one using various online tools or libraries. - - * Variable Name: `CLEARML__services__auth__default_groups____id` - * Example Value: `"abcd-1234-abcd-1234"` +| Property | Description | Variable Name | Example Value | +| ------------ | -------------------------------------------------------------------------- | ------------------------------------ | ------------------------------- | +| `id` | A unique identifier for the group. This **must** be a standard UUID (Universally Unique Identifier). You can generate one using various online tools or libraries. | `CLEARML__services__auth__default_groups____id` | `"abcd-1234-abcd-1234"` | +| `name` | Display name for the group (should match `` used in the variable path) | `CLEARML__services__auth__default_groups____name` | `"My Group Name"`, `"MLOps Team"` | +| `features` | JSON list of features to enable for this group . For the complete list of available features, see [Available Features](#available-features). Note that the features must be defined for the tenant or for the entire server in order to affect the group. By default, all the features of the tenant are available to all users. | `CLEARML__services__auth__default_groups____features` | `'["applications", "experiments", "pipelines", "reports", "show_dashboard", "show_projects"]'` (Note the single quotes wrapping the JSON string if setting via YAML/environment variables). | +| `assignable` | Whether admins can assign users to this group from the ClearML Web UI (`true`/`false`). If `false`, group membership is managed externally or implicitly. | `CLEARML__services__auth__default_groups____assignable` | `"false"` | +| `system` | Always set to `"false"` for custom groups | ``CLEARML__services__auth__default_groups____system` | `"false"` | -* **`name`**: The display name of the group. This should match the `` used in the variable path. - - * Variable Name: `CLEARML__services__auth__default_groups____name` - * Example Value: `"My Group Name"` -* **`features`**: A JSON-formatted list of strings, where each string is a feature name to be enabled for this group. See - [Available Features](#available-features) for a list of valid feature names. Note that the features must be defined - for the tenant or for the entire server in order to affect the group. By default, all the features of the tenant are - available to all users. - - * Variable Name: `CLEARML__services__auth__default_groups____features` - * Example Value: `'["applications", "experiments", "pipelines", "reports", "show_dashboard", "show_projects"]'` (Note - the single quotes wrapping the JSON string if setting via YAML/environment variables). - -* **`assignable`**: A boolean (`"true"` or `"false"`) indicating whether administrators can add users to this group via - the ClearML Web UI. If `false`, group membership is managed externally or implicitly. Configuration-defined groups - often have this set to `false`. - - * Variable Name: `CLEARML__services__auth__default_groups____assignable` - * Example Value: `"false"` - -* **`system`**: A boolean flag. This should **always be set to `"false"`** for custom-defined groups. - - * Variable Name: `CLEARML__services__auth__default_groups____system` - * Example Value: `"false"` - -#### Example Configuration +##### Example Configuration The following example demonstrates how you would define a group named `my_group_name` with a specific set of features that cannot be assigned via the UI: -``` +```yaml # Example configuration snippet (e.g., in Helm values.yaml or docker-compose.yml environment section) # Unique group id for my_group_name @@ -689,37 +319,11 @@ that cannot be assigned via the UI: value: "false" ``` -### Available Features - -The following features can be assigned to groups via the `features` configuration variable: - -| Feature Name | Description | Notes | -| :---- | :---- | :---- | -| `user_management` | Allows viewing company users and groups, and editing group memberships. | Only effective if the group is `assignable`. | -| `user_management_advanced` | Allows direct creation of users (bypassing invites) by admins and system users. | Often requires enabling at the organization level too. | -| `permissions` | Enables editing of Role-Based Access Control (RBAC) rules. | No | -| `applications` | Allows users to work with ClearML Applications (viewing, running). | Excludes management operations (upload/delete). | -| `app_management` | Allows application management operations: upload, delete, enable, disable. | No | -| `experiments` | Allows working with experiments. | *Deprecated/Not Used.* All users have access regardless of this flag. | -| `queues` | Allows working with queues. | *Deprecated/Not Used.* All users have access regardless of this flag. | -| `queue_management` | Allows create, update, and delete operations on queues. | No | -| `data_management` | Controls access to Hyper-Datasets. | Actual access might also depend on `apiserver.services.excluded`. | -| `config_vault` | Enables the configuration vaults feature. | No | -| `pipelines` | Enables access to Pipelines (building and running). | No | -| `reports` | Enables access to Reports. | No | -| `resource_dashboard` | Enables access to the compute resource dashboard feature. | No | -| `sso_management` | Enables the SSO (Single Sign-On) configuration wizard. | No | -| `service_users` | Enables support for creating and managing service accounts (API keys). | No | -| `resource_policy` | Enables the resource policy feature. | May default to a trial feature if not explicitly enabled. | -| `model_serving` | Enables access to the model serving endpoints feature. | No | -| `show_dashboard` | Makes the "Dashboard" menu item visible in the UI sidebar. | No | -| `show_model_view` | Makes the "Models" menu item visible in the UI sidebar. | No | -| `show_projects` | Makes the "Projects" menu item visible in the UI sidebar. | No | -| `show_orchestration` | Makes the "Orchestration" menu item visible in the UI sidebar. | Available from apiserver version 3.25 | -| `show_datasets` | Makes the "Datasets" menu item visible in the UI sidebar. | Available from apiserver version 3.25 | - ### Feature Assignment Strategy +ClearML uses a feature-based permission model, where each user’s access is determined by the groups they belong to. +This section explains how feature assignment works and how to configure it effectively. + #### Combining Features If a user belongs to multiple groups (e.g., the default `users` group and a custom `my_group_name` group), their @@ -727,45 +331,49 @@ effective feature set is the **union** (combination) of all features from all gr #### Configuring the Default 'users' Group -Because all users belong to the `users` group, and features are combined, it's crucial to configure the `users` group +Since all users belong to the `users` group, you should configure the `users` group appropriately. You generally have two options: -1. **Minimum Shared Features:** Assign only the absolute minimum set of features that *every single user* should have to - the `users` group. +1. **Minimum Shared Features:** Only assign features that every user should always have. 2. **Empty Feature Set:** Assign an empty list (`[]`) to the `users` group's features. This means users only get features - explicitly granted by other groups they are members of. This is often the cleanest approach when using multiple custom groups. + explicitly granted to groups they are members of. -**Example: Disabling all features by default for the `users` group:** - -``` -- name: CLEARML__services__auth__default_groups__users__features - value: '[]' - -``` + ```yaml + - name: CLEARML__services__auth__default_groups__users__features + value: '[]' + ``` :::note -You typically don't need to define the id, name, assignable, or system properties for built-in groups like users unless -you need to override default behavior, but you do configure their features. +For built-in groups like users, you typically only need to define the `features` property. You do not need to redefine +`id`, `name`, `assignable`, or `system` unless you need to override defaults. ::: -### Setting Server-Level or Tenant-level Features +#### Setting Server-Level or Tenant-level Features -Features must be enabled for the entire server or for the tenant in order to allow setting it for specific groups. -Setting server wide feature is done using a different configuration pattern: `CLEARML__services__organization__features__`. +To assign a feature to a group, that feature must first be enabled globally (server-level) or per tenant. + +##### Enabling Features Globally +To enable a feature for the entire deployment, use: + +``` +CLEARML__services__organization__features__` +``` Setting one of these variables to `"true"` enables the feature globally. **Example: Enabling `user_management_advanced` for the entire organization:** -``` +```yaml - name: CLEARML__services__organization__features__user_management_advanced value: "true" ``` +##### Enabling Features Per Tenant + To enable a feature for a specific tenant, use the following API call: -``` +```bash curl $APISERVER_URL/system.update_company_settings \ -H "Content-Type: application/json" \ -u $APISERVER_KEY:$APISERVER_SECRET \ @@ -775,15 +383,15 @@ curl $APISERVER_URL/system.update_company_settings \ }' ``` -By default, all users have access to all features, but this can be changed by setting specific features set per group as described above. +By default, all users have access to all features. You can restrict this by explicitly setting feature lists per group. -### Example: Defining Full Features for Admins +#### Example: Granting All Features for Admins While the `admins` group has inherent administrative privileges, you might want to explicitly ensure they have access to *all* configurable features defined via the `features` list, especially if you've restricted the default `users` group significantly. You might also need to enable certain features organization-wide. -``` +```yaml # Enable advanced user management for the whole organization - name: CLEARML__services__organization__features__user_management_advanced value: "true" @@ -806,5 +414,70 @@ significantly. You might also need to enable certain features organization-wide. ``` By combining configuration-defined groups, careful management of the default users group features, and organization-level -settings, you can create a flexible and secure permission model tailored to your ClearML deployment. Remember to -restart the relevant ClearML services after applying configuration changes. +settings, you can create a flexible and secure permission model tailored to your ClearML deployment. + +:::important +Remember to restart the relevant ClearML services after applying configuration changes. +::: + +### Per-Tenant Applications Settings + +You may want your users' applications in different tenants to have their own configuration and template on Kubernetes. +The ClearML Enterprise Server and Agent support different queue modes: + +- `global` (default) - A single Apps Agent on the server. The application's controllers will start on the server. +- `per_tenant` - Multiple Apps Agents, one per tenant (will need `agentk8sglue.appsQueue.enabled=true` on Agents). The + application's controllers will start on the worker. + +Configure the Server's `clearml-values.override.yaml`: + +```yaml +clearmlApplications: + queueMode: "per_tenant" +``` + +Configure the Agent's `clearml-agent-values.override.yaml`: + +```yaml +agentk8sglue: + appsQueue: + enabled: true + # -- Here you can define queueSettings and templateOverrides as for other queues. + # queueSettings: + # templateOverrides: +``` + +:::note +This feature requires the Agent to be configured using an internal admin credentials as previously mentioned in the +"Create an Admin User for the new tenant" section, making sure to pass `"internal":"true"` and using the output +credentials for `clearml.agentk8sglueKey` and `clearml.agentk8sglueSecret` (or `existingAgentk8sglueSecret`). +::: + +## Available Features + +The following features can be assigned to groups via the `features` configuration variable: + +| Feature Name | Description | Notes | +| :---- | :---- | :---- | +| `user_management` | Allows viewing tenant users and groups, and editing group memberships. | Only effective if the group is `assignable`. | +| `user_management_advanced` | Allows direct creation of users (bypassing invites) by admins and system users. | Often also requires enabling at the organization level. | +| `permissions` | Enables editing of Role-Based Access Control (RBAC) rules. | No | +| `applications` | Allows users to work with [ClearML Applications](../../webapp/applications/apps_overview.md) (viewing, running). | Excludes management operations (upload/delete). | +| `app_management` | Allows application management operations: upload, delete, enable, disable. | No | +| `experiments` | Allows working with experiments. | *Deprecated/Not Used.* All users have access regardless of this flag. | +| `queues` | Allows working with queues. | *Deprecated/Not Used.* All users have access regardless of this flag. | +| `queue_management` | Allows create, update, and delete operations on queues. | No | +| `data_management` | Controls access to [Hyper-Datasets](../../hyperdatasets/overview.md). | Access may also depend on `apiserver.services.excluded`. | +| `config_vault` | Enables the [configuration vaults](../../webapp/settings/webapp_settings_profile.md#configuration-vault) feature. | No | +| `pipelines` | Enables access to Pipelines (building and running). | No | +| `reports` | Enables access to [Reports](../../webapp/webapp_reports.md). | No | +| `resource_dashboard` | Enables access to the [orchestration dashboard](../../webapp/webapp_orchestration_dash.md) feature. | No | +| `sso_management` | Enables the SSO (Single Sign-On) configuration wizard. | No | +| `service_users` | Enables support for creating and managing service accounts (API keys). | No | +| `resource_policy` | Enables the [Resource Policies](../../webapp/resource_policies.md) feature. | May default to a trial feature if not explicitly enabled. | +| `model_serving` | Enables access to the [Model Endpoints](../../webapp/webapp_model_endoints.md) feature. | No | +| `show_dashboard` | Makes the "Dashboard" menu item visible in the UI sidebar. | No | +| `show_model_view` | Makes the "Models" menu item visible in the UI sidebar. | No | +| `show_projects` | Makes the "Projects" menu item visible in the UI sidebar. | No | +| `show_orchestration` | Makes the "Orchestration" menu item visible in the UI sidebar. | Available from apiserver version 3.25 | +| `show_datasets` | Makes the "Datasets" menu item visible in the UI sidebar. | Available from apiserver version 3.25 | \ No newline at end of file