clearml-docs/docs/webapp/settings/webapp_settings_resource_configs.md
2024-09-29 12:20:11 +03:00

11 KiB
Raw Blame History

title
Resource Configuration

Administrators can define Resource Policies to implement resource quotas and reservations for different user groups to prioritize workload usage across available resources.

Under the Resource Configuration section, administrators define the available resources and the way in which they will be allocated to different workloads.

Resource configuration page

The Resource Configuration settings page shows the currently provisioned configuration: the defined resource pools, resource profiles, and the resource allocation architecture.

Resource Pools

A resource pool is an aggregation of resources available for use, such as a Kubernetes cluster or a GPU superpod. Administrators specify the total number of resources available in each pool. The resource policy manager ensures workload assignment up to the available number of resources.

Administrators control the execution priority within a pool across the resource profiles making use of it (e.g. if jobs of profile A and jobs of profile B currently need to run in a pool, allocate resources for profile A jobs first or vice versa).

The resource pool cards are displayed on the top of the Resource Configuration settings page. Each card displays the following information:

Resource pool card

  • Pool name
  • Number of resources currently in use out of the total available resources
  • Execution Priority - List of linked profiles in order of execution priority.

Resource Profiles

Resource profiles represent the resource consumption requirements of jobs, such as the number of GPUs needed. They are the interface that administrators use to provide users with access to the available resource pools based on their job resource requirements via Resource Policies.

Administrators can control the resource pool allocation precedence within a profile (e.g. only run jobs on pool B if pool A cannot currently satisfy the profile's resource requirements).

Administrators can control the queuing priority within a profile across resource policies making use of it (e.g. if the R&D team and DevOps team both have pending jobs - run the R&D team's jobs first or vice versa).

The resource profile cards are displayed on the bottom of the Resource Configuration settings page. Each card displays the following information:

Resource profile card

  • Profile name
  • Number of resources - Number of resources allocated to jobs in this profile
  • List of pool links
  • Queued jobs - Number of currently pending jobs
  • Running jobs - Number of currently running jobs
  • Number of resource policies. Click to open resource policy list and to order queuing priority.

Example Workflow

You have GPUs spread across a local H100 and additional bare metal servers, as well as on AWS (managed by an autoscaler). Assume that currently most of your resources are already assigned to jobs, and only 16 resources are available: 8 in the H100 resource pool and 8 in the Bare Metal pool:

Example resource pools

Teams' jobs have varying resource requirements of 0.5, 2, 4, and 8 GPUs. Resource profiles are defined to reflect these:

Example resource profiles

The different jobs will be routed to different resource pools by connecting the profiles to the resource pools. Jobs enqueued through the profiles will be run in the pools where there are available resources in order of their priority. For example, the H100 pool will run jobs with the following precedence: 2 GPU jobs first, then 4 GPU ones, then 8 GPU, and lastly 0.5 GPU.

Example profile priority

Resource policies are implemented for two teams:

  • Dev team
  • Research Team

Each team has a resource policy configured with 8 reserved resources and a 16 resource limit. Both teams make use of the 4xGPU profile (i.e. each job running through this profile requires 4 resources).

Example resource policy

The Dev team is prioritized over the Research team by placing it higher in the Resource Profile's Policies Priority list:

Example resource policy priority

Both the Dev team and the Research team enqueue four 4-resource jobs each: Dev team jobs will be allocated resources first. The 4xGPU resource profile is connected to two resource pools: Bare Metal Low END GPUs (with the 4 GPU Low End link) and H100 Half a Superpod (with the 4 GPU H100 link).

Example resource profile-pool connections

Resources are assigned from the Bare Metal pool first (precedence set on the resource profile card):

Example resource pool precedence

If the first pool cannot currently satisfy the profiles resource requirements, resources are assigned from the next listed pool. Let's look at the first pool in the image below. Notice that the pool has 8 available resources, therefore it can run two 4-resource jobs.

Example resource pool card

Since the Bare Metal pool does not have any more available resources, additional jobs will be assigned resources from the next pool that the Resource Profile is connected to. The H100 pool has 8 available resources. There are still 2 jobs pending from the Dev team requiring 8 resources in total, and 4 jobs from the Research team requiring 16 resources in total. In order to honor the Research teams resource reservation, its first two jobs will be assigned the required 8 resources from the H100 pool.

All available resources having been assigned - 2 jobs of each team will remain pending until some of the currently running jobs finish and resources become available.

Applying Resource Configuration

Administrators can globally activate/deactivate resource policy management. To enable the currently provisioned configuration, click on the Enable resource management toggle. Enabling resource management will service the policy queues according to the provisioned resource profile and pool assignments. Disabling the resource management will stop serving the policy queues. Tasks on these queues will remain pending until resource policy management is reenabled.

Administrators can add, edit, delete, and connect resource pools and profiles in the Resource Configuration settings page.

To make any change (create, delete, or modify a component) to the resource configuration, follow the following steps:

  1. Click Open Editor to go into Editing mode
  2. After making the desired changes you have the following options:
    • Save - Save the changes you made. These changes will not be applied until you click on Provision
    • Provision - Apply the resource policys saved changes
    • Reset Configuration - Set the editor to the currently provisioned values. This will delete any unprovisioned changes (both saved and unsaved)
  3. Click Exit to leave Editor mode. The page will show the provisioned configuration. Unprovisioned saved changes will still be available in Editor mode.

Resource Pool

To create a resource pool:

  1. Click + Add Pool
  2. In the Create Pool modal, input:
    • Name - The resource pools name. This will appear in the Pools information card in the Resource Configuration settings page
    • Number of Resources - Number of resources available in this pool
    • Description - Optional free form text for additional descriptive information
  3. Click Create

To modify a resource pool

  1. Click Menu on the relevant resource pool card > click Edit
  2. In the Edit Pool modal, change the pools name, number of resources, or description
  3. Click Save

You can also change the Execution Priority of the linked resource profiles. Click and drag the profile connection anchor Resourch anchor to change its position in the order of priority.

Resource Profile

To create a resource profile:

  1. Click + Add Profile
  2. In the Create Profile modal, input:
    • Name - The resource profiles name. This will appear in the profiles information card in the Resource Configuration settings page
    • Resource Allotment - Number of resources allocated to each job running in this profile
  3. Click Create

To modify a resource profile:

  1. Click Menu on the relevant resource profile card > click Edit
  2. In the Edit Profile modal, change the pool's name, number of resources, or description
  3. Click Save

To control which pool's resources will be assigned first, click and drag the pool connection anchor connection anchor to change its position in order of priority.

You can also change the Execution Priority of the resource policies making use of this profile. Open the policy list, then click the policy anchor policy anchor and drag the policy to change its position in order of priority.

To delete a resource profile:

  1. Click Menu on the relevant resource pool card
  2. Click Delete

Connecting Profiles to Pools

Connect a resource profile to a resource pool to allow jobs assigned to the profile to make use of the pools resources.

To connect a profile to a pool:

  1. Click Open Editor
  2. Drag the Profile-pool link of the relevant profile to the resource pool you want to connect the profile to. This opens the Connect Profile modal
  3. In the Connect Profile modal, input a name for this connection. This connection name will appear on the profile card

The settings page will show a line linking the profile and the pool cards. The linked profile appears on the pool card, showing its place in the order of execution. To change the profile's priority placement, drag its connection anchor connection anchor to a new position.

To disconnect a profile from a pool:

  1. Click Open Editor
  2. On the relevant profile card, hover over connection name and click X

Jobs assigned to this resource profile will no longer be able to utilize the pools resources.