mirror of
https://github.com/clearml/clearml-docs
synced 2025-02-25 05:24:39 +00:00
234 lines
13 KiB
Markdown
234 lines
13 KiB
Markdown
---
|
||
title: Resource Configuration
|
||
---
|
||
|
||
Administrators can define [Resource Policies](../resource_policies.md) to implement resource quotas and
|
||
reservations for different user groups to prioritize workload usage across available resources.
|
||
|
||
Under the **Resource Configuration** section, administrators define the available resources and the way in which they
|
||
will be allocated to different workloads.
|
||
|
||
data:image/s3,"s3://crabby-images/af615/af61591811f26f4ca6374d3d366b97da19e44a84" alt="Resource configuration page"
|
||
data:image/s3,"s3://crabby-images/4e7d5/4e7d5db16c1b9f41b94b3af9cbec6702487b948b" alt="Resource configuration page"
|
||
|
||
The Resource Configuration settings page shows the [currently provisioned](#applying-resource-configuration) configuration:
|
||
the defined resource pools, resource profiles, and the resource allocation architecture.
|
||
|
||
## Resource Pools
|
||
A resource pool is an aggregation of resources available for use, such as a Kubernetes cluster or a GPU superpod.
|
||
Administrators specify the total number of resources available in each pool. The resource policy manager ensures
|
||
workload assignment up to the available number of resources.
|
||
|
||
Administrators control the execution priority within a pool across the resource profiles making use of it (e.g. if jobs
|
||
of profile A and jobs of profile B currently need to run in a pool, allocate resources for profile A jobs first or vice
|
||
versa).
|
||
|
||
The resource pool cards are displayed on the top of the Resource Configuration settings page. Each card displays the
|
||
following information:
|
||
|
||
<div class="max-w-50">
|
||
|
||
data:image/s3,"s3://crabby-images/1ca1f/1ca1fd9c0e3dfc2f6dcdd63623bf45e36ddc62bd" alt="Resource pool card"
|
||
data:image/s3,"s3://crabby-images/8a549/8a549eda510f47f45c130fa54264148dcc21581e" alt="Resource pool card"
|
||
|
||
</div>
|
||
|
||
* Pool name
|
||
* Number of resources currently in use out of the total available resources
|
||
* Execution Priority - List of [linked profiles](#connecting-profiles-to-pools) in order of execution priority.
|
||
|
||
## Resource Profiles
|
||
Resource profiles represent the resource consumption requirements of jobs, such as the number of GPUs needed. They are
|
||
the interface that administrators use to provide users with access to the available resource pools based on their job
|
||
resource requirements via [Resource Policies](../resource_policies.md).
|
||
|
||
Administrators can control the resource pool allocation precedence within a profile (e.g. only run jobs on `pool B` if
|
||
`pool A` cannot currently satisfy the profile's resource requirements).
|
||
|
||
Administrators can control the queuing priority within a profile across resource policies making use of it (e.g. if the
|
||
R&D team and DevOps team both have pending jobs - run the R&D team's jobs first or vice versa).
|
||
|
||
The resource profile cards are displayed on the bottom of the Resource Configuration settings page. Each card displays
|
||
the following information:
|
||
|
||
<div class="max-w-50">
|
||
|
||
data:image/s3,"s3://crabby-images/47283/47283a469b7e4ca4ee119b5ab92bf7f22f4c4345" alt="Resource profile card"
|
||
|
||
</div>
|
||
|
||
<div class="max-w-50">
|
||
|
||
data:image/s3,"s3://crabby-images/8b753/8b75368279b6db47c913593a0beb8f98269ceaba" alt="Resource profile card"
|
||
|
||
</div>
|
||
|
||
* Profile name
|
||
* <img src="/docs/latest/icons/ico-resource-number.svg" alt="Number of resources" className="icon size-md space-sm" /> - Number
|
||
of resources allocated to jobs in this profile
|
||
* List of [pool links](#connecting-profiles-to-pools)
|
||
* <img src="/docs/latest/icons/ico-queued-jobs.svg" alt="Queued jobs" className="icon size-md space-sm" /> - Number of currently pending jobs
|
||
* <img src="/docs/latest/icons/ico-running-jobs.svg" alt="Running jobs" className="icon size-md space-sm" /> - Number of currently running jobs
|
||
* Number of resource policies. Click to open resource policy list and to order queuing priority.
|
||
|
||
## Example Workflow
|
||
|
||
You have GPUs spread across a local H100 and additional bare metal servers, as well as on AWS (managed
|
||
by an autoscaler). Assume that currently most of your resources are already assigned to jobs, and only 16 resources are available: 8 in the
|
||
H100 resource pool and 8 in the Bare Metal pool:
|
||
|
||
data:image/s3,"s3://crabby-images/74739/74739b2f5e2eed74276c3e5410a4b98679fece33" alt="Example resource pools"
|
||
data:image/s3,"s3://crabby-images/794ab/794ab719b897a96d96ee554fae541e02ef50a5bb" alt="Example resource pools"
|
||
|
||
Teams' jobs have varying resource requirements of 0.5, 2, 4, and 8 GPUs. Resource profiles are defined to reflect these:
|
||
|
||
data:image/s3,"s3://crabby-images/789aa/789aaf9a51f4f8cf45b0869b481034b3181efe3a" alt="Example resource profiles"
|
||
data:image/s3,"s3://crabby-images/4fae4/4fae4318e887a73d95e1945a6c6be5d5c39ae5af" alt="Example resource profiles"
|
||
|
||
The different jobs will be routed to different resource pools by connecting the profiles to the resource pools. Jobs
|
||
enqueued through the profiles will be run in the pools where there are available resources in order of their priority.
|
||
For example, the H100 pool will run jobs with the following precedence: 2 GPU jobs first, then 4 GPU ones, then 8 GPU,
|
||
and lastly 0.5 GPU.
|
||
|
||
data:image/s3,"s3://crabby-images/97786/97786bd65679e7539d042420d9a050c261063bc7" alt="Example profile priority"
|
||
data:image/s3,"s3://crabby-images/f03d4/f03d463103f3280e9040b9af6ece586cb9e00a6f" alt="Example profile priority"
|
||
|
||
Resource policies are implemented for two teams:
|
||
* Dev team
|
||
* Research Team
|
||
|
||
Each team has a resource policy configured with 8 reserved resources and a 16 resource limit. Both teams make use of the
|
||
4xGPU profile (i.e. each job running through this profile requires 4 resources).
|
||
|
||
data:image/s3,"s3://crabby-images/a3151/a3151854c341c28e03e951a34eafefb464e2147f" alt="Example resource policy"
|
||
data:image/s3,"s3://crabby-images/fd43f/fd43f003d44386464009c609ea55eaa64841c8d6" alt="Example resource policy"
|
||
|
||
The Dev team is prioritized over the Research team by placing it higher in the Resource Profile's Policies Priority list:
|
||
|
||
<div class="max-w-75">
|
||
|
||
data:image/s3,"s3://crabby-images/3effc/3effc3eeb45cb3b38202f8f9d1dbc17cbcc45445" alt="Example resource policy priority"
|
||
data:image/s3,"s3://crabby-images/2509d/2509d4b40a1b98d11fb3571d7382f40ae5a6c337" alt="Example resource policy priority"
|
||
|
||
</div>
|
||
|
||
Both the Dev team and the Research team enqueue four 4-resource jobs each: Dev team jobs will be allocated resources
|
||
first. The `4xGPU` resource profile is connected to two resource pools: `Bare Metal Low END GPUs` (with the
|
||
`4 GPU Low End` link) and `H100 Half a Superpod` (with the `4 GPU H100 link`).
|
||
|
||
data:image/s3,"s3://crabby-images/35910/35910aa58427995bb00cd3d4046b8529cb945fb0" alt="Example resource profile-pool connections"
|
||
data:image/s3,"s3://crabby-images/49292/4929221b2ab55705307f2196b04e8d7716c83fe0" alt="Example resource profile-pool connections"
|
||
|
||
Resources are assigned from the `Bare Metal` pool first (precedence set on the resource profile card):
|
||
|
||
<div class="max-w-50">
|
||
|
||
data:image/s3,"s3://crabby-images/60170/60170f39689f11e8c5e50ce7ab5f0378086e2736" alt="Example resource pool precedence"
|
||
data:image/s3,"s3://crabby-images/b77ba/b77ba021d2cbbe3cddf47dfdd12f090f606a3191" alt="Example resource pool precedence"
|
||
|
||
</div>
|
||
|
||
If the first pool cannot currently satisfy the profile’s resource requirements, resources are assigned from the next
|
||
listed pool. Let's look at the first pool in the image below. Notice that the pool has 8 available resources, therefore
|
||
it can run two 4-resource jobs.
|
||
|
||
<div class="max-w-50">
|
||
|
||
data:image/s3,"s3://crabby-images/da87d/da87d210179eac2d563cd476a82f830036239fbc" alt="Example resource pool card"
|
||
data:image/s3,"s3://crabby-images/10954/10954f6a317763285c6e645f0bb54a0e576b458f" alt="Example resource pool card"
|
||
|
||
</div>
|
||
|
||
Since the Bare Metal pool does not have any more available resources, additional jobs will be assigned resources from
|
||
the next pool that the Resource Profile is connected to. The H100 pool has 8 available resources. There are still 2 jobs
|
||
pending from the Dev team requiring 8 resources in total, and 4 jobs from the Research team requiring 16 resources in
|
||
total. In order to honor the Research team’s resource reservation, its first two jobs will be assigned the required 8
|
||
resources from the H100 pool.
|
||
|
||
All available resources having been assigned - 2 jobs of each team will remain pending until some of the currently
|
||
running jobs finish and resources become available.
|
||
|
||
## Applying Resource Configuration
|
||
Administrators can globally activate/deactivate resource policy management. To enable the currently provisioned
|
||
configuration, click on the `Enable resource management` toggle. Enabling resource management will service the policy
|
||
queues according to the provisioned resource profile and pool assignments. Disabling the resource management will stop
|
||
serving the policy queues. Tasks on these queues will remain pending until resource policy management is reenabled.
|
||
|
||
Administrators can add, edit, delete, and connect resource pools and profiles in the Resource Configuration settings
|
||
page.
|
||
|
||
To make any change (create, delete, or modify a component) to the resource configuration, follow the following steps:
|
||
1. Click **Open Editor** to go into Editing mode
|
||
1. After making the desired changes you have the following options:
|
||
* **Save** - Save the changes you made. These changes will not be applied until you click on Provision
|
||
* **Provision** - Apply the resource policy’s saved changes
|
||
* **Reset Configuration** - Set the editor to the currently provisioned values. This will delete any unprovisioned
|
||
changes (both saved and unsaved)
|
||
1. Click **Exit** to leave Editor mode. The page will show the provisioned configuration. Unprovisioned saved changes will
|
||
still be available in Editor mode.
|
||
|
||
### Resource Pool
|
||
|
||
**To create a resource pool:**
|
||
1. Click **+ Add Pool**
|
||
1. In the **Create Pool** modal, input:
|
||
* Name - The resource pool’s name. This will appear in the Pool’s information card in the Resource Configuration settings page
|
||
* Number of Resources - Number of resources available in this pool
|
||
* Description - Optional free form text for additional descriptive information
|
||
1. Click **Create**
|
||
|
||
**To modify a resource pool**
|
||
1. Click <img src="/docs/latest/icons/ico-bars-menu.svg" alt="Menu" className="icon size-md space-sm" /> on the relevant
|
||
resource pool card **>** click **Edit**
|
||
1. In the **Edit Pool** modal, change the pool’s name, number of resources, or description
|
||
1. Click **Save**
|
||
|
||
You can also change the Execution Priority of the [linked resource profiles](#connecting-profiles-to-pools). Click and
|
||
drag the profile connection anchor <img src="/docs/latest/icons/ico-resource-anchor.svg" alt="Resourch anchor" className="icon size-md space-sm" />
|
||
to change its position in the order of priority.
|
||
|
||
### Resource Profile
|
||
**To create a resource profile:**
|
||
1. Click **+ Add Profile**
|
||
1. In the **Create Profile** modal, input:
|
||
* Name - The resource profile’s name. This will appear in the profile’s information card in the Resource Configuration settings page
|
||
* Resource Allotment - Number of resources allocated to each job running in this profile
|
||
3. Click **Create**
|
||
|
||
**To modify a resource profile:**
|
||
1. Click <img src="/docs/latest/icons/ico-bars-menu.svg" alt="Menu" className="icon size-md space-sm" /> on the relevant
|
||
resource profile card > click **Edit**
|
||
1. In the **Edit Profile** modal, change the pool's name, number of resources, or description
|
||
1. Click **Save**
|
||
|
||
To control which pool's resources will be assigned first, click and drag the pool connection anchor <img src="/docs/latest/icons/ico-resource-anchor.svg" alt="connection anchor" className="icon size-md space-sm" />
|
||
to change its position in order of priority.
|
||
|
||
You can also change the Execution Priority of the resource policies making use of this profile. Open the policy list,
|
||
then click the policy anchor <img src="/docs/latest/icons/ico-drag-vertical.svg" alt="policy anchor" className="icon size-md space-sm" />
|
||
and drag the policy to change its position in order of priority.
|
||
|
||
**To delete a resource profile:**
|
||
1. Click <img src="/docs/latest/icons/ico-bars-menu.svg" alt="Menu" className="icon size-md space-sm" /> on the relevant resource pool card
|
||
1. Click Delete
|
||
|
||
### Connecting Profiles to Pools
|
||
Connect a resource profile to a resource pool to allow jobs assigned to the profile to make use of the pool’s resources.
|
||
|
||
**To connect a profile to a pool:**
|
||
1. Click **Open Editor**
|
||
1. Drag the <img src="/docs/latest/icons/ico-profile-link.svg" alt="Profile-pool link" className="icon size-md space-sm" />
|
||
of the relevant profile to the resource pool you want to connect the profile to. This opens the **Connect Profile** modal
|
||
1. In the **Connect Profile** modal, input a name for this connection. This connection name will appear on the profile
|
||
card
|
||
|
||
The settings page will show a line linking the profile and the pool cards. The linked profile appears on the pool card,
|
||
showing its place in the order of execution. To change the profile's priority placement, drag its connection anchor <img src="/docs/latest/icons/ico-resource-anchor.svg" alt="connection anchor" className="icon size-md space-sm" />
|
||
to a new position.
|
||
|
||
**To disconnect a profile from a pool:**
|
||
1. Click **Open Editor**
|
||
1. On the relevant profile card, hover over connection name and click `X`
|
||
|
||
Jobs assigned to this resource profile will no longer be able to utilize the pool’s resources.
|