diff --git a/docs/hyperdatasets/webapp/webapp_datasets_versioning.md b/docs/hyperdatasets/webapp/webapp_datasets_versioning.md index c9745a67..7cec49c7 100644 --- a/docs/hyperdatasets/webapp/webapp_datasets_versioning.md +++ b/docs/hyperdatasets/webapp/webapp_datasets_versioning.md @@ -71,7 +71,7 @@ for column customization options. ![Frame browser list](../../img/hyperdatasets/frame_browser_list.png) -The dataset version's frames can be filtered by multiple criteria. The resulting frames can be exported as a JSON file. +The dataset version's frames can be filtered by multiple criteria. The resulting frames can be [exported as a JSON file](#exporting-frames). To view the details of a specific frame, click on its preview, which will open the [Frame Viewer](webapp_datasets_frames.md#frame-viewer). @@ -174,6 +174,20 @@ Lucene queries can also be used in ROI label filters and frame rules. +### Sorting Frames + +Sort the dataset version’s frames by any of the following attributes: +* ID +* Last update time +* Dimensions (height) +* Timestamp +* Context ID +* Metadata key - Click `+ Metadata Key` and select the desired key for sorting + +Click Sort order to toggle between ascending and descending sort orders. + +![Dataset frame sorting](../../img/hyperdatasets/dataset_frame_sorting.png) + ### Exporting Frames To export (download) the filtered frames as a JSON file, click Menu > **EXPORT FRAMES**. @@ -185,12 +199,51 @@ frame browser configuration settings. ![Frame browser config menu](../../img/hyperdatasets/frame_browser_menu.png) #### Grouping Previews -FrameGroups or SingleFrames can share the same `context_id` (URL). For example, users can set the same `context_id` -to multiple FrameGroups that represent frames in a single video. -Use the **Grouping** menu to select one of the following options: -* Split Preview - Show separate previews for each individual FrameGroup, regardless of shared context. -* Group by URL - Show a single preview for all FrameGroups with the same context +Use the **Grouping** menu to set how to display frames that share a common property: +* **Split Preview** - Show a separate preview for each individual FrameGroup +* **Group by URL** - Show a single preview for all FrameGroups with the same context ID. For example, users can set the +same `context_id` to multiple FrameGroups that represent frames in a single video. +* **Sample by Property** - Specify a frame or ROI property whose value to group frames by and set the number of frames +to preview for each group. For example, in the image below, frames are grouped by ROI labels. Each group displays six +samples of frames that contain an ROI with the same label. + +![Sample by property](../../img/hyperdatasets/dataset_sample_by_roi_property.png) + +**To sample by property:** +1. In the **Grouping** menu, click **Sample by Property** +1. In the **Sample by Property** modal, input the following: + * Select the Property type: + * ROI - Properties associated with the frame ROIs (e.g. ROI label names, IDs, confidence, etc.) + * Frame - Properties associated with the frames (e.g. update time, metadata, timestamp, etc.) + * Property name - Property whose value to group the frames by + * Sample size - Number of frames to preview for each group + * ROI match query (*For grouping by ROI property only*) - A Lucene query to filter which of a frame's ROIs + to use in grouping by their properties. For example, in a Hyper-Dataset where ROIs have object labels and type labels, + view a sample of frames with different types of the same object by grouping frames according to `label.keyword` + with a match query for the object of interest. + + ![Sample by Property modal](../../img/hyperdatasets/sample_by_property_modal.png) + + The image below shows a sample of 3 frames which have ROIs of each type (`pedestrian`, `rider`, `sitting`) of `person`. + + ![ROI Match Query](../../img/hyperdatasets/roi_match_query.png) + :::note Property N/A group + If there are frames which have no value for the grouped by property, a sample of them will be provided as a final + group. If you sample according to an ROI property, this group will NOT include frames that have no ROIS at all. + ::: +1. Click **Save** + +Once saved, whenever you select the **Sample by Property** option in the **Grouping** menu, the frame will be grouped +according to the previously configured setting. + +**To modify the grouping property:** +1. Hover over **Sample by Property** +1. Click Edit pencil +1. Modify the **Sample by Property** configuration +1. Click **Save** + + #### Preview Source When using multi-source FrameGroups, users can choose which of the FrameGroups' sources will be displayed as the preview. @@ -204,11 +257,34 @@ If a FrameGroup doesn't have the selected preview source, the preview displays t ## Statistics -The **Statistics** tab displays a dataset version's label usage stats. -* Dataset total count - number of annotations, annotated frames, and total frames -* Each label is listed along with the number of times it was used in the version -* The pie chart visualizes these stats. Hover over a chart slice and its associated label and usage - percentage will appear at the center of the chart. +The **Statistics** tab allows exploring frame and ROI property distribution across a Hyper-Dataset version: +1. Query the frames to include in the statistics calculations under **Filter by label**. Use [simple](#simple-frame-filtering) +or [advanced](#advanced-frame-filtering) frame filters. If no filter is applied, all frames in the dataset version will +be included in the calculation. +1. Select the property whose distribution should be calculated + * Select the property **Type**: + * **ROI** - Frame ROI properties (e.g. ROI label, ID, confidence, etc.). This will calculate the distribution of + the specified property across all ROIs in the version's frames. + * **Frame** - Frames properties (e.g. update time, metadata keys, timestamp, etc.) + * Input the **Property** key (e.g. `meta.location`) + * If **ROI** property was selected, you can also limit the scope of ROIs included in the calculation with the + **Count ROIs matching** filter: Input a Lucene query to specify which ROIs to count +1. Click **Apply** to calculate the statistics + +For example, calculating the distribution for the `label` ROI property, specifying `rois.confidence: 1` for ROI matching +will show the label distribution across only ROIs with a confidence level of 1. + +![Distribution by ROI property](../../img/hyperdatasets/dataset_version_statistics_roi.png) + +By default, the ROI label distribution across the entire Hyper-Dataset version is shown. +The tab displays the following information +* Object counts: + * Number of annotations matching specification + * Number of annotated frames in the current frame filter selection + * Total number of frames in the current frame filter selection +* Each property is listed along with its number of occurrences in the current frame filter selection +* The pie chart visualizes this distribution. Hover over a chart segment and its associated property and count will +appear in a tooltip and its usage percentage will appear at the center of the chart. ![Version label statistics](../../img/hyperdatasets/dataset_version_statistics.png) diff --git a/docs/img/hyperdatasets/dataset_frame_sorting.png b/docs/img/hyperdatasets/dataset_frame_sorting.png new file mode 100644 index 00000000..c7c10532 Binary files /dev/null and b/docs/img/hyperdatasets/dataset_frame_sorting.png differ diff --git a/docs/img/hyperdatasets/dataset_sample_by_roi_property.png b/docs/img/hyperdatasets/dataset_sample_by_roi_property.png new file mode 100644 index 00000000..54afb2dc Binary files /dev/null and b/docs/img/hyperdatasets/dataset_sample_by_roi_property.png differ diff --git a/docs/img/hyperdatasets/dataset_version_statistics.png b/docs/img/hyperdatasets/dataset_version_statistics.png index b0acdc3b..03db6c68 100644 Binary files a/docs/img/hyperdatasets/dataset_version_statistics.png and b/docs/img/hyperdatasets/dataset_version_statistics.png differ diff --git a/docs/img/hyperdatasets/dataset_version_statistics_roi.png b/docs/img/hyperdatasets/dataset_version_statistics_roi.png new file mode 100644 index 00000000..d4b985f9 Binary files /dev/null and b/docs/img/hyperdatasets/dataset_version_statistics_roi.png differ diff --git a/docs/img/hyperdatasets/roi_match_query.png b/docs/img/hyperdatasets/roi_match_query.png new file mode 100644 index 00000000..c46b96c6 Binary files /dev/null and b/docs/img/hyperdatasets/roi_match_query.png differ diff --git a/docs/img/hyperdatasets/sample_by_property_modal.png b/docs/img/hyperdatasets/sample_by_property_modal.png new file mode 100644 index 00000000..ac29f92f Binary files /dev/null and b/docs/img/hyperdatasets/sample_by_property_modal.png differ diff --git a/docs/img/resource_configuration.png b/docs/img/resource_configuration.png new file mode 100644 index 00000000..d968072e Binary files /dev/null and b/docs/img/resource_configuration.png differ diff --git a/docs/img/resource_configuration_pool_card.png b/docs/img/resource_configuration_pool_card.png new file mode 100644 index 00000000..8d6e89ca Binary files /dev/null and b/docs/img/resource_configuration_pool_card.png differ diff --git a/docs/img/resource_configuration_profile_card.png b/docs/img/resource_configuration_profile_card.png new file mode 100644 index 00000000..9e3a122e Binary files /dev/null and b/docs/img/resource_configuration_profile_card.png differ diff --git a/docs/img/resource_example_policy.png b/docs/img/resource_example_policy.png new file mode 100644 index 00000000..07ce56f2 Binary files /dev/null and b/docs/img/resource_example_policy.png differ diff --git a/docs/img/resource_example_policy_priority.png b/docs/img/resource_example_policy_priority.png new file mode 100644 index 00000000..6e237d02 Binary files /dev/null and b/docs/img/resource_example_policy_priority.png differ diff --git a/docs/img/resource_example_pool_card.png b/docs/img/resource_example_pool_card.png new file mode 100644 index 00000000..1051a1a3 Binary files /dev/null and b/docs/img/resource_example_pool_card.png differ diff --git a/docs/img/resource_example_pool_priority.png b/docs/img/resource_example_pool_priority.png new file mode 100644 index 00000000..7f8341dd Binary files /dev/null and b/docs/img/resource_example_pool_priority.png differ diff --git a/docs/img/resource_example_pools.png b/docs/img/resource_example_pools.png new file mode 100644 index 00000000..f1ddc396 Binary files /dev/null and b/docs/img/resource_example_pools.png differ diff --git a/docs/img/resource_example_profile.png b/docs/img/resource_example_profile.png new file mode 100644 index 00000000..374fc4f9 Binary files /dev/null and b/docs/img/resource_example_profile.png differ diff --git a/docs/img/resource_example_profile_pool_links.png b/docs/img/resource_example_profile_pool_links.png new file mode 100644 index 00000000..e2f13cf8 Binary files /dev/null and b/docs/img/resource_example_profile_pool_links.png differ diff --git a/docs/img/resource_example_profile_priority.png b/docs/img/resource_example_profile_priority.png new file mode 100644 index 00000000..1c8ee3b6 Binary files /dev/null and b/docs/img/resource_example_profile_priority.png differ diff --git a/docs/img/resource_policies_policy_card.png b/docs/img/resource_policies_policy_card.png new file mode 100644 index 00000000..2fdd511e Binary files /dev/null and b/docs/img/resource_policies_policy_card.png differ diff --git a/docs/img/resource_policies_profile_card_admin.png b/docs/img/resource_policies_profile_card_admin.png new file mode 100644 index 00000000..0062778e Binary files /dev/null and b/docs/img/resource_policies_profile_card_admin.png differ diff --git a/docs/img/resource_policies_profile_card_non_admin.png b/docs/img/resource_policies_profile_card_non_admin.png new file mode 100644 index 00000000..e38a9d62 Binary files /dev/null and b/docs/img/resource_policies_profile_card_non_admin.png differ diff --git a/docs/img/resource_policies_remove_profile.png b/docs/img/resource_policies_remove_profile.png new file mode 100644 index 00000000..e0ce43b2 Binary files /dev/null and b/docs/img/resource_policies_remove_profile.png differ diff --git a/docs/webapp/resource_policies.md b/docs/webapp/resource_policies.md new file mode 100644 index 00000000..3950fa1b --- /dev/null +++ b/docs/webapp/resource_policies.md @@ -0,0 +1,128 @@ +--- +title: Resource Policies +--- + +:::note ENTERPRISE FEATURE +This feature is available under the ClearML Enterprise plan +::: + + +Resource policies let administrators define user group resource quotas and reservations to enable workload prioritization +across available resources. + +Administrators make the allocated resources available to users through designated execution queues, each matching a +specific resource consumption profile (i.e. the amount of resources allocated to jobs run through the queue). + +Workspace administrators can use the resource policy manager to create, modify or delete resource policies: +Set resource reservation and limits for user groups + +* Connect resource profiles to a policy, making them available to its user group via ClearML queues +* Non-administrator users can see the resource policies currently applied to them. + +## Create a Policy + +**To create a policy:** +1. Click `+` +1. In the **Create Resource Policy** modal, fill in the following: + * Name - Resource policy name. This name will appear on the Policies list + * Reservation - The number of resources guaranteed to be available for the policy’s users + * Limit - The maximum amount of resources that jobs run through this policy’s queues can concurrently use. + * User Group - The [User groups](webapp_profile.md#user-groups) to which the policy applies + * Description - Optional free form text for additional descriptive information +1. Click **Add** + +Once the policy is defined, you can connect profiles to it (Resource profiles are defined in the [Resource Configuration](webapp_profile.md#resource-configuration) +settings page, available to administrators). Resource profiles serve as an interface for resource policies to provide +users with access to the available resource pools based on their job resource requirements (i.e. a job running through a +profile is allocated the profile’s defined amount of resources). + +**To connect a resource profile to a policy:** +1. In the policy’s details panel, click **Edit** +1. Click **Connect Profile** +1. In the **Connect Profile** modal, input the following information: + * Queue name - The name for the ClearML queue the policy’s users will use to enqueue jobs using this resource + profile. Jobs enqueued to this queue will be allocated the number of resources defined for its profile + * Profile - select the resource profile. +1. Click **Connect** + +:::note Available Profiles +Only profiles that are part of the currently provisioned [resource configuration](webapp_profile.md#resource-configuration) +are available for selection (Profiles that are part of a configuration that has been saved but not yet provisioned +will not appear in the list). + +Profiles whose resource requirement exceeds the policy's resource limit will appear in the list but are not available +for selection. +::: + +## Policy Details +The policy details panel displays: +* Policy quota and reservation +* Resource profiles associated with the policy +* Queues the policy makes available +* Number of current jobs in each profile (pending or running) + +The top card displays the policy information: +* Policy name +* Current usage - The number of resources currently in use (i.e. by currently running jobs) +* Reserved resources +* Resource limit +* User group that the policy applies to - click to show list of users in the group + +![Resource policy card](../img/resource_policies_policy_card.png) + +The cards below the policy card display the profiles that are connected to the policy: +* Resource profile name +* Number of resources - Number +of resources consumed by each job enqueued through this profile's queue +* Queued jobs - Currently queued jobs +* Running jobs - Currently running jobs + +![Resource profile card non-admin view](../img/resource_policies_profile_card_non_admin.png) + +Administrators can also see each resource profile’s resource pool links listed in order of routing priority. + +![Resource profile card admin view](../img/resource_policies_profile_card_admin.png) + +The arrow connecting the policy card with a profile card is labeled with the name of the queue the policy’s users should +use to run tasks through that resource profile. + +## Modify Policy + +To modify a resource policy, click **Edit** to open the details panel in editor mode. + +### To Modify Policy Parameters + +1. On the resource policy card, click Menu **> Edit** +1. In the Edit Resource Policy modal, you can modify the policy’s name, number of reserved resources, resource limit, +and description +1. Click **Save** + +### To Add a Resource Profile to a Policy +1. Click **Connect Profile** +1. In the **Connect Profile** modal, input the following information: + * Queue name - The name for the ClearML queue the policy’s users will use to enqueue jobs using this resource + profile. Jobs enqueued to this queue will be allocated the number of resources defined for its profile + * Profile - select the resource profile. Note that you will only be able to connect profiles that have not already + been connected to the policy +1. Click **Connect** + +### To Remove a Resource Profile + +**To remove a resource profile:** On the relevant resource profile box, click `X`. + +![Remove resource profile](../img/resource_policies_remove_profile.png) + +Removing a profile from a policy will also delete the queue which made this profile available to the policy’s users. +Any tasks enqueued on this queue will be set to `draft` status. + +Click **Exit** to close editor mode + +## Delete Policy + +**To delete a resource policy** +1. Click **Edit** to open the details panel in editor mode +1. On the resource policy box, click Menu +2. Click **Delete** + +Deleting a policy also deletes its queues (i.e. the queues to access the resource profiles). Additionally, any pending +tasks will be dequeued. \ No newline at end of file diff --git a/docs/webapp/webapp_exp_table.md b/docs/webapp/webapp_exp_table.md index 7bc189d6..60ea375b 100644 --- a/docs/webapp/webapp_exp_table.md +++ b/docs/webapp/webapp_exp_table.md @@ -29,6 +29,60 @@ The downloaded data consists of the currently displayed table columns. ![Experiment table](../img/webapp_experiment_table.png) +## Creating Experiments + +You can create experiments by: +* Running code instrumented with ClearML (see [Task Creation](../clearml_sdk/task_sdk.md#task-creation)) +* [Cloning an existing experiment](webapp_exp_reproducing.md) +* Through the UI interface: Input the experiment's details, including its source code and python requirements, and then +run it through a [ClearML Queue](../fundamentals/agents_and_queues.md#what-is-a-queue) or save it as a *draft*. + +To create an experiment through the UI interface: +1. Click `+ New Experiment` +1. In the `Create Experiment` modal, input the following information: + * **Code** + * Experiment name + * Git + * Repository URL + * Version specification - one of the following: + * Tag + * Branch + * Commit ID + * Execution Entry Point + * Working Directory + * One of the following + * Script name + * Module (see [python module specification](https://docs.python.org/3/using/cmdline.html#cmdoption-m)) + * Add `Task.init` call - If selected, [`Task.init()`](../references/sdk/task.md#taskinit) call is added to the + entry point. Select if it is not already called within your code + * **Arguments** (*optional*) - Add [hyperparameter](../fundamentals/hyperparameters.md) values. + * **Environment** (*optional*) - Set up the experiment’s python execution environment using either of the following + options: + * Use Poetry specification - Requires specifying a docker image for the experiment to be executed in. + * Manually specify the python environment configuration: + * Python binary - The python executable to use + * Preinstalled venv - A specific existing virtual environment to use. Requires specifying a docker image for the + experiment to be executed in. + * Python package specification: + * Skip - Assume system packages are available. Requires specifying a docker image for the experiment to be + executed in. + * Use an existing `requirements.txt` file + * Explicitly specify the required packages + * **Docker** (*optional*) - Specify Docker container configuration for executing the experiment + * Image - Docker image to use for running the experiment + * Arguments - Add Docker arguments as a single string + * Startup Script - Add a bash script to be executed inside the Docker before setting up the experiment's environment + * **Run** + * Queue - [ClearML Queue](../fundamentals/agents_and_queues.md#what-is-a-queue) where the experiment should be + enqueued for execution + * Output Destination - A URI where experiment outputs should be stored (ClearML file server by default). +1. Once you have input all the information, click one of the following options + * Save as Draft - Save the experiment as a new draft task. + * Run - Enqueue the experiment for execution in the queue specified in the **Run** tab + +Once you have completed the experiment creation wizard, the experiment will be saved in your current project (where +you clicked `+ New Experiment`). See what you can do with your experiment in [Experiment Actions](#experiment-actions). + ## Experiments Table Columns The experiments table default and customizable columns are described in the following table. diff --git a/docs/webapp/webapp_profile.md b/docs/webapp/webapp_profile.md index 13ac1eba..7d30db69 100644 --- a/docs/webapp/webapp_profile.md +++ b/docs/webapp/webapp_profile.md @@ -22,6 +22,8 @@ The Settings page consists of the following sections: * [Users & Groups](#users--groups) - Manage the users that have access to a workspace * [Access Rules](#access-rules) (ClearML Enterprise Server) - Manage per-resource access privileges * [Identity Providers](#identity-providers) (ClearML Enterprise Server) - Manage server identity providers + * [Resource Configuration](#resource-configuration) (ClearML Enterprise Server) - Define the available resources and the way in which they + will be allocated to different workloads * [Usage & Billing](#usage--billing) (ClearML Hosted Service) - View current usage information and billing details ## Profile @@ -39,13 +41,16 @@ The profile tab presents user information. Under **USER PREFERENCES**, users can set a few web UI options: * **Show Hidden Projects** - Show ClearML infrastructure projects alongside your own projects. Disabled by default. When enabled, these projects are labeled with Hidden project. -* **Don't show ClearML Examples** - Hide the preloaded ClearML example content (project, pipeline, dataset, etc.) -* **HiDPI browser scale override** - Adjust scaling on High-DPI monitors to improve the web UI experience. - Enabled by default. +* **Don't show ClearML examples** - Hide the preloaded ClearML example content (project, pipeline, dataset, etc.). +* **Disable HiDPI browser scale override** - ClearML dynamically sets the browser scaling factor for an optimal page layout. +Disable for default desktop scale. * **Don't show pro tips periodically** - Stop showing ClearML usage tips on login. Disabled by default. +* **Block running user's scripts in the browser** - Block any user and 3rd party scripts from running anywhere in the +WebApp. Note that if enabled, the WebApp will not display debug samples, [Hyper-Dataset frame previews](../hyperdatasets/previews.md), +and embedded resources in [reports](webapp_reports.md). * **Hide specific container arguments** - Specify which Docker environment variable values should be hidden in logs. When printed, the variable values are replaced with `********`. By default, `CLEARML_API_SECRET_KEY`, `CLEARML_AGENT_GIT_PASS`, -`AWS_SECRET_ACCESS_KEY`, and `AZURE_STORAGE_KEY` values are redacted. +`AWS_SECRET_ACCESS_KEY`, and `AZURE_STORAGE_KEY` values are redacted. To modify the hidden container argument list, click **Edit**. :::info Self-hosted ClearML Server The self-hosted ClearML Server has an additional option to enable sharing anonymous telemetry data with the ClearML @@ -574,6 +579,206 @@ Hover over a connection in the table to **Edit** or **Delete** it. ![Identity provider chart](../img/settings_identity_chart.png) +## Resource Configuration + +Administrators can define [Resource Policies](../webapp/resource_policies.md) to implement resource quotas and +reservations for different user groups to prioritize workload usage across available resources. + +Under the **Resource Configuration** section, administrators define the available resources and the way in which they +will be allocated to different workloads. + +![Resource configuration page](../img/resource_configuration.png) + +The Resource Configuration settings page shows the [currently provisioned](#applying-resource-configuration) configuration: +the defined resource pools, resource profiles, and the resource allocation architecture. + +### Resource Pools +A resource pool is an aggregation of resources available for use, such as a Kubernetes cluster or a GPU superpod. +Administrators specify the total number of resources available in each pool. The resource policy manager ensures +workload assignment up to the available number of resources. + +Administrators control the execution priority within a pool across the resource profiles making use of it (e.g. if jobs +of profile A and jobs of profile B currently need to run in a pool, allocate resources for profile A jobs first or vice +versa). + +The resource pool cards are displayed on the top of the Resource Configuration settings page. Each card displays the +following information: + +![Resource pool card](../img/resource_configuration_pool_card.png) + +* Pool name +* Number of resources currently in use out of the total available resources +* Execution Priority - List of [linked profiles](#connecting-profiles-to-pools) in order of execution priority. + +### Resource Profiles +Resource profiles represent the resource consumption requirements of jobs, such as the number of GPUs needed. They are +the interface that administrators use to provide users with access to the available resource pools based on their job +resource requirements via [Resource Policies](../webapp/resource_policies.md). + +Administrators can control the resource pool allocation precedence within a profile (e.g. only run jobs on `pool B` if +`pool A` cannot currently satisfy the profile's resource requirements). + +Administrators can control the queuing priority within a profile across resource policies making use of it (e.g. if the +R&D team and DevOps team both have pending jobs - run the R&D team's jobs first or vice versa). + +The resource profile cards are displayed on the bottom of the Resource Configuration settings page. Each card displays +the following information: + +![Resource profile card](../img/resource_configuration_profile_card.png) + +* Profile name +* Number of resources - Number +of resources allocated to jobs in this profile +* List of [pool links](#connecting-profiles-to-pools) +* Queued jobs - Number of currently pending jobs +* Running jobs - Number of currently running jobs +* Number of resource policies. Click to open resource policy list and to order queuing priority. + +### Example Workflow + +You have GPUs spread across a local H100 and additional bare metal servers, as well as on AWS (managed +by an autoscaler). Assume that currently most of your resources are already assigned to jobs, and only 16 resources are available: 8 in the +H100 resource pool and 8 in the Bare Metal pool: + +![Example resource pools](../img/resource_example_pools.png) + +Teams' jobs have varying resource requirements of 0.5, 2, 4, and 8 GPUs. Resource profiles are defined to reflect these: + +![Example resource profiles](../img/resource_example_profile.png) + +The different jobs will be routed to different resource pools by connecting the profiles to the resource pools. Jobs +enqueued through the profiles will be run in the pools where there are available resources in order of their priority. +For example, the H100 pool will run jobs with the following precedence: 2 GPU jobs first, then 4GPU ones, then 8 GPU, +and lastly 0.5 GPU. + +![Example profile priority](../img/resource_example_profile_priority.png) + +Resource policies are implemented for two teams: +* Dev team +* Research Team + +Each team has a resource policy configured with 8 reserved resources and a 16 resource limit. Both teams make use of the +4xGPU profile (i.e. each job running through this profile requires 4 resources). + +![Example resource policy](../img/resource_example_policy.png) + +The Dev team is prioritized over the Research team by placing it higher in the Resource Profile's Policies Priority list: + +![Example resource policy priority](../img/resource_example_policy_priority.png) + +Both the Dev team and the Research team enqueue four 4-resource jobs each: Dev team jobs will be allocated resources +first. The `4xGPU` resource profile is connected to two resource pools: `Bare Metal Low END GPUs` (with the +`4 GPU Low End` link) and `H100 Half a Superpod` (with the `4 GPU H100 link`). + +![Example resource profile-pool connections](../img/resource_example_profile_pool_links.png) + +Resources are assigned from the `Bare Metal` pool first (precedence set on the resource profile card): + +![Example resource pool precedence](../img/resource_example_pool_priority.png) + +If the first pool cannot currently satisfy the profile’s resource requirements, resources are assigned from the next +listed pool. Let's look at the first pool in the image below. Notice that the pool has 8 available resources, therefore +it can run two 4-resource jobs. + +
+ +![Example resource pool card](../img/resource_example_pool_card.png) + +
+ +Since the Bare Metal pool does not have any more available resources, additional jobs will be assigned resources from +the next pool that the Resource Profile is connected to. The H100 pool has 8 available resources. There are still 2 jobs +pending from the Dev team requiring 8 resources in total, and 4 jobs from the Research team requiring 16 resources in +total. In order to honor the Research team’s resource reservation, its first two jobs will be assigned the required 8 +resources from the H100 pool. + +All available resources having been assigned - 2 jobs of each team will remain pending until some of the currently +running jobs finish and resources become available. + +### Applying Resource Configuration +Administrators can globally activate/deactivate resource policy management. To enable the currently provisioned +configuration, click on the `Enable resource management` toggle. Enabling resource management will service the policy +queues according to the provisioned resource profile and pool assignments. Disabling the resource management will stop +serving the policy queues. Tasks on these queues will remain pending until resource policy management is reenabled. + +Administrators can add, edit, delete, and connect resource pools and profiles in the Resource Configuration settings +page. + +To make any change (create, delete, or modify a component) to the resource configuration, follow the following steps: +1. Click **Open Editor** to go into Editing mode +1. After making the desired changes you have the following options: + * **Save** - Save the changes you made. These changes will not be applied until you click on Provision + * **Provision** - Apply the resource policy’s saved changes + * **Reset Configuration** - Set the editor to the currently provisioned values. This will delete any unprovisioned + changes (both saved and unsaved) +1. Click **Exit** to leave Editor mode. The page will show the provisioned configuration. Unprovisioned saved changes will +still be available in Editor mode. + +#### Resource Pool + +**To create a resource pool:** +1. Click **+ Add Pool** +1. In the **Create Pool** modal, input: + * Name - The resource pool’s name. This will appear in the Pool’s information card in the Resource Configuration settings page + * Number of Resources - Number of resources available in this pool + * Description - Optional free form text for additional descriptive information +1. Click **Create** + +**To modify a resource pool** +1. Click Menu on the relevant +resource pool card **>** click **Edit** +1. In the **Edit Pool** modal, change the pool’s name, number of resources, or description +1. Click **Save** + +You can also change the Execution Priority of the [linked resource profiles](#connecting-profiles-to-pools). Click and +drag the profile connection anchor Resourch anchor +to change its position in the order of priority. + +#### Resource Profile +**To create a resource profile:** +1. Click **+ Add Profile** +1. In the **Create Profile** modal, input: + * Name - The resource profile’s name. This will appear in the profile’s information card in the Resource Configuration settings page + * Resource Allotment - Number of resources allocated to each job running in this profile +3. Click **Create** + +**To modify a resource profile:** +1. Click Menu on the relevant +resource profile card > click **Edit** +1. In the **Edit Profile** modal, change the pool's name, number of resources, or description +1. Click **Save** + +To control which pool's resources will be assigned first, click and drag the pool connection anchor connection anchor +to change its position in order of priority. + +You can also change the Execution Priority of the resource policies making use of this profile. Open the policy list, +then click the policy anchor policy anchor +and drag the policy to change its position in order of priority. + +**To delete a resource profile:** +1. Click Menu on the relevant resource pool card +1. Click Delete + +#### Connecting Profiles to Pools +Connect a resource profile to a resource pool to allow jobs assigned to the profile to make use of the pool’s resources. + +**To connect a profile to a pool:** +1. Click **Open Editor** +1. Drag the Profile-pool link +of the relevant profile to the resource pool you want to connect the profile to. This opens the **Connect Profile** modal +1. In the **Connect Profile** modal, input a name for this connection. This connection name will appear on the profile +card + +The settings page will show a line linking the profile and the pool cards. The linked profile appears on the pool card, +showing its place in the order of execution. To change the profile's priority placement, drag its connection anchor connection anchor +to a new position. + +**To disconnect a profile from a pool:** +1. Click **Open Editor** +1. On the relevant profile card, hover over connection name and click `X` + +Jobs assigned to this resource profile will no longer be able to utilize the pool’s resources. + ## Usage & Billing The **USAGE & BILLING** section displays your ClearML workspace usage information including: diff --git a/docs/webapp/webapp_workers_queues.md b/docs/webapp/webapp_workers_queues.md index f9dbd0f5..d2b5a906 100644 --- a/docs/webapp/webapp_workers_queues.md +++ b/docs/webapp/webapp_workers_queues.md @@ -15,6 +15,8 @@ consumption as needed–-with no code (available under the ClearML Pro plan) * Monitor queue utilization * Reorder, move, and remove experiments from queues * Monitor all of your available and in-use compute resources (available in the ClearML Enterprise plan. See [Orchestration Dashboard](webapp_orchestration_dash.md)) +* Set user group resource quotas and reservations to enable workload prioritization across available resources (available +in the ClearML Enterprise plan. See [Resource Policies](resource_policies.md)) ## Autoscalers diff --git a/sidebars.js b/sidebars.js index 79a7c37b..5add46d1 100644 --- a/sidebars.js +++ b/sidebars.js @@ -130,8 +130,13 @@ module.exports = { ] }, 'webapp/webapp_reports', - 'webapp/webapp_workers_queues', - 'webapp/webapp_orchestration_dash', + { + 'Orchestration': [ + 'webapp/webapp_workers_queues', + 'webapp/webapp_orchestration_dash', + 'webapp/resource_policies' + ] + }, { 'ClearML Applications': [ 'webapp/applications/apps_overview', diff --git a/static/icons/ico-drag-vertical.svg b/static/icons/ico-drag-vertical.svg new file mode 100644 index 00000000..6342bd02 --- /dev/null +++ b/static/icons/ico-drag-vertical.svg @@ -0,0 +1,3 @@ + + + \ No newline at end of file diff --git a/static/icons/ico-profile-link.svg b/static/icons/ico-profile-link.svg new file mode 100644 index 00000000..3967480c --- /dev/null +++ b/static/icons/ico-profile-link.svg @@ -0,0 +1,4 @@ + + + + \ No newline at end of file diff --git a/static/icons/ico-queued-jobs.svg b/static/icons/ico-queued-jobs.svg new file mode 100644 index 00000000..db2ac1b4 --- /dev/null +++ b/static/icons/ico-queued-jobs.svg @@ -0,0 +1,3 @@ + + + diff --git a/static/icons/ico-resource-anchor.svg b/static/icons/ico-resource-anchor.svg new file mode 100644 index 00000000..c3353b1a --- /dev/null +++ b/static/icons/ico-resource-anchor.svg @@ -0,0 +1,3 @@ + + + \ No newline at end of file diff --git a/static/icons/ico-resource-number.svg b/static/icons/ico-resource-number.svg new file mode 100644 index 00000000..2ca7bedb --- /dev/null +++ b/static/icons/ico-resource-number.svg @@ -0,0 +1,3 @@ + + + \ No newline at end of file diff --git a/static/icons/ico-running-jobs.svg b/static/icons/ico-running-jobs.svg new file mode 100644 index 00000000..9e897a71 --- /dev/null +++ b/static/icons/ico-running-jobs.svg @@ -0,0 +1,4 @@ + + + + \ No newline at end of file diff --git a/static/icons/ico-sort.svg b/static/icons/ico-sort.svg new file mode 100644 index 00000000..4605c2ce --- /dev/null +++ b/static/icons/ico-sort.svg @@ -0,0 +1,3 @@ + + + \ No newline at end of file