clearml-docs/scaling_resources.md at cdbde18610c34458e873dc50a80b6040a32c4e5b

JavaScript promise rejection: Loading CSS chunk katex failed. (error: https://git.softuniq.eu/assets/css/katex.fb6ef55c.css). Open browser console to see more details.

ClearML/clearml-docs

Fork 0

mirror of https://github.com/clearml/clearml-docs synced 2025-05-03 04:22:14 +00:00

revital cdbde18610 Add scaling resources

2025-03-26 13:47:41 +02:00

2.5 KiB

Raw Blame History

title
Autoscaling Resources

Autoscaling allows organizations to dynamically manage compute resources based on demand, optimizing efficiency and cost.

When running machine learning experiments or large-scale compute tasks, demand for resources fluctuates. Autoscaling ensures that:

Resources are available when needed, preventing delays in task execution.
Idle resources are automatically spun down, reducing unnecessary costs.
Workloads can be distributed efficiently.

ClearML offers the following resource autoscaling solutions:

Built-in GUI applications - Built-in applications to autoscale, no code required (available under the Pro and Enterprise plans)
- AWS Autoscaler
- GCP Autoscaler
Kubernetes autoscaling
Custom autoscaler implementation using the AutoScaler class

GUI Autoscaler Applications

For users on Pro and Enterprise plans, ClearML provides a UI applications to configure autoscaling for cloud resources. These applications include:

AWS Autoscaler: Automatically provisions and shuts down AWS EC2 instances based on workload demand.
GCP Autoscaler: Manages Google Cloud instances dynamically according to defined budgets.

These applications allow users to set up autoscaling with minimal configuration, defining compute budgets and resource limits directly through the UI.

Kubernetes Autoscaling

ClearML integrates with Kubernetes, allowing agents to be deployed within a cluster. Kubernetes handles:

Automatic pod creation for executing tasks.
Resource allocation and scaling based on workload.
Optional integration with Kubernetes' Cluster Autoscaler, which adjusts the number of nodes dynamically.

This is particularly useful for organizations already using Kubernetes for workload orchestration.

Custom Autoscaler Implementation

Users can build their own autoscaler using the clearml.automation.auto_scaler.AutoScaler class which enables:

Direct control over instance scaling logic.
Custom rules for resource allocation.
Budget-conscious decision-making based on predefined policies.

This method requires some scripting.

as well

demonstrates how to use the clearml.automation.auto_scaler module to implement a service that optimizes AWS EC2 instance scaling according to a defined instance budget

2.5 KiB Raw Blame History

GUI Autoscaler Applications

Kubernetes Autoscaling

Custom Autoscaler Implementation

2.5 KiB

Raw Blame History