mirror of https://github.com/clearml/clearml-docs synced 2025-02-24 21:14:37 +00:00

Noam Wasersprung 567af28632

Restructure docs for platform components and use case clarity (#1048 )

2025-02-23 17:33:55 +02:00

4.1 KiB

Raw Blame History

id	title	slug
overview	What is ClearML?	/

ClearML Documentation

Overview

Welcome to the documentation for ClearML, the end to end platform for streamlining AI development and deployment. ClearML consists of three essential layers:

Infrastructure Control Plane (Cloud/On-Prem Agnostic)
AI Development Center
GenAI App Engine

Each layer provides distinct functionality to ensure an efficient and scalable AI workflow from development to deployment.

Infrastructure Control Plane

The Infrastructure Control Plane serves as the foundation of the ClearML platform, offering compute resource provisioning and management, enabling administrators to make the compute available through GPUaaS capabilities and no-hassle configuration.
Utilizing the Infrastructure Control Plane, DevOps and IT teams can manage and optimize GPU resources to ensure high performance and cost efficiency.

Features

Resource Management: Automates the allocation and management of GPU resources.
Workload Autoscaling: Seamlessly scale GPU resources based on workload demands.
Monitoring and Logging: Provides comprehensive monitoring and logging for GPU utilization and performance.
Cost Optimization: Consolidate cloud and on-prem compute into a seamless GPUaaS offering
Deployment Flexibility: Easily run your workloads on both cloud and on-premises compute.

AI Development Center

The AI Development Center offers a robust environment for developing, training, and testing AI models. It is designed to be cloud and on-premises agnostic, providing flexibility in deployment.

Features

Integrated Development Environment: A comprehensive IDE for training, testing, and debugging AI models.
Model Training: Scalable and distributed model training and hyperparameter optimization.
Data Management: Tools for data preprocessing, management, and versioning.
Experiment Tracking: Track metrics, artifacts and log. manage versions, and compare results.
Workflow Automation: Build pipelines to formalize your workflow

GenAI App Engine

The GenAI App Engine is designed to deploy large language models (LLM) into GPU clusters and manage various AI workloads, including Retrieval-Augmented Generation (RAG) tasks. This layer also handles networking, authentication, and role-based access control (RBAC) for deployed services.

Features

LLM Deployment: Seamlessly deploy LLMs into GPU clusters.
RAG Workloads: Efficiently manage and execute RAG workloads.
Networking and Authentication: Deploy GenAI through secure, authenticated network endpoints
RBAC: Implement RBAC to control access to deployed services.

Getting Started

To begin using the ClearML, follow these steps:

Set Up Infrastructure Control Plane: Allocate and manage your GPU resources.
Develop AI Models: Use the AI Development Center to develop and train your models.
Deploy AI Models: Deploy your models using the GenAI App Engine.

For detailed instructions on each step, refer to the respective sections in this documentation.

Support

For feature requests or bug reports, see ClearML on GitHub.

If you have any questions, join the discussion on the ClearML Slack channel, or tag your questions on stackoverflow with the clearml tag.

Lastly, you can always find us at support@clearml.ai.

4.1 KiB Raw Blame History

ClearML Documentation

Overview

Infrastructure Control Plane

Features

AI Development Center

Features

GenAI App Engine

Features

Getting Started

Support

4.1 KiB

Raw Blame History