Efficiently Managing AI Chips in Kubernetes: A Deep Dive into Hami Architecture

Introduction

As AI workloads grow exponentially, the demand for efficient GPU utilization and flexible computing power management has become critical. Traditional Kubernetes lacks native support for GPU sharing and heterogeneous AI chip management, leading to suboptimal resource utilization and complex scheduling workflows. This article explores the challenges of managing AI chips in Kubernetes and introduces Hami, a CNCF sandbox project designed to address these limitations through advanced virtualization and scheduling capabilities.

Key Challenges in AI Chip Management

1. GPU Sharing Limitations

Traditional Kubernetes does not support GPU sharing, resulting in underutilized resources. For example, a 5G GPU may only be used for 2G, leaving significant capacity unused. This forces tasks to compete for exclusive GPU access, causing CPU-bound tasks to queue and reducing overall system efficiency.

2. Heterogeneous Cluster Management

The Chinese market features multiple AI chip vendors (e.g., non-Nvidia alternatives), requiring custom scheduler extenders for each. This leads to fragmented scheduling logic, increased complexity, and reduced performance.

3. DIA Scheme Limitations

The Dynamic Resource Allocation (DIA) scheme requires Kubernetes 1.32 and is currently in development by Nvidia. It necessitates manual configuration of resource claims and classes, limiting its practicality for widespread adoption.

Hami Architecture: A Unified Solution

Hami is a lightweight, non-intrusive AI chip virtualization middleware under the CNCF sandbox. It enables unified management of multi-vendor devices and resource sharing, addressing the aforementioned challenges.

Core Components

Mutating Webhook: Processes resource requests dynamically.
Scheduler Extender: Implements custom scheduling logic for heterogeneous workloads.
Device Plugins: Provides driver support for various AI chips (e.g., Nvidia, Huawei Ascend).
Container Resource Control: Uses the Humor injection library to hijack CUDA runtime, enabling container-level resource limits.

Key Features and Functionalities

1. Device Sharing

Hami supports dynamic GPU/chip resource sharing, boosting utilization to near 100%. Tasks can specify required GPU memory and memory limits without modifying container configurations. For instance, two tasks requiring 2G GPU memory can share a 2G resource, with the remaining 22G available for other tasks.

2. Resource Control Mechanism

Ensures compatibility with CUDA versions (>10.2) and Nvidia drivers (>440). Supports device type specification (e.g., A100) and black/white list management for fine-grained control.

3. Task Priority

Tasks can be prioritized via environment variables (0=high, 1=low). High-priority tasks preempt low-priority ones, pausing the latter until the former completes.

4. Dynamic MIG (Multi-Instance GPU)

Automatically identifies optimal MIG configurations (e.g., 1G/10B) without manual instance name specification. Supports Nvidia, Huawei Ascend, and other chip architectures.

5. Topology-Aware Scheduling

Minimizes communication costs between GPUs by leveraging network topology data. Allocates GPUs closest to the task nodes, enhancing performance for distributed training.

6. Scheduling Strategies

Beam Pack: Prioritizes existing GPU allocations to reduce fragmentation.
Spread: Distributes tasks across unused GPUs to maximize performance.

Monitoring and Integration

DCGM Exporter: Monitors GPU memory allocation, remaining resources, and workload metrics.
Volcano Integration: Supports Volcano VGPO and coordinator for GPU sharing mechanisms.
Grafana Visualization: Integrates DCGM data for real-time GPU resource monitoring.

Application Scenarios

Production Environments: Cloud providers like UCloud and enterprises (e.g., PN Security, banks) leverage Hami to optimize GPU utilization.
Heterogeneous Chip Management: Unified management of Nvidia, Huawei Ascend, and other chip types simplifies operations in multi-vendor clusters.

Future Roadmap

Expand Vendor Support: Add compatibility with Quon, MD Inter, AWS, and other chip architectures.
Web UI Development: Simplify user interaction with a web interface, slated for CNCF incubation by year-end.
DIA Integration: Explore compatibility with Dynamic Resource Allocation (DIA) for enhanced resource management.
Ecosystem Collaboration: Partner with vendors like Docloud and Silicon Cloud to integrate Hami into their products.

Conclusion

Hami addresses critical challenges in Kubernetes-based AI chip management by enabling efficient resource sharing, flexible scheduling, and heterogeneous device support. Its lightweight architecture and integration with CNCF tools make it a promising solution for optimizing computing power in AI workloads. By adopting Hami, organizations can achieve higher GPU utilization, reduce operational complexity, and scale their AI infrastructure effectively.