As AI workloads grow exponentially, the demand for efficient GPU utilization and flexible computing power management has become critical. Traditional Kubernetes lacks native support for GPU sharing and heterogeneous AI chip management, leading to suboptimal resource utilization and complex scheduling workflows. This article explores the challenges of managing AI chips in Kubernetes and introduces Hami, a CNCF sandbox project designed to address these limitations through advanced virtualization and scheduling capabilities.
Traditional Kubernetes does not support GPU sharing, resulting in underutilized resources. For example, a 5G GPU may only be used for 2G, leaving significant capacity unused. This forces tasks to compete for exclusive GPU access, causing CPU-bound tasks to queue and reducing overall system efficiency.
The Chinese market features multiple AI chip vendors (e.g., non-Nvidia alternatives), requiring custom scheduler extenders for each. This leads to fragmented scheduling logic, increased complexity, and reduced performance.
The Dynamic Resource Allocation (DIA) scheme requires Kubernetes 1.32 and is currently in development by Nvidia. It necessitates manual configuration of resource claims and classes, limiting its practicality for widespread adoption.
Hami is a lightweight, non-intrusive AI chip virtualization middleware under the CNCF sandbox. It enables unified management of multi-vendor devices and resource sharing, addressing the aforementioned challenges.
Hami supports dynamic GPU/chip resource sharing, boosting utilization to near 100%. Tasks can specify required GPU memory and memory limits without modifying container configurations. For instance, two tasks requiring 2G GPU memory can share a 2G resource, with the remaining 22G available for other tasks.
Ensures compatibility with CUDA versions (>10.2) and Nvidia drivers (>440). Supports device type specification (e.g., A100) and black/white list management for fine-grained control.
Tasks can be prioritized via environment variables (0=high, 1=low). High-priority tasks preempt low-priority ones, pausing the latter until the former completes.
Automatically identifies optimal MIG configurations (e.g., 1G/10B) without manual instance name specification. Supports Nvidia, Huawei Ascend, and other chip architectures.
Minimizes communication costs between GPUs by leveraging network topology data. Allocates GPUs closest to the task nodes, enhancing performance for distributed training.
Hami addresses critical challenges in Kubernetes-based AI chip management by enabling efficient resource sharing, flexible scheduling, and heterogeneous device support. Its lightweight architecture and integration with CNCF tools make it a promising solution for optimizing computing power in AI workloads. By adopting Hami, organizations can achieve higher GPU utilization, reduce operational complexity, and scale their AI infrastructure effectively.