CubeFS: A Cloud-Native Storage System Empowering Modern Applications Through Case Studies

Introduction

In the era of cloud-native computing, storage systems must evolve to meet the demands of scalability, performance, and flexibility. CubeFS emerges as a robust solution, designed to address the challenges of modern data-intensive applications. This article explores CubeFS’s architecture, core features, and real-world applications, highlighting its role in the CNCF ecosystem. By analyzing case studies, we demonstrate how CubeFS optimizes storage for AI, hybrid cloud environments, and high-throughput workloads.

Technical Overview

System Architecture

CubeFS is a cloud-native storage system built with modular components to ensure scalability and reliability. Its architecture includes:

Client Subsystem: Supports S3, ADFS, and POSIX protocols, enabling seamless integration with diverse applications.
Cache Subsystem: Optimized for high-throughput scenarios, it employs consistent hashing and segment-based data mapping (1MB segments) to accelerate access, with support for memory and disk storage.
Metadata Subsystem: Ensures strong consistency and scalability, featuring garbage collection and automated POSIX interface management.
Object Access Zone: Acts as a city proxy system, enabling access control based on capabilities.
Storage Subsystem: Combines multi-replica and erasure coding engines, with independent storage management, metadata tracking, and fault detection.

Core Features

CubeFS excels in several critical areas:

Multi-Protocol Support: Integrates S3, ADFS, and POSIX, allowing compatibility with a wide range of cloud-native tools.
High-Performance Storage Engines: Multi-replica and erasure coding ensure low latency and high throughput, critical for AI and big data workloads.
Strong Consistency: Achieved through Raft and Quorum mechanisms, ensuring data integrity across distributed nodes.
Distributed Caching: Combines public cloud and on-premises storage to reduce latency and costs, with intelligent data migration based on access patterns.
Elastic Caching: Adapts to business needs with dynamic replication, load balancing, and distance-aware optimization.

Case Studies

1. AI Storage Application

Challenge: AI training and inference require high throughput and low latency, while data preprocessing involves massive filtering and cleaning. Solution: CubeFS’s cache subsystem accelerates data retrieval using PyTorch storage plugins. Lifecycle management (LC Node) automates cold data migration to cost-effective storage tiers, while hybrid cloud integration reduces transmission costs.

2. Compute-Storage Separation

Challenge: Single-node storage limits capacity and performance, complicating data migration and balancing. Solution: CubeFS’s multi-replica model stores data across clusters, enhancing scalability and cost-efficiency. Regular consistency checks and automated fault recovery via tracking modules ensure reliability. Transitioning to shared storage simplifies operations and improves stability.

3. SDK Application

Challenge: Traditional Fuse tools hinder performance in high-throughput scenarios. Solution: CubeFS’s SDK operates in user mode, bypassing kernel limitations to achieve higher throughput and stability. It supports key-value storage (e.g., Radius, Rosb) and append-write applications, ideal for real-time data processing.

Future Outlook

CubeFS is continuously evolving to meet emerging demands:

Performance Optimization: Enhancing distributed caching for hybrid cloud architectures.
Hybrid Cloud Integration: Supporting S3 external storage to enable seamless data flow across environments.
Version Updates: The 3.1 release focuses on distributed caching, while 3.5.2 strengthens migration and stability. Future updates aim to reduce memory costs and support multi-cloud environments.

SDK Characteristics

High Performance: User-mode operation avoids kernel restrictions, ensuring stability and throughput.
Adaptability: Tailored for specialized use cases, such as append-write applications.
Scalability: Designed for large-scale key-value storage systems, supporting online services with high availability.

Application Scenarios

Compute Services: Stores war logs and SST files using tree-structured services.
Append-Write Applications: Ideal for CubeFS’s design, such as log-based systems.
K-V Storage Architecture: Scales for massive key-value storage, supporting services like Radius and Rosb.

Data Storage Requirements

High Stability: Ensures P99 latency below 1ms.
Optimization: Background processes refine performance, though specific details remain undisclosed.

Conclusion

CubeFS stands as a versatile cloud-native storage system, combining strong consistency, distributed caching, and hybrid cloud capabilities to address modern workloads. Its modular architecture and focus on performance make it a compelling choice for AI, big data, and hybrid cloud environments. By leveraging CubeFS’s features, organizations can achieve scalable, cost-effective storage solutions tailored to their specific needs.