Event Processing on the Edge at Scale Using Kubernetes Native Tech

Introduction

In the era of distributed systems and real-time data processing, edge computing has emerged as a critical paradigm for handling data closer to its source. Intuit, a leading global fintech company, has pioneered innovations in this space by leveraging Kubernetes-native technologies to address the challenges of large-scale edge event processing. This article explores how Intuit’s open-source project, NemoFlow (formerly Nemo), enables efficient, scalable, and language-agnostic event processing at the edge, powered by Kubernetes and CNCF projects like Argo.

Challenges in Edge Event Processing

Edge environments present unique challenges that traditional cloud-native solutions struggle to address:

Diverse Event Sources: Edge devices generate data from heterogeneous sensors and hardware, requiring flexible integration strategies.
Real-Time Constraints: Cloud frameworks like Apache Flink or Spark Streams are resource-intensive, making them unsuitable for edge devices with limited CPU and memory.
Language and Resource Limitations: JVM-based applications (e.g., Java) are common in cloud environments but are often impractical for edge deployment due to high memory overhead.

Solution: NemoFlow

NemoFlow is a Kubernetes-native event processing framework designed to overcome these challenges. Its key design principles include:

Kubernetes First: Built as a Kubernetes operator, it abstracts infrastructure management, allowing developers to focus solely on logic implementation.
Language Agnosticism: Supports Java, Python, Golang, and Rust via SDKs, enabling cross-platform processing.
Serverless-Like Abstraction: Auto-scaling capabilities ensure resources are dynamically allocated based on workload, reducing costs during idle periods.

Core Features

Decoupled Architecture: Separates data sources, processing logic, and sinks, enabling modular plugin-based workflows.
Lightweight Deployment: Runs on any Kubernetes cluster (e.g., EKS, GKE, K3S), ensuring compatibility with edge and on-prem environments.
Event-Driven Pipelines: Supports complex stream processing with features like windowed aggregations (fixed, sliding, session windows) and custom functions.

Architecture and Abstraction Concepts

Monovex: Simple Event Processing

Monovex abstracts event processing workflows using Kubernetes Custom Resource Definitions (CRDs). A typical workflow includes:

Source Container: Reads data from Kafka or custom endpoints.
Transformer Container: Parses and transforms data using language-agnostic SDKs.
Sync Container: Forwards processed data to downstream systems via Unix Domain Sockets.

Pipeline: Complex Stream Processing

Pipelines are composed of independent Vertex components, each responsible for a specific processing stage. Key capabilities include:

Data Aggregation: Map/Reduce operations, grouping, and windowed aggregations.
Custom Processing: User-defined sources, syncs, and functions (UDS/UDS/UDF) for tailored logic.

Real-World Applications

Demo Environment

A demonstration pipeline runs on a K3S cluster, processing data through four stages:

Source: Simulates data ingestion from an HTTP endpoint.
Map UDF: Classifies data as even or odd.
Aggregation Vertex: Computes sums every 5 minutes using sliding windows.
Sync: Writes results to Kafka.

Performance metrics show:

Simple event processing achieves 4,000 events per second with 11 nodes.
Complex pipelines support machine learning inference and multi-language processing.

Community Use Cases

BCube: Processes edge signals without network connectivity using NemoFlow’s offline capabilities.
Boomer Groups: Monitors device temperatures and health metrics in real-time.
NT (Japan Telecom): Accelerates AR machine learning and network optimization.

Summary

NemoFlow addresses edge event processing challenges by combining Kubernetes-native scalability, language flexibility, and lightweight deployment. Its key advantages include:

Kubernetes Integration: Seamless orchestration with minimal infrastructure overhead.
Language Agnosticism: Enables cross-platform development with SDKs for multiple languages.
Resource Efficiency: Low memory and CPU usage, ideal for constrained edge environments.

Ideal use cases include edge automation (e.g., retail inventory), predictive maintenance (e.g., vibration analysis), and smart infrastructure (e.g., traffic control). By leveraging Kubernetes and CNCF projects like Argo, NemoFlow provides a robust foundation for scalable edge computing.