Introduction
Progressive delivery has emerged as a critical practice for modern software systems, enabling teams to release changes with reduced risk and increased reliability. At EarnIn, a financial technology company with 4 million users and 40 billion API requests per month, scaling progressive delivery required a robust infrastructure to manage complex microservices and frequent deployments. This article explores how EarnIn leveraged Linkerd, Gateway API, and Argo Rollouts to implement advanced deployment strategies, including Canary and Blue Green releases, within the CNCF ecosystem.
Technical Implementation
Progressive Delivery Overview
Progressive delivery involves releasing changes in controlled stages, such as Canary or Blue Green deployments, to minimize risk. Key components include:
- Canary Releases: Gradually routing traffic to new versions while monitoring performance.
- Blue Green Deployments: Switching traffic between identical environments to ensure zero-downtime updates.
- Observability: Real-time monitoring of metrics like latency and error rates.
Linkerd and Gateway API Integration
Linkerd, a service mesh, provides fine-grained traffic control and built-in observability. By adopting Gateway API (a CNCF standard), EarnIn standardized load balancing and traffic management for both north-south and east-west flows. This integration enabled:
- Traffic Splitting: Configuring Canary ratios (e.g., 20% new version, 80% stable).
- Dynamic Routing: Using HTTP/GRPC routes defined in Helm charts to direct traffic based on predefined policies.
- Zero-Downtime Migration: Gradually shifting traffic from legacy Deployments to Argo Rollouts without disrupting external URLs.
Argo Rollouts for Canary and Blue Green
Argo Rollouts, a CNCF project, provides automated deployment strategies. EarnIn implemented:
- Canary Analysis: Custom thresholds (e.g., error rate >10%) to trigger rollbacks.
- Matrix Provider: Automated evaluation of deployment health during rollouts.
- Rollout Phases: Three-stage migration from Deployments to Rollouts, ensuring backward compatibility with existing Service Profiles.
Challenges and Solutions
Helm Configuration Abstraction
To maintain long-term configurability, EarnIn abstracted Helm charts to allow developers to customize:
- Traffic Ratios: Customizable Canary steps (e.g., 10%/50%/90% instead of default 20/40/60/80).
- Deployment Policies: Defining rollback triggers based on latency or error thresholds.
Gateway API CRD Integration
Migrating from Deployment to Rollout required aligning HTTP/GRPC routes with Gateway API CRDs. Key steps included:
- Parallel running of Deployments and Rollouts during the transition.
- Gradual replacement of Deployment configurations with Rollout-based Helm charts.
- Ensuring Service.Cluster.Local URLs remained unchanged during traffic shifts.
Zero-Downtime Migration
EarnIn executed a phased rollout:
- Referencing existing Deployments in Rollout configurations.
- Publishing two Helm chart patches: one for parallel execution, another for full Rollout adoption.
- Validating Argo Rollout Controller behavior during the transition to avoid disruptions.
Self-Service Tooling
Argo CD UI Plugin
A custom plugin enhanced Argo CD with:
- Visual Deployment Progress: Real-time tracking of traffic ratios and Canary evaluations.
- Direct Configuration: Developers could adjust Canary parameters directly within the Argo CD interface.
Internal Platform Integration
Automated workflows synchronized deployment changes with GitOps repositories and provided feedback to internal platforms, reducing manual intervention and improving developer autonomy.
Deployment and Testing
Demo Workflow
EarnIn tested Canary deployments in a sandbox environment, simulating:
- 20% traffic to Canary Replica Sets.
- 80% traffic to Stable Replica Sets.
- Three staged rollbacks based on predefined metrics (e.g., latency >10s, error rate >10%).
Future Roadmap
- Canary Adoption: Targeting 90% Canary usage for core services.
- Infrastructure Expansion: Extending progressive delivery to non-Web services (e.g., databases).
- Monitoring Optimization: Enhancing rollback mechanisms and observability tools.
Technical Architecture
Core Components
- Argo CD: GitOps pipeline for declarative configuration management.
- Argo Rollouts: Controller for Canary and Blue Green deployments.
- Linkerd: Service mesh for traffic control and observability.
- Gateway API: Standardized API for load balancing and traffic routing.
Traffic Control Flow
- HTTP/GRPC routes are defined in Helm charts.
- Argo Rollouts dynamically adjusts traffic ratios.
- Matrix Provider evaluates metrics and triggers rollbacks if thresholds are exceeded.
- Linkerd and Gateway API manage both north-south and east-west traffic seamlessly.
Conclusion
EarnIn’s adoption of progressive delivery through Linkerd, Gateway API, and Argo Rollouts has significantly improved deployment reliability and risk management. By standardizing traffic control and enabling self-service deployment strategies, the team reduced manual intervention and enhanced observability. As the company scales, expanding progressive delivery to infrastructure layers and refining monitoring tools will further solidify its position in the CNCF ecosystem.