Prometheus and OpenTelemetry are two cornerstone tools in the observability landscape, each offering distinct approaches to metrics collection and processing. As organizations adopt hybrid observability strategies, interoperability between these systems has become critical. This article explores the current state of their integration, focusing on architectural differences, technical challenges, and future directions.
Prometheus employs a pull-based model, where it actively discovers targets and periodically scrapes metrics. This approach enables detailed service health monitoring, such as detecting failed scrapes, but requires maintaining in-memory metrics and may waste resources on redundant pulls.
In contrast, OpenTelemetry uses a push-based model, where metrics are exported to collectors or directly to storage. This reduces the need for service discovery but lacks visibility into service downtime or network issues. Metrics are only transmitted when they change, limiting service state tracking.
Prometheus 3.0 introduced native UTF-8 support, resolving common issues with special characters like periods (.) in metric names. This aligns with OpenTelemetry’s semantic conventions while requiring configuration adjustments in the OpenTelemetry Collector to avoid default endpoints and enable UTF-8 name conversion strategies (e.g., replacing UTF-8 with underscores).
Prometheus 3.0 added a Delta to Cumulative processor to convert OpenTelemetry’s delta metrics into cumulative formats. This requires maintaining state to compute deltas, and future work aims to enhance query-phase processing or introduce PromQL functions for efficiency.
The introduction of native histograms in Prometheus 3.0 improves performance and accuracy by allowing complete histograms within single requests. These are now supported in OpenTelemetry Collector’s remote receivers, though further work is needed to stabilize OpenMetrics text format implementations and address query-specific requirements.
Remote Write v2 introduces support for native and classic histograms, exemplars (links to traces), metadata (applied to series rather than metric names), and timestamps. It also includes partial write statistics (e.g., successful/failed write volumes) and plans to integrate these features into remote write exporters.
Current strategies map resource attributes to a single metric (target_info
), combined via PromQL JOINs. However, JOINs are considered a usage threshold. Future work will allow users to customize attributes converted to labels, though this requires Prometheus restarts and may alter time series IDs, increasing memory overhead. UX research is ongoing to refine resource attribute handling, with plans to integrate OpenTelemetry’s Entities proposal to distinguish Kubernetes, container, and other attributes.
Prometheus users can leverage the promote resource attributes
configuration to convert attributes into labels, improving usability. However, modifying attributes requires restarts, potentially causing time series ID changes, memory spikes, and monitoring interruptions.
Ongoing UX research investigates resource attribute design in monitoring data models, gathering feedback on database requirements. Future plans include integrating OpenTelemetry’s entities concept by May 2024 to differentiate Kubernetes, container, and other attributes.
Support for OpenTelemetry’s OTAP proposal enables entity-based resource attribute identification, enhancing Prometheus’s data parsing and visualization. UTF-8 support is now available, though underscores remain as units and types suffixes.
Development of remote write receivers and exporters supports delta data incremental writes, potentially adjusting protocols for delta modes. Native histogram support and UTF-8 handling are being optimized for data transmission efficiency.
A Parquet file processing working group is addressing high cardinality resource attributes (e.g., 200+ attributes) to improve storage and query performance. Parquet is being explored as a storage solution to reduce memory pressure.
OpenTelemetry Collector processors enable data transformation and filtering of resource attributes before remote export. Proposals include adding deny lists for custom attribute filtering rules.
A new info()
function replaces traditional JOIN operations for OpenTelemetry data processing, though it remains a temporary solution rather than a final resolution.
Prometheus and OpenTelemetry interoperability hinges on addressing architectural differences, enhancing UTF-8 support, optimizing Delta-to-Cumulative conversions, and refining resource attribute handling. While challenges like memory overhead and query complexity persist, ongoing efforts in UX research, protocol improvements, and integration with OpenTelemetry’s Entities proposal signal a path toward more seamless observability. Organizations should prioritize testing Delta metrics, leveraging native histograms, and planning for resource attribute customization to align with evolving requirements.