The State of Prometheus and OpenTelemetry Interoperability

Prometheus and OpenTelemetry are two cornerstone tools in the observability landscape, each offering distinct approaches to metrics collection and processing. As organizations adopt hybrid observability strategies, interoperability between these systems has become critical. This article explores the current state of their integration, focusing on architectural differences, technical challenges, and future directions.

Model Differences

Prometheus employs a pull-based model, where it actively discovers targets and periodically scrapes metrics. This approach enables detailed service health monitoring, such as detecting failed scrapes, but requires maintaining in-memory metrics and may waste resources on redundant pulls.

In contrast, OpenTelemetry uses a push-based model, where metrics are exported to collectors or directly to storage. This reduces the need for service discovery but lacks visibility into service downtime or network issues. Metrics are only transmitted when they change, limiting service state tracking.

UTF-8 Support

Prometheus 3.0 introduced native UTF-8 support, resolving common issues with special characters like periods (.) in metric names. This aligns with OpenTelemetry’s semantic conventions while requiring configuration adjustments in the OpenTelemetry Collector to avoid default endpoints and enable UTF-8 name conversion strategies (e.g., replacing UTF-8 with underscores).

Delta to Cumulative Conversion

Prometheus 3.0 added a Delta to Cumulative processor to convert OpenTelemetry’s delta metrics into cumulative formats. This requires maintaining state to compute deltas, and future work aims to enhance query-phase processing or introduce PromQL functions for efficiency.

Native Histograms

The introduction of native histograms in Prometheus 3.0 improves performance and accuracy by allowing complete histograms within single requests. These are now supported in OpenTelemetry Collector’s remote receivers, though further work is needed to stabilize OpenMetrics text format implementations and address query-specific requirements.

Remote Write v2

Remote Write v2 introduces support for native and classic histograms, exemplars (links to traces), metadata (applied to series rather than metric names), and timestamps. It also includes partial write statistics (e.g., successful/failed write volumes) and plans to integrate these features into remote write exporters.

Resource Attributes

Current strategies map resource attributes to a single metric (target_info), combined via PromQL JOINs. However, JOINs are considered a usage threshold. Future work will allow users to customize attributes converted to labels, though this requires Prometheus restarts and may alter time series IDs, increasing memory overhead. UX research is ongoing to refine resource attribute handling, with plans to integrate OpenTelemetry’s Entities proposal to distinguish Kubernetes, container, and other attributes.

Future Directions

  • Enhance UTF-8 support by eliminating name conversion (e.g., units and types suffixes).
  • Improve Delta metric query support and explore more efficient conversion mechanisms.
  • Deepen integration with OpenTelemetry’s Entities design to enhance context-aware resource attribute processing.

Resource Attribute Handling and Configuration

Prometheus users can leverage the promote resource attributes configuration to convert attributes into labels, improving usability. However, modifying attributes requires restarts, potentially causing time series ID changes, memory spikes, and monitoring interruptions.

User Experience and UX Research

Ongoing UX research investigates resource attribute design in monitoring data models, gathering feedback on database requirements. Future plans include integrating OpenTelemetry’s entities concept by May 2024 to differentiate Kubernetes, container, and other attributes.

OpenTelemetry Integration and Extensions

Support for OpenTelemetry’s OTAP proposal enables entity-based resource attribute identification, enhancing Prometheus’s data parsing and visualization. UTF-8 support is now available, though underscores remain as units and types suffixes.

Remote Write and Protocol Improvements

Development of remote write receivers and exporters supports delta data incremental writes, potentially adjusting protocols for delta modes. Native histogram support and UTF-8 handling are being optimized for data transmission efficiency.

High Cardinality Attribute Handling

A Parquet file processing working group is addressing high cardinality resource attributes (e.g., 200+ attributes) to improve storage and query performance. Parquet is being explored as a storage solution to reduce memory pressure.

Data Transformation and Filtering

OpenTelemetry Collector processors enable data transformation and filtering of resource attributes before remote export. Proposals include adding deny lists for custom attribute filtering rules.

Prometheus Language Features

A new info() function replaces traditional JOIN operations for OpenTelemetry data processing, though it remains a temporary solution rather than a final resolution.

Conclusion

Prometheus and OpenTelemetry interoperability hinges on addressing architectural differences, enhancing UTF-8 support, optimizing Delta-to-Cumulative conversions, and refining resource attribute handling. While challenges like memory overhead and query complexity persist, ongoing efforts in UX research, protocol improvements, and integration with OpenTelemetry’s Entities proposal signal a path toward more seamless observability. Organizations should prioritize testing Delta metrics, leveraging native histograms, and planning for resource attribute customization to align with evolving requirements.