Skip to content

Latest commit

 

History

History
274 lines (212 loc) · 13 KB

File metadata and controls

274 lines (212 loc) · 13 KB

OpenTelemetry Protocol with Apache Arrow

Slack Go-CI Rust-CI OpenSSF Scorecard for otel-arrow OpenSSF Best Practices codecov

The OTel-Arrow project is building a high-performance, end-to-end column-oriented telemetry pipeline for OpenTelemetry data based on Apache Arrow.

The OpenTelemetry Protocol with Apache Arrow (OTAP) is designed as the column-oriented equivalent of OpenTelemetry Protocol (OTLP) that dramatically reduces network usage, and our protocol specification ensures that OTAP and OTLP are always convertible, without loss, in both directions.

The OTAP Dataflow Engine is our new Rust OpenTelemetry code base, a pipeline engine that dramatically reduces network, memory, and CPU usage, gaining efficiency through a number of optimizations, including the shared-nothing and thread-per-core design patterns and extensive use of zero-copy data types.

The OTAP Dataflow Engine is embeddable software. Our repo builds a demonstration df_engine artifact with core nodes included and features YAML configuration, but really it's designed for safely adding OpenTelemetry features and capabilities in other programs, anywhere that Rust can be compiled, with fine-grained control over memory and CPU resources.

The OTAP Dataflow Engine has built-in support for OTAP and OTLP, receivers and exporters, and built-in processors featuring batching, fanout, failover, retry, and routing by signal type. It has processors for common forms of filtering, transforming, sampling, and temporal aggregation. We have a durable buffer processor, based on the Arrow IPC format, introducing disk-based storage into pipelines for reliable delivery, and there are others such as a Syslog receiver and a Console exporter.

Our transform processor is built using Apache Datafusion, the industry-leading embedded query engine, itself based on Apache Arrow, and our Parquet exporter for OTAP makes OpenTelemetry data directly accessible to a wide range of tools, thanks to the Apache Parquet ecosystem.

We're self-instrumenting ourselves with an experimental OpenTelemetry SDK that emits OTAP directly, meaning we have an end-to-end column-oriented telemetry pipeline in Rust.

Our Golang Collector components otelarrowreceiver and otelarrowexporter have been included in the OpenTelemetry Collector-Contrib distribution since the July 2024 release of v0.104.0.

Our project is growing, new contributors are welcome. Join us in #otel-arrow on the CNCF Slack!

What is Apache Arrow?

The Apache Arrow is a major open-source project for in-memory and on-wire data exchange using a column-oriented representation. For OpenTelemetry readers, Apache Arrow is a lot like us, as the project encompasses a data format, a set of libraries, and an ecosystem.

The Apache Arrow format is a specification for the in-memory layout of a record batch, including details about the schema, the column names and types, and a length, and then a set of Arrays, one per column of the correct type and matching length. Arrow record batches support a number of types, including scalars of various width, strings and binary data, arrays, lists, structs, maps, as well as dictionary-encodings over the other types.

With Apache Arrow, we can build a record batch in one language and pass it to another language using shared memory. Apache Arrow specifies Arrow IPC, an encoding for column-oriented data that extends zero-copy to network and file-based communications.

What is OTAP?

OTAP is formally the OpenTelemetry Protocol with Apache Arrow, abbreviated OTAP for OTel-Arrow Protocol, a column-oriented representation for OpenTelemetry data supporting efficient in-memory and on-wire telemetry exchange. Where OpenTelemetry's OTLP is a row-oriented protocol, OTel-Arrow's OTAP protocol uses Apache Arrow to encode telemetry in a columnar format that is more efficient for CPUs to work with, because of vectorization, and compresses dramatically better, especially over long-lived streams with the use of Arrow IPC stream encoding.

OTAP maintains 100% compatibility with the OpenTelemetry data model for logs, traces, and metrics, with a straight-forward and non-lossy round trip from OTLP to OTAP and back, and support for the OpenTelemetry Profiles signal in OTAP is important future work.

In the OTAP Dataflow Engine, batches of OTAP data are represented using multiple record batches in an arrangement referred to as a "star schema". The number of record batches varies by OpenTelemetry signal type, see our data model documentation for details. OTAP Dataflow Engine also transports OTLP protocol bytes directly and efficiently by avoiding protocol message objects.

Adapter libraries for conversion between OTAP and OTLP representations are provided in Rust and Golang in this repository.

Quick Start

OpenTelemetry Collector example

OTel-Arrow components ship in the OpenTelemetry Collector-Contrib distribution. The two components extend the configuration model and settings of the core OTLP receiver and exporter. By design, you can swap otlp for otelarrow in your Collector configuration. To locate one, see OpenTelemetry Collector releases.

See the Exporter and Receiver docs for complete and up-to-date configuration details and Collector-specific examples.

OTAP Dataflow Engine example

We are not at this time providing pre-built OTAP Dataflow Engine releases. Developers can build the OTAP Dataflow Engine in a minimal configuration with the following:

git clone https://github.com/open-telemetry/otel-arrow.git
cd otel-arrow/rust/otap-dataflow
cargo build --bin df_engine --no-default-features

A directory of example configurations provides a number of examples (e.g., syslog-console.yaml). For example, to receive syslog messages with our Syslog/CEF receiver on port 5140 and print them to the console:

./target/debug/df_engine -c ./configs/syslog-console.yaml

Linux/MacOS users can test this with:

logger -n 127.0.0.1 -P 5140 -d --rfc3164 "hello world"

PowerShell users can test this with:

$t=Get-Date -Format 'MMM dd HH:mm:ss';$u=New-Object Net.Sockets.UdpClient;$b=[Text.Encoding]::ASCII.GetBytes("<14>$t powershell test: hello world");$u.Send($b,$b.Length,'127.0.0.1',5140);$u.Close()

See the admin console on port 8080, or visit http://localhost:8080/metrics to see engine metrics in Prometheus format.

syslog-to-console admin console page

See the OTAP Dataflow Engine documentation for more details.

Roadmap

See our project phases document for project goals and history. Phase 1 established the OTAP representation and proved that a column-oriented representation for OpenTelemetry is good for compression performance.

We are currently completing Phase 2, delivering the OTAP Dataflow engine. Phase 2 has demonstrated that a column-oriented, Arrow-based pipeline delivers new levels of performance for OpenTelemetry. See our live continuous benchmarks and nightly benchmark suite.

As a community, we are planning phase 3, see the links below to join us.

Contributing

We meet weekly, alternating between Tuesday at 4:00 PM PT and Thursday at 8:00 AM PT. Check the OpenTelemetry community calendar for dates and Zoom links.

Whether you're a seasoned OpenTelemetry developer, just starting your journey, or simply curious about the work we do, you're more than welcome to participate!

Maintainers

For more information about the maintainer role, see the community repository.

Approvers

For more information about the approver role, see the community repository.

Triagers

For more information about the approver role, see the community repository.

Emeritus

Thanks to all of our contributors

OpenTelemetry-Arrow contributors

Documentation

Here are some of our important documents. You can find more work-in-progress design documentation for the OTAP Dataflow Engine.

Document Description
OTAP Spec Formal protocol specification
OTAP Basics Introduction to the OTAP protocol
Data Model Arrow schema mappings for OTLP entities
Phase 1 Overview Wire protocol details and historical benchmarks
Phase 2 Design End-to-end pipeline architecture
Engine Design Engine architecture
Benchmarks Current performance results
Validation Process Encoding/decoding validation process
Dataflow Engine Rust crate architecture and component reference

References