dynamicroutingconnector

package module
v0.34.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 24, 2026 License: Apache-2.0 Imports: 24 Imported by: 0

README

Dynamic Routing Connector

The Dynamic Routing Connector is an OpenTelemetry Collector connector that routes telemetry data (traces, logs, and metrics) to different pipelines based on the estimated cardinality of unique combinations of configured metadata keys. It uses the HyperLogLog algorithm to efficiently estimate cardinality without storing all unique identifiers, making it memory-efficient even at scale. The metadata keys you configure determine what type of cardinality is being measured—whether that's unique connections, unique services, unique pods, or any other combination of metadata attributes.

Status
Distributions []
Issues Open issues Closed issues
Code coverage codecov

Supported Pipeline Types

Exporter Pipeline Type Receiver Pipeline Type Stability Level
logs logs development
metrics metrics development
traces traces development

Overview

The Dynamic Routing Connector enables intelligent, data-driven routing of telemetry signals to different processing pipelines based on the observed cardinality of unique combinations of metadata keys. The connector estimates how many unique combinations exist for a given primary key (e.g., tenant ID) based on the configured metadata keys. This is particularly useful in multi-tenant environments or scenarios where different workloads require different processing strategies based on their cardinality characteristics.

The Problem It Solves

Traditional OpenTelemetry Collector configurations use static routing rules that are defined at configuration time. This approach has limitations:

  1. Static Configuration: Routing decisions are fixed and cannot adapt to changing traffic patterns
  2. One-Size-Fits-All: All data flows through the same pipeline regardless of volume or connection patterns
  3. Inefficient Resource Usage: High-cardinality tenants may need different batching, sampling, or processing strategies than low-cardinality ones
  4. Manual Tuning Required: Operators must manually configure routing rules based on assumptions about traffic patterns

The Dynamic Routing Connector fills this gap by providing adaptive, cardinality-based routing that automatically adjusts pipeline selection based on observed cardinality patterns derived from configured metadata keys.

How It Works

The connector uses the following approach:

  1. Metadata Extraction: For each incoming telemetry signal, the connector extracts metadata from the client context using the configured routing_keys.partition_by and routing_keys.measure_by. The partition_by keys are used to create a composite key that partitions cardinality estimates (e.g., per tenant, per tenant+type, per region+environment). Multiple keys can be specified to create composite partitions. The measure_by keys define what unique combinations are being counted for each composite partition key value.

  2. Cardinality Estimation: The connector uses the HyperLogLog algorithm to estimate the number of unique combinations of the configured measure_by keys for each composite value of the partition_by keys. Composite keys are constructed by concatenating values from all specified partition keys, separated by colons (:) for multiple values of the same key and semicolons (;) for different keys.

  3. Threshold-Based Routing: Based on the estimated cardinality for each primary key value, the connector routes data to different pipelines defined by threshold boundaries. For example:

    • Low cardinality (0-10 unique combinations) → Pipeline A
    • Medium cardinality (11-100 unique combinations) → Pipeline B
    • High cardinality (101+ unique combinations) → Pipeline C
  4. Periodic Re-evaluation: At configurable intervals, the connector re-evaluates routing decisions based on the most recent cardinality estimates, allowing it to adapt to changing patterns.

  5. Memory Efficiency: The HyperLogLog algorithm provides accurate cardinality estimates with minimal memory overhead, making it suitable for high-throughput scenarios.

Configuration

Basic Configuration
connectors:
  dynamicrouting:
    routing_keys:
      partition_by: ["x-tenant-id"]
      measure_by:
        - "x-forwarded-for"
        - "user-agent"
    routing_pipelines:
      - pipelines:  ["traces/low_cardinality"]
        max_cardinality: 10
      - pipelines: ["traces/medium_cardinality"]
        max_cardinality: 100
      - pipelines: ["traces/high_cardinality"]
        max_cardinality: 500
      - pipelines: ["traces/very_high_cardinality"]
        max_cardinality: .inf
    default_pipelines: ["traces/default"]
    evaluation_interval: 30s
Composite Partition Keys

You can specify multiple keys in routing_keys.partition_by to create composite partitions. This is useful when you want to track cardinality per combination of multiple dimensions (e.g., per tenant AND tenant type, or per region AND environment).

connectors:
  dynamicrouting:
    routing_keys:
      # Composite key: partitions by both tenant and tenant type
      partition_by: ["x-tenant-id", "x-tenant-type"]
      measure_by:
        - "x-forwarded-for"
        - "user-agent"
    routing_pipelines:
      - pipelines: ["traces/low_cardinality"]
        max_cardinality: 10
      - pipelines: ["traces/high_cardinality"]
        max_cardinality: .inf
    default_pipelines: ["traces/default"]
    evaluation_interval: 30s

In this example, the connector will:

  • Create separate cardinality estimates for each unique combination of x-tenant-id and x-tenant-type
  • For example: tenant-a:premium, tenant-a:standard, tenant-b:premium, etc.
  • Each composite key will have its own routing decision based on its cardinality

Composite Key Construction: Values from multiple keys are concatenated with colons (:) separating multiple values of the same key, and semicolons (;) separating different keys. If a key is missing from the metadata, it's skipped in the composite key construction.

Configuration Fields
Field Type Description Required
routing_keys RoutingKeys Configuration object for routing keys. Contains partition_by and measure_by fields. Yes
routing_keys.partition_by []string Array of metadata keys used to create a composite key for partitioning cardinality estimates. Multiple keys can be specified to create composite partitions (e.g., ["x-tenant-id"] for per-tenant, or ["x-tenant-id", "x-tenant-type"] for per-tenant+type). Composite keys are constructed by concatenating values from all specified keys. Each unique composite key value will have its own cardinality estimate. At least one key must be specified. Yes
routing_keys.measure_by []string Metadata keys used to define unique combinations for cardinality estimation. The connector counts how many unique combinations of these keys exist for each composite value of partition_by. The choice of keys determines what type of cardinality is measured (e.g., unique connections, unique pods, unique deployments). No
routing_pipelines []RoutingPipeline Array of pipeline configurations, each containing pipelines (array of pipeline IDs) and max_cardinality (float64). Pipelines must be defined in ascending order of max_cardinality, and the last pipeline must have max_cardinality set to .inf (positive infinity). The connector routes to the first pipeline where the estimated cardinality is less than or equal to max_cardinality. Yes
default_pipelines []pipeline.ID Pipelines to use when all partition keys are missing from the client context. Yes
evaluation_interval duration How often to re-evaluate routing decisions based on new cardinality estimates. Default: 30s No
Configuration Rules
  • routing_keys.partition_by must contain at least one key
  • routing_pipelines must contain at least one pipeline configuration
  • routing_pipelines must be defined in ascending order of max_cardinality values
  • The last pipeline in routing_pipelines must have max_cardinality set to .inf (positive infinity)
  • Each pipeline configuration must specify at least one pipeline ID in the pipelines array
Routing Logic

The connector routes data based on the estimated cardinality for the composite partition key:

  • If all routing_keys.partition_by keys are missing from the client context → routes to default_pipelines
  • Otherwise, constructs a composite key from the partition_by keys and routes to the first pipeline in routing_pipelines where estimated_cardinality ≤ max_cardinality
    • The connector iterates through routing_pipelines in order and selects the first pipeline where the condition is met
    • Since the last pipeline must have max_cardinality: .inf, all cardinality values will match at least one pipeline
    • Composite keys are created by concatenating values from all partition_by keys (values separated by :, keys separated by ;)

Use Cases

Dynamic Batching Based on Cardinality

One of the most powerful use cases for the Dynamic Routing Connector is implementing dynamic batching strategies based on cardinality. By configuring routing_keys.measure_by to represent unique connections (e.g., source IP and user agent), you can route tenants with different connection volumes to different batching pipelines.

Scenario: You're operating a multi-tenant observability platform where different tenants have vastly different cardinality patterns. Some tenants have a few unique combinations (e.g., few connections, few pods, few services), while others have many.

Problem: Using a single batching configuration for all tenants leads to:

  • Low-cardinality tenants: Small batches that are inefficient and increase overhead
  • High-cardinality tenants: Large batches that may cause memory pressure and latency spikes

Solution: Use the Dynamic Routing Connector to route tenants to different pipelines with optimized batching configurations based on their cardinality:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

connectors:
  dynamicrouting:
    routing_keys:
      partition_by: ["x-tenant-id"]
      measure_by:
        - "x-forwarded-for"
        - "user-agent"
    routing_pipelines:
      # ≤10 unique connections: Small batches, frequent flush
      - pipelines: ["traces/small_batch"]
        max_cardinality: 10
      # ≤50 unique connections: Medium batches
      - pipelines: ["traces/medium_batch"]
        max_cardinality: 50
      # ≤200 unique connections: Large batches
      - pipelines: ["traces/large_batch"]
        max_cardinality: 200
      # >200 unique connections: Very large batches, aggressive batching
      - pipelines: ["traces/xlarge_batch"]
        max_cardinality: .inf
    default_pipelines: ["traces/default"]
    evaluation_interval: 30s

processors:
  batch/small:
    timeout: 1s
    send_batch_size: 100
    send_batch_max_size: 200
  
  batch/medium:
    timeout: 5s
    send_batch_size: 500
    send_batch_max_size: 1000
  
  batch/large:
    timeout: 10s
    send_batch_size: 2000
    send_batch_max_size: 5000
  
  batch/xlarge:
    timeout: 30s
    send_batch_size: 5000
    send_batch_max_size: 10000

exporters:
  otlp/elastic:
    endpoint: https://elastic-cloud-endpoint:443
    headers:
      Authorization: "Bearer ${ELASTIC_API_KEY}"

service:
  pipelines:
    traces:
      receivers: [otlp]
      connectors: [dynamicrouting]
  
    traces/small_batch:
      processors: [batch/small]
      exporters: [otlp/elastic]
  
    traces/medium_batch:
      processors: [batch/medium]
      exporters: [otlp/elastic]
  
    traces/large_batch:
      processors: [batch/large]
      exporters: [otlp/elastic]
  
    traces/xlarge_batch:
      processors: [batch/xlarge]
      exporters: [otlp/elastic]
  
    traces/default:
      processors: [batch/medium]
      exporters: [otlp/elastic]

How It Works:

  1. Cardinality Tracking: For each tenant (identified by x-tenant-id), the connector tracks unique combinations of the configured routing_keys.measure_by keys (x-forwarded-for and user-agent in this example). This measures the cardinality of unique connection combinations per tenant.

  2. Cardinality Estimation: Using HyperLogLog, the connector estimates how many unique combinations each tenant has without storing all identifiers. In this case, it estimates unique connection combinations.

  3. Dynamic Routing: Based on the estimated cardinality:

    • Tenant A (5 unique connection combinations) → traces/small_batch pipeline with 1s timeout and 100-item batches
    • Tenant B (25 unique connection combinations) → traces/medium_batch pipeline with 5s timeout and 500-item batches
    • Tenant C (150 unique connection combinations) → traces/large_batch pipeline with 10s timeout and 2000-item batches
    • Tenant D (500 unique connection combinations) → traces/xlarge_batch pipeline with 30s timeout and 5000-item batches
  4. Adaptive Behavior: Every 30 seconds, the connector re-evaluates routing decisions. If Tenant A's cardinality grows to 15, it automatically switches to the traces/medium_batch pipeline.

Benefits:

  • Optimized Throughput: High-cardinality tenants benefit from larger batches, reducing overhead
  • Lower Latency: Low-cardinality tenants get faster processing with smaller batches
  • Resource Efficiency: Memory and CPU usage are optimized per tenant workload
  • Automatic Adaptation: No manual intervention needed as traffic patterns change

Implementation Details

HyperLogLog Algorithm

The connector uses the HyperLogLog probabilistic data structure to estimate cardinality. This provides:

  • Memory Efficiency: Constant memory usage regardless of the number of unique combinations being tracked
  • Accuracy: Typical error rate of ~1% for cardinality estimation
  • Performance: O(1) insertion and estimation operations
Evaluation Interval

The evaluation_interval determines how frequently routing decisions are updated:

  • Shorter intervals: More responsive to changes but higher CPU usage
  • Longer intervals: More stable routing but slower adaptation to traffic changes
  • Recommended: 30-60 seconds for most use cases

Warnings

Statefulness

This connector maintains state (HyperLogLog sketches) in memory. Important considerations:

  • Memory Usage: Memory usage scales with the number of unique partition key values, not the total number of unique combinations being tracked
  • State Loss: State is lost on collector restart. Routing decisions will rebuild over the evaluation interval
  • High Cardinality Partition Keys: If you have many unique composite values for routing_keys.partition_by, memory usage will increase proportionally. Using multiple keys in partition_by will create more partitions (one per unique combination), which increases memory usage.
Metadata Requirements
  • The connector requires client metadata to be set in the context. Ensure your receivers/proxies propagate metadata appropriately
  • Missing all routing_keys.partition_by keys will route to default_pipelines. If some (but not all) keys are missing, the composite key will be constructed from the available keys.
  • Missing routing_keys.measure_by keys will still work, but the cardinality estimation will be based on the partition key alone (which may not provide meaningful cardinality measurements)
  • The choice of measure_by keys determines what type of cardinality is being measured—choose keys that represent the unique combinations you want to track

Troubleshooting

All Data Routes to Default Pipeline
  • Check: Verify that at least one of the routing_keys.partition_by keys is present in client metadata
  • Solution: Ensure your receiver or proxy is setting the metadata in the context
Routing Not Updating
  • Check: Verify evaluation_interval is not too long
  • Solution: Reduce the interval or manually trigger evaluation (requires collector restart)
High Memory Usage
  • Check: Number of unique composite values for routing_keys.partition_by
  • Solution: Consider using a more selective partition key or increasing evaluation interval to reduce state accumulation

Contributing

Contributions are welcome! Please see the main repository contributing guidelines for details.

Documentation

Overview

Package dynamicroutingconnector provides a connector for dynamically routing requests to different pipelines depending on the configuration.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func NewFactory

func NewFactory() connector.Factory

NewFactory returns a connector.Factory.

Types

type Config

type Config struct {
	RoutingKeys        RoutingKeys       `mapstructure:"routing_keys"`
	DefaultPipelines   []pipeline.ID     `mapstructure:"default_pipelines"`
	EvaluationInterval time.Duration     `mapstructure:"evaluation_interval"`
	RoutingPipelines   []RoutingPipeline `mapstructure:"routing_pipelines"`
}

func (*Config) Validate

func (c *Config) Validate() error

type RoutingKeys

type RoutingKeys struct {
	PartitionBy []string `mapstructure:"partition_by"`
	MeasureBy   []string `mapstructure:"measure_by"`
}

type RoutingPipeline

type RoutingPipeline struct {
	Pipelines      []pipeline.ID `mapstructure:"pipelines"`
	MaxCardinality float64       `mapstructure:"max_cardinality"`
}

Directories

Path Synopsis
internal

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL