spanpruningprocessor

package module
v0.0.0-...-ec5cef7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 27, 2026 License: Apache-2.0 Imports: 25 Imported by: 0

README

Span Pruning Processor

Status
Stability alpha: traces
Distributions contrib
Issues Open issues Closed issues
Code coverage codecov
Code Owners @portertech, @csmarchbanks

Overview

The Span Pruning Processor identifies duplicate or similar leaf spans within a single trace, groups them, and replaces each group with a single aggregated summary span. When leaf spans are aggregated, the processor also recursively aggregates their parent spans if all children of those parents are being aggregated.

Leaf spans are spans that are not referenced as a parent by any other span in the trace. They typically represent the last actions in an execution call stack (e.g., individual database queries, HTTP calls to external services).

Spans are grouped by:

  1. Span name - spans must have the same name
  2. Span kind - spans must have the same kind (Internal, Server, Client, Producer, Consumer)
  3. Status code - spans must have the same status (OK, Error, or Unset)
  4. TraceState - spans must have identical TraceState values (for Consistent Probability Sampling compatibility)
  5. Configured attributes - spans must have matching values for attributes specified in group_by_attributes
  6. Parent span name - leaf spans must share the same parent span name to be grouped together

Parent spans are eligible for aggregation when all of their children are aggregated, they share the same name, kind, and status code, and they are not root spans.

Optionally, the processor can detect duration outliers using statistical methods (IQR or MAD) and either annotate summary spans with outlier correlations or preserve outlier spans as individual spans for debugging while still aggregating normal spans.

This processor is useful for reducing trace data volume while preserving meaningful information about repeated operations.

Use Cases

  • Database query optimization: When an application makes many similar database queries (e.g., N+1 queries), aggregate them into a single summary span
  • Batch operations: Consolidate many similar leaf operations into a single representative span
  • Cost reduction: Reduce trace storage costs by eliminating redundant span data

Configuration

processors:
  spanpruning:
    # OTTL conditions to select which traces to prune
    # When empty, all traces are pruned (default behavior)
    # When set, only traces where at least one span matches any condition are pruned
    # Example: only prune traces from specific services
    # conditions:
    #   - 'resource.attributes["service.name"] == "loki-query-engine"'

    # Attributes to use for grouping similar leaf spans (supports glob patterns)
    # Spans with the same name AND same values for matching attributes will be grouped
    # Examples:
    #   - "db.*" matches db.operation, db.name, db.statement, etc.
    #   - "http.request.*" matches http.request.method, http.request.header, etc.
    #   - "db.operation" matches only the exact key "db.operation"
    group_by_attributes:
      - "db.*"
      - "http.method"

    # Minimum number of similar leaf spans required before aggregation
    # Default: 5
    min_spans_to_aggregate: 3

    # Maximum depth of parent span aggregation above leaf spans
    # 0 = only aggregate leaf spans (no parent aggregation)
    # -1 = unlimited depth
    # Default: 1
    max_parent_depth: 1

    # Prefix for aggregation statistics attributes
    # Default: "aggregation."
    aggregation_attribute_prefix: "batch."

    # Upper bounds for histogram buckets (latency distribution)
    # Default: [5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 2.5s, 5s, 10s]
    # Set to empty list to disable histogram
    aggregation_histogram_buckets: [10ms, 50ms, 100ms, 500ms, 1s]

    # Enable attribute loss analysis during aggregation
    # Default: false (reduces telemetry overhead)
    # When enabled, analyzes attribute differences, records metrics, and adds summary attributes
    enable_attribute_loss_analysis: false

    # Attribute loss exemplar sampling rate
    # Fraction of attribute-loss metric recordings that include trace exemplars.
    # Range: 0.0 (disabled) to 1.0 (always)
    # Default: 0.01 (1%)
    attribute_loss_exemplar_sample_rate: 0.01

    # Enable measurement of serialized trace sizes before and after pruning
    # When enabled, records bytes_received and bytes_emitted metrics
    # This requires serializing the trace data which can be expensive for large batches
    # Default: false
    enable_bytes_metrics: false

    # Enable IQR or MAD outlier detection and attribute correlation
    # When enabled, adds duration_median_ns and outlier_correlated_attributes
    # to summary spans
    # Default: false
    enable_outlier_analysis: false

    # Outlier analysis configuration (optional)
    outlier_analysis:
      # Statistical method for outlier detection
      # "iqr" (default): Interquartile Range method
      # "mad": Median Absolute Deviation method (more robust to extreme outliers)
      method: iqr

      # IQR multiplier for outlier detection threshold (when method=iqr)
      # Outliers are spans with duration > Q3 + (iqr_multiplier * IQR)
      # Common values: 1.5 (standard), 3.0 (extreme only)
      # Default: 1.5
      iqr_multiplier: 1.5

      # MAD multiplier for outlier detection threshold (when method=mad)
      # Outliers are spans with duration > median + (mad_multiplier * MAD * 1.4826)
      # Common values: 2.5-3.0 (standard), 3.5+ (extreme only)
      # Default: 3.0
      mad_multiplier: 3.0

      # Minimum group size for reliable IQR calculation
      # Groups smaller than this skip outlier analysis
      # Must be at least 4 (need quartiles)
      # Default: 7
      min_group_size: 7

      # Minimum fraction of outliers that must share an attribute value
      # for it to be reported as correlated
      # Range: (0.0, 1.0]
      # Default: 0.75 (75% of outliers must share the value)
      correlation_min_occurrence: 0.75

      # Maximum fraction of normal spans that can have the correlated value
      # Lower values mean stronger signal
      # Range: [0.0, 1.0)
      # Default: 0.25 (at most 25% of normal spans can have the value)
      correlation_max_normal_occurrence: 0.25

      # Maximum correlated attributes to report in summary span attribute
      # Default: 5
      max_correlated_attributes: 5

      # Preserve outlier spans as individual spans instead of aggregating
      # When true, only normal spans are aggregated; outliers remain in the trace
      # Default: false
      preserve_outliers: false

      # Maximum number of outlier spans to preserve per aggregation group
      # Spans are selected by most extreme duration first
      # 0 = preserve all detected outliers
      # Default: 2
      max_preserved_outliers: 2

      # Only preserve outliers when a strong attribute correlation is found
      # This avoids preserving outliers that are just random variance
      # Default: false
      preserve_only_with_correlation: false

Configuration Options

Field Type Default Description
conditions []string [] OTTL conditions for selective pruning; empty = prune all traces
group_by_attributes []string [] Attribute patterns for grouping (supports glob patterns like db.*)
min_spans_to_aggregate int 5 Minimum group size before aggregation occurs
max_parent_depth int 1 Max depth of parent aggregation (0=none, -1=unlimited)
aggregation_attribute_prefix string "aggregation." Prefix for aggregation statistics attributes
aggregation_histogram_buckets []time.Duration [5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 2.5s, 5s, 10s] Upper bounds for histogram buckets
enable_attribute_loss_analysis bool false Enable attribute loss analysis (adds metrics and span attributes showing attribute differences)
attribute_loss_exemplar_sample_rate float64 0.01 Fraction of attribute-loss metric recordings that include trace exemplars (0.0–1.0). Only applies when enable_attribute_loss_analysis is true.
enable_bytes_metrics bool false Enable measurement of serialized trace sizes (bytes_received/bytes_emitted metrics)
enable_outlier_analysis bool false Enable outlier detection and correlation analysis
outlier_analysis.method string "iqr" Statistical method: "iqr" or "mad"
outlier_analysis.iqr_multiplier float64 1.5 IQR threshold multiplier (when method=iqr)
outlier_analysis.mad_multiplier float64 3.0 MAD threshold multiplier (when method=mad)
outlier_analysis.min_group_size int 7 Minimum group size for outlier analysis
outlier_analysis.correlation_min_occurrence float64 0.75 Minimum outlier occurrence fraction for correlation
outlier_analysis.correlation_max_normal_occurrence float64 0.25 Maximum normal occurrence fraction for correlation
outlier_analysis.max_correlated_attributes int 5 Maximum correlated attributes to report
outlier_analysis.preserve_outliers bool false Keep outliers as individual spans instead of aggregating
outlier_analysis.max_preserved_outliers int 2 Max outliers to preserve per group (0=preserve all)
outlier_analysis.preserve_only_with_correlation bool false Only preserve outliers if a strong correlation is found
Glob Pattern Support

The group_by_attributes field supports glob patterns for matching attribute keys:

Pattern Matches
db.* db.operation, db.name, db.statement, etc.
http.request.* http.request.method, http.request.header.content-type, etc.
rpc.* rpc.method, rpc.service, rpc.system, etc.
db.operation Only the exact key db.operation

When multiple attributes match a pattern, they are all included in the grouping key (sorted alphabetically for consistency).

Summary Span

When spans are aggregated, the summary span includes:

Properties
  • Name: Original span name (e.g., SELECT)
  • TraceID: Same as original spans
  • SpanID: Newly generated unique ID
  • ParentSpanID: Same as original spans (common parent)
  • Kind: Same as template span (inherited from slowest span)
  • StartTimestamp: Earliest start time of all spans in the group
  • EndTimestamp: Latest end time of all spans in the group
  • Status: Same as original spans (spans are grouped by status code)
  • TraceState: Inherited from the template span (preserved for Consistent Probability Sampling compatibility)
  • Attributes: Inherited from the slowest span in the group

Note: The summary span's duration (EndTimestamp - StartTimestamp) represents the total time window covered by all aggregated spans, which may exceed duration_max_ns. For example, if spans overlap or are staggered, the time range can be larger than any individual span's duration. Use duration_max_ns to find the slowest individual operation.

What Gets Aggregated Away

When spans are aggregated into a summary span, the following data from non-template spans is lost:

Data Behavior
Span Events Only the template (slowest) span's events are preserved
Span Links Only the template span's links are preserved
Attributes Non-matching attribute values are lost (see attribute loss analysis)
Individual Timestamps Original start/end times replaced by the group's time range
SpanIDs Original SpanIDs are replaced by a single summary SpanID

To understand attribute loss, enable enable_attribute_loss_analysis: true which adds diverse_attributes and missing_attributes to summary spans.

Aggregation Attributes

The following attributes are added to the summary span (shown with default aggregation_attribute_prefix: "aggregation."):

Attribute Type Description
<prefix>is_summary bool Always true to identify summary spans
<prefix>span_count int64 Number of spans that were aggregated
<prefix>duration_min_ns int64 Minimum duration in nanoseconds
<prefix>duration_max_ns int64 Maximum duration in nanoseconds
<prefix>duration_avg_ns int64 Average duration in nanoseconds
<prefix>duration_total_ns int64 Total duration in nanoseconds
<prefix>histogram_bucket_bounds_s []float64 Bucket upper bounds in seconds (excludes +Inf)
<prefix>histogram_bucket_counts []int64 Cumulative count per bucket (includes +Inf bucket)
Optional Outlier Analysis Attributes

When enable_outlier_analysis: true, the following additional attributes are added:

Attribute Type Description
<prefix>duration_median_ns int64 Median duration (more robust than average for skewed distributions)
<prefix>outlier_correlated_attributes string Attributes that distinguish outliers from normal spans (format: key=value(outlier%/normal%), ...)
Histogram Buckets

The histogram provides a latency distribution of the aggregated spans. The buckets are cumulative, meaning each bucket count includes all spans with duration less than or equal to the bucket boundary.

Example with buckets [10ms, 50ms, 100ms] and 5 spans with durations [5ms, 15ms, 25ms, 75ms, 150ms]:

  • histogram_bucket_bounds_s: [0.01, 0.05, 0.1]
  • histogram_bucket_counts: [1, 3, 4, 5]
    • Bucket 0 (≤10ms): 1 span (5ms)
    • Bucket 1 (≤50ms): 3 spans (5ms, 15ms, 25ms)
    • Bucket 2 (≤100ms): 4 spans (5ms, 15ms, 25ms, 75ms)
    • Bucket 3 (+Inf): 5 spans (all)
Outlier Analysis (Optional)

When enable_outlier_analysis: true, the processor detects duration outliers and identifies attributes that correlate with slow spans.

Detection Methods

The processor supports two statistical methods for outlier detection:

Method Formula Characteristics
IQR (default) threshold = Q3 + (multiplier × IQR) Standard method; sensitive to moderate outliers; uses quartiles
MAD threshold = median + (multiplier × MAD × 1.4826) More robust to extreme outliers; uses median

When to use each:

  • IQR: Best for typical distributions with moderate outliers. Standard choice for most use cases.
  • MAD: Better when you have extreme outliers that would skew IQR calculations, or when you need more stable detection thresholds.
How It Works

IQR (Interquartile Range) Method:

  1. Sort spans by duration
  2. Calculate Q1 (25th percentile) and Q3 (75th percentile)
  3. Calculate IQR = Q3 - Q1
  4. Flag spans with duration > Q3 + (iqr_multiplier × IQR) as outliers

MAD (Median Absolute Deviation) Method:

  1. Sort spans by duration and find the median
  2. Calculate |duration - median| for each span
  3. MAD = median of those deviations
  4. Flag spans with duration > median + (mad_multiplier × MAD × 1.4826) as outliers

Note: The 1.4826 scale factor makes MAD comparable to standard deviation for normal distributions.

Attribute Correlation (same for both methods):

  • Compare attribute values between outliers and normal spans
  • Find attribute values that appear frequently in outliers but rarely in normal spans
  • Report the strongest correlations based on the configured thresholds
Configuration Example
processors:
  spanpruning:
    enable_outlier_analysis: true
    outlier_analysis:
      method: iqr                # or "mad" for more robustness
      iqr_multiplier: 1.5        # Standard outlier threshold (IQR method)
      mad_multiplier: 3.0        # Standard outlier threshold (MAD method)
      min_group_size: 7          # Skip groups with <7 spans
      correlation_min_occurrence: 0.75   # 75% of outliers must share value
      correlation_max_normal_occurrence: 0.25  # <25% of normal spans can have it
      max_correlated_attributes: 5       # Report top 5 correlations
Example Output
SELECT (summary, span_count: 20)
  aggregation.duration_avg_ns: 45000000
  aggregation.duration_median_ns: 8000000
  aggregation.outlier_correlated_attributes: "db.cache_hit=false(100%/0%), db.shard=7(80%/10%)"

Interpretation:

  • Median vs Avg: Large difference (8ms vs 45ms) indicates outliers are skewing the average
  • Primary correlation: All outliers (100%) had cache_hit=false, while 0% of normal spans did
  • Secondary correlation: 80% of outliers hit shard 7, but only 10% of normal spans did

This helps identify root causes of latency issues:

  • Cache misses
  • Specific database shards
  • Failed retries
  • Timeout scenarios
When to Use
  • Enable when you need to understand why some operations are slow
  • Disable (default) to minimize overhead when outlier analysis isn't needed
  • Works best with groups of 10+ spans for statistical reliability
Performance Impact
  • Computational overhead: Sorts durations, calculates quartiles, counts attribute occurrences
  • Minimal when disabled: Zero overhead (no sorting or calculations)
  • Recommended: Use min_group_size: 7 or higher to skip analysis on small groups
Preserving Outlier Spans (Optional)

When outlier_analysis.preserve_outliers: true, detected outlier spans are kept as individual spans instead of being aggregated. This provides:

  • Full visibility into slow operations for debugging
  • Preserved context: Original attributes, events, and links remain intact
  • Selective aggregation: Only prune repetitive normal spans
Configuration
processors:
  spanpruning:
    enable_outlier_analysis: true
    outlier_analysis:
      preserve_outliers: true         # Keep outliers as individual spans
      max_preserved_outliers: 2       # Keep top 2 slowest outliers per group
      preserve_only_with_correlation: false  # Preserve even without correlation
Configuration Options
Field Type Default Description
preserve_outliers bool false Keep outliers as individual spans instead of aggregating
max_preserved_outliers int 2 Max outliers to preserve per group (0=preserve all detected)
preserve_only_with_correlation bool false Only preserve outliers if a strong attribute correlation is found
Example Output

Before (10 similar SELECT spans, 2 are outliers):

handler
├── SELECT - 5ms (normal)
├── SELECT - 6ms (normal)
├── SELECT - 7ms (normal)
├── SELECT - 8ms (normal)
├── SELECT - 9ms (normal)
├── SELECT - 10ms (normal)
├── SELECT - 11ms (normal)
├── SELECT - 12ms (normal)
├── SELECT - 500ms (outlier, cache_hit=false)
└── SELECT - 600ms (outlier, cache_hit=false)

After (with preserve_outliers: true, max_preserved_outliers: 2):

handler
├── SELECT (summary, span_count=8)      ← Normal spans aggregated
│   - aggregation.preserved_outlier_count: 2
│   - aggregation.outlier_correlated_attributes: "cache_hit=false(100%/0%)"
├── SELECT - 500ms                       ← Outlier preserved
│   - aggregation.is_preserved_outlier: true
│   - aggregation.summary_span_id: "abc123"
│   - cache_hit: false
└── SELECT - 600ms                       ← Outlier preserved
    - aggregation.is_preserved_outlier: true
    - aggregation.summary_span_id: "abc123"
    - cache_hit: false
Summary Span Attributes (When Preserving Outliers)
Attribute Type Description
<prefix>preserved_outlier_count int64 Number of outlier spans preserved
<prefix>preserved_outlier_span_ids []string SpanIDs of preserved outliers
Preserved Outlier Span Attributes
Attribute Type Description
<prefix>is_preserved_outlier bool Identifies span as a preserved outlier
<prefix>summary_span_id string SpanID of the associated summary span
Behavior Notes
  • Parent aggregation: Parents can still be aggregated if all their children are either aggregated or preserved as outliers
  • Skip aggregation: If preserving outliers leaves too few normal spans (below min_spans_to_aggregate), the entire group is left unchanged
  • Selection order: Outliers are preserved starting with the most extreme (longest duration) first

Pipeline Placement

This processor is designed to work best when placed after processors that ensure complete traces are available:

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [groupbytrace, spanpruning, batch]
      exporters: [otlp]

Or with tail sampling:

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [tail_sampling, spanpruning, batch]
      exporters: [otlp]

Example

Basic Example

A trace with repeated database queries (some failing):

Before Processing:

root-span (parent)
├── SELECT (leaf) - duration: 10ms, db.operation: select, status: OK
├── SELECT (leaf) - duration: 15ms, db.operation: select, status: OK
├── SELECT (leaf) - duration: 12ms, db.operation: select, status: OK
├── SELECT (leaf) - duration: 50ms, db.operation: select, status: Error
├── SELECT (leaf) - duration: 45ms, db.operation: select, status: Error
└── INSERT (leaf) - duration: 20ms, db.operation: insert, status: OK

After Processing (with min_spans_to_aggregate: 2):

root-span (parent)
├── SELECT (summary, status: OK)
│   - aggregation.is_summary: true
│   - aggregation.span_count: 3
│   - aggregation.duration_min_ns: 10000000
│   - aggregation.duration_max_ns: 15000000
│   - aggregation.duration_avg_ns: 12333333
├── SELECT (summary, status: Error)
│   - aggregation.is_summary: true
│   - aggregation.span_count: 2
│   - aggregation.duration_min_ns: 45000000
│   - aggregation.duration_max_ns: 50000000
│   - aggregation.duration_avg_ns: 47500000
└── INSERT (unchanged - only 1 span, below threshold)

Note: Spans with different status codes are grouped separately, preserving error information.

Recursive Parent Aggregation Example

When spans are aggregated, the processor also checks if their parent spans can be aggregated. Parent spans are eligible for aggregation when:

  1. All of their children are being aggregated
  2. They share the same name, kind, and status code with other eligible parents
  3. They are not root spans (must have a parent)
  4. At least 2 parents meet the criteria

Before Processing (with min_spans_to_aggregate: 2, group_by_attributes: ["db.op"]):

root
├── handler (status: OK)
│   └── SELECT (db.op=select, status: OK) ───┐
├── handler (status: OK)                      │ leaf group A: 3 OK SELECTs
│   └── SELECT (db.op=select, status: OK) ───┤
├── handler (status: OK)                      │
│   └── SELECT (db.op=select, status: OK) ───┘
├── handler (status: Error)
│   └── SELECT (db.op=select, status: Error) ┐ leaf group B: 2 Error SELECTs
├── handler (status: Error)                   │
│   └── SELECT (db.op=select, status: Error) ┘
├── handler (status: OK)
│   └── INSERT (db.op=insert, status: OK) ──── only 1, below threshold
└── worker (status: OK)
    └── SELECT (db.op=select, status: OK) ──── different parent name

After Processing:

root
├── handler (summary, status: OK, span_count: 3)
│   └── SELECT (summary, status: OK, span_count: 3)
├── handler (summary, status: Error, span_count: 2)
│   └── SELECT (summary, status: Error, span_count: 2)
├── handler (status: OK)
│   └── INSERT (status: OK) ─────────────────────────── unchanged
└── worker (status: OK)
    └── SELECT (status: OK) ─────────────────────────── unchanged

Why each span was handled this way:

Span Result Reason
3x handler (OK) with SELECT children Aggregated All children aggregated, same name+kind+status
3x SELECT (OK) under handler Aggregated Same name + kind + status + attributes + parent name
2x handler (Error) with SELECT children Aggregated All children aggregated, same name+kind+status
2x SELECT (Error) under handler Aggregated Same name + kind + status + attributes + parent name
handler (OK) with INSERT child Unchanged Child not aggregated (only 1 INSERT)
INSERT (OK) Unchanged Below threshold (only 1 span)
worker (OK) Unchanged Child not aggregated
SELECT (OK) under worker Unchanged Different parent name than other SELECTs

OTTL Condition Filtering

The conditions field allows selective trace pruning using OTTL (OpenTelemetry Transformation Language) expressions. Only traces where at least one span matches any condition will be pruned.

Behavior
Conditions Result
Empty/not configured All traces are pruned (default behavior)
Configured Only matching traces are pruned; others pass through unchanged
Syntax

Conditions use OTTL span context syntax. Each condition is a boolean expression evaluated against each span in a trace. If any span matches any condition, the entire trace is eligible for pruning.

Common Examples

Filter by service name:

conditions:
  - 'resource.attributes["service.name"] == "loki-query-engine"'

Filter by span attributes:

conditions:
  - 'attributes["db.system"] == "postgresql"'

Filter by HTTP route:

conditions:
  - 'attributes["http.route"] == "/api/v1/query"'

Multiple conditions (OR logic):

conditions:
  - 'resource.attributes["service.name"] == "loki-query-engine"'
  - 'attributes["db.system"] == "postgresql"'

A trace is pruned if any span matches any condition.

Filter by span name:

conditions:
  - 'name == "SELECT"'

Filter by status:

conditions:
  - 'status.code == 2'  # Error status
Use Cases
  • Targeted pruning: Only prune traces from specific services known to generate repetitive spans
  • Environment filtering: Prune only production traces while preserving development traces
  • Operation-specific: Prune only database-heavy traces while keeping HTTP traces intact
  • Debugging: Temporarily disable pruning for specific services to investigate issues

Limitations

  • Requires complete traces for accurate leaf detection
  • Summary span inherits attributes from the slowest span in the group
  • Parent spans are only aggregated when ALL their children are aggregated

Consistent Probability Sampling (CPS) Compatibility

The processor is designed to be compatible with Consistent Probability Sampling (CPS). CPS uses TraceState to carry sampling metadata (ot=th:...;rv:...) where:

  • th (threshold) indicates the sampling probability threshold
  • rv (randomness value) provides consistent randomness for sampling decisions

Why TraceState matters for aggregation:

Spans with different TraceState values represent different sampling populations with different "adjusted counts" (weights). Aggregating them together would produce statistically incorrect summaries and break downstream sampling decisions.

The processor uses exact TraceState matching (not just the th value) because:

  • The rv value affects sampling decisions
  • Vendor-specific keys may have semantic meaning
  • Key ordering may be significant

Telemetry

The processor emits the following metrics to help monitor its operation:

Counters
Metric Description
otelcol_processor_spanpruning_spans_received Total number of spans received by the processor
otelcol_processor_spanpruning_spans_pruned Total number of spans removed by aggregation
otelcol_processor_spanpruning_aggregations_created Total number of aggregation summary spans created
otelcol_processor_spanpruning_traces_processed Total number of traces processed
otelcol_processor_spanpruning_outliers_detected Total spans identified as outliers by analysis (when enable_outlier_analysis: true)
otelcol_processor_spanpruning_outliers_preserved Total outlier spans kept as individual spans (when preserve_outliers: true)
otelcol_processor_spanpruning_outliers_correlations_detected Total aggregation groups where outliers had correlated attributes
otelcol_processor_spanpruning_bytes_received Total bytes of serialized traces received (when enable_bytes_metrics: true)
otelcol_processor_spanpruning_bytes_emitted Total bytes of serialized traces emitted after pruning (when enable_bytes_metrics: true)
Histograms
Metric Description
otelcol_processor_spanpruning_aggregation_group_size Distribution of the number of spans per aggregation group
otelcol_processor_spanpruning_processing_duration Time taken to process each batch of traces (in seconds)
Optional Attribute Loss Metrics

When enable_attribute_loss_analysis: true, the processor also emits metrics about attribute loss during aggregation. These metrics help you understand how much information is being lost when spans are grouped together.

To correlate these metrics back to traces, a configurable fraction of these metric recordings can include trace exemplars via attribute_loss_exemplar_sample_rate. Sampling is applied per aggregation group, and the exemplar context is taken from the slowest span in the group.

Histograms (Optional)
Metric Description
otelcol_processor_spanpruning_leaf_attribute_diversity_loss Attribute values lost due to diversity per leaf aggregation (when leaf spans have different attribute values)
otelcol_processor_spanpruning_leaf_attribute_loss Attribute keys lost due to absence per leaf aggregation (when some spans don't have an attribute that others do)
otelcol_processor_spanpruning_parent_attribute_diversity_loss Attribute values lost due to diversity per parent aggregation
otelcol_processor_spanpruning_parent_attribute_loss Attribute keys lost due to absence per parent aggregation

Attribute loss analysis is disabled by default (enable_attribute_loss_analysis: false) to reduce overhead. When enabled, the processor:

  • Analyzes attribute differences across spans being aggregated
  • Records histogram metrics for loss tracking
  • Adds <prefix>diverse_attributes and <prefix>missing_attributes summary attributes to aggregated spans

These metrics can be used to:

  • Monitor the effectiveness of span pruning (compare spans_received vs spans_pruned)
  • Track the compression ratio achieved by aggregation
  • Identify processing bottlenecks via processing_duration
  • Understand aggregation patterns via aggregation_group_size

Documentation

Overview

Package spanpruningprocessor detects duplicate or similar leaf spans within a single trace and replaces each set with a single aggregated summary span. Leaf spans are spans that are never referenced as a parent by another span. When all children of a parent are aggregated, the parent can also be aggregated, preserving the trace structure while reducing volume.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func NewFactory

func NewFactory() processor.Factory

NewFactory returns a new factory for the Span Pruning processor.

Types

type Config

type Config struct {
	// GroupByAttributes lists attribute patterns used to decide which leaf spans
	// belong in the same aggregation group. Spans must share the span name and
	// have identical values for every matched attribute to be grouped. Patterns
	// accept glob syntax, for example:
	//   - "db.*" matches db.operation, db.name, db.statement, etc.
	//   - "http.request.*" matches http.request.method, http.request.header, etc.
	//   - "service" matches only the exact key "service"
	// Examples: ["db.*", "http.method"], ["rpc.*"].
	GroupByAttributes []string `mapstructure:"group_by_attributes"`

	// MinSpansToAggregate is the minimum number of similar spans required before
	// aggregation occurs. Groups smaller than this threshold are preserved.
	// Default: 5
	MinSpansToAggregate int `mapstructure:"min_spans_to_aggregate"`

	// MaxParentDepth bounds how many ancestor levels above the aggregated leaves
	// can also be aggregated. Use 0 to aggregate only leaves, -1 for unlimited
	// depth, or a positive integer to cap traversal.
	// Default: 1
	MaxParentDepth int `mapstructure:"max_parent_depth"`

	// AggregationAttributePrefix prefixes all aggregation-related attributes that
	// are added to summary spans.
	// Default: "aggregation."
	AggregationAttributePrefix string `mapstructure:"aggregation_attribute_prefix"`

	// AggregationHistogramBuckets lists cumulative histogram bucket upper bounds
	// for latency tracking on aggregated spans. Empty slice disables histograms.
	// Example: [5*time.Millisecond, 10*time.Millisecond, 100*time.Millisecond]
	// Default: [5*time.Millisecond, 10*time.Millisecond, 25*time.Millisecond, 50*time.Millisecond, 100*time.Millisecond, 250*time.Millisecond, 500*time.Millisecond, time.Second, 2500*time.Millisecond, 5*time.Second, 10*time.Second]
	AggregationHistogramBuckets []time.Duration `mapstructure:"aggregation_histogram_buckets"`

	// EnableAttributeLossAnalysis toggles analysis of attribute loss during
	// aggregation. When enabled, the processor compares attribute sets across
	// aggregated spans, records loss metrics, and annotates summary spans.
	// Default: false (to reduce telemetry overhead)
	EnableAttributeLossAnalysis bool `mapstructure:"enable_attribute_loss_analysis"`

	// AttributeLossExemplarSampleRate controls the fraction of attribute-loss
	// metric recordings that include exemplars when loss analysis is enabled.
	// Range: 0.0 (disabled) to 1.0 (always). Default: 0.01 (1%).
	AttributeLossExemplarSampleRate float64 `mapstructure:"attribute_loss_exemplar_sample_rate"`

	// EnableOutlierAnalysis toggles IQR-based outlier detection and attribute
	// correlation. When enabled, adds duration_median_ns and outlier_correlated_attributes
	// to summary spans.
	// Default: false
	EnableOutlierAnalysis bool `mapstructure:"enable_outlier_analysis"`

	// EnableBytesMetrics toggles measurement of serialized trace sizes before
	// and after pruning. When enabled, records bytes_received and bytes_emitted
	// metrics. This requires serializing the trace data which can be expensive
	// for large batches.
	// Default: false
	EnableBytesMetrics bool `mapstructure:"enable_bytes_metrics"`

	// OutlierAnalysis configures IQR-based outlier detection and
	// attribute correlation for aggregation groups.
	OutlierAnalysis OutlierAnalysisConfig `mapstructure:"outlier_analysis"`

	// Conditions is a list of OTTL conditions that determine which traces
	// should be pruned. Conditions use OTTL span context syntax. When empty,
	// all traces are pruned (current behavior). When set, only traces where
	// at least one span matches any condition are pruned.
	// Example: `resource.attributes["service.name"] == "loki-query-engine"`
	Conditions []string `mapstructure:"conditions"`
}

Config defines the configuration options for the span pruning processor and the rules used to identify and aggregate similar spans.

func (*Config) Validate

func (cfg *Config) Validate() error

Validate checks if the processor configuration is valid

type OutlierAnalysisConfig

type OutlierAnalysisConfig struct {
	// Method selects the statistical method for outlier detection.
	// Valid values: "iqr" (default), "mad"
	Method OutlierMethod `mapstructure:"method"`

	// IQRMultiplier sets the threshold for IQR-based outlier detection.
	// Outliers are spans with duration > Q3 + (IQRMultiplier * IQR).
	// Common values: 1.5 (standard), 3.0 (extreme only).
	// Default: 1.5
	IQRMultiplier float64 `mapstructure:"iqr_multiplier"`

	// MADMultiplier sets the threshold for MAD-based outlier detection.
	// Outliers are spans with duration > median + (MADMultiplier * MAD * 1.4826).
	// Common values: 2.5-3.0 (standard), 3.5+ (extreme only).
	// Default: 3.0
	MADMultiplier float64 `mapstructure:"mad_multiplier"`

	// MinGroupSize is the minimum number of spans needed for reliable
	// outlier detection. Groups smaller than this skip outlier analysis.
	// Must be at least 4 (need quartiles).
	// Default: 7
	MinGroupSize int `mapstructure:"min_group_size"`

	// CorrelationMinOccurrence is the minimum fraction of outliers that must
	// share an attribute value for it to be reported as correlated.
	// Range: (0.0, 1.0]
	// Default: 0.75 (75% of outliers must share the value)
	CorrelationMinOccurrence float64 `mapstructure:"correlation_min_occurrence"`

	// CorrelationMaxNormalOccurrence is the maximum fraction of normal spans
	// that can have the correlated value. Lower values mean stronger signal.
	// Range: [0.0, 1.0)
	// Default: 0.25 (at most 25% of normal spans can have the value)
	CorrelationMaxNormalOccurrence float64 `mapstructure:"correlation_max_normal_occurrence"`

	// MaxCorrelatedAttributes limits how many correlated attributes are
	// reported in the summary span attribute.
	// Default: 5
	MaxCorrelatedAttributes int `mapstructure:"max_correlated_attributes"`

	// PreserveOutliers controls whether outlier spans are kept as individual
	// spans instead of being aggregated. When true, only normal spans are
	// aggregated; outliers remain in the trace.
	// Default: false (aggregate all, add summary attributes)
	PreserveOutliers bool `mapstructure:"preserve_outliers"`

	// MaxPreservedOutliers limits how many outlier spans are preserved per
	// aggregation group. Spans are selected by most extreme duration first.
	// 0 = preserve all detected outliers.
	// Default: 2
	MaxPreservedOutliers int `mapstructure:"max_preserved_outliers"`

	// PreserveOnlyWithCorrelation only preserves outliers when a strong
	// attribute correlation is found. This avoids preserving outliers that
	// are just random variance.
	// Default: false
	PreserveOnlyWithCorrelation bool `mapstructure:"preserve_only_with_correlation"`

	// MinOutlierThresholdPercent sets a minimum percentage above median that
	// a span must exceed to be considered an outlier, regardless of statistical
	// method. This prevents overly sensitive outlier detection when IQR or MAD
	// is zero (tightly clustered data) or produces very small thresholds.
	// Range: [0.0, 1.0+]
	// Default: 0.1 (10% above median)
	MinOutlierThresholdPercent float64 `mapstructure:"min_outlier_threshold_percent"`
}

OutlierAnalysisConfig controls outlier detection and attribute correlation.

func (*OutlierAnalysisConfig) Validate

func (cfg *OutlierAnalysisConfig) Validate(enabled bool) error

Validate checks OutlierAnalysisConfig for invalid values.

type OutlierMethod

type OutlierMethod string

OutlierMethod defines the statistical method for outlier detection.

const (
	// OutlierMethodIQR uses Interquartile Range for outlier detection.
	// Threshold: Q3 + (IQRMultiplier * IQR)
	OutlierMethodIQR OutlierMethod = "iqr"

	// OutlierMethodMAD uses Median Absolute Deviation for outlier detection.
	// Threshold: median + (MADMultiplier * MAD * 1.4826)
	// MAD is more robust to extreme outliers than IQR.
	OutlierMethodMAD OutlierMethod = "mad"
)

Directories

Path Synopsis
internal

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL