Metrics & Tracing

Definitive reference for Isartor's OpenTelemetry traces, metrics, structured logging, and observability stack — from local development to Kubernetes.


Overview

Isartor uses OpenTelemetry for distributed tracing and metrics, plus tracing-subscriber with a JSON layer for structured logging.

SignalProtocolDefault Endpoint
TracesOTLP gRPChttp://localhost:4317
MetricsOTLP gRPChttp://localhost:4317
Logsstdout (JSON)

When ISARTOR__ENABLE_MONITORING=false (default), only the console log layer is active — zero OTel overhead.

Architecture

┌─────────────┐                  ┌──────────────────┐
│  Isartor    │  OTLP gRPC      │  OTel Collector   │
│  Gateway    │─────────────────▶│  :4317            │
│             │  (traces +       │                   │
│             │   metrics)       │  Pipelines:       │
└─────────────┘                  │  traces → Jaeger  │
                                 │  metrics → Prom   │
                                 └───┬──────────┬────┘
                                     │          │
                          ┌──────────▼──┐  ┌────▼──────────┐
                          │   Jaeger    │  │  Prometheus   │
                          │   :16686    │  │  :9090        │
                          │   (UI)      │  │  (scrape)     │
                          └─────────────┘  └───────┬───────┘
                                                   │
                                           ┌───────▼───────┐
                                           │   Grafana     │
                                           │   :3000       │
                                           │  (dashboards) │
                                           └───────────────┘

Enabling Monitoring

ISARTOR__ENABLE_MONITORING=true
ISARTOR__OTEL_EXPORTER_ENDPOINT=http://localhost:4317
RUST_LOG=info,h2=warn,hyper=warn,tower=warn       # optional override

When ISARTOR__ENABLE_MONITORING=false (the default), Isartor uses console-only logging via tracing-subscriber with RUST_LOG filtering. No OTel SDK is initialised — zero overhead.


Telemetry Initialisation (src/telemetry.rs)

init_telemetry() returns an OtelGuard (RAII). The guard holds the SdkTracerProvider and SdkMeterProvider; dropping it flushes pending telemetry and shuts down exporters gracefully.

ComponentDescription
JSON stdout layerStructured logs emitted as JSON when monitoring is on
Pretty console layerHuman-readable output when monitoring is off
OTLP trace exportergRPC via opentelemetry-otlp → Collector
OTLP metric exportergRPC via opentelemetry-otlp → Collector
EnvFilterReads RUST_LOG, defaults to info,h2=warn,hyper=warn,tower=warn

Service identity:

service.name    = "isartor-gateway"
service.version = env!("CARGO_PKG_VERSION")   # e.g. "0.1.0"

Distributed Traces — Span Reference

Every request gets a root span (gateway_request) from the monitoring middleware. Child spans are created per-layer:

Root Span

Span NameSourceKey Attributes
gateway_requestsrc/middleware/monitoring.rshttp.method, http.route, http.status_code, client.address, isartor.final_layer

http.status_code and isartor.final_layer are recorded after the response returns (empty → filled pattern).

Layer 0 — Auth

Span NameSourceKey Attributes
(inline tracing::debug!/warn!)src/middleware/auth.rs

Auth is lightweight; no dedicated span is created. Events are logged at debug/warn level.

Layer 1a — Exact Cache

Span NameSourceKey Attributes
l1a_exact_cache_getsrc/adapters/cache.rscache.backend (memory|redis), cache.key, cache.hit
l1a_exact_cache_putsrc/adapters/cache.rscache.backend, cache.key, response_len

Layer 1b — Semantic Cache

Span NameSourceKey Attributes
l1b_semantic_cache_searchsrc/vector_cache.rscache.entries_scanned, cache.hit, cosine_similarity
l1b_semantic_cache_insertsrc/vector_cache.rscache.evicted, cache.size_after

cosine_similarity — the best-match score formatted to 4 decimal places. This is the key attribute for tuning the similarity threshold.

Layer 2 — SLM Triage

Span NameSourceKey Attributes
layer2_slmsrc/middleware/slm_triage.rsslm.complexity_score (TEMPLATE|SNIPPET|COMPLEX; legacy binary mode: SIMPLE|COMPLEX)
l2_classify_intentsrc/adapters/router.rsrouter.backend (embedded_candle|remote_vllm), router.decision, router.model, router.url, prompt_len

Layer 2.5 — Context Optimiser

Span NameSourceKey Attributes
layer2_5_context_optimizersrc/middleware/context_optimizer.rscontext.bytes_saved, context.strategy (e.g. "classifier+dedup", "classifier+log_crunch")

When L2.5 modifies the request body, it also sets the response header x-isartor-context-optimized: bytes_saved=<N>.

Layer 3 — Cloud LLM

Span NameSourceKey Attributes
layer3_llmsrc/handler.rsai.prompt.length_bytes, provider.name, model

Custom Span Attributes — Quick Reference

These are the Isartor-specific attributes (beyond standard OTel semantic conventions) that appear on spans and are useful for filtering in Jaeger / Tempo:

AttributeTypeWhere SetPurpose
isartor.final_layerstringRoot gateway_request spanWhich layer resolved the request
cache.hitboolL1a and L1b spansWhether the cache lookup succeeded
cosine_similaritystringL1b search spanBest cosine-similarity score (4 d.p)
cache.entries_scannedu64L1b search spanEntries scanned during similarity search
cache.backendstringL1a get/put spans"memory" or "redis"
router.decisionstringL2 classify span"TEMPLATE", "SNIPPET", or "COMPLEX" (tiered mode); "SIMPLE" or "COMPLEX" (binary mode)
router.backendstringL2 classify span"embedded_candle" or "remote_vllm"
context.bytes_savedu64L2.5 optimizer spanBytes removed by compression pipeline
context.strategystringL2.5 optimizer spanPipeline stages that modified content (e.g. "classifier+dedup")
provider.namestringL3 handler spane.g. "openai", "xai", "azure"
modelstringL3 handler spane.g. "gpt-4o", "grok-beta"
http.status_codeu16Root spanHTTP response status code
client.addressstringRoot spanClient IP (from x-forwarded-for)

OTel Metrics (src/metrics.rs)

Four instruments are registered as a singleton GatewayMetrics via OnceLock:

Metric NameTypeAttributesDescription
isartor_requests_totalCounterfinal_layer, status_code, traffic_surface, client, endpoint_family, toolTotal prompts processed
isartor_request_duration_secondsHistogramfinal_layer, status_code, traffic_surface, client, endpoint_familyEnd-to-end request duration
isartor_layer_duration_secondsHistogramlayer_name, toolPer-layer latency
isartor_tokens_saved_totalCounterfinal_layer, traffic_surface, client, endpoint_family, toolEstimated tokens saved by early resolve
isartor_errors_totalCounterlayer, error_class, toolError occurrences by layer / agent
isartor_retries_totalCounteroperation, attempts, outcome, toolRetry outcomes by agent
isartor_cache_events_totalCountercache_layer, outcome, toolL1 / L1a / L1b hit-miss safety by agent

Where Metrics Are Recorded

Call SiteMetrics Recorded
root_monitoring_middlewarerecord_request_with_context(), record_tokens_saved_with_context() (if early)
proxy::connect::emit_proxy_decision()record_request_with_context(), record_tokens_saved_with_context() (if early)
cache_middleware (L1 hit)record_layer_duration("L1a_ExactCache" | "L1b_SemanticCache")
slm_triage_middleware (L2 hit)record_layer_duration("L2_SLM")
context_optimizer_middlewarerecord_layer_duration("L2_5_ContextOptimiser") (when bytes saved > 0)
chat_handler (L3)record_layer_duration("L3_Cloud")

Request Dimensions

Unified prompt telemetry distinguishes:

  • traffic_surface: gateway or proxy
  • client: direct, openai, anthropic, copilot, claude, antigravity, etc.
  • endpoint_family: native, openai, or anthropic

Token Estimation

estimate_tokens(prompt) uses the heuristic: max(1, prompt.len() / 4). This is intentionally conservative — the metric tracks relative savings rather than precise token counts.


ROI — isartor_tokens_saved_total

This is the headline business metric. Every request resolved before Layer 3 (exact cache, semantic cache, or local SLM) avoids a round-trip to the external LLM provider.

# Daily token savings
sum(increase(isartor_tokens_saved_total[24h]))

# Savings by layer
sum by (final_layer) (rate(isartor_tokens_saved_total[1h]))

# Prompt volume by traffic surface
sum by (traffic_surface) (rate(isartor_requests_total[5m]))

# Prompt volume by client
sum by (client) (rate(isartor_requests_total[5m]))

# Estimated cost savings (assuming $0.01 per 1K tokens)
sum(increase(isartor_tokens_saved_total[24h])) / 1000 * 0.01

Use this metric to justify infrastructure spend for the caching / SLM layers.


Docker Compose — Local Observability Stack

Use the provided compose file for local development:

cd docker
docker compose -f docker-compose.observability.yml up -d
ServicePortPurpose
OTel Collector4317OTLP gRPC receiver
Jaeger16686Trace UI
Prometheus9090Metrics scrape + query
Grafana3000Dashboards (anonymous admin)

Configuration files:

FilePurpose
docker/otel-collector-config.yamlCollector pipelines
docker/prometheus.ymlScrape targets

Pipeline Flow

Isartor  ──OTLP gRPC──▶  OTel Collector ──▶  Jaeger    (traces)
                                          └──▶  Prometheus (metrics)
                                                     │
                                                     ▼
                                                  Grafana

OTel Collector Configuration

The collector config is at docker/otel-collector-config.yaml:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: "0.0.0.0:4317"
      http:

exporters:
  prometheus:
    endpoint: "0.0.0.0:8889"
  otlp:
    endpoint: "jaeger:4317"
    tls:
      insecure: true
  debug:
    verbosity: basic

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [otlp, debug]
    metrics:
      receivers: [otlp]
      exporters: [prometheus, debug]

Prometheus Configuration

The Prometheus config is at docker/prometheus.yml:

scrape_configs:
  - job_name: 'otel-collector'
    scrape_interval: 5s
    static_configs:
      - targets: ['otel-collector:8889']

Prometheus scrapes the OTel Collector's Prometheus exporter on port 8889 every 5 seconds.


Per-Tier Setup

Level 1 — Minimal (Console Logs Only)

No observability stack is needed. Use RUST_LOG for structured console output:

ISARTOR__ENABLE_MONITORING=false
RUST_LOG=isartor=info

For debug-level output during development:

RUST_LOG=isartor=debug,tower_http=trace

Level 2 — Docker Compose (Full Stack)

The docker-compose.sidecar.yml includes the complete observability stack:

cd docker
docker compose -f docker-compose.sidecar.yml up --build

Services included:

ServiceURLPurpose
OTel Collectorlocalhost:4317 (gRPC)Receives OTLP from gateway
Jaeger UIhttp://localhost:16686View distributed traces
Prometheushttp://localhost:9090Query metrics
Grafanahttp://localhost:3000Dashboards (anonymous admin access)

The gateway is pre-configured with:

ISARTOR__ENABLE_MONITORING=true
ISARTOR__OTEL_EXPORTER_ENDPOINT=http://otel-collector:4317

Level 3 — Kubernetes (Managed or Self-Hosted)

ApproachRecommended StackNotes
Self-managedOTel Collector DaemonSet + Jaeger Operator + kube-prometheus-stackFull control, higher ops burden
AWSAWS X-Ray + CloudWatch + Managed GrafanaADOT Collector as sidecar/DaemonSet
GCPCloud Trace + Cloud Monitoring + Cloud LoggingUse OTLP exporter to Cloud Trace
AzureApplication Insights + Azure MonitorUse Azure Monitor OpenTelemetry exporter
Grafana CloudGrafana Alloy + Grafana CloudLow ops, managed Prometheus + Tempo
DatadogDatadog Agent + OTel CollectorEnterprise APM

For all options, point the gateway at the collector:

ISARTOR__OTEL_EXPORTER_ENDPOINT=http://otel-collector.isartor:4317

Grafana Dashboard Queries (PromQL)

PanelPromQL
Request Raterate(isartor_requests_total[5m])
P95 Latencyhistogram_quantile(0.95, rate(isartor_request_duration_seconds_bucket[5m]))
Layer Resolutionsum by (final_layer) (rate(isartor_requests_total[5m]))
Traffic Surface Splitsum by (traffic_surface) (rate(isartor_requests_total[5m]))
Client Splitsum by (client) (rate(isartor_requests_total[5m]))
Per-Layer Latencyhistogram_quantile(0.95, sum by (le, layer_name) (rate(isartor_layer_duration_seconds_bucket[5m])))
Tokens Saved / Hoursum(increase(isartor_tokens_saved_total[1h]))
Tokens Saved by Layersum by (final_layer) (rate(isartor_tokens_saved_total[5m]))
Cache Hit Raterate(isartor_requests_total{final_layer=~"L1.*"}[5m]) / rate(isartor_requests_total[5m])

Jaeger — Useful Searches

GoalSearch
Slow requests (> 500 ms)Service isartor-gateway, Min Duration 500ms
Cache missesTag cache.hit=false
Semantic cache tuningTag cosine_similarity — sort by value
Layer 3 fallbacksTag isartor.final_layer=L3_Cloud
SLM local resolutionsTag router.decision=TEMPLATE or router.decision=SNIPPET (tiered); router.decision=SIMPLE (binary)

Trace Anatomy

A typical trace for a cache-miss, locally-resolved request:

isartor-gateway
  └─ HTTP POST /api/chat                       [250ms]
       ├─ Layer0_AuthCheck                       [0.1ms]
       ├─ Layer1_SemanticCache (MISS)            [5ms]
       ├─ Layer2_IntentClassifier                [80ms]
       │     intent=TEMPLATE, confidence=0.97
       └─ Layer2_LocalExecutor                   [160ms]
             model=phi-3-mini, tokens=42

Built-in User Views

For quick operator checks without a separate telemetry stack:

isartor stats --gateway-url http://localhost:8080
isartor stats --gateway-url http://localhost:8080 --by-tool

Add --gateway-api-key <key> only when gateway auth is enabled.

--by-tool prints richer per-agent stats: requests, cache hits/misses, average latency, retry count, error count, and L1a/L1b safety ratios.

Built-in JSON endpoints:

  • GET /health
  • GET /debug/proxy/recent
  • GET /debug/stats/prompts
  • GET /debug/stats/agents

Alerting Rules

Prometheus Alerting Rules

Create docker/prometheus-alerts.yml:

groups:
  - name: isartor
    rules:
      - alert: HighErrorRate
        expr: rate(isartor_requests_total{status="error"}[5m]) > 0.05
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Isartor error rate > 5% for 5 minutes"

      - alert: HighLatency
        expr: histogram_quantile(0.95, rate(isartor_request_duration_seconds_bucket[5m])) > 2
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Isartor P95 latency > 2s for 5 minutes"

      - alert: LowCacheHitRate
        expr: >
          rate(isartor_requests_total{final_layer=~"L1.*"}[15m]) /
          rate(isartor_requests_total[15m]) < 0.3
        for: 15m
        labels:
          severity: info
        annotations:
          summary: "Cache hit rate below 30% — consider tuning similarity threshold"

      - alert: LowDeflectionRate
        expr: |
          1 - (
            sum(rate(isartor_requests_total{final_layer="L3_Cloud"}[1h]))
            /
            sum(rate(isartor_requests_total[1h]))
          ) < 0.5
        for: 30m
        labels:
          severity: warning
        annotations:
          summary: "Isartor deflection rate below 50%"

      - alert: FirewallDown
        expr: up{job="isartor"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Isartor gateway is down"

Troubleshooting

SymptomCauseFix
No traces in JaegerMonitoring disabledSet ISARTOR__ENABLE_MONITORING=true
No traces in JaegerCollector unreachableVerify OTEL_EXPORTER_ENDPOINT + port 4317
No metrics in PrometheusPrometheus can't scrape collectorCheck prometheus.yml targets
Grafana "No data"Data source misconfiguredURL should be http://prometheus:9090
Console shows "OTel disabled"Config precedenceCheck env vars override file config
isartor_layer_duration_seconds emptyNo requests yetSend a test request

See also: Configuration Reference · Performance Tuning · Troubleshooting