# Migrating Agent Print-Debug to Structured OTel Traces with LiteLLM

> Replace scattered print statements in a LiteLLM-routed agent with structured OpenTelemetry spans exportable to Grafana Tempo. By the end, you can query every LLM call by model name, token count, and tool name without touching your agent logic.

- Canonical URL: https://agentry.press/tutorial/migrating-agent-print-debug-to-structured-otel-traces-with-litellm/
- Type: Tutorial
- Published: 2026-06-04
- By: agentry
- Tags: observability, opentelemetry, litellm, tracing, agents, openllmetry

---

## Why this matters

As coding agents gain the ability to autonomously run commands and interact with development tools [2], the gap between "works on my laptop" and "auditable in production" has become a real operational problem. Print-based debugging collapses under any multi-step agent: you get a wall of text with no timing, no causal structure, and no way to correlate a slow response with a specific model or tool call.

OpenLLMetry (the OpenTelemetry instrumentation layer for LLM workloads) solves this by auto-instrumenting LiteLLM's completion calls and emitting spans that carry `gen_ai.model`, `gen_ai.usage.prompt_tokens`, `gen_ai.usage.completion_tokens`, and tool-call attributes. Because LiteLLM routes to any provider behind a single API surface, the same instrumentation code works whether your agent calls GPT-4o, Claude 3.5, or a self-hosted model on Hetzner.

This tutorial wires OpenLLMetry to a console exporter (runnable anywhere, no Docker needed) and shows the exact span attributes you can forward to Grafana Tempo or any OTLP-compatible backend by swapping one exporter line.

## Prerequisites

- Python 3.11 or 3.12
- An API key for at least one LLM provider (OpenAI, Anthropic, or any LiteLLM-supported provider)
- Basic familiarity with OpenTelemetry concepts (spans, exporters, tracer providers)
- Optional: a running Grafana Tempo instance if you want to forward traces beyond the console

## Setup

Install LiteLLM, the OpenLLMetry instrumentation package, and the OpenTelemetry SDK:

```bash
uv pip install litellm opentelemetry-sdk opentelemetry-exporter-otlp-proto-grpc openinference-instrumentation-litellm
```

Verify the key packages are present:

```python
from importlib.metadata import version
print("litellm:", version("litellm"))
print("openinference-instrumentation-litellm:", version("openinference-instrumentation-litellm"))
print("opentelemetry-sdk:", version("opentelemetry-sdk"))
print("setup_ok")
```

## Step 1: Understand what the instrumentation captures

Before writing any agent code, it helps to see exactly what OpenLLMetry emits. The `LiteLLMInstrumentor` patches `litellm.completion` (and its async variant) at import time. Each call becomes a span with these attributes:

| Attribute | Example value |
|---|---|
| `gen_ai.system` | `openai` |
| `gen_ai.request.model` | `gpt-4o-mini` |
| `gen_ai.usage.prompt_tokens` | `42` |
| `gen_ai.usage.completion_tokens` | `18` |
| `gen_ai.response.finish_reasons` | `["stop"]` |
| `llm.request.type` | `chat` |

Tool calls add a child span per tool with `tool.name` and the serialized arguments. This structure is what lets you write Tempo queries like `{span.gen_ai.request.model="gpt-4o-mini"}` and immediately see every call to that model across all agent runs.

## Step 2: Wire the tracer provider with a console exporter

The console exporter is the right starting point: it requires no running service, and the output is identical in structure to what you'd send to Tempo. You swap the exporter later without changing any agent code.

```python
# filename: tracing_setup.py
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor, ConsoleSpanExporter
from openinference.instrumentation.litellm import LiteLLMInstrumentor


def configure_tracing() -> TracerProvider:
    """Set up a TracerProvider that prints spans to stdout.

    Swap ConsoleSpanExporter for OTLPSpanExporter to forward to Tempo.
    """
    provider = TracerProvider()
    # SimpleSpanProcessor flushes each span synchronously -- ideal for scripts
    # and tests. Use BatchSpanProcessor in long-running services.
    provider.add_span_processor(SimpleSpanProcessor(ConsoleSpanExporter()))
    trace.set_tracer_provider(provider)

    # Patch litellm.completion so every call emits a span automatically.
    LiteLLMInstrumentor().instrument()
    return provider
```

Two design choices worth noting:

- `SimpleSpanProcessor` flushes each span the moment it ends. This is correct for scripts and tests. In a long-running service, replace it with `BatchSpanProcessor` for throughput.
- `LiteLLMInstrumentor().instrument()` monkey-patches LiteLLM globally. Call it once at startup, before any `litellm.completion` call.

## Step 3: Build a minimal tool-calling agent

This agent simulates a two-tool workflow: a calculator and a unit converter. The tools are plain Python functions. The agent loop calls LiteLLM, checks for tool calls in the response, dispatches them, and feeds results back. No framework needed.

```python
# filename: agent.py
import json
import litellm
from opentelemetry import trace

# Tool implementations
def calculate(expression: str) -> str:
    """Evaluate a simple arithmetic expression."""
    try:
        result = eval(expression, {"__builtins__": {}})  # noqa: S307
        return str(result)
    except Exception as exc:
        return f"error: {exc}"


def convert_units(value: float, from_unit: str, to_unit: str) -> str:
    """Convert between a small set of units."""
    conversions = {
        ("km", "miles"): 0.621371,
        ("miles", "km"): 1.60934,
        ("kg", "lbs"): 2.20462,
        ("lbs", "kg"): 0.453592,
    }
    factor = conversions.get((from_unit, to_unit))
    if factor is None:
        return f"unknown conversion: {from_unit} -> {to_unit}"
    return f"{value * factor:.4f} {to_unit}"


TOOL_REGISTRY = {
    "calculate": calculate,
    "convert_units": lambda args: convert_units(
        args["value"], args["from_unit"], args["to_unit"]
    ),
}

TOOL_SCHEMAS = [
    {
        "type": "function",
        "function": {
            "name": "calculate",
            "description": "Evaluate an arithmetic expression",
            "parameters": {
                "type": "object",
                "properties": {"expression": {"type": "string"}},
                "required": ["expression"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "convert_units",
            "description": "Convert a numeric value between units",
            "parameters": {
                "type": "object",
                "properties": {
                    "value": {"type": "number"},
                    "from_unit": {"type": "string"},
                    "to_unit": {"type": "string"},
                },
                "required": ["value", "from_unit", "to_unit"],
            },
        },
    },
]


def run_agent(model: str, user_message: str) -> str:
    """Run a single-turn agent loop with tool support.

    The model parameter is any LiteLLM model string, e.g.:
      'openai/gpt-4o-mini', 'anthropic/claude-3-haiku-20240307'
    """
    tracer = trace.get_tracer("agent")
    messages = [{"role": "user", "content": user_message}]

    with tracer.start_as_current_span("agent.run") as agent_span:
        agent_span.set_attribute("agent.model", model)
        agent_span.set_attribute("agent.user_message", user_message)

        for step in range(5):  # guard against infinite loops
            response = litellm.completion(
                model=model,
                messages=messages,
                tools=TOOL_SCHEMAS,
                tool_choice="auto",
            )
            choice = response.choices[0]

            if choice.finish_reason == "tool_calls":
                tool_calls = choice.message.tool_calls
                messages.append(choice.message)

                for tc in tool_calls:
                    fn_name = tc.function.name
                    fn_args = json.loads(tc.function.arguments)

                    with tracer.start_as_current_span(f"tool.{fn_name}") as tool_span:
                        tool_span.set_attribute("tool.name", fn_name)
                        tool_span.set_attribute("tool.arguments", json.dumps(fn_args))

                        if fn_name == "calculate":
                            result = calculate(fn_args["expression"])
                        else:
                            result = TOOL_REGISTRY[fn_name](fn_args)

                        tool_span.set_attribute("tool.result", result)

                    messages.append({
                        "role": "tool",
                        "tool_call_id": tc.id,
                        "content": result,
                    })

            else:
                final_answer = choice.message.content or ""
                agent_span.set_attribute("agent.final_answer", final_answer)
                agent_span.set_attribute("agent.steps", step + 1)
                return final_answer

        return "max steps reached"
```

The key structural point: the `agent.run` span wraps the entire loop, and each tool dispatch gets its own child span. This parent-child relationship is what Tempo renders as a waterfall, letting you see at a glance whether latency came from the LLM call or the tool execution.

> [!PULLQUOTE]
> The parent-child span relationship is what Tempo renders as a waterfall, letting you see at a glance whether latency came from the LLM call or the tool execution.

## Step 4: Replace print statements with span events

The old pattern looks like this:

```python
# Old print-debug style -- do not use
print(f"[DEBUG] Calling model {model} with {len(messages)} messages")
response = litellm.completion(...)
print(f"[DEBUG] Got response: {response.choices[0].finish_reason}")
print(f"[DEBUG] Tokens used: {response.usage.total_tokens}")
```

The structured replacement uses span events and attributes:

```python
# filename: span_events_demo.py
from opentelemetry import trace


def demo_span_events():
    tracer = trace.get_tracer("demo")
    with tracer.start_as_current_span("llm.call") as span:
        # Attributes are indexed and queryable
        span.set_attribute("gen_ai.request.model", "gpt-4o-mini")
        span.set_attribute("message_count", 3)

        # Events are timestamped log entries inside the span
        span.add_event("before_completion", {"message_count": 3})

        # ... completion call would go here ...

        span.add_event("after_completion", {
            "finish_reason": "stop",
            "total_tokens": 60,
        })
        span.set_attribute("gen_ai.usage.total_tokens", 60)
    print("span_events_demo_ok")


demo_span_events()
```

Attributes (`set_attribute`) are indexed fields you filter on. Events (`add_event`) are timestamped log lines attached to the span timeline. Use attributes for things you want to aggregate (model name, token counts, tool names) and events for narrative checkpoints ("retrying after rate limit", "cache hit").

## Step 5: Run the agent and inspect the trace output

The entry point below wires tracing, runs the agent with a mock response (so it executes without a real API key in the sandbox), and prints the span structure to stdout.

```python
# filename: run_demo.py
import json
from unittest.mock import MagicMock, patch
from tracing_setup import configure_tracing


def make_mock_response(content: str, finish_reason: str = "stop"):
    """Build a minimal litellm-shaped response object."""
    choice = MagicMock()
    choice.finish_reason = finish_reason
    choice.message.content = content
    choice.message.tool_calls = None

    usage = MagicMock()
    usage.prompt_tokens = 25
    usage.completion_tokens = 15
    usage.total_tokens = 40

    response = MagicMock()
    response.choices = [choice]
    response.usage = usage
    response.model = "gpt-4o-mini"
    return response


def run_traced_demo():
    provider = configure_tracing()

    mock_response = make_mock_response(
        "The result of 42 * 7 is 294, which is approximately 182.7 miles."
    )

    with patch("litellm.completion", return_value=mock_response):
        from agent import run_agent
        result = run_agent(
            model="openai/gpt-4o-mini",
            user_message="What is 42 * 7, and convert that many km to miles?",
        )

    print("\n=== Agent result ===")
    print(result)

    # Force flush so all spans are written before the process exits
    provider.force_flush()
    print("\ndemo_complete")


run_traced_demo()
```

```bash
python /workspace/run_demo.py
```

You'll see JSON span objects printed to stdout. Each span includes `name`, `context.trace_id`, `context.span_id`, `parent_id`, `start_time`, `end_time`, and the `attributes` dict. The `agent.run` span's `parent_id` is null (it's the root). The `litellm.completion` span (emitted by OpenLLMetry) and any `tool.*` spans have the agent span's ID as their `parent_id`.

## Step 6: Forward traces to Grafana Tempo (OTLP)

Swapping the exporter is a one-line change in `tracing_setup.py`. No agent code changes.

```python
# filename: tracing_setup_tempo.py
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from openinference.instrumentation.litellm import LiteLLMInstrumentor
import os


def configure_tracing_tempo() -> TracerProvider:
    """Configure tracing to export to Grafana Tempo via OTLP/gRPC.

    Set OTEL_EXPORTER_OTLP_ENDPOINT to your Tempo endpoint, e.g.:
      export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
    """
    endpoint = os.environ.get("OTEL_EXPORTER_OTLP_ENDPOINT", "http://localhost:4317")

    exporter = OTLPSpanExporter(endpoint=endpoint, insecure=True)
    provider = TracerProvider()
    # BatchSpanProcessor is correct for production: buffers and flushes efficiently.
    provider.add_span_processor(BatchSpanProcessor(exporter))
    trace.set_tracer_provider(provider)

    LiteLLMInstrumentor().instrument()
    return provider
```

To run Tempo locally, start it with Docker (outside the sandbox):

```yaml
# docker-compose.yml for local Tempo
services:
  tempo:
    image: grafana/tempo:latest
    command: ["-config.file=/etc/tempo.yaml"]
    ports:
      - "4317:4317"   # OTLP gRPC
      - "3200:3200"   # Tempo HTTP API
    volumes:
      - ./tempo.yaml:/etc/tempo.yaml
```

Then set `OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317` and replace `configure_tracing()` with `configure_tracing_tempo()` in your entry point. The same span structure that printed to the console now indexes in Tempo. You can query it with TraceQL:

```
{ span.gen_ai.request.model = "gpt-4o-mini" } | avg(duration) by (span.tool.name)
```

The same OTLP payload works with Datadog, Honeycomb, or New Relic. Only the exporter endpoint and authentication headers change.

## Verify it works

```python
import io
import sys
from unittest.mock import MagicMock, patch

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor, ConsoleSpanExporter
from openinference.instrumentation.litellm import LiteLLMInstrumentor

# Fresh provider for this verification block
provider = TracerProvider()
buf = io.StringIO()
provider.add_span_processor(SimpleSpanProcessor(ConsoleSpanExporter(out=buf)))
trace.set_tracer_provider(provider)

# Re-instrument with the fresh provider
LiteLLMInstrumentor().uninstrument()
LiteLLMInstrumentor().instrument()

mock_resp = MagicMock()
mock_resp.choices[0].finish_reason = "stop"
mock_resp.choices[0].message.content = "294"
mock_resp.choices[0].message.tool_calls = None
mock_resp.usage.prompt_tokens = 10
mock_resp.usage.completion_tokens = 5
mock_resp.usage.total_tokens = 15
mock_resp.model = "gpt-4o-mini"

with patch("litellm.completion", return_value=mock_resp):
    from agent import run_agent
    answer = run_agent("openai/gpt-4o-mini", "What is 6 * 7?")

provider.force_flush()

span_output = buf.getvalue()

# Verify the agent span was emitted
assert "agent.run" in span_output, f"Expected 'agent.run' span, got:\n{span_output[:500]}"

# Verify the model attribute was recorded
assert "gpt-4o-mini" in span_output, "Expected model attribute in span output"

# Verify the answer came back
assert answer == "294", f"Unexpected answer: {answer}"

print("verification_passed")
print(f"Answer: {answer}")
print(f"Span output length: {len(span_output)} chars")
```

## Troubleshooting

**`ModuleNotFoundError: No module named 'openinference'`** -- The package name on PyPI is `openinference-instrumentation-litellm`, not `openinference`. Run `uv pip install openinference-instrumentation-litellm` and confirm with `uv pip show openinference-instrumentation-litellm`.

**Spans appear in the console but not in Tempo** -- Confirm Tempo is accepting OTLP on port 4317 with `curl -v http://localhost:4317`. If Tempo is behind TLS, set `insecure=False` in `OTLPSpanExporter` and provide the CA cert via `OTEL_EXPORTER_OTLP_CERTIFICATE`. Also confirm you called `provider.force_flush()` before process exit when using `BatchSpanProcessor`.

**`LiteLLMInstrumentor().instrument()` raises `RuntimeError: Already instrumented`** -- You called `instrument()` twice in the same process. Guard with `LiteLLMInstrumentor().uninstrument()` before re-instrumenting, or check `LiteLLMInstrumentor().is_instrumented_by_opentelemetry` first.

**Tool call spans are missing** -- The child `tool.*` spans are created by your agent code, not by OpenLLMetry. Confirm the `with tracer.start_as_current_span(...)` block in `agent.py` is inside the same thread as the `litellm.completion` call. Async agents need `tracer.start_as_current_span` replaced with `async with tracer.start_as_current_span`.

**Token counts show as zero in spans** -- Some LiteLLM provider adapters don't populate `response.usage` for streaming calls. Set `stream=False` for the instrumented path, or enable `stream_options={"include_usage": True}` if the provider supports it.

**`ConsoleSpanExporter` output is empty after the agent call** -- You're using `BatchSpanProcessor` instead of `SimpleSpanProcessor`. The batch processor flushes asynchronously. Either switch to `SimpleSpanProcessor` for local testing, or call `provider.force_flush()` immediately after the agent call and before reading the output buffer.

## Next steps

- **Add cost tracking**: LiteLLM exposes `response._hidden_params["response_cost"]` after each call. Record it as `gen_ai.usage.cost` on the span and build a Grafana dashboard that aggregates spend by model and user.
- **Propagate trace context across services**: If your agent calls downstream microservices over HTTP, inject the W3C `traceparent` header with `opentelemetry.propagate.inject(headers)` so Tempo links the full distributed trace.
- **Sample high-token traces**: Configure a `ParentBasedTraceIdRatio` sampler that keeps 100% of traces where `gen_ai.usage.total_tokens > 1000` and samples the rest at 10%, reducing storage costs without losing visibility into expensive calls.
- **Export to a managed OTLP backend**: Grafana Cloud, Honeycomb, and Datadog all accept the same OTLP payload. Replace `OTLPSpanExporter(endpoint="http://localhost:4317")` with the vendor's endpoint and set `OTEL_EXPORTER_OTLP_HEADERS` to the auth token. No agent code changes required.

## FAQ

### What span attributes does OpenLLMetry capture from LiteLLM calls?

OpenLLMetry captures gen_ai.system, gen_ai.request.model, gen_ai.usage.prompt_tokens, gen_ai.usage.completion_tokens, gen_ai.response.finish_reasons, and llm.request.type. Tool calls add child spans with tool.name and serialized arguments.

### How do I forward traces from the console exporter to Grafana Tempo?

Replace ConsoleSpanExporter with OTLPSpanExporter in tracing_setup.py, pointing to your Tempo endpoint (default http://localhost:4317). The same span structure and agent code work unchanged; only the exporter line changes.

### Should I use SimpleSpanProcessor or BatchSpanProcessor?

Use SimpleSpanProcessor for scripts and tests because it flushes each span immediately. Use BatchSpanProcessor in long-running services for better throughput. Remember to call provider.force_flush() before process exit with BatchSpanProcessor.

### How do I query traces by model name or token count in Tempo?

Use TraceQL queries like { span.gen_ai.request.model = "gpt-4o-mini" } to filter by model, or { span.gen_ai.usage.total_tokens > 1000 } to find expensive calls. The indexed span attributes make these queries fast.

### Why are my tool call spans missing from the trace output?

Tool spans are created by your agent code with tracer.start_as_current_span(), not by OpenLLMetry. Confirm the span block wraps the tool execution and runs in the same thread as the litellm.completion call.

## References

1. https://github.com/vercel-labs/open-agents
2. https://openai.com/index/running-codex-safely
