# Tracing LangGraph Agents with OpenTelemetry and Phoenix

> Build a two-tool LangGraph agent that emits OpenInference-compliant OpenTelemetry spans, then view per-node cost and latency attribution in a locally-running Phoenix instance. No proprietary infrastructure required: the full stack runs on your laptop with a single API key.

- Canonical URL: https://agentry.press/tutorial/tracing-langgraph-agents-with-opentelemetry-and-phoenix/
- Type: Tutorial
- Published: 2026-06-06
- By: agentry
- Tags: langgraph, opentelemetry, observability, tracing, phoenix, agents

---

## Why this matters

As agentic systems grow more capable, the gap between "it works in a notebook" and "it runs reliably in production" widens fast. Multi-step agents that call tools, branch on LLM output, and accumulate context across turns are notoriously hard to debug when something goes wrong. You need to know which node consumed the most tokens, which tool call added 800 ms of latency, and whether a retry loop is silently burning budget.

The open-source agent ecosystem has converged on two complementary standards: LangGraph for stateful agent orchestration and OpenInference (an OpenTelemetry semantic convention layer) for portable trace data. Phoenix, Arize's open-source observability UI, speaks OpenInference natively and runs entirely in-process, so you get a full trace viewer without standing up a separate collector or paying for a hosted service. Wiring these three pieces together into a single runnable project is the gap this tutorial fills. The result is a pattern you can drop into any LangGraph agent and immediately see per-node cost and latency in a browser tab.

## Prerequisites

- Python 3.11 or newer
- An OpenAI API key (the live-call blocks are marked skip; you can substitute Anthropic with minor edits)
- Familiarity with async/await in Python
- Basic LangGraph knowledge (nodes, edges, `StateGraph`)
- No Docker required: Phoenix runs in-process via its embedded server mode

## Setup

Install all dependencies in one shot. `openinference-instrumentation-langchain` provides the auto-instrumentation hook that intercepts LangGraph's underlying LangChain calls and emits OpenInference-compliant spans.

```bash
uv pip install langgraph langchain-openai openai \
  arize-phoenix openinference-instrumentation-langchain \
  opentelemetry-sdk opentelemetry-exporter-otlp-proto-grpc \
  opentelemetry-exporter-otlp-proto-http httpx
```

Verify the key packages are present:

```python
from importlib.metadata import version
for pkg in ["langgraph", "arize-phoenix", "openinference-instrumentation-langchain", "opentelemetry-sdk"]:
    print(f"{pkg}: {version(pkg)}")
print("all packages found")
```

## Step 1: Start Phoenix in-process

Phoenix ships an embedded ASGI server that you can launch from Python. It listens on port 6006 by default and exposes an OTLP/HTTP endpoint at `/v1/traces`. Launching it with `nohup` keeps it alive across subsequent code blocks.

```bash
nohup python -c "
import phoenix as px
px.launch_app()
import time
while True:
    time.sleep(60)
" > /tmp/phoenix.log 2>&1 & disown
sleep 6
curl -sf http://localhost:6006 -o /dev/null && echo "phoenix_up" || (echo "phoenix failed" >&2; cat /tmp/phoenix.log; exit 1)
```

## Step 2: Configure the OpenTelemetry pipeline

Set up a `TracerProvider` that exports spans to Phoenix over OTLP/HTTP, then register the LangChain/LangGraph auto-instrumentation. The `LangChainInstrumentor` patches LangChain's callback system, which LangGraph uses internally, so every node invocation, LLM call, and tool execution gets a span automatically.

```python
# filename: otel_setup.py
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor, SimpleSpanProcessor, ConsoleSpanExporter
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from openinference.instrumentation.langchain import LangChainInstrumentor

PHOENIX_OTLP_ENDPOINT = "http://localhost:6006/v1/traces"

def setup_tracing(also_console: bool = False) -> TracerProvider:
    """Create and register a TracerProvider that sends spans to Phoenix."""
    provider = TracerProvider()

    # Primary exporter: Phoenix via OTLP/HTTP
    otlp_exporter = OTLPSpanExporter(endpoint=PHOENIX_OTLP_ENDPOINT)
    provider.add_span_processor(BatchSpanProcessor(otlp_exporter))

    if also_console:
        # Secondary exporter: console (synchronous, useful for local debugging)
        provider.add_span_processor(SimpleSpanProcessor(ConsoleSpanExporter()))

    trace.set_tracer_provider(provider)

    # Instrument LangChain/LangGraph callbacks
    LangChainInstrumentor().instrument(tracer_provider=provider)

    return provider
```

## Step 3: Define the two tools

The agent will have access to a weather lookup tool and a unit-conversion tool. Both are synchronous functions decorated with `@tool`. Keeping them deterministic means the structural tests run without API keys.

```python
# filename: tools.py
from langchain_core.tools import tool

@tool
def get_weather(city: str) -> str:
    """Return a mock current weather report for a city."""
    data = {
        "london": "12°C, overcast",
        "tokyo": "28°C, sunny",
        "new york": "19°C, partly cloudy",
    }
    return data.get(city.lower(), f"No data for {city}")

@tool
def convert_temperature(value: float, from_unit: str, to_unit: str) -> str:
    """Convert a temperature between Celsius and Fahrenheit."""
    from_unit = from_unit.lower()
    to_unit = to_unit.lower()
    if from_unit == "celsius" and to_unit == "fahrenheit":
        result = value * 9 / 5 + 32
        return f"{value}°C = {result:.1f}°F"
    elif from_unit == "fahrenheit" and to_unit == "celsius":
        result = (value - 32) * 5 / 9
        return f"{value}°F = {result:.1f}°C"
    return f"Unsupported conversion: {from_unit} -> {to_unit}"

ALL_TOOLS = [get_weather, convert_temperature]
```

## Step 4: Build the LangGraph agent

The agent follows the standard ReAct pattern: an `agent` node calls the LLM with tools bound, and a `tools` node executes whichever tool the LLM requested. The graph loops until the LLM emits a final answer with no tool calls.

The model client is created lazily inside `build_agent` so that the graph structure can be verified without a live API key.

```python
# filename: agent.py
from typing import Annotated, Sequence
from typing_extensions import TypedDict

from langchain_core.messages import BaseMessage, HumanMessage, AIMessage
from langchain_core.runnables import RunnableConfig
from langgraph.graph import StateGraph, END
from langgraph.graph.message import add_messages
from langgraph.prebuilt import ToolNode

from tools import ALL_TOOLS


class AgentState(TypedDict):
    messages: Annotated[Sequence[BaseMessage], add_messages]


def build_agent(model=None):
    """Construct the LangGraph agent. Pass a model instance or leave None to
    create a default ChatOpenAI (requires OPENAI_API_KEY at call time)."""
    if model is None:
        from langchain_openai import ChatOpenAI
        model = ChatOpenAI(model="gpt-4o-mini", temperature=0)

    model_with_tools = model.bind_tools(ALL_TOOLS)
    tool_node = ToolNode(ALL_TOOLS)

    def agent_node(state: AgentState, config: RunnableConfig) -> dict:
        response = model_with_tools.invoke(state["messages"], config)
        return {"messages": [response]}

    def should_continue(state: AgentState) -> str:
        last = state["messages"][-1]
        if isinstance(last, AIMessage) and last.tool_calls:
            return "tools"
        return END

    graph = StateGraph(AgentState)
    graph.add_node("agent", agent_node)
    graph.add_node("tools", tool_node)
    graph.set_entry_point("agent")
    graph.add_conditional_edges("agent", should_continue, {"tools": "tools", END: END})
    graph.add_edge("tools", "agent")

    return graph.compile()
```

## Step 5: Verify graph structure without an API key

Before making any live calls, confirm the graph compiles correctly and has the expected nodes. This block uses a stub model so no credentials are needed.

```python
from unittest.mock import MagicMock
from langchain_core.messages import AIMessage
from agent import build_agent

# Stub model that never calls OpenAI
stub = MagicMock()
stub.bind_tools.return_value = stub
stub.invoke.return_value = AIMessage(content="stub response", tool_calls=[])

app = build_agent(model=stub)
nodes = list(app.get_graph().nodes.keys())
print("nodes:", sorted(nodes))
assert "agent" in nodes, "missing agent node"
assert "tools" in nodes, "missing tools node"
print("graph_structure_ok")
```

## Step 6: Run the agent with tracing enabled

This block starts the OTel pipeline and invokes the agent with a real question. It requires `OPENAI_API_KEY` to be set in your environment, so it is marked as a skip block in the sandbox.

```python
import os
from langchain_core.messages import HumanMessage
from otel_setup import setup_tracing
from agent import build_agent

# Boot the tracing pipeline (sends to Phoenix + console for visibility)
provider = setup_tracing(also_console=True)

app = build_agent()  # uses ChatOpenAI, needs OPENAI_API_KEY

question = (
    "What is the current weather in London? "
    "Also convert that temperature to Fahrenheit."
)

result = app.invoke({"messages": [HumanMessage(content=question)]})

final = result["messages"][-1]
print("Agent answer:", final.content)

# Flush all buffered spans to Phoenix before the process exits
provider.force_flush()
print("Spans flushed to Phoenix at http://localhost:6006")
```

## Step 7: Inspect traces in Phoenix

With the agent run complete and spans flushed, open `http://localhost:6006` in your browser. You will see a project called `default` containing a single trace. Expand it to find:

- A root span named after the LangGraph run
- Child spans for each `agent` node invocation, each carrying `llm.token_count.prompt`, `llm.token_count.completion`, and `llm.model_name` attributes
- Child spans for each `tools` node invocation, each carrying the tool name and its input/output

Phoenix aggregates token counts across spans and displays per-trace cost estimates in the "Cost" column of the traces table, using published OpenAI pricing by model name.

> [!PULLQUOTE]
> Phoenix aggregates token counts across spans and displays per-trace cost estimates using published OpenAI pricing by model name.

To filter by node type, use the span attribute filter `span.kind = CHAIN` for LangGraph nodes or `span.kind = LLM` for model calls.

## Step 8: Add a custom span attribute

Auto-instrumentation covers the standard attributes. For domain-specific metadata (for example, tagging a trace with the user's session ID or a feature flag), you can add attributes to the active span from inside any node.

```python
# filename: agent_with_custom_spans.py
from typing import Annotated, Sequence
from typing_extensions import TypedDict

from opentelemetry import trace as otel_trace
from langchain_core.messages import BaseMessage, HumanMessage, AIMessage
from langchain_core.runnables import RunnableConfig
from langgraph.graph import StateGraph, END
from langgraph.graph.message import add_messages
from langgraph.prebuilt import ToolNode

from tools import ALL_TOOLS

tracer = otel_trace.get_tracer(__name__)


class AgentState(TypedDict):
    messages: Annotated[Sequence[BaseMessage], add_messages]
    session_id: str


def build_traced_agent(model=None):
    if model is None:
        from langchain_openai import ChatOpenAI
        model = ChatOpenAI(model="gpt-4o-mini", temperature=0)

    model_with_tools = model.bind_tools(ALL_TOOLS)
    tool_node = ToolNode(ALL_TOOLS)

    def agent_node(state: AgentState, config: RunnableConfig) -> dict:
        span = otel_trace.get_current_span()
        span.set_attribute("session.id", state.get("session_id", "unknown"))
        span.set_attribute("agent.turn", len(state["messages"]))
        response = model_with_tools.invoke(state["messages"], config)
        return {"messages": [response]}

    def should_continue(state: AgentState) -> str:
        last = state["messages"][-1]
        if isinstance(last, AIMessage) and last.tool_calls:
            return "tools"
        return END

    graph = StateGraph(AgentState)
    graph.add_node("agent", agent_node)
    graph.add_node("tools", tool_node)
    graph.set_entry_point("agent")
    graph.add_conditional_edges("agent", should_continue, {"tools": "tools", END: END})
    graph.add_edge("tools", "agent")

    return graph.compile()
```

Verify the extended graph compiles:

```python
from unittest.mock import MagicMock
from langchain_core.messages import AIMessage
from agent_with_custom_spans import build_traced_agent

stub = MagicMock()
stub.bind_tools.return_value = stub
stub.invoke.return_value = AIMessage(content="stub", tool_calls=[])

app2 = build_traced_agent(model=stub)
nodes2 = sorted(app2.get_graph().nodes.keys())
print("extended nodes:", nodes2)
assert "agent" in nodes2
print("extended_graph_ok")
```

## Step 9: Emit a console-only trace for local verification

This block wires a `SimpleSpanProcessor` with a `ConsoleSpanExporter` and runs the agent against the stub model. It proves the OTel pipeline fires without needing Phoenix or an API key.

```python
import io, sys
from unittest.mock import MagicMock
from langchain_core.messages import AIMessage, HumanMessage
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor, ConsoleSpanExporter
from opentelemetry import trace
from openinference.instrumentation.langchain import LangChainInstrumentor
from agent import build_agent

# Fresh provider for this verification block
verify_provider = TracerProvider()
buf = io.StringIO()
verify_provider.add_span_processor(SimpleSpanProcessor(ConsoleSpanExporter(out=buf)))
trace.set_tracer_provider(verify_provider)

# Re-instrument with the new provider
LangChainInstrumentor().instrument(tracer_provider=verify_provider, skip_dep_check=True)

stub = MagicMock()
stub.bind_tools.return_value = stub
stub.invoke.return_value = AIMessage(content="The weather is fine.", tool_calls=[])

app = build_agent(model=stub)
app.invoke({"messages": [HumanMessage(content="What is the weather in Tokyo?")]})

verify_provider.force_flush()
output = buf.getvalue()

if output.strip():
    print("spans_emitted: YES")
    # Print first 400 chars of span JSON for inspection
    print(output[:400])
else:
    print("spans_emitted: NO (stub model may not trigger LangChain callbacks)")

print("console_trace_verification_done")
```

## Verify it works

Run this end-to-end smoke test. It checks that all modules import cleanly, the graph compiles, and the OTel provider registers without errors.

```python
# End-to-end smoke test (no API key needed)
from importlib.metadata import version
from unittest.mock import MagicMock
from langchain_core.messages import AIMessage, HumanMessage
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor, ConsoleSpanExporter
from opentelemetry import trace
from openinference.instrumentation.langchain import LangChainInstrumentor
from agent import build_agent
from tools import get_weather, convert_temperature

# 1. Tool correctness
assert "12" in get_weather.invoke({"city": "london"}), "weather tool broken"
assert "53.6" in convert_temperature.invoke({"value": 12.0, "from_unit": "celsius", "to_unit": "fahrenheit"}), "conversion tool broken"

# 2. Graph structure
stub = MagicMock()
stub.bind_tools.return_value = stub
stub.invoke.return_value = AIMessage(content="done", tool_calls=[])
app = build_agent(model=stub)
assert "agent" in app.get_graph().nodes
assert "tools" in app.get_graph().nodes

# 3. OTel provider registers
provider = TracerProvider()
provider.add_span_processor(SimpleSpanProcessor(ConsoleSpanExporter()))
trace.set_tracer_provider(provider)
LangChainInstrumentor().instrument(tracer_provider=provider, skip_dep_check=True)

print(f"langgraph {version('langgraph')} | arize-phoenix {version('arize-phoenix')}")
print("smoke_test_passed")
```

## Troubleshooting

**`ModuleNotFoundError: No module named 'openinference'`** — The package name on PyPI is `openinference-instrumentation-langchain`, not `openinference`. Re-run the install block and confirm the package appears in `uv pip list`.

**Phoenix UI shows no traces after the agent run** — The `BatchSpanProcessor` buffers spans and flushes asynchronously. Always call `provider.force_flush()` before your script exits, as shown in Step 6. If traces still don't appear, check `/tmp/phoenix.log` to confirm the embedded server started successfully.

**`LangChainInstrumentor().instrument()` raises `RuntimeError: Already instrumented`** — This happens when you call `instrument()` twice in the same process (common in notebooks). Call `LangChainInstrumentor().uninstrument()` first, or pass `skip_dep_check=True` and guard with a module-level flag.

**Token counts are missing from spans** — `gpt-4o-mini` returns usage metadata by default. If you switch to a different model or provider, confirm the LangChain integration for that provider populates `response_metadata["token_usage"]`. The OpenInference instrumentor reads from that key.

**Port 6006 already in use** — Another Phoenix instance or a TensorBoard process may be bound to 6006. Pass `port=6007` to `px.launch_app(port=6007)` and update `PHOENIX_OTLP_ENDPOINT` in `otel_setup.py` accordingly.

**`ChatOpenAI` raises `AuthenticationError` immediately** — The client validates the API key at construction time, not at call time. Confirm `OPENAI_API_KEY` is exported in your shell before running Step 6.

## Next steps

- **Add a retrieval tool**: Wire a vector-store lookup as a third tool and observe how Phoenix breaks down retrieval latency versus LLM latency in the waterfall view.
- **Export to Grafana Tempo**: Replace the OTLP/HTTP exporter endpoint with a Tempo instance (`http://localhost:4318/v1/traces`) to correlate agent traces with infrastructure metrics in Grafana dashboards. The span structure is identical; only the exporter endpoint changes.
- **Structured evaluation**: Use Phoenix's `run_evals` API to score each trace for hallucination or relevance automatically, feeding results back as span annotations.
- **Multi-agent tracing**: Extend the pattern to a supervisor-worker LangGraph topology and observe how Phoenix groups child agent spans under the parent trace context.

## FAQ

### How does Phoenix display per-node cost and latency for LangGraph agents?

Phoenix receives OpenTelemetry spans emitted by the LangChainInstrumentor, which auto-instruments LangGraph's underlying LangChain calls. Each node invocation and tool execution generates a span carrying token counts and timing data. Phoenix aggregates token counts across spans and calculates per-trace cost estimates using published OpenAI pricing by model name, then displays breakdowns in its waterfall view.

### What is OpenInference and why does it matter for agent observability?

OpenInference is a semantic convention layer on top of OpenTelemetry that standardizes how LLM and agent spans are structured and attributed. It ensures portable trace data across different observability backends, so traces emitted by a LangGraph agent can be viewed in Phoenix, Grafana Tempo, or other OpenInference-compatible systems without code changes.

### Do I need Docker or a separate collector to run Phoenix with LangGraph?

No. Phoenix ships an embedded ASGI server that runs in-process on your laptop, listening on port 6006 with an OTLP/HTTP endpoint at `/v1/traces`. You launch it from Python and export spans directly to it without standing up external infrastructure.

### How do I add custom metadata like session IDs to agent traces?

Inside any LangGraph node, call `otel_trace.get_current_span()` and use `span.set_attribute(key, value)` to attach domain-specific metadata. The OpenTelemetry SDK automatically associates these attributes with the active span and includes them in exported traces.

### What happens if I forget to call provider.force_flush() before the script exits?

The BatchSpanProcessor buffers spans and flushes asynchronously. Without an explicit `force_flush()` call, buffered spans may not reach Phoenix before the process terminates. Always call `provider.force_flush()` at the end of your script to ensure all spans are exported.

## References

1. https://arxiv.org/abs/2605.15040v1