Why this matters
As agentic systems grow more capable, the gap between “it works in a notebook” and “it runs reliably in production” widens fast. Multi-step agents that call tools, branch on LLM output, and accumulate context across turns are notoriously hard to debug when something goes wrong. You need to know which node consumed the most tokens, which tool call added 800 ms of latency, and whether a retry loop is silently burning budget.
The open-source agent ecosystem has converged on two complementary standards: LangGraph for stateful agent orchestration and OpenInference (an OpenTelemetry semantic convention layer) for portable trace data. Phoenix, Arize’s open-source observability UI, speaks OpenInference natively and runs entirely in-process, so you get a full trace viewer without standing up a separate collector or paying for a hosted service. Wiring these three pieces together into a single runnable project is the gap this tutorial fills. The result is a pattern you can drop into any LangGraph agent and immediately see per-node cost and latency in a browser tab.
Prerequisites
- Python 3.11 or newer
- An OpenAI API key (the live-call blocks are marked skip; you can substitute Anthropic with minor edits)
- Familiarity with async/await in Python
- Basic LangGraph knowledge (nodes, edges,
StateGraph) - No Docker required: Phoenix runs in-process via its embedded server mode
Setup
Install all dependencies in one shot. openinference-instrumentation-langchain provides the auto-instrumentation hook that intercepts LangGraph’s underlying LangChain calls and emits OpenInference-compliant spans.
uv pip install langgraph langchain-openai openai \
arize-phoenix openinference-instrumentation-langchain \
opentelemetry-sdk opentelemetry-exporter-otlp-proto-grpc \
opentelemetry-exporter-otlp-proto-http httpx
Verify the key packages are present:
from importlib.metadata import version
for pkg in ["langgraph", "arize-phoenix", "openinference-instrumentation-langchain", "opentelemetry-sdk"]:
print(f"{pkg}: {version(pkg)}")
print("all packages found")
Step 1: Start Phoenix in-process
Phoenix ships an embedded ASGI server that you can launch from Python. It listens on port 6006 by default and exposes an OTLP/HTTP endpoint at /v1/traces. Launching it with nohup keeps it alive across subsequent code blocks.
nohup python -c "
import phoenix as px
px.launch_app()
import time
while True:
time.sleep(60)
" > /tmp/phoenix.log 2>&1 & disown
sleep 6
curl -sf http://localhost:6006 -o /dev/null && echo "phoenix_up" || (echo "phoenix failed" >&2; cat /tmp/phoenix.log; exit 1)
Step 2: Configure the OpenTelemetry pipeline
Set up a TracerProvider that exports spans to Phoenix over OTLP/HTTP, then register the LangChain/LangGraph auto-instrumentation. The LangChainInstrumentor patches LangChain’s callback system, which LangGraph uses internally, so every node invocation, LLM call, and tool execution gets a span automatically.
# filename: otel_setup.py
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor, SimpleSpanProcessor, ConsoleSpanExporter
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from openinference.instrumentation.langchain import LangChainInstrumentor
PHOENIX_OTLP_ENDPOINT = "http://localhost:6006/v1/traces"
def setup_tracing(also_console: bool = False) -> TracerProvider:
"""Create and register a TracerProvider that sends spans to Phoenix."""
provider = TracerProvider()
# Primary exporter: Phoenix via OTLP/HTTP
otlp_exporter = OTLPSpanExporter(endpoint=PHOENIX_OTLP_ENDPOINT)
provider.add_span_processor(BatchSpanProcessor(otlp_exporter))
if also_console:
# Secondary exporter: console (synchronous, useful for local debugging)
provider.add_span_processor(SimpleSpanProcessor(ConsoleSpanExporter()))
trace.set_tracer_provider(provider)
# Instrument LangChain/LangGraph callbacks
LangChainInstrumentor().instrument(tracer_provider=provider)
return provider
Step 3: Define the two tools
The agent will have access to a weather lookup tool and a unit-conversion tool. Both are synchronous functions decorated with @tool. Keeping them deterministic means the structural tests run without API keys.
# filename: tools.py
from langchain_core.tools import tool
@tool
def get_weather(city: str) -> str:
"""Return a mock current weather report for a city."""
data = {
"london": "12°C, overcast",
"tokyo": "28°C, sunny",
"new york": "19°C, partly cloudy",
}
return data.get(city.lower(), f"No data for {city}")
@tool
def convert_temperature(value: float, from_unit: str, to_unit: str) -> str:
"""Convert a temperature between Celsius and Fahrenheit."""
from_unit = from_unit.lower()
to_unit = to_unit.lower()
if from_unit == "celsius" and to_unit == "fahrenheit":
result = value * 9 / 5 + 32
return f"{value}°C = {result:.1f}°F"
elif from_unit == "fahrenheit" and to_unit == "celsius":
result = (value - 32) * 5 / 9
return f"{value}°F = {result:.1f}°C"
return f"Unsupported conversion: {from_unit} -> {to_unit}"
ALL_TOOLS = [get_weather, convert_temperature]
Step 4: Build the LangGraph agent
The agent follows the standard ReAct pattern: an agent node calls the LLM with tools bound, and a tools node executes whichever tool the LLM requested. The graph loops until the LLM emits a final answer with no tool calls.
The model client is created lazily inside build_agent so that the graph structure can be verified without a live API key.
# filename: agent.py
from typing import Annotated, Sequence
from typing_extensions import TypedDict
from langchain_core.messages import BaseMessage, HumanMessage, AIMessage
from langchain_core.runnables import RunnableConfig
from langgraph.graph import StateGraph, END
from langgraph.graph.message import add_messages
from langgraph.prebuilt import ToolNode
from tools import ALL_TOOLS
class AgentState(TypedDict):
messages: Annotated[Sequence[BaseMessage], add_messages]
def build_agent(model=None):
"""Construct the LangGraph agent. Pass a model instance or leave None to
create a default ChatOpenAI (requires OPENAI_API_KEY at call time)."""
if model is None:
from langchain_openai import ChatOpenAI
model = ChatOpenAI(model="gpt-4o-mini", temperature=0)
model_with_tools = model.bind_tools(ALL_TOOLS)
tool_node = ToolNode(ALL_TOOLS)
def agent_node(state: AgentState, config: RunnableConfig) -> dict:
response = model_with_tools.invoke(state["messages"], config)
return {"messages": [response]}
def should_continue(state: AgentState) -> str:
last = state["messages"][-1]
if isinstance(last, AIMessage) and last.tool_calls:
return "tools"
return END
graph = StateGraph(AgentState)
graph.add_node("agent", agent_node)
graph.add_node("tools", tool_node)
graph.set_entry_point("agent")
graph.add_conditional_edges("agent", should_continue, {"tools": "tools", END: END})
graph.add_edge("tools", "agent")
return graph.compile()
Step 5: Verify graph structure without an API key
Before making any live calls, confirm the graph compiles correctly and has the expected nodes. This block uses a stub model so no credentials are needed.
from unittest.mock import MagicMock
from langchain_core.messages import AIMessage
from agent import build_agent
# Stub model that never calls OpenAI
stub = MagicMock()
stub.bind_tools.return_value = stub
stub.invoke.return_value = AIMessage(content="stub response", tool_calls=[])
app = build_agent(model=stub)
nodes = list(app.get_graph().nodes.keys())
print("nodes:", sorted(nodes))
assert "agent" in nodes, "missing agent node"
assert "tools" in nodes, "missing tools node"
print("graph_structure_ok")
Step 6: Run the agent with tracing enabled
This block starts the OTel pipeline and invokes the agent with a real question. It requires OPENAI_API_KEY to be set in your environment, so it is marked as a skip block in the sandbox.
import os
from langchain_core.messages import HumanMessage
from otel_setup import setup_tracing
from agent import build_agent
# Boot the tracing pipeline (sends to Phoenix + console for visibility)
provider = setup_tracing(also_console=True)
app = build_agent() # uses ChatOpenAI, needs OPENAI_API_KEY
question = (
"What is the current weather in London? "
"Also convert that temperature to Fahrenheit."
)
result = app.invoke({"messages": [HumanMessage(content=question)]})
final = result["messages"][-1]
print("Agent answer:", final.content)
# Flush all buffered spans to Phoenix before the process exits
provider.force_flush()
print("Spans flushed to Phoenix at http://localhost:6006")
Step 7: Inspect traces in Phoenix
With the agent run complete and spans flushed, open http://localhost:6006 in your browser. You will see a project called default containing a single trace. Expand it to find:
- A root span named after the LangGraph run
- Child spans for each
agentnode invocation, each carryingllm.token_count.prompt,llm.token_count.completion, andllm.model_nameattributes - Child spans for each
toolsnode invocation, each carrying the tool name and its input/output
Phoenix aggregates token counts across spans and displays per-trace cost estimates in the “Cost” column of the traces table, using published OpenAI pricing by model name.
Phoenix aggregates token counts across spans and displays per-trace cost estimates using published OpenAI pricing by model name.
To filter by node type, use the span attribute filter span.kind = CHAIN for LangGraph nodes or span.kind = LLM for model calls.
Step 8: Add a custom span attribute
Auto-instrumentation covers the standard attributes. For domain-specific metadata (for example, tagging a trace with the user’s session ID or a feature flag), you can add attributes to the active span from inside any node.
# filename: agent_with_custom_spans.py
from typing import Annotated, Sequence
from typing_extensions import TypedDict
from opentelemetry import trace as otel_trace
from langchain_core.messages import BaseMessage, HumanMessage, AIMessage
from langchain_core.runnables import RunnableConfig
from langgraph.graph import StateGraph, END
from langgraph.graph.message import add_messages
from langgraph.prebuilt import ToolNode
from tools import ALL_TOOLS
tracer = otel_trace.get_tracer(__name__)
class AgentState(TypedDict):
messages: Annotated[Sequence[BaseMessage], add_messages]
session_id: str
def build_traced_agent(model=None):
if model is None:
from langchain_openai import ChatOpenAI
model = ChatOpenAI(model="gpt-4o-mini", temperature=0)
model_with_tools = model.bind_tools(ALL_TOOLS)
tool_node = ToolNode(ALL_TOOLS)
def agent_node(state: AgentState, config: RunnableConfig) -> dict:
span = otel_trace.get_current_span()
span.set_attribute("session.id", state.get("session_id", "unknown"))
span.set_attribute("agent.turn", len(state["messages"]))
response = model_with_tools.invoke(state["messages"], config)
return {"messages": [response]}
def should_continue(state: AgentState) -> str:
last = state["messages"][-1]
if isinstance(last, AIMessage) and last.tool_calls:
return "tools"
return END
graph = StateGraph(AgentState)
graph.add_node("agent", agent_node)
graph.add_node("tools", tool_node)
graph.set_entry_point("agent")
graph.add_conditional_edges("agent", should_continue, {"tools": "tools", END: END})
graph.add_edge("tools", "agent")
return graph.compile()
Verify the extended graph compiles:
from unittest.mock import MagicMock
from langchain_core.messages import AIMessage
from agent_with_custom_spans import build_traced_agent
stub = MagicMock()
stub.bind_tools.return_value = stub
stub.invoke.return_value = AIMessage(content="stub", tool_calls=[])
app2 = build_traced_agent(model=stub)
nodes2 = sorted(app2.get_graph().nodes.keys())
print("extended nodes:", nodes2)
assert "agent" in nodes2
print("extended_graph_ok")
Step 9: Emit a console-only trace for local verification
This block wires a SimpleSpanProcessor with a ConsoleSpanExporter and runs the agent against the stub model. It proves the OTel pipeline fires without needing Phoenix or an API key.
import io, sys
from unittest.mock import MagicMock
from langchain_core.messages import AIMessage, HumanMessage
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor, ConsoleSpanExporter
from opentelemetry import trace
from openinference.instrumentation.langchain import LangChainInstrumentor
from agent import build_agent
# Fresh provider for this verification block
verify_provider = TracerProvider()
buf = io.StringIO()
verify_provider.add_span_processor(SimpleSpanProcessor(ConsoleSpanExporter(out=buf)))
trace.set_tracer_provider(verify_provider)
# Re-instrument with the new provider
LangChainInstrumentor().instrument(tracer_provider=verify_provider, skip_dep_check=True)
stub = MagicMock()
stub.bind_tools.return_value = stub
stub.invoke.return_value = AIMessage(content="The weather is fine.", tool_calls=[])
app = build_agent(model=stub)
app.invoke({"messages": [HumanMessage(content="What is the weather in Tokyo?")]})
verify_provider.force_flush()
output = buf.getvalue()
if output.strip():
print("spans_emitted: YES")
# Print first 400 chars of span JSON for inspection
print(output[:400])
else:
print("spans_emitted: NO (stub model may not trigger LangChain callbacks)")
print("console_trace_verification_done")
Verify it works
Run this end-to-end smoke test. It checks that all modules import cleanly, the graph compiles, and the OTel provider registers without errors.
# End-to-end smoke test (no API key needed)
from importlib.metadata import version
from unittest.mock import MagicMock
from langchain_core.messages import AIMessage, HumanMessage
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor, ConsoleSpanExporter
from opentelemetry import trace
from openinference.instrumentation.langchain import LangChainInstrumentor
from agent import build_agent
from tools import get_weather, convert_temperature
# 1. Tool correctness
assert "12" in get_weather.invoke({"city": "london"}), "weather tool broken"
assert "53.6" in convert_temperature.invoke({"value": 12.0, "from_unit": "celsius", "to_unit": "fahrenheit"}), "conversion tool broken"
# 2. Graph structure
stub = MagicMock()
stub.bind_tools.return_value = stub
stub.invoke.return_value = AIMessage(content="done", tool_calls=[])
app = build_agent(model=stub)
assert "agent" in app.get_graph().nodes
assert "tools" in app.get_graph().nodes
# 3. OTel provider registers
provider = TracerProvider()
provider.add_span_processor(SimpleSpanProcessor(ConsoleSpanExporter()))
trace.set_tracer_provider(provider)
LangChainInstrumentor().instrument(tracer_provider=provider, skip_dep_check=True)
print(f"langgraph {version('langgraph')} | arize-phoenix {version('arize-phoenix')}")
print("smoke_test_passed")
Troubleshooting
ModuleNotFoundError: No module named 'openinference' — The package name on PyPI is openinference-instrumentation-langchain, not openinference. Re-run the install block and confirm the package appears in uv pip list.
Phoenix UI shows no traces after the agent run — The BatchSpanProcessor buffers spans and flushes asynchronously. Always call provider.force_flush() before your script exits, as shown in Step 6. If traces still don’t appear, check /tmp/phoenix.log to confirm the embedded server started successfully.
LangChainInstrumentor().instrument() raises RuntimeError: Already instrumented — This happens when you call instrument() twice in the same process (common in notebooks). Call LangChainInstrumentor().uninstrument() first, or pass skip_dep_check=True and guard with a module-level flag.
Token counts are missing from spans — gpt-4o-mini returns usage metadata by default. If you switch to a different model or provider, confirm the LangChain integration for that provider populates response_metadata["token_usage"]. The OpenInference instrumentor reads from that key.
Port 6006 already in use — Another Phoenix instance or a TensorBoard process may be bound to 6006. Pass port=6007 to px.launch_app(port=6007) and update PHOENIX_OTLP_ENDPOINT in otel_setup.py accordingly.
ChatOpenAI raises AuthenticationError immediately — The client validates the API key at construction time, not at call time. Confirm OPENAI_API_KEY is exported in your shell before running Step 6.
Next steps
- Add a retrieval tool: Wire a vector-store lookup as a third tool and observe how Phoenix breaks down retrieval latency versus LLM latency in the waterfall view.
- Export to Grafana Tempo: Replace the OTLP/HTTP exporter endpoint with a Tempo instance (
http://localhost:4318/v1/traces) to correlate agent traces with infrastructure metrics in Grafana dashboards. The span structure is identical; only the exporter endpoint changes. - Structured evaluation: Use Phoenix’s
run_evalsAPI to score each trace for hallucination or relevance automatically, feeding results back as span annotations. - Multi-agent tracing: Extend the pattern to a supervisor-worker LangGraph topology and observe how Phoenix groups child agent spans under the parent trace context.
FAQ
How does Phoenix display per-node cost and latency for LangGraph agents?
Phoenix receives OpenTelemetry spans emitted by the LangChainInstrumentor, which auto-instruments LangGraph’s underlying LangChain calls. Each node invocation and tool execution generates a span carrying token counts and timing data. Phoenix aggregates token counts across spans and calculates per-trace cost estimates using published OpenAI pricing by model name, then displays breakdowns in its waterfall view.
What is OpenInference and why does it matter for agent observability?
OpenInference is a semantic convention layer on top of OpenTelemetry that standardizes how LLM and agent spans are structured and attributed. It ensures portable trace data across different observability backends, so traces emitted by a LangGraph agent can be viewed in Phoenix, Grafana Tempo, or other OpenInference-compatible systems without code changes.
Do I need Docker or a separate collector to run Phoenix with LangGraph?
No. Phoenix ships an embedded ASGI server that runs in-process on your laptop, listening on port 6006 with an OTLP/HTTP endpoint at /v1/traces. You launch it from Python and export spans directly to it without standing up external infrastructure.
How do I add custom metadata like session IDs to agent traces?
Inside any LangGraph node, call otel_trace.get_current_span() and use span.set_attribute(key, value) to attach domain-specific metadata. The OpenTelemetry SDK automatically associates these attributes with the active span and includes them in exported traces.
What happens if I forget to call provider.force_flush() before the script exits?
The BatchSpanProcessor buffers spans and flushes asynchronously. Without an explicit force_flush() call, buffered spans may not reach Phoenix before the process terminates. Always call provider.force_flush() at the end of your script to ensure all spans are exported.