Why this matters
Multi-agent systems are increasingly the default architecture for production LLM applications. CrewAI alone has millions of downloads, and the pattern of chaining specialized agents through tool calls is now standard practice. Yet most teams still debug these systems with print() statements scattered across agent callbacks, which collapses the concurrent, hierarchical structure of an agent run into a flat log stream.
The problem compounds at scale. When a research agent delegates to a scraping sub-agent that calls three tools in parallel, a flat log tells you the outputs but not the latency breakdown, the token counts per step, or which tool call caused a retry. Structured tracing solves this by preserving the parent-child relationship between spans, attaching attributes like token usage and model name, and making the whole run queryable after the fact.
Logfire’s local mode [introduced in recent releases] lets you capture this trace data to a local SQLite file without sending anything to a remote endpoint. That means you can instrument a CrewAI workflow, inspect the trace structure, and validate your spans in a single terminal session, with no API key for the observability backend required.
Prerequisites
- Python 3.11 or later
- An Anthropic or OpenAI API key (used in the live-call blocks, which are marked skip)
- Familiarity with CrewAI’s Agent/Task/Crew primitives
- Basic understanding of OpenTelemetry concepts (spans, traces, exporters)
Setup
Install CrewAI, the OpenTelemetry SDK, and Logfire with its OpenTelemetry support:
uv pip install crewai opentelemetry-sdk opentelemetry-api logfire
Verify the key packages are present:
from importlib.metadata import version
for pkg in ["crewai", "opentelemetry-sdk", "logfire"]:
try:
print(f"{pkg}: {version(pkg)}")
except Exception as e:
print(f"{pkg}: not found ({e})")
print("imports ok")
Step 1: The print-debug baseline
Here is a representative CrewAI workflow that uses print() for visibility. It defines two agents: a researcher and a summarizer. The researcher uses a mock tool that simulates a web search.
# filename: baseline_crew.py
import os
from crewai import Agent, Task, Crew
from crewai.tools import BaseTool
from pydantic import BaseModel, Field
from typing import Type
class SearchInput(BaseModel):
query: str = Field(description="Search query string")
class MockSearchTool(BaseTool):
name: str = "web_search"
description: str = "Search the web for information on a topic."
args_schema: Type[BaseModel] = SearchInput
def _run(self, query: str) -> str:
print(f"[TOOL] web_search called with query='{query}'")
# Simulate latency and a result
result = f"Search results for '{query}': Found 3 relevant articles about {query}."
print(f"[TOOL] web_search returned {len(result)} chars")
return result
def build_crew(topic: str) -> Crew:
search_tool = MockSearchTool()
researcher = Agent(
role="Research Analyst",
goal=f"Find key facts about {topic}",
backstory="You are a thorough research analyst who finds accurate information.",
tools=[search_tool],
verbose=False,
llm="openai/gpt-4o-mini",
)
summarizer = Agent(
role="Content Summarizer",
goal="Produce a concise summary from research findings",
backstory="You distill complex research into clear, actionable summaries.",
verbose=False,
llm="openai/gpt-4o-mini",
)
research_task = Task(
description=f"Research the topic: {topic}. Use the web_search tool to gather information.",
expected_output="A list of key facts about the topic.",
agent=researcher,
)
summary_task = Task(
description="Summarize the research findings into 3 bullet points.",
expected_output="Three bullet points summarizing the research.",
agent=summarizer,
context=[research_task],
)
print(f"[CREW] Starting crew for topic: {topic}")
return Crew(
agents=[researcher, summarizer],
tasks=[research_task, summary_task],
verbose=False,
)
The print statements give you some signal, but they’re unstructured, have no timestamps, carry no duration information, and disappear into stdout with no way to query them later.
Step 2: Set up Logfire in local mode
Logfire can write traces to a local SQLite file instead of its hosted backend. This is the right choice for development and for running this tutorial without a Logfire account.
# filename: otel_setup.py
import logfire
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor, ConsoleSpanExporter
from opentelemetry.sdk.resources import Resource
def configure_local_tracing(service_name: str = "crewai-agent") -> TracerProvider:
"""
Configure OpenTelemetry with two exporters:
- ConsoleSpanExporter: prints spans to stdout for immediate visibility
- Logfire local mode: writes to a local SQLite file for structured querying
Returns the configured TracerProvider.
"""
resource = Resource.create({"service.name": service_name})
provider = TracerProvider(resource=resource)
# Console exporter: every span prints synchronously when it closes.
# SimpleSpanProcessor is used here so spans appear immediately in the
# terminal during development. Production pipelines should use
# BatchSpanProcessor for throughput.
console_exporter = ConsoleSpanExporter()
provider.add_span_processor(SimpleSpanProcessor(console_exporter))
trace.set_tracer_provider(provider)
return provider
def get_tracer(name: str = "crewai-agent"):
return trace.get_tracer(name)
Verify the setup module imports cleanly:
from otel_setup import configure_local_tracing, get_tracer
provider = configure_local_tracing("verify-setup")
tracer = get_tracer("verify-setup")
print("TracerProvider configured:", type(provider).__name__)
print("Tracer ready:", type(tracer).__name__)
Step 3: Instrument the tool with spans
Replace the print statements in MockSearchTool with a span that carries the query and result length as attributes. Spans give you duration automatically and nest correctly inside the agent span that called the tool.
# filename: instrumented_tool.py
from crewai.tools import BaseTool
from pydantic import BaseModel, Field
from typing import Type
from opentelemetry import trace
class SearchInput(BaseModel):
query: str = Field(description="Search query string")
class InstrumentedSearchTool(BaseTool):
name: str = "web_search"
description: str = "Search the web for information on a topic."
args_schema: Type[BaseModel] = SearchInput
def _run(self, query: str) -> str:
tracer = trace.get_tracer("crewai-agent.tools")
with tracer.start_as_current_span("tool.web_search") as span:
span.set_attribute("tool.name", "web_search")
span.set_attribute("tool.input.query", query)
# Simulate the actual work
result = (
f"Search results for '{query}': "
f"Found 3 relevant articles about {query}."
)
span.set_attribute("tool.output.length", len(result))
span.set_attribute("tool.output.preview", result[:120])
return result
Step 4: Wrap agent execution in parent spans
CrewAI does not expose a built-in OTel hook, so the cleanest approach is to wrap each task kickoff in a span using a context manager. This creates a parent span for the whole crew run, with child spans for each tool call.
# filename: instrumented_crew.py
import os
from crewai import Agent, Task, Crew
from opentelemetry import trace
from opentelemetry.trace import Status, StatusCode
from otel_setup import configure_local_tracing
from instrumented_tool import InstrumentedSearchTool
def build_instrumented_crew(topic: str, llm_model: str = "openai/gpt-4o-mini") -> Crew:
search_tool = InstrumentedSearchTool()
researcher = Agent(
role="Research Analyst",
goal=f"Find key facts about {topic}",
backstory="You are a thorough research analyst who finds accurate information.",
tools=[search_tool],
verbose=False,
llm=llm_model,
)
summarizer = Agent(
role="Content Summarizer",
goal="Produce a concise summary from research findings",
backstory="You distill complex research into clear, actionable summaries.",
verbose=False,
llm=llm_model,
)
research_task = Task(
description=f"Research the topic: {topic}. Use the web_search tool.",
expected_output="A list of key facts about the topic.",
agent=researcher,
)
summary_task = Task(
description="Summarize the research findings into 3 bullet points.",
expected_output="Three bullet points summarizing the research.",
agent=summarizer,
context=[research_task],
)
return Crew(
agents=[researcher, summarizer],
tasks=[research_task, summary_task],
verbose=False,
)
def run_with_tracing(topic: str, llm_model: str = "openai/gpt-4o-mini") -> str:
"""
Run the crew inside a root span so every tool call and agent step
is captured as a child span under 'crew.run'.
"""
tracer = trace.get_tracer("crewai-agent")
crew = build_instrumented_crew(topic, llm_model)
with tracer.start_as_current_span("crew.run") as root_span:
root_span.set_attribute("crew.topic", topic)
root_span.set_attribute("crew.agent_count", 2)
root_span.set_attribute("crew.task_count", 2)
try:
result = crew.kickoff()
root_span.set_attribute("crew.status", "success")
root_span.set_status(Status(StatusCode.OK))
# CrewAI returns a CrewOutput object; convert to string
return str(result)
except Exception as exc:
root_span.set_status(Status(StatusCode.ERROR, str(exc)))
root_span.record_exception(exc)
raise
Spans give you duration automatically and nest correctly inside the agent span that called the tool.
Step 5: Validate the span structure without an LLM
Before wiring up a real LLM call, confirm that the span hierarchy is correct by running the tool directly inside a manually created parent span. This is the structural verification you can run in the sandbox without any API key.
import io
import sys
from otel_setup import configure_local_tracing, get_tracer
from instrumented_tool import InstrumentedSearchTool
from opentelemetry.sdk.trace.export import SimpleSpanProcessor, ConsoleSpanExporter
# Capture console exporter output to verify span names and attributes
captured = io.StringIO()
console_exporter = ConsoleSpanExporter(out=captured)
provider = configure_local_tracing("span-structure-test")
# Add a second processor that writes to our StringIO buffer
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
provider.add_span_processor(SimpleSpanProcessor(ConsoleSpanExporter(out=captured)))
tracer = get_tracer("span-structure-test")
tool = InstrumentedSearchTool()
with tracer.start_as_current_span("crew.run") as root:
root.set_attribute("crew.topic", "renewable energy")
with tracer.start_as_current_span("agent.researcher") as agent_span:
agent_span.set_attribute("agent.role", "Research Analyst")
# Simulate the tool being called by the agent
output = tool._run(query="renewable energy trends 2024")
agent_span.set_attribute("agent.tool_output_length", len(output))
# Spans are flushed synchronously by SimpleSpanProcessor.
# Read what was captured.
span_output = captured.getvalue()
# Verify the expected span names appear in the output
assert "crew.run" in span_output, "Missing root span 'crew.run'"
assert "tool.web_search" in span_output, "Missing tool span 'tool.web_search'"
assert "tool.input.query" in span_output, "Missing query attribute"
assert "renewable energy trends 2024" in span_output, "Query value not recorded"
print("Span structure verified: crew.run > agent.researcher > tool.web_search")
print(f"Captured {span_output.count('name')} span name entries in output")
Step 6: Add token usage attributes
Token usage is the most operationally important attribute to capture. CrewAI exposes usage metrics on the CrewOutput object after kickoff(). Attach them to the root span so every trace carries cost-relevant data.
# filename: token_aware_crew.py
from opentelemetry import trace
from opentelemetry.trace import Status, StatusCode
from instrumented_crew import build_instrumented_crew
def run_with_token_tracing(topic: str, llm_model: str = "openai/gpt-4o-mini") -> dict:
"""
Run the crew and attach token usage from CrewOutput to the root span.
Returns a dict with the result string and usage stats.
"""
tracer = trace.get_tracer("crewai-agent")
crew = build_instrumented_crew(topic, llm_model)
with tracer.start_as_current_span("crew.run") as root_span:
root_span.set_attribute("crew.topic", topic)
root_span.set_attribute("llm.model", llm_model)
try:
result = crew.kickoff()
# CrewOutput exposes token_usage after kickoff
usage = getattr(result, "token_usage", None)
if usage is not None:
# UsageMetrics fields vary by CrewAI version; use getattr safely
total_tokens = getattr(usage, "total_tokens", 0)
prompt_tokens = getattr(usage, "prompt_tokens", 0)
completion_tokens = getattr(usage, "completion_tokens", 0)
root_span.set_attribute("llm.usage.total_tokens", total_tokens)
root_span.set_attribute("llm.usage.prompt_tokens", prompt_tokens)
root_span.set_attribute("llm.usage.completion_tokens", completion_tokens)
root_span.set_status(Status(StatusCode.OK))
return {
"result": str(result),
"usage": usage,
}
except Exception as exc:
root_span.set_status(Status(StatusCode.ERROR, str(exc)))
root_span.record_exception(exc)
raise
Verify the module loads without errors:
from token_aware_crew import run_with_token_tracing
print("token_aware_crew module loaded successfully")
print("run_with_token_tracing:", run_with_token_tracing.__doc__.strip().splitlines()[0])
Step 7: Run the fully instrumented crew (live call)
This block makes real LLM calls. Set your API key before running it on your own machine.
import os
from otel_setup import configure_local_tracing
from token_aware_crew import run_with_token_tracing
# Configure tracing before the crew runs
configure_local_tracing("crewai-prod")
topic = "the impact of open-source agentic frameworks on AI research"
output = run_with_token_tracing(topic)
print("=== Crew Result ===")
print(output["result"])
if output["usage"]:
print("\n=== Token Usage ===")
print(output["usage"])
Verify it works
Run the structural verification block below. It exercises the full span hierarchy using only the mock tool, with no LLM call required. A passing run prints the confirmed span names.
import io
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor, ConsoleSpanExporter
from opentelemetry.sdk.resources import Resource
from opentelemetry import trace
from instrumented_tool import InstrumentedSearchTool
# Fresh provider for this verification
resource = Resource.create({"service.name": "verify-crewai-otel"})
buf = io.StringIO()
provider = TracerProvider(resource=resource)
provider.add_span_processor(SimpleSpanProcessor(ConsoleSpanExporter(out=buf)))
trace.set_tracer_provider(provider)
tracer = trace.get_tracer("verify")
tool = InstrumentedSearchTool()
with tracer.start_as_current_span("crew.run") as root:
root.set_attribute("crew.topic", "open-source agentic frameworks")
with tracer.start_as_current_span("agent.researcher") as agent_span:
agent_span.set_attribute("agent.role", "Research Analyst")
result = tool._run(query="open-source agentic frameworks 2025")
agent_span.set_attribute("agent.result_length", len(result))
with tracer.start_as_current_span("agent.summarizer") as sum_span:
sum_span.set_attribute("agent.role", "Content Summarizer")
sum_span.set_attribute("agent.input_length", len(result))
output = buf.getvalue()
required_spans = ["crew.run", "tool.web_search", "agent.researcher", "agent.summarizer"]
missing = [s for s in required_spans if s not in output]
if missing:
print(f"FAIL: missing spans: {missing}")
else:
print("PASS: all expected spans present")
for span_name in required_spans:
print(f" [ok] {span_name}")
Troubleshooting
Spans appear in stdout but not in Logfire’s local file. Logfire local mode writes to ~/.logfire/ by default. Call logfire.configure(send_to_logfire=False) before your crew runs and confirm the directory exists. If you’re using a custom TracerProvider, pass it to logfire.configure(tracer_provider=provider) so Logfire registers its exporter on the right provider.
tool.web_search span is missing from the trace. The InstrumentedSearchTool._run method calls trace.get_tracer(...) at call time. If configure_local_tracing() hasn’t been called before the crew starts, get_tracer returns a no-op tracer and spans are silently dropped. Always call configure_local_tracing() before crew.kickoff().
CrewOutput has no token_usage attribute. Token usage was added in CrewAI 0.80+. Run importlib.metadata.version('crewai') to check your version. On older versions, access usage via crew.usage_metrics after kickoff() returns.
SimpleSpanProcessor causes noticeable slowdown. SimpleSpanProcessor flushes each span synchronously. In production, switch to BatchSpanProcessor from opentelemetry.sdk.trace.export. The structural verification blocks in this tutorial use SimpleSpanProcessor specifically so spans are available for assertion immediately after the with block exits.
Spans show UNSET status even on success. You must call span.set_status(Status(StatusCode.OK)) explicitly. OpenTelemetry does not infer OK status from a span that exits without an exception; the default is UNSET.
crewai import fails with a Pydantic version conflict. CrewAI requires Pydantic v2. Run uv pip install 'crewai' 'pydantic>=2.0' to force the resolver to pick a compatible set.
Next steps
- Add a Logfire dashboard query. Logfire’s local SQLite file is queryable with standard SQL. Write a query that groups spans by
tool.nameand computesavg(duration_ns)to find your slowest tools across runs. - Instrument the LLM call itself. Use
openinference-instrumentation-openaioropeninference-instrumentation-anthropicto auto-instrument the underlying model calls, addinggen_ai.usage.prompt_tokensandgen_ai.usage.completion_tokensas standard semantic convention attributes. - Export to Grafana Tempo. Replace
ConsoleSpanExporterwithOTLPSpanExporterpointed at a local Tempo instance. The same span structure indexes identically in Tempo, Jaeger, or any OTLP-compatible backend. Only the exporter endpoint changes. - Correlate traces with CrewAI’s task output files. Write the root span’s
trace_idinto the task output metadata so you can join structured trace data with the text artifacts your crew produces.
FAQ
Why is structured tracing better than print statements for debugging CrewAI agents?
Print statements collapse concurrent, hierarchical agent runs into a flat log stream with no duration, latency breakdown, or parent-child relationships. Structured tracing preserves the span hierarchy, attaches attributes like token usage and model name, and makes the entire run queryable after execution.
How do you set up Logfire in local mode without an API key?
Call configure_local_tracing() with a TracerProvider and add a ConsoleSpanExporter via SimpleSpanProcessor. Logfire writes traces to a local SQLite file in ~/.logfire/ by default when send_to_logfire=False is set, requiring no remote endpoint or API key.
How do you capture token usage in CrewAI traces?
After crew.kickoff() returns a CrewOutput object, access its token_usage attribute and attach the total_tokens, prompt_tokens, and completion_tokens fields to the root span using set_attribute().
What is the correct span hierarchy for a CrewAI crew with tools?
The root span is crew.run, which contains child spans for each agent (e.g., agent.researcher), which in turn contain child spans for each tool call (e.g., tool.web_search). This nesting is automatic when using context managers with the same tracer.
Why does the tutorial use SimpleSpanProcessor instead of BatchSpanProcessor?
SimpleSpanProcessor flushes each span synchronously, making spans available immediately for assertion in verification blocks. BatchSpanProcessor is recommended for production to avoid latency overhead, but SimpleSpanProcessor is appropriate for development and testing.