Multi-agent pipelines fail in production in ways that single-agent systems don’t. When a research agent hands off to a writing agent and the final response is slow or expensive, you need to know which agent, which LLM call, and which tool invocation caused the problem. Without structured traces, you’re guessing.

Why this matters

CrewAI has become one of the most-adopted multi-agent frameworks in Python, but most teams running it in production have no per-agent cost or latency attribution. They see a single wall-clock time and a total token count. OpenLLMetry (the OpenTelemetry instrumentation layer for LLM frameworks) adds automatic span generation for CrewAI agent runs, LLM calls, and tool invocations without requiring changes to your agent definitions. Combined with an OTLP-compatible backend, you get the same span structure that indexes correctly on any observability platform: SigNoz, Grafana Tempo, Honeycomb, or Datadog. Only the exporter endpoint changes.

Only the exporter endpoint changes.

This tutorial wires OpenLLMetry into a two-agent crew, exports spans to a local console OTLP exporter (no Docker required for the runnable path), and shows you how to read per-agent cost and latency from the resulting trace tree.

Prerequisites

  • Python 3.11 or 3.12
  • An OpenAI or Mistral API key (blocks that call the LLM are marked skip_execution_reason so the sandbox skips them; you run those locally)
  • Familiarity with CrewAI’s Agent, Task, and Crew primitives
  • Optional: Docker Compose if you want to point spans at a real SigNoz instance (covered in the “Next steps” section)

Setup

Install CrewAI, OpenLLMetry’s CrewAI instrumentation, and the OpenTelemetry SDK with its console exporter.

uv pip install crewai opentelemetry-sdk opentelemetry-exporter-otlp-proto-grpc traceloop-sdk

Export your LLM key. The blocks that actually call the API are skipped in the sandbox; set this in your own shell before running those steps.

export OPENAI_API_KEY="sk-your-key-here"

Step 1: Understand the instrumentation model

OpenLLMetry’s traceloop-sdk wraps the OpenTelemetry SDK and ships auto-instrumentation patches for CrewAI, LangChain, and several other frameworks. When you call Traceloop.init(), it monkey-patches CrewAI’s Agent.execute_task and the underlying LLM client so that each invocation becomes a child span under a root workflow span.

The span hierarchy looks like this:

workflow: <crew_name>
  agent: researcher
    llm.call (model=gpt-4o-mini, tokens=312)
    tool: SerperDevTool
  agent: writer
    llm.call (model=gpt-4o-mini, tokens=891)

Each span carries attributes like gen_ai.usage.prompt_tokens, gen_ai.usage.completion_tokens, and gen_ai.response.model, which are the building blocks for cost attribution.

Step 2: Configure the OTLP exporter

For the runnable sandbox path, use the OpenTelemetry console exporter so spans print to stdout. In production, swap the exporter endpoint to your SigNoz collector (http://localhost:4317) or any OTLP-compatible backend.

# filename: otel_setup.py
import os
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import (
    BatchSpanProcessor,
    ConsoleSpanExporter,
)
from opentelemetry.sdk.resources import Resource, SERVICE_NAME


def build_tracer_provider(service_name: str = "crewai-pipeline") -> TracerProvider:
    """Return a TracerProvider that exports spans to stdout.

    Swap ConsoleSpanExporter for OTLPSpanExporter to point at SigNoz:

        from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
        exporter = OTLPSpanExporter(endpoint="http://localhost:4317", insecure=True)
    """
    resource = Resource(attributes={SERVICE_NAME: service_name})
    provider = TracerProvider(resource=resource)
    exporter = ConsoleSpanExporter()
    provider.add_span_processor(BatchSpanProcessor(exporter))
    trace.set_tracer_provider(provider)
    return provider


def get_tracer(name: str = "crewai-pipeline"):
    return trace.get_tracer(name)

Step 3: Define the two-agent crew

The crew has a researcher agent that summarises a topic and a writer agent that turns the summary into a short report. Both agents use the same lightweight model to keep token costs low during testing.

# filename: crew_definition.py
import os
from crewai import Agent, Task, Crew, Process


def build_crew() -> Crew:
    researcher = Agent(
        role="Research Analyst",
        goal="Produce a concise factual summary on the given topic.",
        backstory=(
            "You are a meticulous analyst who distils complex topics "
            "into clear, accurate bullet points."
        ),
        verbose=True,
        allow_delegation=False,
        # model string accepted by litellm; swap for mistral/mistral-small etc.
        llm="gpt-4o-mini",
    )

    writer = Agent(
        role="Technical Writer",
        goal="Transform research notes into a readable short report.",
        backstory=(
            "You are an experienced technical writer who turns bullet-point "
            "research into clear prose for a developer audience."
        ),
        verbose=True,
        allow_delegation=False,
        llm="gpt-4o-mini",
    )

    research_task = Task(
        description="Summarise the key concepts behind OpenTelemetry distributed tracing in 5 bullet points.",
        expected_output="A numbered list of 5 bullet points, each under 30 words.",
        agent=researcher,
    )

    write_task = Task(
        description=(
            "Using the research notes provided, write a 150-word report on "
            "OpenTelemetry distributed tracing suitable for a developer blog."
        ),
        expected_output="A 150-word prose report with a title.",
        agent=writer,
        context=[research_task],
    )

    crew = Crew(
        agents=[researcher, writer],
        tasks=[research_task, write_task],
        process=Process.sequential,
        verbose=True,
    )
    return crew

Step 4: Wire OpenLLMetry and run the crew

Traceloop.init() must be called before the crew runs. Pass disable_batch=False so spans are flushed via the batch processor you configured. The app_name becomes the root workflow span name.

# filename: run_crew.py
import os
from otel_setup import build_tracer_provider
from crew_definition import build_crew

# Initialise the provider BEFORE importing traceloop so it picks up our provider.
provider = build_tracer_provider(service_name="crewai-otel-tutorial")

from traceloop.sdk import Traceloop  # noqa: E402 — intentional late import

Traceloop.init(
    app_name="crewai-otel-tutorial",
    disable_batch=False,
    # exporter_endpoint is ignored when we set the provider manually above;
    # traceloop will use the globally registered TracerProvider.
)


def main():
    crew = build_crew()
    result = crew.kickoff()
    # Force-flush so all spans are written before the process exits.
    provider.force_flush()
    print("\n=== CREW OUTPUT ===")
    print(result)


if __name__ == "__main__":
    main()

Run the pipeline (requires your API key):

python run_crew.py

Step 5: Read the span output

With the console exporter, each span prints as a JSON-like block. Here is an abbreviated example of what a single LLM call span looks like:

{
    "name": "crewai.agent",
    "context": {
        "trace_id": "0x4bf92f3577b34da6a3ce929d0e0e4736",
        "span_id": "0x00f067aa0ba902b7"
    },
    "attributes": {
        "traceloop.workflow.name": "crewai-otel-tutorial",
        "traceloop.entity.name": "Research Analyst",
        "gen_ai.usage.prompt_tokens": 312,
        "gen_ai.usage.completion_tokens": 148,
        "gen_ai.response.model": "gpt-4o-mini",
        "llm.request.type": "chat"
    },
    "start_time": "2024-11-01T10:00:01.123Z",
    "end_time":   "2024-11-01T10:00:03.456Z"
}

The fields you care about for cost attribution:

AttributeMeaning
traceloop.entity.nameWhich agent produced this span
gen_ai.usage.prompt_tokensInput tokens (drives cost)
gen_ai.usage.completion_tokensOutput tokens (drives cost)
end_time - start_timePer-agent latency
gen_ai.response.modelModel used (affects per-token price)

Step 6: Point spans at SigNoz (optional, requires Docker)

If you have Docker Compose available, spin up SigNoz with its all-in-one image and replace the console exporter with the OTLP gRPC exporter.

# filename: docker-compose.yml
version: "3.8"
services:
  signoz:
    image: signoz/signoz:latest
    ports:
      - "3301:3301"   # SigNoz UI
      - "4317:4317"   # OTLP gRPC receiver
      - "4318:4318"   # OTLP HTTP receiver

Then update otel_setup.py to use the OTLP exporter:

# filename: otel_setup_signoz.py
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.resources import Resource, SERVICE_NAME
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter


def build_tracer_provider(service_name: str = "crewai-pipeline") -> TracerProvider:
    resource = Resource(attributes={SERVICE_NAME: service_name})
    provider = TracerProvider(resource=resource)
    exporter = OTLPSpanExporter(
        endpoint="http://localhost:4317",
        insecure=True,
    )
    provider.add_span_processor(BatchSpanProcessor(exporter))
    trace.set_tracer_provider(provider)
    return provider

After docker compose up -d, open http://localhost:3301, navigate to the Traces tab, and filter by service name crewai-pipeline. You’ll see the full trace tree with per-agent spans, token counts, and latency breakdowns. The same span structure indexes the same way on Datadog or Honeycomb. Only the exporter endpoint changes.

Verify it works

This block imports your modules and confirms they load without errors. It does not call the LLM.

import importlib, sys

# Verify otel_setup imports cleanly
from otel_setup import build_tracer_provider, get_tracer

provider = build_tracer_provider("verify-test")
tracer = get_tracer("verify-test")

with tracer.start_as_current_span("smoke-test-span") as span:
    span.set_attribute("test.key", "hello")

provider.force_flush()
print("otel_setup: OK")

# Verify crew_definition imports cleanly (does not call LLM)
import ast, pathlib
src = pathlib.Path("/workspace/crew_definition.py").read_text()
ast.parse(src)  # syntax check
print("crew_definition: syntax OK")

# Verify run_crew imports cleanly at the AST level
src2 = pathlib.Path("/workspace/run_crew.py").read_text()
ast.parse(src2)
print("run_crew: syntax OK")

print("verify_all_modules_ok")

Troubleshooting

ModuleNotFoundError: No module named 'traceloop' — The package name on PyPI is traceloop-sdk, not traceloop. Run uv pip install traceloop-sdk and confirm the install succeeded.

Spans appear in the console but not in SigNoz — The OTLP gRPC port 4317 must be reachable. Run curl -v http://localhost:4317 to check. If SigNoz is on a remote host, replace localhost with the host IP and ensure the firewall allows 4317/tcp.

AttributeError: 'Crew' object has no attribute 'kickoff' — You are on a CrewAI version older than 0.28. Run uv pip install --upgrade crewai to get a version that ships the kickoff method.

All spans share the same trace_id — This is correct behaviour. CrewAI’s sequential process runs tasks in a single workflow, so all agent spans are children of the same root span.

force_flush returns before spans appear — The BatchSpanProcessor has a default max-export-batch-size of 512 and a schedule delay of 5 seconds. For local debugging, pass export_timeout_millis=10000 to BatchSpanProcessor or switch to SimpleSpanProcessor during development.

Token attributes missing from spans — Some older versions of traceloop-sdk do not patch the litellm callback that CrewAI uses internally. Pin to traceloop-sdk>=0.0.80 where the litellm integration is stable.

Next steps

  • Add tool spans: Wrap any custom CrewAI tool’s _run method with tracer.start_as_current_span("tool.<name>") to get tool-level latency separate from LLM latency.
  • Cost rollup script: Read the exported OTLP JSON (redirect stdout to a file), parse gen_ai.usage.prompt_tokens and gen_ai.usage.completion_tokens per agent, and multiply by the model’s per-token price to produce a per-run cost report.
  • SigNoz alerting: Once spans land in SigNoz, create an alert on p99(crewai.agent duration) > 10s to catch regressions before users notice.
  • Swap to Grafana Tempo: Replace the SigNoz compose file with the Grafana Tempo OSS image and point the OTLP exporter at port 4317 of the Tempo container. The span structure is identical; only the UI changes.

FAQ

How does OpenLLMetry instrument CrewAI without changing agent code?

OpenLLMetry’s traceloop-sdk monkey-patches CrewAI’s Agent.execute_task method and the underlying LLM client when Traceloop.init() is called, automatically emitting spans for each agent run, LLM call, and tool invocation without requiring modifications to agent definitions.

What span attributes are available for cost attribution?

Each span includes gen_ai.usage.prompt_tokens, gen_ai.usage.completion_tokens, gen_ai.response.model, and traceloop.entity.name (agent name), which together enable per-agent cost and latency calculation.

Can I use this with observability platforms other than SigNoz?

Yes. The span structure is OTLP-compatible and works with any OTLP backend including Grafana Tempo, Honeycomb, and Datadog. Only the exporter endpoint changes; the span hierarchy and attributes remain identical.

What is the span hierarchy for a multi-agent crew?

The root span is the workflow (crew name), with child spans for each agent, and grandchild spans for each LLM call and tool invocation within that agent, allowing attribution of costs and latency to specific agents.

How do I debug if spans are not appearing in my backend?

Verify the OTLP gRPC port 4317 is reachable with curl, ensure the exporter endpoint is correct, and check that traceloop-sdk is version 0.0.80 or later for stable litellm integration.