# Wiring OpenTelemetry Spans into a Multi-Agent CrewAI Pipeline

> Build a two-agent CrewAI pipeline where every LLM call and tool invocation emits structured OpenTelemetry spans via OpenLLMetry, exported to a console-based OTLP collector. You get per-agent latency and token-cost attribution without touching your agent logic.

- Canonical URL: https://agentry.press/tutorial/wiring-opentelemetry-spans-into-a-multi-agent-crewai-pipeline/
- Type: Tutorial
- Published: 2026-05-25
- By: agentry
- Tags: crewai, opentelemetry, observability, multi-agent, tracing, llmops

---

Multi-agent pipelines fail in production in ways that single-agent systems don't. When a research agent hands off to a writing agent and the final response is slow or expensive, you need to know which agent, which LLM call, and which tool invocation caused the problem. Without structured traces, you're guessing.

## Why this matters

CrewAI has become one of the most-adopted multi-agent frameworks in Python, but most teams running it in production have no per-agent cost or latency attribution. They see a single wall-clock time and a total token count. OpenLLMetry (the OpenTelemetry instrumentation layer for LLM frameworks) adds automatic span generation for CrewAI agent runs, LLM calls, and tool invocations without requiring changes to your agent definitions. Combined with an OTLP-compatible backend, you get the same span structure that indexes correctly on any observability platform: SigNoz, Grafana Tempo, Honeycomb, or Datadog. Only the exporter endpoint changes.

> [!PULLQUOTE]
> Only the exporter endpoint changes.

This tutorial wires OpenLLMetry into a two-agent crew, exports spans to a local console OTLP exporter (no Docker required for the runnable path), and shows you how to read per-agent cost and latency from the resulting trace tree.

## Prerequisites

- Python 3.11 or 3.12
- An OpenAI or Mistral API key (blocks that call the LLM are marked `skip_execution_reason` so the sandbox skips them; you run those locally)
- Familiarity with CrewAI's `Agent`, `Task`, and `Crew` primitives
- Optional: Docker Compose if you want to point spans at a real SigNoz instance (covered in the "Next steps" section)

## Setup

Install CrewAI, OpenLLMetry's CrewAI instrumentation, and the OpenTelemetry SDK with its console exporter.

```bash
uv pip install crewai opentelemetry-sdk opentelemetry-exporter-otlp-proto-grpc traceloop-sdk
```

Export your LLM key. The blocks that actually call the API are skipped in the sandbox; set this in your own shell before running those steps.

```bash
export OPENAI_API_KEY="sk-your-key-here"
```

## Step 1: Understand the instrumentation model

OpenLLMetry's `traceloop-sdk` wraps the OpenTelemetry SDK and ships auto-instrumentation patches for CrewAI, LangChain, and several other frameworks. When you call `Traceloop.init()`, it monkey-patches CrewAI's `Agent.execute_task` and the underlying LLM client so that each invocation becomes a child span under a root workflow span.

The span hierarchy looks like this:

```
workflow: <crew_name>
  agent: researcher
    llm.call (model=gpt-4o-mini, tokens=312)
    tool: SerperDevTool
  agent: writer
    llm.call (model=gpt-4o-mini, tokens=891)
```

Each span carries attributes like `gen_ai.usage.prompt_tokens`, `gen_ai.usage.completion_tokens`, and `gen_ai.response.model`, which are the building blocks for cost attribution.

## Step 2: Configure the OTLP exporter

For the runnable sandbox path, use the OpenTelemetry console exporter so spans print to stdout. In production, swap the exporter endpoint to your SigNoz collector (`http://localhost:4317`) or any OTLP-compatible backend.

```python
# filename: otel_setup.py
import os
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import (
    BatchSpanProcessor,
    ConsoleSpanExporter,
)
from opentelemetry.sdk.resources import Resource, SERVICE_NAME


def build_tracer_provider(service_name: str = "crewai-pipeline") -> TracerProvider:
    """Return a TracerProvider that exports spans to stdout.

    Swap ConsoleSpanExporter for OTLPSpanExporter to point at SigNoz:

        from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
        exporter = OTLPSpanExporter(endpoint="http://localhost:4317", insecure=True)
    """
    resource = Resource(attributes={SERVICE_NAME: service_name})
    provider = TracerProvider(resource=resource)
    exporter = ConsoleSpanExporter()
    provider.add_span_processor(BatchSpanProcessor(exporter))
    trace.set_tracer_provider(provider)
    return provider


def get_tracer(name: str = "crewai-pipeline"):
    return trace.get_tracer(name)
```

## Step 3: Define the two-agent crew

The crew has a researcher agent that summarises a topic and a writer agent that turns the summary into a short report. Both agents use the same lightweight model to keep token costs low during testing.

```python
# filename: crew_definition.py
import os
from crewai import Agent, Task, Crew, Process


def build_crew() -> Crew:
    researcher = Agent(
        role="Research Analyst",
        goal="Produce a concise factual summary on the given topic.",
        backstory=(
            "You are a meticulous analyst who distils complex topics "
            "into clear, accurate bullet points."
        ),
        verbose=True,
        allow_delegation=False,
        # model string accepted by litellm; swap for mistral/mistral-small etc.
        llm="gpt-4o-mini",
    )

    writer = Agent(
        role="Technical Writer",
        goal="Transform research notes into a readable short report.",
        backstory=(
            "You are an experienced technical writer who turns bullet-point "
            "research into clear prose for a developer audience."
        ),
        verbose=True,
        allow_delegation=False,
        llm="gpt-4o-mini",
    )

    research_task = Task(
        description="Summarise the key concepts behind OpenTelemetry distributed tracing in 5 bullet points.",
        expected_output="A numbered list of 5 bullet points, each under 30 words.",
        agent=researcher,
    )

    write_task = Task(
        description=(
            "Using the research notes provided, write a 150-word report on "
            "OpenTelemetry distributed tracing suitable for a developer blog."
        ),
        expected_output="A 150-word prose report with a title.",
        agent=writer,
        context=[research_task],
    )

    crew = Crew(
        agents=[researcher, writer],
        tasks=[research_task, write_task],
        process=Process.sequential,
        verbose=True,
    )
    return crew
```

## Step 4: Wire OpenLLMetry and run the crew

`Traceloop.init()` must be called before the crew runs. Pass `disable_batch=False` so spans are flushed via the batch processor you configured. The `app_name` becomes the root workflow span name.

```python
# filename: run_crew.py
import os
from otel_setup import build_tracer_provider
from crew_definition import build_crew

# Initialise the provider BEFORE importing traceloop so it picks up our provider.
provider = build_tracer_provider(service_name="crewai-otel-tutorial")

from traceloop.sdk import Traceloop  # noqa: E402 — intentional late import

Traceloop.init(
    app_name="crewai-otel-tutorial",
    disable_batch=False,
    # exporter_endpoint is ignored when we set the provider manually above;
    # traceloop will use the globally registered TracerProvider.
)


def main():
    crew = build_crew()
    result = crew.kickoff()
    # Force-flush so all spans are written before the process exits.
    provider.force_flush()
    print("\n=== CREW OUTPUT ===")
    print(result)


if __name__ == "__main__":
    main()
```

Run the pipeline (requires your API key):

```bash
python run_crew.py
```

## Step 5: Read the span output

With the console exporter, each span prints as a JSON-like block. Here is an abbreviated example of what a single LLM call span looks like:

```
{
    "name": "crewai.agent",
    "context": {
        "trace_id": "0x4bf92f3577b34da6a3ce929d0e0e4736",
        "span_id": "0x00f067aa0ba902b7"
    },
    "attributes": {
        "traceloop.workflow.name": "crewai-otel-tutorial",
        "traceloop.entity.name": "Research Analyst",
        "gen_ai.usage.prompt_tokens": 312,
        "gen_ai.usage.completion_tokens": 148,
        "gen_ai.response.model": "gpt-4o-mini",
        "llm.request.type": "chat"
    },
    "start_time": "2024-11-01T10:00:01.123Z",
    "end_time":   "2024-11-01T10:00:03.456Z"
}
```

The fields you care about for cost attribution:

| Attribute | Meaning |
|---|---|
| `traceloop.entity.name` | Which agent produced this span |
| `gen_ai.usage.prompt_tokens` | Input tokens (drives cost) |
| `gen_ai.usage.completion_tokens` | Output tokens (drives cost) |
| `end_time - start_time` | Per-agent latency |
| `gen_ai.response.model` | Model used (affects per-token price) |

## Step 6: Point spans at SigNoz (optional, requires Docker)

If you have Docker Compose available, spin up SigNoz with its all-in-one image and replace the console exporter with the OTLP gRPC exporter.

```yaml
# filename: docker-compose.yml
version: "3.8"
services:
  signoz:
    image: signoz/signoz:latest
    ports:
      - "3301:3301"   # SigNoz UI
      - "4317:4317"   # OTLP gRPC receiver
      - "4318:4318"   # OTLP HTTP receiver
```

Then update `otel_setup.py` to use the OTLP exporter:

```python
# filename: otel_setup_signoz.py
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.resources import Resource, SERVICE_NAME
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter


def build_tracer_provider(service_name: str = "crewai-pipeline") -> TracerProvider:
    resource = Resource(attributes={SERVICE_NAME: service_name})
    provider = TracerProvider(resource=resource)
    exporter = OTLPSpanExporter(
        endpoint="http://localhost:4317",
        insecure=True,
    )
    provider.add_span_processor(BatchSpanProcessor(exporter))
    trace.set_tracer_provider(provider)
    return provider
```

After `docker compose up -d`, open `http://localhost:3301`, navigate to the Traces tab, and filter by service name `crewai-pipeline`. You'll see the full trace tree with per-agent spans, token counts, and latency breakdowns. The same span structure indexes the same way on Datadog or Honeycomb. Only the exporter endpoint changes.

## Verify it works

This block imports your modules and confirms they load without errors. It does not call the LLM.

```python
import importlib, sys

# Verify otel_setup imports cleanly
from otel_setup import build_tracer_provider, get_tracer

provider = build_tracer_provider("verify-test")
tracer = get_tracer("verify-test")

with tracer.start_as_current_span("smoke-test-span") as span:
    span.set_attribute("test.key", "hello")

provider.force_flush()
print("otel_setup: OK")

# Verify crew_definition imports cleanly (does not call LLM)
import ast, pathlib
src = pathlib.Path("/workspace/crew_definition.py").read_text()
ast.parse(src)  # syntax check
print("crew_definition: syntax OK")

# Verify run_crew imports cleanly at the AST level
src2 = pathlib.Path("/workspace/run_crew.py").read_text()
ast.parse(src2)
print("run_crew: syntax OK")

print("verify_all_modules_ok")
```

## Troubleshooting

**`ModuleNotFoundError: No module named 'traceloop'`** — The package name on PyPI is `traceloop-sdk`, not `traceloop`. Run `uv pip install traceloop-sdk` and confirm the install succeeded.

**Spans appear in the console but not in SigNoz** — The OTLP gRPC port 4317 must be reachable. Run `curl -v http://localhost:4317` to check. If SigNoz is on a remote host, replace `localhost` with the host IP and ensure the firewall allows 4317/tcp.

**`AttributeError: 'Crew' object has no attribute 'kickoff'`** — You are on a CrewAI version older than 0.28. Run `uv pip install --upgrade crewai` to get a version that ships the `kickoff` method.

**All spans share the same `trace_id`** — This is correct behaviour. CrewAI's sequential process runs tasks in a single workflow, so all agent spans are children of the same root span.

**`force_flush` returns before spans appear** — The `BatchSpanProcessor` has a default max-export-batch-size of 512 and a schedule delay of 5 seconds. For local debugging, pass `export_timeout_millis=10000` to `BatchSpanProcessor` or switch to `SimpleSpanProcessor` during development.

**Token attributes missing from spans** — Some older versions of `traceloop-sdk` do not patch the litellm callback that CrewAI uses internally. Pin to `traceloop-sdk>=0.0.80` where the litellm integration is stable.

## Next steps

- **Add tool spans**: Wrap any custom CrewAI tool's `_run` method with `tracer.start_as_current_span("tool.<name>")` to get tool-level latency separate from LLM latency.
- **Cost rollup script**: Read the exported OTLP JSON (redirect stdout to a file), parse `gen_ai.usage.prompt_tokens` and `gen_ai.usage.completion_tokens` per agent, and multiply by the model's per-token price to produce a per-run cost report.
- **SigNoz alerting**: Once spans land in SigNoz, create an alert on `p99(crewai.agent duration) > 10s` to catch regressions before users notice.
- **Swap to Grafana Tempo**: Replace the SigNoz compose file with the Grafana Tempo OSS image and point the OTLP exporter at port 4317 of the Tempo container. The span structure is identical; only the UI changes.

## FAQ

### How does OpenLLMetry instrument CrewAI without changing agent code?

OpenLLMetry's traceloop-sdk monkey-patches CrewAI's Agent.execute_task method and the underlying LLM client when Traceloop.init() is called, automatically emitting spans for each agent run, LLM call, and tool invocation without requiring modifications to agent definitions.

### What span attributes are available for cost attribution?

Each span includes gen_ai.usage.prompt_tokens, gen_ai.usage.completion_tokens, gen_ai.response.model, and traceloop.entity.name (agent name), which together enable per-agent cost and latency calculation.

### Can I use this with observability platforms other than SigNoz?

Yes. The span structure is OTLP-compatible and works with any OTLP backend including Grafana Tempo, Honeycomb, and Datadog. Only the exporter endpoint changes; the span hierarchy and attributes remain identical.

### What is the span hierarchy for a multi-agent crew?

The root span is the workflow (crew name), with child spans for each agent, and grandchild spans for each LLM call and tool invocation within that agent, allowing attribution of costs and latency to specific agents.

### How do I debug if spans are not appearing in my backend?

Verify the OTLP gRPC port 4317 is reachable with curl, ensure the exporter endpoint is correct, and check that traceloop-sdk is version 0.0.80 or later for stable litellm integration.

## References

1. https://github.com/mem0ai/mem0
2. https://github.com/PrefectHQ/fastmcp
