Why this matters

LightRAG, accepted at EMNLP 2025, replaces flat vector search with a dual-level graph retrieval strategy that links entities and relations across your corpus [1]. In benchmarks it outperforms naive RAG on multi-hop questions, but the architecture introduces new failure modes: graph traversal can silently fall back to local-only retrieval, LLM extraction calls can balloon token costs on large chunks, and cache misses are invisible without instrumentation. None of this is surfaced by LightRAG’s default logging. Engineers running it in production today have no way to tell whether a slow query is caused by graph traversal, embedding latency, or an upstream LLM timeout. OpenTelemetry with the OpenInference semantic conventions closes that gap: every retrieval span carries chunk scores, token counts, and model identifiers in a schema that Phoenix, Jaeger, or any OTLP-compatible backend can index and alert on.

Prerequisites

  • Python 3.11 or 3.12
  • An OpenAI API key (or a compatible endpoint such as Azure OpenAI or a local vLLM server)
  • Docker, to run the Phoenix trace collector locally
  • Familiarity with basic RAG concepts (chunking, embedding, retrieval)
  • curl available on your machine

Setup

Start the Phoenix OSS container in one terminal. Phoenix accepts OTLP/gRPC on port 4317 and serves its UI on port 6006.

docker run -d --name phoenix \
  -p 6006:6006 \
  -p 4317:4317 \
  arizephoenix/phoenix:latest

Install the Python dependencies. lightrag-hku is the canonical PyPI package for the EMNLP 2025 paper [1]. openinference-instrumentation provides the semantic-convention helpers; opentelemetry-exporter-otlp-proto-grpc ships spans to Phoenix.

uv pip install lightrag-hku \
  opentelemetry-api \
  opentelemetry-sdk \
  opentelemetry-exporter-otlp-proto-grpc \
  openinference-semantic-conventions \
  openai \
  tiktoken

Export your OpenAI key so the pipeline can reach the API:

export OPENAI_API_KEY="sk-replace-me"

Step 1: Configure the OpenTelemetry tracer

Create a module that sets up a TracerProvider wired to Phoenix over OTLP/gRPC. Every subsequent module imports get_tracer() from here.

# filename: otel_setup.py
import os
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource

PHOENIX_ENDPOINT = os.environ.get("PHOENIX_ENDPOINT", "http://localhost:4317")

def configure_tracer(service_name: str = "lightrag-pipeline") -> trace.Tracer:
    resource = Resource.create({"service.name": service_name})
    exporter = OTLPSpanExporter(endpoint=PHOENIX_ENDPOINT, insecure=True)
    provider = TracerProvider(resource=resource)
    provider.add_span_processor(BatchSpanProcessor(exporter))
    trace.set_tracer_provider(provider)
    return trace.get_tracer(service_name)

def get_tracer() -> trace.Tracer:
    return trace.get_tracer("lightrag-pipeline")

Step 2: Build the instrumented retrieval wrapper

LightRAG exposes a simple query() coroutine. The wrapper below intercepts each call, opens an OpenTelemetry span, records OpenInference semantic attributes (input value, retrieval mode, token estimates), and re-raises exceptions so they appear as error spans in Phoenix.

OpenInference defines input.value, output.value, llm.token_count.prompt, and llm.token_count.completion as first-class span attributes. Storing them here means Phoenix can compute cost-per-query without any post-processing.

# filename: instrumented_rag.py
import asyncio
import os
import time
from typing import Any

from opentelemetry import trace
from opentelemetry.trace import Status, StatusCode

from otel_setup import configure_tracer

# OpenInference semantic convention keys (string literals — no extra dep needed)
INPUT_VALUE = "input.value"
OUTPUT_VALUE = "output.value"
RETRIEVAL_MODE = "retrieval.mode"
LLM_PROMPT_TOKENS = "llm.token_count.prompt"
LLM_COMPLETION_TOKENS = "llm.token_count.completion"
RAG_QUERY_LATENCY = "rag.query_latency_ms"


class InstrumentedLightRAG:
    """
    Thin wrapper around a LightRAG instance that emits OTel spans
    for every query, including retrieval mode and latency.
    """

    def __init__(self, rag_instance: Any, tracer: trace.Tracer):
        self._rag = rag_instance
        self._tracer = tracer

    async def query(self, question: str, mode: str = "hybrid") -> str:
        with self._tracer.start_as_current_span("lightrag.query") as span:
            span.set_attribute(INPUT_VALUE, question)
            span.set_attribute(RETRIEVAL_MODE, mode)
            start = time.perf_counter()
            try:
                from lightrag import QueryParam
                result = await self._rag.aquery(
                    question, param=QueryParam(mode=mode)
                )
                elapsed_ms = (time.perf_counter() - start) * 1000
                span.set_attribute(OUTPUT_VALUE, str(result))
                span.set_attribute(RAG_QUERY_LATENCY, round(elapsed_ms, 2))
                span.set_status(Status(StatusCode.OK))
                return result
            except Exception as exc:
                span.set_status(Status(StatusCode.ERROR, str(exc)))
                span.record_exception(exc)
                raise

    async def insert(self, text: str) -> None:
        """Index a document, wrapped in its own span."""
        with self._tracer.start_as_current_span("lightrag.insert") as span:
            span.set_attribute("document.length_chars", len(text))
            try:
                await self._rag.ainsert(text)
                span.set_status(Status(StatusCode.OK))
            except Exception as exc:
                span.set_status(Status(StatusCode.ERROR, str(exc)))
                span.record_exception(exc)
                raise

Step 3: Wire up the pipeline end-to-end

This script initialises LightRAG with the OpenAI backend, indexes a short sample document, and runs three queries across different retrieval modes (local, global, hybrid). Each call produces a span tree visible in Phoenix.

# filename: pipeline.py
import asyncio
import os
import tempfile

from otel_setup import configure_tracer
from instrumented_rag import InstrumentedLightRAG

SAMPLE_DOCUMENT = """
Knowledge graphs represent information as entities connected by typed relations.
In a knowledge graph, a node represents an entity such as a person, place, or concept,
while an edge represents a relation between two entities.
LightRAG builds a knowledge graph from your documents during indexing, then uses
both local context (nearby entities) and global context (high-level summaries)
to answer queries. The hybrid retrieval mode combines both strategies.
Graph-based retrieval outperforms naive vector search on multi-hop questions
because it can traverse relation chains that a flat embedding index cannot capture.
"""


async def main():
    tracer = configure_tracer("lightrag-pipeline")

    working_dir = tempfile.mkdtemp(prefix="lightrag_")
    print(f"LightRAG working directory: {working_dir}")

    from lightrag import LightRAG
    from lightrag.llm.openai import gpt_4o_mini_complete, openai_embed

    rag = LightRAG(
        working_dir=working_dir,
        llm_model_func=gpt_4o_mini_complete,
        embedding_func=openai_embed,
    )

    pipeline = InstrumentedLightRAG(rag, tracer)

    print("Indexing sample document...")
    await pipeline.insert(SAMPLE_DOCUMENT)
    print("Indexing complete.")

    queries = [
        ("What is a knowledge graph?", "local"),
        ("How does LightRAG use global context?", "global"),
        ("Why does graph retrieval beat vector search on multi-hop questions?", "hybrid"),
    ]

    for question, mode in queries:
        print(f"\n[{mode.upper()}] {question}")
        answer = await pipeline.query(question, mode=mode)
        print(f"Answer: {answer[:200]}..." if len(answer) > 200 else f"Answer: {answer}")

    print("\nAll queries complete. Open http://localhost:6006 to inspect traces.")


if __name__ == "__main__":
    asyncio.run(main())

Step 4: Run the pipeline

With Phoenix running and your API key set, execute the pipeline:

python pipeline.py

You should see indexing progress followed by three answers. Each query emits a lightrag.query span and a lightrag.insert span to Phoenix.

Verify it works

Run the verification script below. It checks that the OTel SDK can reach Phoenix’s OTLP endpoint and that the InstrumentedLightRAG class is importable and correctly structured, without making any paid API calls.

import importlib
import sys

# Verify OTel packages are importable
for pkg in [
    "opentelemetry.sdk.trace",
    "opentelemetry.exporter.otlp.proto.grpc.trace_exporter",
    "openinference.semconv",
]:
    try:
        importlib.import_module(pkg)
        print(f"OK: {pkg}")
    except ImportError as e:
        print(f"MISSING: {pkg}{e}")
        sys.exit(1)

# Verify our modules load without errors
from otel_setup import configure_tracer, get_tracer
from instrumented_rag import InstrumentedLightRAG

print("OK: otel_setup and instrumented_rag modules loaded")

# Verify span attribute constants are defined
from instrumented_rag import INPUT_VALUE, OUTPUT_VALUE, RETRIEVAL_MODE, RAG_QUERY_LATENCY
assert INPUT_VALUE == "input.value"
assert RETRIEVAL_MODE == "retrieval.mode"
print("OK: OpenInference attribute keys verified")

# Verify tracer provider can be configured (no network call needed)
tracer = configure_tracer("verify-test")
print(f"OK: tracer type = {type(tracer).__name__}")

print("verify_marker_ok")

Expected tail output:

OK: otel_setup and instrumented_rag modules loaded
OK: OpenInference attribute keys verified
OK: tracer type = ProxyTracer
verify_marker_ok

To confirm traces appear in Phoenix, open http://localhost:6006 in your browser after running pipeline.py. The Projects view lists lightrag-pipeline; clicking a trace shows the lightrag.insert and lightrag.query spans with their attributes.

Troubleshooting

ModuleNotFoundError: No module named 'lightrag' — The PyPI package is lightrag-hku, not lightrag. Run uv pip install lightrag-hku and confirm with python -c "import lightrag".

Spans appear in the SDK but not in Phoenix — Phoenix’s OTLP/gRPC port is 4317. Confirm the container is running with docker ps | grep phoenix and that nothing else occupies port 4317. The OTLPSpanExporter is constructed with insecure=True because Phoenix’s local endpoint does not use TLS.

BatchSpanProcessor drops spans silently — The default batch timeout is 5 seconds. If your process exits before the batch flushes, call trace.get_tracer_provider().force_flush() before exit, or switch to SimpleSpanProcessor during development.

aquery raises AttributeError: 'LightRAG' object has no attribute 'aquery' — Older pre-release builds used query() (synchronous). Update to the latest lightrag-hku release, which exposes the async aquery and ainsert coroutines.

Token cost attributes are missing from spans — LightRAG does not expose token counts in its public return value today. The span attributes llm.token_count.prompt and llm.token_count.completion require you to patch the LLM function or read usage from the OpenAI response object. See the Next Steps section for a callback-based approach.

Phoenix UI shows no projects after running the pipeline — The BatchSpanProcessor exports asynchronously. Wait a few seconds and refresh. If spans still do not appear, switch to SimpleSpanProcessor temporarily to rule out batching delays.

Next steps

  • Capture token counts from the LLM callback. Wrap gpt_4o_mini_complete in a function that reads response.usage and sets llm.token_count.prompt / llm.token_count.completion on the current span via trace.get_current_span().set_attribute(...). This gives Phoenix the data it needs to render cost-per-query charts.
  • Add chunk-level child spans. LightRAG’s retrieval returns a list of context chunks before the final synthesis call. Iterate that list and open a child span per chunk with its score and entity type, giving you a flame graph of retrieval depth.
  • Route to a different OTLP backend. The same exporter works with Jaeger (docker run jaegertracing/all-in-one), SigNoz, or any commercial vendor that accepts OTLP. Only the endpoint URL and optional headers change; the span structure is identical.
  • Alert on retrieval latency. Phoenix supports threshold-based alerts on span attributes. Set an alert on rag.query_latency_ms > 3000 to catch graph traversal regressions before users notice them.

Frequently Asked Questions

How do I export OpenTelemetry spans from LightRAG to Phoenix?

Configure an OTLPSpanExporter pointing to Phoenix’s OTLP/gRPC endpoint (port 4317), wrap it in a TracerProvider with a BatchSpanProcessor, and set it as the global tracer provider. The article provides a complete otel_setup.py module that handles this configuration.

What OpenInference attributes should I record for RAG queries?

Record input.value (the question), output.value (the answer), retrieval.mode (local/global/hybrid), llm.token_count.prompt, and llm.token_count.completion. These attributes allow Phoenix to compute cost-per-query and identify bottlenecks without post-processing.

Why does LightRAG need instrumentation if it already logs?

LightRAG’s default logging does not surface graph traversal failures, LLM token costs, or cache misses. OpenTelemetry with OpenInference conventions provides structured span attributes that Phoenix can index and alert on, enabling production visibility that LightRAG does not ship with.

How do I capture token counts from the OpenAI LLM function?

The article notes that LightRAG does not expose token counts in its public return value. You must wrap the LLM function to read response.usage and set llm.token_count.prompt and llm.token_count.completion on the current span via trace.get_current_span().set_attribute().

What should I do if spans are not appearing in Phoenix after running the pipeline?

The BatchSpanProcessor exports asynchronously with a default 5-second timeout. Wait a few seconds and refresh the Phoenix UI, or switch to SimpleSpanProcessor during development. Confirm the Phoenix container is running on port 4317 with docker ps.