Why this matters

Most agent tutorials start with a framework: LangGraph, CrewAI, AutoGen. That’s fine for production, but it hides the mechanics. When a tool call silently fails, or the loop runs forever, or token costs spike, you need to know exactly what the runtime is doing at the protocol level.

The learn-claude-code repository [1] demonstrates that a working agent harness needs fewer than 200 lines of bash. The core loop is: send a chat-completion request, check whether the model returned a tool_calls array, dispatch the named function, append the result as a tool role message, and repeat. Everything else (retry logic, cost tracking, trace export) is additive.

This tutorial builds that loop from scratch. You’ll write the tool dispatcher in Python (for clean JSON handling), the orchestration loop in bash (for transparency), and a structured logger that emits one JSON object per step. The log format is compatible with LangSmith’s batch-trace ingest API, so you can review full traces without running a framework.

Prerequisites

  • Python 3.11 or later
  • bash 4+ (macOS ships bash 3; install bash via Homebrew or use zsh with emulate bash)
  • jq 1.6+ installed and on PATH
  • An API key for any OpenAI-compatible endpoint (OpenAI, Together, Groq, a local vLLM server, etc.)
  • Optional: a LangSmith account and LANGSMITH_API_KEY for trace upload

Setup

Install the Python dependencies. The harness uses httpx for HTTP calls and rich for readable terminal output during development.

uv pip install httpx rich python-dotenv

Export your endpoint credentials. The harness reads these three variables; swap the base URL for any OpenAI-compatible server.

export OPENAI_API_KEY="sk-replace-me"
export OPENAI_BASE_URL="https://api.openai.com/v1"
export AGENT_MODEL="gpt-4o-mini"

Step 1: Define the tools

The harness ships with three tools that run entirely in-process: read_file, write_file, and shell_exec. Each tool is a Python function. A registry maps the JSON schema name to the callable.

# filename: tools.py
import subprocess
import json
from pathlib import Path


def read_file(path: str) -> dict:
    """Read a file and return its contents."""
    p = Path(path)
    if not p.exists():
        return {"error": f"File not found: {path}"}
    try:
        return {"content": p.read_text(errors="replace"), "size": p.stat().st_size}
    except Exception as exc:
        return {"error": str(exc)}


def write_file(path: str, content: str) -> dict:
    """Write content to a file, creating parent directories as needed."""
    try:
        p = Path(path)
        p.parent.mkdir(parents=True, exist_ok=True)
        p.write_text(content)
        return {"written": p.stat().st_size, "path": str(p)}
    except Exception as exc:
        return {"error": str(exc)}


def shell_exec(command: str, timeout: int = 10) -> dict:
    """Run a shell command and return stdout, stderr, and exit code."""
    try:
        result = subprocess.run(
            command,
            shell=True,
            capture_output=True,
            text=True,
            timeout=timeout,
        )
        return {
            "stdout": result.stdout,
            "stderr": result.stderr,
            "returncode": result.returncode,
        }
    except subprocess.TimeoutExpired:
        return {"error": f"Command timed out after {timeout}s"}
    except Exception as exc:
        return {"error": str(exc)}


# JSON schemas sent to the model in the `tools` array
TOOL_SCHEMAS = [
    {
        "type": "function",
        "function": {
            "name": "read_file",
            "description": "Read the contents of a file at the given path.",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {"type": "string", "description": "Absolute or relative file path"}
                },
                "required": ["path"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "write_file",
            "description": "Write text content to a file.",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {"type": "string"},
                    "content": {"type": "string"},
                },
                "required": ["path", "content"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "shell_exec",
            "description": "Execute a shell command and return its output.",
            "parameters": {
                "type": "object",
                "properties": {
                    "command": {"type": "string"},
                    "timeout": {"type": "integer", "default": 10},
                },
                "required": ["command"],
            },
        },
    },
]

# Dispatch table: name -> callable
TOOL_REGISTRY = {
    "read_file": read_file,
    "write_file": write_file,
    "shell_exec": shell_exec,
}


def dispatch(name: str, arguments_json: str) -> str:
    """Call the named tool and return its result as a JSON string."""
    if name not in TOOL_REGISTRY:
        return json.dumps({"error": f"Unknown tool: {name}"})
    try:
        args = json.loads(arguments_json)
    except json.JSONDecodeError as exc:
        return json.dumps({"error": f"Bad arguments JSON: {exc}"})
    result = TOOL_REGISTRY[name](**args)
    return json.dumps(result)

Step 2: Write the structured logger

Every agent step emits one JSON object to a log file. The schema mirrors the fields LangSmith expects for a run: run_id, parent_run_id, name, inputs, outputs, start_time, end_time, and run_type. This lets you cat agent_trace.jsonl | jq . during a run and upload the file to LangSmith afterward.

# filename: logger.py
import json
import time
import uuid
from pathlib import Path
from typing import Any

LOG_FILE = Path("/workspace/agent_trace.jsonl")


class StepLogger:
    """Append one JSON line per agent step to a JSONL trace file."""

    def __init__(self, session_id: str | None = None):
        self.session_id = session_id or str(uuid.uuid4())
        LOG_FILE.parent.mkdir(parents=True, exist_ok=True)

    def log(
        self,
        name: str,
        run_type: str,
        inputs: dict[str, Any],
        outputs: dict[str, Any],
        start_time: float,
        end_time: float,
        parent_run_id: str | None = None,
        error: str | None = None,
    ) -> str:
        run_id = str(uuid.uuid4())
        record = {
            "run_id": run_id,
            "session_id": self.session_id,
            "parent_run_id": parent_run_id,
            "name": name,
            "run_type": run_type,
            "inputs": inputs,
            "outputs": outputs,
            "error": error,
            "start_time": start_time,
            "end_time": end_time,
            "latency_ms": round((end_time - start_time) * 1000, 2),
        }
        with LOG_FILE.open("a") as fh:
            fh.write(json.dumps(record) + "\n")
        return run_id

Step 3: Build the agent loop

The loop is the heart of the harness. It mirrors the architecture described in [1]: send messages, check finish_reason, dispatch tools if needed, append results, repeat until the model returns stop or a step limit is reached.

# filename: agent.py
import json
import os
import time
from typing import Any

import httpx
from rich.console import Console
from rich.panel import Panel

from tools import TOOL_SCHEMAS, dispatch
from logger import StepLogger

console = Console()

API_KEY = os.environ.get("OPENAI_API_KEY", "")
BASE_URL = os.environ.get("OPENAI_BASE_URL", "https://api.openai.com/v1").rstrip("/")
MODEL = os.environ.get("AGENT_MODEL", "gpt-4o-mini")
MAX_STEPS = int(os.environ.get("AGENT_MAX_STEPS", "10"))


def chat_completion(messages: list[dict], tools: list[dict]) -> dict[str, Any]:
    """Call the chat completions endpoint and return the parsed response body."""
    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json",
    }
    payload = {
        "model": MODEL,
        "messages": messages,
        "tools": tools,
        "tool_choice": "auto",
    }
    with httpx.Client(timeout=60) as client:
        resp = client.post(f"{BASE_URL}/chat/completions", headers=headers, json=payload)
        resp.raise_for_status()
        return resp.json()


def run_agent(user_prompt: str) -> str:
    """Run the tool-call loop and return the final assistant message."""
    logger = StepLogger()
    messages: list[dict] = [
        {
            "role": "system",
            "content": (
                "You are a helpful assistant with access to file system and shell tools. "
                "Use the tools to complete the user's request. "
                "When you have finished, respond with a plain text summary."
            ),
        },
        {"role": "user", "content": user_prompt},
    ]

    console.print(Panel(f"[bold cyan]Task:[/bold cyan] {user_prompt}", expand=False))
    parent_run_id: str | None = None

    for step in range(1, MAX_STEPS + 1):
        console.print(f"\n[dim]--- step {step} ---[/dim]")
        t0 = time.time()

        try:
            response = chat_completion(messages, TOOL_SCHEMAS)
        except httpx.HTTPStatusError as exc:
            console.print(f"[red]HTTP error: {exc.response.status_code}[/red]")
            raise

        t1 = time.time()
        choice = response["choices"][0]
        message = choice["message"]
        finish_reason = choice["finish_reason"]
        usage = response.get("usage", {})

        # Log the LLM call
        llm_run_id = logger.log(
            name=f"llm_step_{step}",
            run_type="llm",
            inputs={"messages": messages, "model": MODEL},
            outputs={"message": message, "usage": usage},
            start_time=t0,
            end_time=t1,
            parent_run_id=parent_run_id,
        )
        if parent_run_id is None:
            parent_run_id = llm_run_id

        # Append assistant message to history
        messages.append(message)

        if finish_reason == "stop":
            final_text = message.get("content") or ""
            console.print(Panel(f"[green]{final_text}[/green]", title="Final answer"))
            logger.log(
                name="agent_finish",
                run_type="chain",
                inputs={"prompt": user_prompt},
                outputs={"answer": final_text, "steps": step},
                start_time=t0,
                end_time=t1,
                parent_run_id=parent_run_id,
            )
            return final_text

        if finish_reason == "tool_calls":
            tool_calls = message.get("tool_calls", [])
            for tc in tool_calls:
                fn_name = tc["function"]["name"]
                fn_args = tc["function"]["arguments"]
                call_id = tc["id"]

                console.print(f"  [yellow]tool call:[/yellow] {fn_name}({fn_args[:80]})")

                t2 = time.time()
                result_json = dispatch(fn_name, fn_args)
                t3 = time.time()

                logger.log(
                    name=fn_name,
                    run_type="tool",
                    inputs={"arguments": fn_args},
                    outputs={"result": result_json},
                    start_time=t2,
                    end_time=t3,
                    parent_run_id=llm_run_id,
                )

                messages.append(
                    {
                        "role": "tool",
                        "tool_call_id": call_id,
                        "content": result_json,
                    }
                )
            continue

        # Unexpected finish reason
        console.print(f"[red]Unexpected finish_reason: {finish_reason}[/red]")
        break

    return "Agent reached step limit without a final answer."

The core loop is: send a chat-completion request, check whether the model returned a tool_calls array, dispatch the named function, append the result as a tool role message, and repeat.

Step 4: Write the bash entry point

The bash wrapper does three things: validates that required environment variables are set, accepts a task string as its first argument, and calls the Python agent. It also pretty-prints the trace after the run using jq.

cat > /workspace/run_agent.sh << 'BASH_EOF'
#!/usr/bin/env bash
set -euo pipefail

REQUIRED_VARS=(OPENAI_API_KEY OPENAI_BASE_URL AGENT_MODEL)
for var in "${REQUIRED_VARS[@]}"; do
  if [[ -z "${!var:-}" ]]; then
    echo "ERROR: environment variable $var is not set" >&2
    exit 1
  fi
done

TASK="${1:-List the files in /workspace and summarize what you see.}"
LOG_FILE="/workspace/agent_trace.jsonl"

# Clear previous trace
rm -f "$LOG_FILE"

echo "Running agent with task: $TASK"
python /workspace/agent.py "$TASK"

echo ""
echo "=== Trace summary ==="
if command -v jq &>/dev/null && [[ -f "$LOG_FILE" ]]; then
  jq -r '[.name, .run_type, (.latency_ms | tostring) + "ms"] | join("  ")' "$LOG_FILE"
else
  echo "(jq not found or no trace file — skipping summary)"
fi
BASH_EOF
chmod +x /workspace/run_agent.sh
echo "run_agent.sh written"

The agent module currently reads the task from sys.argv[1] when invoked as a script. Add that entry point to the bottom of agent.py:

# filename: agent_main.py
import sys
from agent import run_agent

if __name__ == "__main__":
    task = " ".join(sys.argv[1:]) if len(sys.argv) > 1 else "List the files in /workspace."
    run_agent(task)

Step 5: Smoke-test the tool dispatcher

Before running the full loop (which requires an API key), verify that the tool dispatcher works correctly in isolation. This test calls each tool directly and checks the return shapes.

import json
from tools import dispatch

# write_file
result = json.loads(dispatch("write_file", json.dumps({"path": "/workspace/hello.txt", "content": "hello world"})))
assert "written" in result, f"write_file failed: {result}"
print("write_file OK:", result)

# read_file
result = json.loads(dispatch("read_file", json.dumps({"path": "/workspace/hello.txt"})))
assert result["content"] == "hello world", f"read_file mismatch: {result}"
print("read_file OK:", result)

# shell_exec
result = json.loads(dispatch("shell_exec", json.dumps({"command": "echo ping"})))
assert result["returncode"] == 0, f"shell_exec failed: {result}"
print("shell_exec OK:", result)

# unknown tool
result = json.loads(dispatch("nonexistent", "{}"))
assert "error" in result
print("unknown tool OK:", result)

print("All tool dispatch tests passed.")

Step 6: Smoke-test the logger

import time
import json
from pathlib import Path
from logger import StepLogger, LOG_FILE

# Remove any previous trace
LOG_FILE.unlink(missing_ok=True)

logger = StepLogger(session_id="test-session")
t0 = time.time()
time.sleep(0.01)
t1 = time.time()

run_id = logger.log(
    name="test_llm",
    run_type="llm",
    inputs={"messages": [{"role": "user", "content": "hello"}]},
    outputs={"message": {"role": "assistant", "content": "hi"}},
    start_time=t0,
    end_time=t1,
)

lines = LOG_FILE.read_text().strip().splitlines()
assert len(lines) == 1, f"Expected 1 log line, got {len(lines)}"
record = json.loads(lines[0])
assert record["run_id"] == run_id
assert record["run_type"] == "llm"
assert record["latency_ms"] > 0
print("Logger test passed. run_id:", run_id)
print("Logged record keys:", list(record.keys()))

Verify it works

Run the full agent against a real endpoint. Because this block requires a live API key, it is marked as skipped in the sandbox but is the command you run on your own machine.

export OPENAI_API_KEY="your-key-here"
export OPENAI_BASE_URL="https://api.openai.com/v1"
export AGENT_MODEL="gpt-4o-mini"
bash /workspace/run_agent.sh "Write a Python file at /workspace/fib.py that prints the first 10 Fibonacci numbers, then run it and show me the output."

After the run, inspect the trace:

jq -r '[.name, .run_type, (.latency_ms | tostring) + "ms"] | join("  ")' /workspace/agent_trace.jsonl

Expected output shape (step count varies by task):

llm_step_1  llm  843ms
shell_exec  tool  12ms
llm_step_2  llm  612ms
write_file  tool  3ms
llm_step_3  llm  590ms
shell_exec  tool  8ms
llm_step_4  llm  480ms
agent_finish  chain  0ms

To upload the trace to LangSmith for visual review, use the batch-ingest endpoint:

curl -s -X POST https://api.smith.langchain.com/runs/batch \
  -H "x-api-key: $LANGSMITH_API_KEY" \
  -H "Content-Type: application/json" \
  -d "{\"post\": $(jq -s '.' /workspace/agent_trace.jsonl)}" | jq .status

Troubleshooting

OPENAI_API_KEY is not set even after export. The bash script uses set -euo pipefail, which means unset variables cause an immediate exit. Run export OPENAI_API_KEY=... (not just OPENAI_API_KEY=...) before calling the script.

httpx.HTTPStatusError: 401 Unauthorized. The Authorization header is Bearer <key>. Some compatible endpoints (Together, Groq) use the same scheme; others (Azure OpenAI) use api-key as the header name. If you’re targeting Azure, change the header in chat_completion to "api-key": API_KEY and update BASE_URL to include the deployment path.

The loop hits MAX_STEPS without finishing. The model is calling tools in a cycle. Add a console.print inside the loop to inspect the full message list, or reduce the task scope. You can also set AGENT_MAX_STEPS=20 for longer tasks.

jq: command not found on the trace summary. Install jq with apt-get install -y jq (Debian/Ubuntu) or brew install jq (macOS). The bash script degrades gracefully and skips the summary if jq is absent.

ModuleNotFoundError: No module named 'tools' when running agent_main.py from a different directory. The harness assumes all .py files are in the same directory. Run with cd /workspace && python agent_main.py "your task" or add /workspace to PYTHONPATH.

Tool result is silently truncated. The content field of a tool role message is a plain string. If result_json is very large (a file with thousands of lines), some models will truncate or ignore it. Add a MAX_TOOL_OUTPUT constant in tools.py and slice result_json before appending.

Next steps

  • Add a search_web tool using the SerpAPI or Brave Search API. The dispatcher pattern in tools.py makes adding a new tool a three-step process: write the function, add the JSON schema to TOOL_SCHEMAS, and add the entry to TOOL_REGISTRY.
  • Stream the LLM response by switching to the streaming endpoint (stream: true) and accumulating delta chunks. This lets the terminal show partial output in real time and reduces perceived latency on long answers.
  • Export to OpenTelemetry by replacing the JSONL logger with an OTLP span exporter. Each logger.log call maps directly to a span: run_id becomes the span ID, parent_run_id becomes the parent span ID, and latency_ms becomes the span duration.
  • Wrap the loop in a FastAPI endpoint to turn the harness into a stateless HTTP service. POST a task, get back the final answer and a session_id you can use to retrieve the trace.

FAQ

How does the agent loop handle tool calls from the model?

The loop checks the model’s finish_reason field. When it equals tool_calls, the agent extracts each tool name and arguments from the response, dispatches the function via a Python registry, and appends the result as a tool role message before sending the next request.

What tools are included in the harness by default?

The harness ships with three in-process tools: read_file (reads file contents), write_file (writes text to a file), and shell_exec (runs shell commands with timeout). Each tool is a Python function with a corresponding JSON schema sent to the model.

How can I upload the agent trace to LangSmith?

The logger writes one JSON object per step to a JSONL file with fields matching LangSmith’s run schema (run_id, parent_run_id, name, inputs, outputs, start_time, end_time). After the run completes, POST the file to LangSmith’s batch-ingest endpoint using curl with your API key.

What happens if the model returns an unexpected finish reason?

The loop logs a warning and breaks out of the step iteration. The agent returns a message indicating it reached the step limit without a final answer.

Can this harness work with endpoints other than OpenAI?

Yes. The harness works with any OpenAI-compatible endpoint (Together, Groq, vLLM, Azure OpenAI). Change the OPENAI_BASE_URL environment variable and adjust the Authorization header if the endpoint uses a different scheme.