Why this matters
Most agent tutorials start with a framework: LangGraph, CrewAI, AutoGen. That’s fine for production, but it hides the mechanics. When a tool call silently fails, or the loop runs forever, or token costs spike, you need to know exactly what the runtime is doing at the protocol level.
The learn-claude-code repository [1] demonstrates that a working agent harness needs fewer than 200 lines of bash. The core loop is: send a chat-completion request, check whether the model returned a tool_calls array, dispatch the named function, append the result as a tool role message, and repeat. Everything else (retry logic, cost tracking, trace export) is additive.
This tutorial builds that loop from scratch. You’ll write the tool dispatcher in Python (for clean JSON handling), the orchestration loop in bash (for transparency), and a structured logger that emits one JSON object per step. The log format is compatible with LangSmith’s batch-trace ingest API, so you can review full traces without running a framework.
Prerequisites
- Python 3.11 or later
- bash 4+ (macOS ships bash 3; install bash via Homebrew or use zsh with
emulate bash) jq1.6+ installed and on PATH- An API key for any OpenAI-compatible endpoint (OpenAI, Together, Groq, a local vLLM server, etc.)
- Optional: a LangSmith account and
LANGSMITH_API_KEYfor trace upload
Setup
Install the Python dependencies. The harness uses httpx for HTTP calls and rich for readable terminal output during development.
uv pip install httpx rich python-dotenv
Export your endpoint credentials. The harness reads these three variables; swap the base URL for any OpenAI-compatible server.
export OPENAI_API_KEY="sk-replace-me"
export OPENAI_BASE_URL="https://api.openai.com/v1"
export AGENT_MODEL="gpt-4o-mini"
Step 1: Define the tools
The harness ships with three tools that run entirely in-process: read_file, write_file, and shell_exec. Each tool is a Python function. A registry maps the JSON schema name to the callable.
# filename: tools.py
import subprocess
import json
from pathlib import Path
def read_file(path: str) -> dict:
"""Read a file and return its contents."""
p = Path(path)
if not p.exists():
return {"error": f"File not found: {path}"}
try:
return {"content": p.read_text(errors="replace"), "size": p.stat().st_size}
except Exception as exc:
return {"error": str(exc)}
def write_file(path: str, content: str) -> dict:
"""Write content to a file, creating parent directories as needed."""
try:
p = Path(path)
p.parent.mkdir(parents=True, exist_ok=True)
p.write_text(content)
return {"written": p.stat().st_size, "path": str(p)}
except Exception as exc:
return {"error": str(exc)}
def shell_exec(command: str, timeout: int = 10) -> dict:
"""Run a shell command and return stdout, stderr, and exit code."""
try:
result = subprocess.run(
command,
shell=True,
capture_output=True,
text=True,
timeout=timeout,
)
return {
"stdout": result.stdout,
"stderr": result.stderr,
"returncode": result.returncode,
}
except subprocess.TimeoutExpired:
return {"error": f"Command timed out after {timeout}s"}
except Exception as exc:
return {"error": str(exc)}
# JSON schemas sent to the model in the `tools` array
TOOL_SCHEMAS = [
{
"type": "function",
"function": {
"name": "read_file",
"description": "Read the contents of a file at the given path.",
"parameters": {
"type": "object",
"properties": {
"path": {"type": "string", "description": "Absolute or relative file path"}
},
"required": ["path"],
},
},
},
{
"type": "function",
"function": {
"name": "write_file",
"description": "Write text content to a file.",
"parameters": {
"type": "object",
"properties": {
"path": {"type": "string"},
"content": {"type": "string"},
},
"required": ["path", "content"],
},
},
},
{
"type": "function",
"function": {
"name": "shell_exec",
"description": "Execute a shell command and return its output.",
"parameters": {
"type": "object",
"properties": {
"command": {"type": "string"},
"timeout": {"type": "integer", "default": 10},
},
"required": ["command"],
},
},
},
]
# Dispatch table: name -> callable
TOOL_REGISTRY = {
"read_file": read_file,
"write_file": write_file,
"shell_exec": shell_exec,
}
def dispatch(name: str, arguments_json: str) -> str:
"""Call the named tool and return its result as a JSON string."""
if name not in TOOL_REGISTRY:
return json.dumps({"error": f"Unknown tool: {name}"})
try:
args = json.loads(arguments_json)
except json.JSONDecodeError as exc:
return json.dumps({"error": f"Bad arguments JSON: {exc}"})
result = TOOL_REGISTRY[name](**args)
return json.dumps(result)
Step 2: Write the structured logger
Every agent step emits one JSON object to a log file. The schema mirrors the fields LangSmith expects for a run: run_id, parent_run_id, name, inputs, outputs, start_time, end_time, and run_type. This lets you cat agent_trace.jsonl | jq . during a run and upload the file to LangSmith afterward.
# filename: logger.py
import json
import time
import uuid
from pathlib import Path
from typing import Any
LOG_FILE = Path("/workspace/agent_trace.jsonl")
class StepLogger:
"""Append one JSON line per agent step to a JSONL trace file."""
def __init__(self, session_id: str | None = None):
self.session_id = session_id or str(uuid.uuid4())
LOG_FILE.parent.mkdir(parents=True, exist_ok=True)
def log(
self,
name: str,
run_type: str,
inputs: dict[str, Any],
outputs: dict[str, Any],
start_time: float,
end_time: float,
parent_run_id: str | None = None,
error: str | None = None,
) -> str:
run_id = str(uuid.uuid4())
record = {
"run_id": run_id,
"session_id": self.session_id,
"parent_run_id": parent_run_id,
"name": name,
"run_type": run_type,
"inputs": inputs,
"outputs": outputs,
"error": error,
"start_time": start_time,
"end_time": end_time,
"latency_ms": round((end_time - start_time) * 1000, 2),
}
with LOG_FILE.open("a") as fh:
fh.write(json.dumps(record) + "\n")
return run_id
Step 3: Build the agent loop
The loop is the heart of the harness. It mirrors the architecture described in [1]: send messages, check finish_reason, dispatch tools if needed, append results, repeat until the model returns stop or a step limit is reached.
# filename: agent.py
import json
import os
import time
from typing import Any
import httpx
from rich.console import Console
from rich.panel import Panel
from tools import TOOL_SCHEMAS, dispatch
from logger import StepLogger
console = Console()
API_KEY = os.environ.get("OPENAI_API_KEY", "")
BASE_URL = os.environ.get("OPENAI_BASE_URL", "https://api.openai.com/v1").rstrip("/")
MODEL = os.environ.get("AGENT_MODEL", "gpt-4o-mini")
MAX_STEPS = int(os.environ.get("AGENT_MAX_STEPS", "10"))
def chat_completion(messages: list[dict], tools: list[dict]) -> dict[str, Any]:
"""Call the chat completions endpoint and return the parsed response body."""
headers = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json",
}
payload = {
"model": MODEL,
"messages": messages,
"tools": tools,
"tool_choice": "auto",
}
with httpx.Client(timeout=60) as client:
resp = client.post(f"{BASE_URL}/chat/completions", headers=headers, json=payload)
resp.raise_for_status()
return resp.json()
def run_agent(user_prompt: str) -> str:
"""Run the tool-call loop and return the final assistant message."""
logger = StepLogger()
messages: list[dict] = [
{
"role": "system",
"content": (
"You are a helpful assistant with access to file system and shell tools. "
"Use the tools to complete the user's request. "
"When you have finished, respond with a plain text summary."
),
},
{"role": "user", "content": user_prompt},
]
console.print(Panel(f"[bold cyan]Task:[/bold cyan] {user_prompt}", expand=False))
parent_run_id: str | None = None
for step in range(1, MAX_STEPS + 1):
console.print(f"\n[dim]--- step {step} ---[/dim]")
t0 = time.time()
try:
response = chat_completion(messages, TOOL_SCHEMAS)
except httpx.HTTPStatusError as exc:
console.print(f"[red]HTTP error: {exc.response.status_code}[/red]")
raise
t1 = time.time()
choice = response["choices"][0]
message = choice["message"]
finish_reason = choice["finish_reason"]
usage = response.get("usage", {})
# Log the LLM call
llm_run_id = logger.log(
name=f"llm_step_{step}",
run_type="llm",
inputs={"messages": messages, "model": MODEL},
outputs={"message": message, "usage": usage},
start_time=t0,
end_time=t1,
parent_run_id=parent_run_id,
)
if parent_run_id is None:
parent_run_id = llm_run_id
# Append assistant message to history
messages.append(message)
if finish_reason == "stop":
final_text = message.get("content") or ""
console.print(Panel(f"[green]{final_text}[/green]", title="Final answer"))
logger.log(
name="agent_finish",
run_type="chain",
inputs={"prompt": user_prompt},
outputs={"answer": final_text, "steps": step},
start_time=t0,
end_time=t1,
parent_run_id=parent_run_id,
)
return final_text
if finish_reason == "tool_calls":
tool_calls = message.get("tool_calls", [])
for tc in tool_calls:
fn_name = tc["function"]["name"]
fn_args = tc["function"]["arguments"]
call_id = tc["id"]
console.print(f" [yellow]tool call:[/yellow] {fn_name}({fn_args[:80]})")
t2 = time.time()
result_json = dispatch(fn_name, fn_args)
t3 = time.time()
logger.log(
name=fn_name,
run_type="tool",
inputs={"arguments": fn_args},
outputs={"result": result_json},
start_time=t2,
end_time=t3,
parent_run_id=llm_run_id,
)
messages.append(
{
"role": "tool",
"tool_call_id": call_id,
"content": result_json,
}
)
continue
# Unexpected finish reason
console.print(f"[red]Unexpected finish_reason: {finish_reason}[/red]")
break
return "Agent reached step limit without a final answer."
The core loop is: send a chat-completion request, check whether the model returned a
tool_callsarray, dispatch the named function, append the result as atoolrole message, and repeat.
Step 4: Write the bash entry point
The bash wrapper does three things: validates that required environment variables are set, accepts a task string as its first argument, and calls the Python agent. It also pretty-prints the trace after the run using jq.
cat > /workspace/run_agent.sh << 'BASH_EOF'
#!/usr/bin/env bash
set -euo pipefail
REQUIRED_VARS=(OPENAI_API_KEY OPENAI_BASE_URL AGENT_MODEL)
for var in "${REQUIRED_VARS[@]}"; do
if [[ -z "${!var:-}" ]]; then
echo "ERROR: environment variable $var is not set" >&2
exit 1
fi
done
TASK="${1:-List the files in /workspace and summarize what you see.}"
LOG_FILE="/workspace/agent_trace.jsonl"
# Clear previous trace
rm -f "$LOG_FILE"
echo "Running agent with task: $TASK"
python /workspace/agent.py "$TASK"
echo ""
echo "=== Trace summary ==="
if command -v jq &>/dev/null && [[ -f "$LOG_FILE" ]]; then
jq -r '[.name, .run_type, (.latency_ms | tostring) + "ms"] | join(" ")' "$LOG_FILE"
else
echo "(jq not found or no trace file — skipping summary)"
fi
BASH_EOF
chmod +x /workspace/run_agent.sh
echo "run_agent.sh written"
The agent module currently reads the task from sys.argv[1] when invoked as a script. Add that entry point to the bottom of agent.py:
# filename: agent_main.py
import sys
from agent import run_agent
if __name__ == "__main__":
task = " ".join(sys.argv[1:]) if len(sys.argv) > 1 else "List the files in /workspace."
run_agent(task)
Step 5: Smoke-test the tool dispatcher
Before running the full loop (which requires an API key), verify that the tool dispatcher works correctly in isolation. This test calls each tool directly and checks the return shapes.
import json
from tools import dispatch
# write_file
result = json.loads(dispatch("write_file", json.dumps({"path": "/workspace/hello.txt", "content": "hello world"})))
assert "written" in result, f"write_file failed: {result}"
print("write_file OK:", result)
# read_file
result = json.loads(dispatch("read_file", json.dumps({"path": "/workspace/hello.txt"})))
assert result["content"] == "hello world", f"read_file mismatch: {result}"
print("read_file OK:", result)
# shell_exec
result = json.loads(dispatch("shell_exec", json.dumps({"command": "echo ping"})))
assert result["returncode"] == 0, f"shell_exec failed: {result}"
print("shell_exec OK:", result)
# unknown tool
result = json.loads(dispatch("nonexistent", "{}"))
assert "error" in result
print("unknown tool OK:", result)
print("All tool dispatch tests passed.")
Step 6: Smoke-test the logger
import time
import json
from pathlib import Path
from logger import StepLogger, LOG_FILE
# Remove any previous trace
LOG_FILE.unlink(missing_ok=True)
logger = StepLogger(session_id="test-session")
t0 = time.time()
time.sleep(0.01)
t1 = time.time()
run_id = logger.log(
name="test_llm",
run_type="llm",
inputs={"messages": [{"role": "user", "content": "hello"}]},
outputs={"message": {"role": "assistant", "content": "hi"}},
start_time=t0,
end_time=t1,
)
lines = LOG_FILE.read_text().strip().splitlines()
assert len(lines) == 1, f"Expected 1 log line, got {len(lines)}"
record = json.loads(lines[0])
assert record["run_id"] == run_id
assert record["run_type"] == "llm"
assert record["latency_ms"] > 0
print("Logger test passed. run_id:", run_id)
print("Logged record keys:", list(record.keys()))
Verify it works
Run the full agent against a real endpoint. Because this block requires a live API key, it is marked as skipped in the sandbox but is the command you run on your own machine.
export OPENAI_API_KEY="your-key-here"
export OPENAI_BASE_URL="https://api.openai.com/v1"
export AGENT_MODEL="gpt-4o-mini"
bash /workspace/run_agent.sh "Write a Python file at /workspace/fib.py that prints the first 10 Fibonacci numbers, then run it and show me the output."
After the run, inspect the trace:
jq -r '[.name, .run_type, (.latency_ms | tostring) + "ms"] | join(" ")' /workspace/agent_trace.jsonl
Expected output shape (step count varies by task):
llm_step_1 llm 843ms
shell_exec tool 12ms
llm_step_2 llm 612ms
write_file tool 3ms
llm_step_3 llm 590ms
shell_exec tool 8ms
llm_step_4 llm 480ms
agent_finish chain 0ms
To upload the trace to LangSmith for visual review, use the batch-ingest endpoint:
curl -s -X POST https://api.smith.langchain.com/runs/batch \
-H "x-api-key: $LANGSMITH_API_KEY" \
-H "Content-Type: application/json" \
-d "{\"post\": $(jq -s '.' /workspace/agent_trace.jsonl)}" | jq .status
Troubleshooting
OPENAI_API_KEY is not set even after export. The bash script uses set -euo pipefail, which means unset variables cause an immediate exit. Run export OPENAI_API_KEY=... (not just OPENAI_API_KEY=...) before calling the script.
httpx.HTTPStatusError: 401 Unauthorized. The Authorization header is Bearer <key>. Some compatible endpoints (Together, Groq) use the same scheme; others (Azure OpenAI) use api-key as the header name. If you’re targeting Azure, change the header in chat_completion to "api-key": API_KEY and update BASE_URL to include the deployment path.
The loop hits MAX_STEPS without finishing. The model is calling tools in a cycle. Add a console.print inside the loop to inspect the full message list, or reduce the task scope. You can also set AGENT_MAX_STEPS=20 for longer tasks.
jq: command not found on the trace summary. Install jq with apt-get install -y jq (Debian/Ubuntu) or brew install jq (macOS). The bash script degrades gracefully and skips the summary if jq is absent.
ModuleNotFoundError: No module named 'tools' when running agent_main.py from a different directory. The harness assumes all .py files are in the same directory. Run with cd /workspace && python agent_main.py "your task" or add /workspace to PYTHONPATH.
Tool result is silently truncated. The content field of a tool role message is a plain string. If result_json is very large (a file with thousands of lines), some models will truncate or ignore it. Add a MAX_TOOL_OUTPUT constant in tools.py and slice result_json before appending.
Next steps
- Add a
search_webtool using the SerpAPI or Brave Search API. The dispatcher pattern intools.pymakes adding a new tool a three-step process: write the function, add the JSON schema toTOOL_SCHEMAS, and add the entry toTOOL_REGISTRY. - Stream the LLM response by switching to the streaming endpoint (
stream: true) and accumulating delta chunks. This lets the terminal show partial output in real time and reduces perceived latency on long answers. - Export to OpenTelemetry by replacing the JSONL logger with an OTLP span exporter. Each
logger.logcall maps directly to a span:run_idbecomes the span ID,parent_run_idbecomes the parent span ID, andlatency_msbecomes the span duration. - Wrap the loop in a FastAPI endpoint to turn the harness into a stateless HTTP service. POST a task, get back the final answer and a
session_idyou can use to retrieve the trace.
FAQ
How does the agent loop handle tool calls from the model?
The loop checks the model’s finish_reason field. When it equals tool_calls, the agent extracts each tool name and arguments from the response, dispatches the function via a Python registry, and appends the result as a tool role message before sending the next request.
What tools are included in the harness by default?
The harness ships with three in-process tools: read_file (reads file contents), write_file (writes text to a file), and shell_exec (runs shell commands with timeout). Each tool is a Python function with a corresponding JSON schema sent to the model.
How can I upload the agent trace to LangSmith?
The logger writes one JSON object per step to a JSONL file with fields matching LangSmith’s run schema (run_id, parent_run_id, name, inputs, outputs, start_time, end_time). After the run completes, POST the file to LangSmith’s batch-ingest endpoint using curl with your API key.
What happens if the model returns an unexpected finish reason?
The loop logs a warning and breaks out of the step iteration. The agent returns a message indicating it reached the step limit without a final answer.
Can this harness work with endpoints other than OpenAI?
Yes. The harness works with any OpenAI-compatible endpoint (Together, Groq, vLLM, Azure OpenAI). Change the OPENAI_BASE_URL environment variable and adjust the Authorization header if the endpoint uses a different scheme.