Human + AI workflow observability

See why your AI workflows fail, stall, or burn budget.

reflect turns local coding-agent sessions and OpenTelemetry GenAI spans into evidence: where time went, which tools looped, which agent burned tokens, and what context or skill should improve the next run.

message limits overloaded sessions runaway tool loops unclear token burn bad context handoff

What You Get

Observe

One view across agents

Compare sessions, tool usage, failures, model mix, token volume, MCP calls, and estimated cost across Claude Code, Cursor, Copilot, Gemini CLI, OpenCode, and Antigravity.

Understand

Session autopsies

Move from "the agent got stuck" to the specific sequence: retries, broad scans, shell failures, context churn, quota pressure, or tool loops.

Improve

Better future runs

Use real workflow behavior to shape context, skills, helper tools, and operational guidance instead of guessing what would have helped.

60-Second Local Path

Start with reflect. It wires supported agents, reads local telemetry, and opens the same dashboard used by this showcase. Use opentelemetry-hooks directly only when you want capture without the report workflow.

pipx install o11y-reflect
reflect setup
reflect

# no telemetry yet?
reflect --demo

Mission

Agent Agnostic

Observe the workflow across Claude Code, Cursor, Copilot, Gemini CLI, OpenCode, Antigravity, and future agents without tying the product story to one vendor.

Governance at the Edge

Keep control close to the hook boundary, where sessions can be inspected, denied, asked about, or mutated before waste and risk compound.

Vendor Neutral

Build on OpenTelemetry GenAI conventions and local session data so teams can bring their own backend, dashboard, and retention model.

Agent Benchmarking

Compare agents by real workflow outcomes: completion, retries, tool loops, failure rate, token burn, MCP dependency, and operational quality.