Open source observability for AI agents. See every run, every failure, every dollar. The moment it happens.

Open source
Three-line install
Any custom agent supported

Works with

LangChain
LangChain
CrewAI
CrewAI
LlamaIndex
LlamaIndex
OpenAI
OpenAI
Anthropic
Anthropic
OpenClaw
OpenClaw
Hermes
Hermes
LangChain
LangChain
CrewAI
CrewAI
LlamaIndex
LlamaIndex
OpenAI
OpenAI
Anthropic
Anthropic
OpenClaw
OpenClaw
Hermes
Hermes

Your agents are running. You have no idea how they are performing.

The first sign is usually a surprise invoice.

$0burned before you noticed it.
1 in 3agent failures go undetected until a user reports it.
0visibility into which call caused it.

Most developers find out too late. AgentMetrics tells you first.

What you see from run one

agentmetrics.dev
Live
research-agent
Live

Total runs

2,847

Avg cost

$0.031

Avg latency

2.3s

Success rate

94.2%

Runs / cost, last 12h

RunDurationCostTokens
#48212.1s$0.0283,842ok
#48208.4s$0.09411,201retry
#48191.9s$0.0314,102ok
#48182.3s$0.0293,956ok

Switch to claude-3-haiku for 90% of runs

Save $612/mo

Complete visibility
into every agent run.

Performance

How long every step takes and exactly where your agent slows down.

Spot the bottleneck before your users do.

Cost

What each run costs, which model is spending the most, and why.

Stop paying for runs that do not work.

Quality

Which runs failed, what the top error signatures are, and how the failure rate trended this week.

Fix the right thing first.

Reliability

When retry storms hit, how bad they got, and what triggered them.

Know the moment a threshold is crossed.

Business

Team-wide fleet view, per-agent SLA monitoring, and AI-generated cost optimization recommendations.

The number your CFO will ask for.

From install to production monitoring in minutes.

01

Install AgentMetrics

pip install agentmetrics. Add the decorator. Run your agent. Works with LangChain, CrewAI, LlamaIndex, Anthropic, OpenAI Agents, OpenClaw, Hermes, and any custom Python code.

pip install agentmetrics

@agentmetrics.track()
def my_agent(task):
    return llm.complete(task)
02

Live from run one

Every token, every tool call, every retry — captured automatically from the first run. No sampling. No extra config.

Tokens4,218 in / 891 out
Latency2.3s avg
Retries12 detected
Cost$0.031 / run
03

Know exactly what to fix

See which agents burn the most, which fail silently, and what each run costs. The full picture, from run one to a million.

Before

$1,240/mo

After

$498/mo

See exactly how your agents are performing.

Everyone who runs AgentMetrics finds something worth fixing in the first run.

Open source. MIT licensed.