AgentMetrics | Open-Source Observability for AI Agents

Open source observability for AI agents. See every run, every failure, every dollar. The moment it happens.

Connect your first agent View Docs

Open source

Three-line install

Any custom agent supported

Works with

LangChain

CrewAI

LlamaIndex

OpenAI

Anthropic

OpenClaw

Hermes

LangChain

CrewAI

LlamaIndex

OpenAI

Anthropic

OpenClaw

Hermes

Your agents are running. You have no idea how they are performing.

The first sign is usually a surprise invoice.

$0burned before you noticed it.

1 in 3agent failures go undetected until a user reports it.

0visibility into which call caused it.

Most developers find out too late.
AgentMetrics tells you first.

What you see from run one

agentmetrics.dev

Live

Agents

research-agent

Live

Total runs

2,847

Avg cost

$0.031

Avg latency

2.3s

Success rate

94.2%

Runs / cost, last 12h

RunDurationCostTokens

#48212.1s$0.0283,842ok

#48208.4s$0.09411,201retry

#48191.9s$0.0314,102ok

#48182.3s$0.0293,956ok

Switch to claude-3-haiku for 90% of runs

Save $612/mo

Complete visibility
into every agent run.

Performance

How long every step takes and exactly where your agent slows down.

Spot the bottleneck before your users do.

Cost

What each run costs, which model is spending the most, and why.

Stop paying for runs that do not work.

Quality

Which runs failed, what the top error signatures are, and how the failure rate trended this week.

Fix the right thing first.

Reliability

When retry storms hit, how bad they got, and what triggered them.

Know the moment a threshold is crossed.

Business

Team-wide fleet view, per-agent SLA monitoring, and AI-generated cost optimization recommendations.

The number your CFO will ask for.

From install to production monitoring in minutes.

Install AgentMetrics

pip install agentmetrics. Add the decorator. Run your agent. Works with LangChain, CrewAI, LlamaIndex, Anthropic, OpenAI Agents, OpenClaw, Hermes, and any custom Python code.

pip install agentmetrics

@agentmetrics.track()
def my_agent(task):
    return llm.complete(task)

Live from run one

Every token, every tool call, every retry — captured automatically from the first run. No sampling. No extra config.

Tokens4,218 in / 891 out

Latency2.3s avg

Retries12 detected

Cost$0.031 / run

Know exactly what to fix

See which agents burn the most, which fail silently, and what each run costs. The full picture, from run one to a million.

Before

$1,240/mo

After

$498/mo