You know who's using AI.
You don't know who's using it well.

Maestro shows you.

AI makes velocity metrics meaningless.

One is mastering the craft of working with AI. The other is winging it. Your metrics can't tell.

Sarah ChenSarah Chen
PRs merged11
AI sessions34
Tokens used1.4M
Avg cycle time3.1 hrs
Build pass rate96%
Alex RockwellAlex Rockwell
PRs merged13
AI sessions31
Tokens used1.8M
Avg cycle time2.8 hrs
Build pass rate91%

The new craft of engineering

Well-run agent sessions produce excellent code. Poorly-run sessions produce AI slop.

sarah.chen  ·  auth-token-fix
>session in progress…

The new blind spot

Which engineers are your AI leaders?

Some have already mastered the new craft.

Others are struggling with every session.

And some are still skeptics.

All of their sessions are invisible.
To you. And to each other.

No shared learning. No tribal knowledge. No visibility into what works. And what doesn't.

Sarah Chen
Alex Rockwell
Colleen Turner
David Christensen
Marcus Johnson
Edison Mendez
Derek Harmel
Lisa Park
Christian Wilson
Tom Rodriguez
Ambreen Hasan
Ashlee Pitock

Two agent coding sessions.
Two outcomes.

Add OAuth2 PKCE flow for mobile clients
Prompts9
Focus9
Review9
Efficiency8
Session Quality
9/10
Maestro Insight
High-quality verification loop
Sarah scoped the problem before letting the agent code, and caught the expired-token edge case the agent missed.
Evidence from transcript2 of 38 turns
turn 1·opening prompt·scope before code

Before you write code, list the security tradeoffs of PKCE vs. implicit flow for our mobile case.

turn 23·+18 min·caught edge case

Run the test against an expired refresh token — that's the case I'm worried about.

38 turns14 tools145k tok$4.20PR #1247 merged
Fix auth token refresh race condition
Prompts3
Focus3
Review2
Efficiency5
Session Quality
3/10
Maestro Insight
Agent-led — no verification
Alex gave the agent no context and approved the first approach without testing. The expired-token edge case was never caught.
Evidence from transcript2 of 87 turns
turn 1·opening prompt·no context given

Just fix the token refresh bug.

turn 3·+2 min·blind approval

Ok sure, go ahead.

87 turns31 tools312k tok$8.40PR #1231 open

How it works

Insights powered by a coding agent plugin.

01
Lightweight agent plugin

Maestro installs as a plugin to Claude Code and Codex. It collects session data as engineers work. Deploy via your existing MDM in minutes.

02
Maestro extracts insights

Sessions linked to shipped PRs are analyzed across five dimensions of craft. Only work that reached a PR is evaluated — exploratory sessions stay private.

03
Surfaced in two places

Engineers see their session review as a PR comment — same as a code review, visible to them first. Leaders see team-wide patterns on the Maestro dashboard.

Developer Coding Session
Agent plugin collects session data
session data
Maestro
Session + PR analysis
insights
VibeCheck
Session findings posted as a PR comment — visible to everyone so the team can learn together
Session Scorecards
Surfaces strengths and gaps so the team can grow its practice
Leadership Dashboard
Team-wide patterns and trends — see which engineers are leveling up and where the gaps are

The five dimensions of session craft

Not a vibe. A standard.

Five dimensions. One score per session. Calibrated against shipped outcomes.

Agent Focus

Did the engineer stay scoped, or did the session drift?

HumanLayer's analysis of ~100k sessions found recall degrades when context fill exceeds 40%. Drifting sessions hit that threshold faster — and the agent stops following instructions reliably.

Efficiency

Did they reach the solution with minimal wasted motion?

GitClear's 2026 study of 2,172 developer-weeks found AI power users author 4–10× more durable code than average AI users. Efficiency, not volume, is the separator.

Session Management

Did they control the session, or let the agent run loose?

DORA 2025: code-review time rose 91% and PR size rose 154% in high-AI teams. Scoped sessions — one task per thread — are what keep review from collapsing.

Verification Rigor

Did they verify the agent's output before calling it done?

Stanford/MIT (Mar 2026): 14.3% of AI-generated code contained vulnerabilities vs. 9.1% for human-written. Skipping verification is where that gap surfaces in production.

Prompt Quality

Did they give the agent enough context to succeed?

Practitioners from Anthropic to Cursor converge: a tight plan before editing prevents most mid-session corrections. Boris Cherny attributes "one-shot" implementations to plan quality, not model quality.

Team momentum, at a glance

See which teams are leveling up — and which need coaching.

5 teams · 38 engineers · Last 8 weeks
TeamMembersEffectiveness ▼StrongestWeakest
Platform Engineering8
63-5
Agent Focus+29Verification Rigor−21
Payments Engineering6
56-15
Agent Focus+12Prompt Quality−19
Security Engineering5
51+2
Session Management+5Verification Rigor−18
Product Engineering12
45+6
Agent Focus−3Verification Rigor−25
Data & Analytics7
45-2
Session Management±4Prompt Quality−23
  • Coach Platform and Payments on Verification Rigor.
  • Coach Payments and Data on Prompt Quality.
  • Security has the right habits — it just needs better verification.
"My team was shipping more code than ever. That's not the same as shipping better code."

CEO, Series B Fintech

3.1×
session quality gap between top-quartile and bottom-quartile engineers on the same team
62%
of "low-quality" sessions came from engineers whose PR throughput looked healthy
+18 pts
average AI Effectiveness gain in 12 weeks for teams using Maestro

Every engineer on your team is building a habit right now. Maestro tells you which one.

Book a demo →

getmaestro.ai