AI Usage Metrics Every Engineering Manager Should Track

$4.2M seed round led by Moonfire, Burst Capital & Y Combinator

Reading time:

AI Usage Metrics Every Engineering Manager Should Track

Article written by

Brennan Lupyrypa

You're six months into rolling out AI tools across your engineering team. Everyone seems to be using them, productivity feels higher, but when your CEO asks for concrete ROI data, you find yourself scrambling. Sound familiar?

Here's the reality check: 76% of all respondents are using or planning to use AI tools up from 70% in 2023, while AI's favorability rating decreased from 77% last year to 72%. [1] This widening gap between adoption and satisfaction reveals a critical blind spot in how engineering teams measure AI usage.

You need more than gut feelings and developer testimonials. You need a systematic approach to measure AI usage that connects to real business outcomes. Let's explore the metrics that actually matter and introduce you to Weave, a platform that's revolutionizing how engineering teams track and optimize their AI investments.

The challenge: why traditional metrics fail

The problem with measuring AI effectiveness starts with the metrics themselves. Here's the thing: almost every eng leader already measures output - either openly or behind closed doors. But they rely on metrics like lines of code (correlation with effort: ~0.3), number of PRs, or story points (slightly better at ~0.35). These metrics are, frankly, terrible proxies for productivity. Even more concerning, recent research shows mixed results about AI's actual impact

On one hand, some studies show promise. Three separate randomized controlled trials involving over 4,000 developers found that those using Copilot achieved a 26% increase in productivity [2]. However, other research contradicts these findings. In one study, developers initially predicted that AI tools would reduce their completion time by 24%. After using the tools, they estimated the time savings at 20%. But when researchers measured actual performance, they discovered that AI tools actually increased completion time by 19%—meaning the AI tooling slowed developers down [3].

This disconnect between perception and reality is reflected in developer attitudes. Nearly half (45%) of professional developers believe AI tools are bad or very bad at handling complex tasks [4]. Yet despite these concerns, usage continues to climb. The takeaway is clear: As AI adoption grows, building better intuition around when and how to use these tools effectively is becoming increasingly important.

Introducing Weave: the AI measurement solution

Before diving into specific metrics, let's address the elephant in the room: how do you actually measure AI usage effectively? Weave solves this problem by combining LLMs and domain-specific machine learning to understand engineering work.

Unlike traditional analytics tools that rely on surface-level code metrics, Weave provides deep insights into your team's actual productivity. Weave uses AI to measure engineering. We scan every PR and understand how much work people are doing and how good it is. This level of analysis is crucial when you need to measure AI usage beyond simple acceptance rates.

The foundation: core AI usage metrics

1. AI adoption and engagement patterns

Start by understanding who's actually using AI tools and how frequently:

Feature Utilization Rate: The top three AI Search tools used by developers are ChatGPT (82%), GitHub Copilot (41%), and Google Gemini (24%). [1] Weave shows you how is using which tools
Time Investment: Monitor how much time developers spend with AI assistance
Task-Specific Usage: Identify which development activities benefit most from AI

The key insight? You're not just tracking usage—you're understanding patterns. Weave excels here by classifying PRs into new features, bug fixes, or "keeping the lights on", so we can tell you how much of your engineering bandwidth is going to each bucket

2. Quality and outcome metrics

Raw usage numbers mean nothing without quality indicators:

AI Code Acceptance Rate: What percentage of AI suggestions actually make it to production?
Post-AI Review Time: How do code reviews change when AI assistance is involved?
Defect Rate Analysis: Track bugs originating from AI-assisted versus human-written code
Code Turnover: How much of your code is being re-rewritten in the months following?

Advanced metrics: business impact and ROI

3. productivity and velocity enhancement

Move beyond simple output measures:

Time to First Working Solution: How quickly can developers reach functioning implementations?
Feature Delivery Velocity: Track actual business value delivered with AI assistance
Context Switching Reduction: Measure efficiency gains from staying in flow
Knowledge Transfer: How AI tools improve documentation and team learning

The research shows significant variation by developer experience: They also found that productivity varied by developer experience, with less experienced developers getting more benefit from Copilot. Understanding these patterns is crucial for ROI calculations.

5. Enhanced DORA metrics for the AI era

Traditional DORA metrics need enhancement to capture AI's impact:

Deployment Frequency: Deployment frequency measures how often a team pushes changes to production. High deployment frequency indicates rapid iteration and delivery, a hallmark of agile and DevOps methodologies.
Lead Time for Changes: Lead time for changes measures the total time between when work on a change request is initiated to when that change has been deployed to production and thus delivered to the customer.
Change Failure Rate: Change failure rate measures the rate at which production changes result in incidents, rollbacks, or failures. It quantifies the percentage of changes that result in service degradation or disruptions. Low change failure rates indicate robust testing procedures and reliable deployment practices.
Mean Time to Recovery (MTTR): Enhanced with AI-specific incident analysis [5]

Weave helps you understand these enhanced DORA metrics by providing the context that traditional tools miss. A manager noticed that code review quality has the highest correlation to output. He reset code review standards, and team output went up by 15%. This is the kind of insight that drives real improvement.

Business Outcome Metrics That Matter

6. Financial and strategic impact

Connect AI usage to outcomes that matter to leadership:

Development Cost per Feature: Compare costs before and after AI adoption
Technical Debt Impact: How AI affects long-term code maintainability
Innovation Acceleration: Track new feature experimentation enabled by AI efficiency
Customer Value Delivery: Connect AI-assisted development to user satisfaction

Recent industry data shows promising ROI potential: Generative AI usage jumped from 55% in 2023 to 75% in 2024. For every $1 a company invests in generative AI, the ROI is $3.7x. The top leaders using generative AI are realizing an ROI of $10.3. [6]

Implementation Strategy: Your Roadmap to Success

Phase 1: Foundation Setting (Weeks 1-4)

Establish baseline productivity measurements
Deploy Weave to start comprehensive tracking
Survey team sentiment and current AI usage patterns

Phase 2: Core Metrics Implementation (Weeks 5-12)

Begin systematic AI usage measurement
Start tracking code quality and review impact
Correlate AI usage with enhanced DORA metrics

Phase 3: Advanced Analytics (Months 4-6)

Implement business outcome tracking
Develop team-specific optimization strategies
Create continuous improvement feedback loops

Weave accelerates this timeline by providing immediate visibility. You can connect your repo to Weave and get started in ~30 seconds here.

Why Weave Is Your Competitive Advantage

Unlike traditional engineering analytics platforms, Weave was built specifically for the AI era. Engineering leaders are flying blind. They can't dive in everywhere, so they need to rely on gut feel or shoddy metrics to try to get a handle on what's going on and what needs fixing. Engineering is unique in that there are no good metrics to solve this problem. And that's why we built Weave.

Weave provides:

AI-Powered Analysis: We run LLMs + our own models on every PR and review, analyzing both output and quality. Then we summarize this data and insights in dashboards
Real Productivity Insights: Weave gives you the data you need to optimize your engineering team, with our intelligent algorithms that deeply understand your work.

Article written by

Brennan Lupyrypa

Make AI Engineering Simple

Effortless charts, clear scope, easy code review, and team analysis

Book a demo