How Weave is Replacing Story Points with LLMs and AI

Jun 17, 2025

June 17, 2025

Software engineering teams have long relied on story points to estimate and track work. But as teams grow and projects become more complex, traditional metrics often fall short. This gap leads to missed deadlines, unclear productivity signals, and frustration for both engineers and managers. Weave is changing this by using LLMs and domain-specific machine learning to provide a clearer, more objective view of engineering team performance.

Why Traditional Story Points Fall Short

The Problem with Story Points

Story points were designed to help teams estimate effort and complexity. But they are subjective, often inconsistent across teams, and can be influenced by team dynamics or external pressure. This makes it hard to compare work across teams or track progress over time.

  • Story points are not standardized, so a “5” for one team might be a “2” for another.

  • Teams often inflate or deflate points to meet targets.

  • Story points don’t capture the quality or impact of work, only perceived effort.

Industry Frameworks and Metrics

To address these gaps, many organizations have adopted frameworks like DORA, SPACE, and CORE 4 metrics. These models focus on outcomes such as deployment frequency, lead time, and team satisfaction. While they offer a broader view, they still rely on manual data entry and subjective reporting.

  • DORA metrics: Deployment frequency, lead time for changes, change failure rate, and time to restore service.

  • SPACE metrics: Satisfaction, Performance, Activity, Communication, and Efficiency.

  • CORE 4 metrics: Code, Output, Review, and Efficiency.

How Weave Uses LLMs and AI for Engineering Analytics

Objective Measurement with LLMs

Weave analyzes every pull request (PR) and code review using a combination of LLMs and proprietary machine learning models. Instead of relying on subjective estimates, Weave’s models are trained on expert-labeled datasets to answer a key question: “How long would this PR take for an expert engineer?”.

  • Each PR is evaluated for complexity, scope, and quality.

  • The system estimates the actual time and effort required, not just lines of code or number of commits.

  • Weave classifies work into categories like new features, bug fixes, and maintenance, giving teams a clear view of where their time goes.

Key Features of Weave’s Analytics Platform

  • Tracks real output over time, not just activity.

  • Summarizes data and insights in dashboards for easy review.

  • Measures both output and quality, providing a balanced view of team performance.

  • Monitors time spent on code review and the usefulness of those reviews.

  • Measures the quality of code reviews by understanding the depth and practicality of the comments.

Technical Deep Dive: How the Model Works

Weave’s custom machine learning model is trained on a large, expert-labeled dataset of PRs. The model considers factors such as:

  • Code complexity and dependencies

  • Size and scope of changes

  • Review comments and feedback cycles

  • Historical performance data

Comparing Weave’s Approach to Traditional Metrics

Criteria

Story Points

Weave LLM/AI Analytics

Subjectivity

High

Low

Standardization

Low

High

Measures Output Quality

No

Yes

Real-Time Insights

No

Yes

Tracks Review Quality

No

Yes

Gameable

High

Low

When to Use Each Approach

  • Story points work best for small, co-located teams with stable membership.

  • Weave’s analytics are ideal for distributed teams, organizations with multiple squads, or any size group seeking objective, scalable performance tracking.

Integrating Weave with Your Engineering Workflow

Seamless Integration with Existing Tools

Weave connects with popular platforms like GitHub and Jira, making it easy to start tracking engineering analytics without changing your workflow.

Step-by-Step: Getting Started with Weave

  1. Connect your code repository (e.g., GitHub).

  2. Allow Weave to analyze PRs (usually takes 5 hours)

  3. Dive into the dashboards for output, quality, and time allocation.

  4. Use insights to adjust team processes and improve performance.

The Future of Engineering Team Performance Tracking

The shift from story points to AI-driven analytics marks a significant step forward for engineering management. By providing objective, real-time insights into both output and quality, Weave helps teams identify strengths, address weaknesses, and deliver projects more reliably.

Teams that adopt data-driven performance tracking are better equipped to:

  • Spot and resolve bottlenecks quickly.

  • Allocate resources more effectively.

  • Improve code quality and team collaboration.

For engineering leaders looking to move beyond subjective metrics, Weave offers a clear, actionable path to better team performance.