Reading time:

From Stack Ranking to Smart Metrics: How StrongSuit Brought Objectivity to Engineering Performance

About StrongSuit

StrongSuit is a litigation AI platform that amplifies attorney judgment rather than replacing it. The company helps legal professionals handle case research, document drafting, and discovery analysis while keeping lawyers in control of strategic decisions. Over 1,000 law firms and legal organizations use StrongSuit to close cases 3-5x faster.

StrongSuit's engineering team operates on tight six-week development cycles, where shipping speed directly impacts their ability to serve customers in the fast-moving legal tech space. The team's culture emphasizes ownership and execution. Engineers operate under a "you build it, you own it" philosophy, where the focus is on delivering quality outcomes rather than micromanaging how engineers achieve them.

The Challenge

When gut feelings need data to back them up

Susheel Daswani, StrongSuit's Head of Engineering, had spent decades navigating the tension between subjective and objective performance evaluation. Earlier in his career, he'd participated in stack ranking exercises where managers would create "baseball cards" for every engineer, then debate their relative value in a room full of other managers defending their own teams.

The process was frustrating. Backend engineers would argue their work was more critical because it improved system performance, while frontend contributions were dismissed as "just changing a widget." Managers had to cherry-pick three accomplishments for each engineer's card, subjectively choosing what mattered most from months of work. The whole system felt political, incomplete, and disconnected from the wealth of objective data sitting in code repositories and project management tools.

"I was always a little upset that objective data did not come out," Susheel reflects. "We have all this data in GitHub and Linear, but we weren’t using it as much as we could."

At StrongSuit, where shipping speed was critical and the team is still small, this challenge became acute. Leadership needed to make decisions about the team and had gut feelings about individual performance, but they lacked the objective data to validate those instincts. They couldn't answer fundamental questions: How is each engineer affecting our KPIs? How do our engineers compare to their peers, both internally and across the industry? When unplanned customer issues disrupt a six-week roadmap, how can we demonstrate that our team's effort remained strong even if we missed deliverables?

The typical DORA metrics that other platforms provided weren't enough. They measured velocity and deployment frequency but missed the nuance of engineering work. A two-line code change that required understanding an entire system and finding an elegant solution looked identical to routine maintenance in traditional metrics. Complex refactoring work that improved maintainability could make a productive engineer look inactive for weeks.

The Solution

Objective metrics that account for complexity and effort

When the leadership team evaluated Weave, they immediately recognized something different. Rather than counting lines of code or PRs, Weave used machine learning models trained on expert engineering work to measure both the complexity and effort behind every pull request. The platform's code output metric resonated immediately because it captured what traditional metrics missed. "What I appreciate most about Weave is this code output metric, and that it specifically aligns to effort," Susheel explains.

Susheel had worked with expert engineers who didn't give much effort and non-expert engineers who worked extremely hard. Weave could distinguish between both dimensions and compare engineers not just to their internal peers but to the broader industry.

The team made a critical decision early: they would transparently implement Weave. Everyone could see their own code output and how they compared to team averages and industry benchmarks. Engineers could click into individual PRs to see how Weave evaluated different contributions, understanding why a complex two-line fix might score higher than a large but straightforward refactor.

Susheel checks the platform daily, primarily focusing on code output while also using the performance review features to generate comprehensive summaries. He implemented a simple rule: individual code output should be evaluated over three-month periods to account for natural variation, but team output per week became a key metric for demonstrating progress to cross-functional leaders.

The Results

Transparency that builds trust instead of resistance and helps StrongSuit ship fast

Despite initial concerns that engineers might resist being measured, the opposite happened. Even an engineer who had been stressed by stack ranking at their previous company found Weave useful because they could see exactly how different PRs were evaluated and confirm it matched their lived experience.

"People want to see how they're scoring," Susheel observes. "They appreciate that a two-line code change can get you significant code output because you've done the work to understand the whole system and arrived at an elegant solution."

The platform transformed conversations with cross-functional leadership when roadmap deliverables got disrupted. "If my engineers are performing well comparatively in code output and other metrics, I can say, yes, we didn't hit these deliverables, but there's a reason for it outside of my engineers' effort." Having engineers consistently performing in the top 10% throughout a cycle provided strong evidence that missed deliverables possibly stemmed from urgent customer needs or a change in priorities, not from a lack of ability or effort.

The objective data solved the political problems Susheel had experienced at larger companies. No more debates about whether backend or frontend work mattered more. No more subjectively choosing three accomplishments to put on a baseball card. The code output metric captured the full picture of engineering contribution in a way that felt fair to engineers across different specialties.

Most importantly, the transparency created trust and enabled StrongSuit to improve its shipping speed. Engineers understood how they were being evaluated, could verify it matched their own assessment of their work, and appreciated that leadership was using objective data as one key input alongside other evaluation methods rather than as the sole determinant of performance. Further, Susheel used Weave’s code output metric, which ranks his team’s performance against counterparts, to help his team ship faster and with more quality. StrongSuit now ships faster than 95% of other similarly-sized startups.

"I thought Weave was a strong answer to what frustrated me at previous companies: where are the objective metrics? Subjective metrics are useful, but you also need the objective, and ideally they should have some alignment." — Susheel Daswani, Head of Engineering, StrongSuit

Make AI Engineering Simple

Effortless charts, clear scope, easy code review, and team analysis