RepoShark mascot

We Tried to Measure AI's Impact on Codebases. Here's Why It's So Hard.

Everyone claims AI is transforming software development, but actually quantifying that impact in a real codebase is a minefield of confounding variables, vanity metrics, and uncomfortable questions. Here's what we learned trying.

There's a stat floating around that AI coding assistants make developers "55% faster." Another one says "40% more code." A third claims "3x productivity." They get dropped into pitch decks and blog posts without much scrutiny, and everyone nods along because the conclusion feels right — AI tools do feel helpful when you're using them.

But when we actually tried to measure AI's impact on real codebases — looking at commit histories, PR patterns, contributor behaviour, and code quality signals — we ran into a wall. Not because AI isn't changing how people write code. It clearly is. But because measuring how much and whether it's good turns out to be a genuinely hard problem that most people are hand-waving past.

Here's what we found.

The Obvious Metric Is the Wrong One

The first instinct is to measure output. More lines of code, more commits, more PRs merged per week. If a developer adopted Copilot in March and their commit frequency doubled by April, that's the AI working, right?

Not necessarily.

Lines of code has been a discredited productivity metric for decades, and AI doesn't magically rehabilitate it. If anything, AI makes it worse. A developer who lets an LLM generate a 200-line utility function they would have written in 40 lines hasn't become 5x more productive — they've added 160 lines of future maintenance burden. The codebase got bigger. Whether it got better is a completely separate question.

We looked at repositories where teams had visibly adopted AI tooling (detectable through commit message patterns, PR descriptions referencing AI assistance, and characteristic code patterns). Commit volume did tend to increase. But so did several less flattering metrics:

None of this means AI is making things worse. It means that raw output metrics don't capture what's actually happening, and optimising for them leads you somewhere misleading.

The Before/After Problem

The cleanest way to measure AI impact would be a controlled experiment: same team, same project, same time pressure, with and without AI tools. But nobody works that way. You can't rewind a codebase. You can't un-know what Copilot would have suggested.

What you can do is compare time periods — before adoption and after. But this introduces a cascade of confounding variables:

We found that almost every "AI made us X% faster" claim, when you actually inspect the methodology, is comparing a period of enthusiastic adoption against a baseline that wasn't controlled for any of the above. It's not lying, exactly. But it's not science either.

What AI Actually Changes (That You Can Measure)

If gross output metrics are misleading and before/after comparisons are confounded, what can you actually observe?

After looking at hundreds of repositories, we think the measurable impacts fall into a few categories — and they're more nuanced than the headlines suggest.

Code homogeneity increases

AI-assisted codebases tend to become more internally consistent over time. The same patterns get repeated. The same abstractions get used. This is partly good — consistency reduces cognitive load. But it also means the codebase can develop a kind of monoculture. The same mistakes get replicated everywhere. The same suboptimal patterns become load-bearing before anyone notices.

You can detect this by looking at file similarity metrics and the ratio of novel patterns versus repeated structures across commits. We've seen repos where 60%+ of new code in a month is structurally near-identical to existing code — a level of repetition that's unusual in purely human-authored codebases.

The "first draft" problem

AI is excellent at producing a first draft. It's less good at producing the right draft. In practice, this means developers spend less time writing code and more time editing, reviewing, and course-correcting code that an AI wrote.

This isn't captured in any commit-level metric. The commit lands looking clean. But the developer spent 20 minutes coaxing the AI toward the right solution, then another 10 minutes removing the unnecessary abstraction it added, then another 5 minutes fixing the edge case it missed. The commit history says "30 minutes, 80 lines." The reality was "35 minutes, and the developer could have written it in 25 without the AI, but in 45 without any prior knowledge of the library."

The AI helped. But not by the amount the metrics suggest.

Review burden shifts

This is the one we found most consistently across repositories. When AI generates more code, someone still has to review it. And reviewing AI-generated code is different from reviewing human-written code:

In several repositories we analysed, the introduction of AI tools coincided with a measurable increase in PR cycle time and a decrease in review thoroughness (measured by review comment density per line changed). The bottleneck didn't disappear. It moved from writing to reviewing.

Test coverage becomes more important (but doesn't always increase)

If you're generating code faster, your test infrastructure needs to keep up. In theory, AI also helps write tests. In practice, we see a common pattern: AI-assisted PRs ship more production code but proportionally less test code than the pre-AI baseline.

This isn't universal — some teams are disciplined about this. But the temptation is clear. The AI generates the feature code quickly, and the developer ships it while their enthusiasm is high, planning to "add tests later." The commit history is littered with these deferred promises.

The Metrics Nobody Wants to Talk About

Here's where it gets uncomfortable.

Knowledge erosion

When developers lean heavily on AI for code generation, they sometimes stop building deep familiarity with the codebase. We've observed repositories where contributor breadth appears to increase (more people touching more files) but contributor depth decreases (fewer people who deeply understand any given module).

This shows up as a subtle shift in the bus factor calculation. Traditionally, a bus factor of 1 means one person dominates a file or module. With AI-assisted development, you can have a bus factor of 3 where none of the three contributors could confidently explain the code without re-reading it. The metric looks healthy. The reality is fragile.

You can't measure this from commit data alone. But you can detect proxy signals: increased frequency of "refactor" commits shortly after AI-assisted feature commits (suggesting the initial code wasn't well-understood), higher revert rates, and more "fix" commits in the 48 hours following AI-heavy PRs.

Dependency on generation context

AI-generated code is only as good as the context window it had when generating. When that context is wrong or incomplete — which happens silently — the resulting code can be subtly misaligned with the project's actual architecture.

We've seen this manifest as a slow drift in code style and architectural patterns within a single repository. The codebase starts developing "dialects" — sections that use different patterns for the same operations, because different AI sessions had different context when generating them. Over months, this creates a maintenance burden that's invisible in any single commit but obvious when you zoom out.

The productivity paradox

Perhaps the most uncomfortable finding: in repositories with heavy AI tool adoption, we consistently see more code but not proportionally more capability. Features get built. But the ratio of code volume to user-facing functionality trends upward. The codebase grows faster than the product does.

This isn't unique to AI — it's the classic "accidental complexity" problem that software engineering has always struggled with. But AI accelerates it, because the cost of writing code drops toward zero while the cost of understanding, maintaining, and evolving that code stays constant.

So How Should You Actually Measure This?

After all of this, we think the honest answer is: you probably can't measure AI's impact with a single metric or a simple before/after comparison. But you can track a set of signals that, taken together, give you a reasonable picture:

Track what matters, not what's easy to count:

Watch for warning signs:

Be honest about what you're measuring and why:

The Bigger Picture

AI coding tools are genuinely useful. We use them ourselves. They reduce the friction of getting started, help with unfamiliar APIs, and handle boilerplate that nobody enjoys writing. The developers who use them effectively treat them as a drafting aid — not an autopilot — and maintain their own understanding of what's being generated.

But the industry's rush to quantify AI's impact has produced a lot of misleading numbers. "55% faster" doesn't mean 55% more productive. More code doesn't mean better code. And a metric that goes up isn't automatically a metric that matters.

The real impact of AI on a codebase is visible — but only if you're looking at the right signals, over the right timescale, with the right context. It shows up in how PR review patterns shift, how contributor knowledge distributes (or doesn't), how test coverage keeps pace (or doesn't), and how code complexity evolves relative to product complexity.

These are the kinds of signals we think about constantly at RepoShark. Our health scoring and risk detection were built to surface exactly these patterns — the slow shifts in repository health that are invisible in any single commit but obvious when you have the data to zoom out. Whether AI is part of your workflow or not, the question is the same: is this codebase getting healthier or quietly degrading? The answer is always in the patterns.


If you want to see what your repository's patterns actually look like — commit distribution, contributor depth, PR cycle times, risk signals — try a free analysis. No setup, no config. Just paste a repo URL.

RepoShark mascot

RepoShark — analyse any GitHub repository's health in seconds.

Try free →

Keyboard shortcuts

General

Show keyboard shortcuts
?

Navigation

Go to Dashboard
gthend
Go to Settings
gthens
Go to Home
gthenh
Go to Compare
gthenc
Go to Pricing
gthenp

Dashboard

Focus search
/
Next repository
j
Previous repository
k
Open selected repository
Enter

Press ? to toggle this panel