How AI authorship
reshapes code structure.
Forty-eight public open-source repositories, five languages, three strata of AI-authorship intensity. We attribute every line of every function to AI or human authorship via git blame against Claude-tagged commits, score each function’s structural similarity to its peers via semble, and ask whether the resulting AI-vs-human uniformity gap holds across languages, across AI authorship intensity, and across function size11Methodology pre-registered in full at v0.1.0 before any sampling. Six hypotheses (H1–H6). We report results regardless of direction: three confirmed, two counter-prediction, one partially confirmed..
For each public function in each repo, we compute its semantic similarity to the other functions in the same repo (via semble’s hybrid retrieval — Model2Vec embeddings + BM25 + reciprocal rank fusion), its cohesion-vs-coupling profile, and its cyclomatic complexity (Python only, via radon). Every line is attributed to AI or human authorship through git blame against commits carrying Co-Authored-By: Claude or the Claude Code footer. Functions with ≥70% AI lines are classified as AI-authored; those with ≤10% AI lines as human-authored. Per-repo metrics aggregate across all functions; per-language and per-bucket metrics aggregate across the 48-repo sample.
AI code is the outlier in human codebases.
AI code is the norm in AI codebases.
Same code, opposite role. The AI-vs-human uniformity gap reverses sign across the AI authorship range — and the relationship is roughly linear.
| Bucket | Repos | Mean AI gap | Interpretation |
|---|---|---|---|
| Low (<30% AI authored) | 18 | -0.00229 | AI is the outlier |
| Mid (30 – 70%) | 15 | +0.00080 | Roughly equal |
| High (>70%) | 15 | +0.00283 | AI is the norm |
The gap is small in absolute terms — fractions of a thousandth on a similarity scale that runs roughly 0–0.05 — but it is directionally consistent and correlated. In the eighteen repositories where humans wrote most of the code, AI-authored functions read structurally differently from the surrounding code. In the fifteen repositories where AI dominated, AI functions cluster together and human contributions become the structural minority. The mid-bucket sits near zero. Pearson r between AI ratio and gap is +0.58 across all forty-five repositories with sufficient samples in both groups.
AI converges in JavaScript.
AI varies in TypeScript.
The sign reversal between TypeScript (−0.00254, humans more uniform within-repo) and JavaScript (+0.00383, AI more uniform) is the largest cross-language signal in the run. They are the only two languages whose mean AI gap falls on opposite sides of zero. Python, Go, and Rust sit close to zero in between.
| Language | Repos | Mean AI gap | Uniformity index |
|---|---|---|---|
| javascript | 7 | +0.00383 | 0.01355 |
| python | 8 | +0.00178 | 0.01274 |
| rust | 10 | +0.00028 | 0.01330 |
| go | 12 | +0.00013 | 0.01493 |
| typescript | 11 | -0.00254 | 0.01506 |
The likely cause: TypeScript’s type system forces explicit structural choices, so humans converge to type-driven idioms while AI takes the structural freedom and varies. JavaScript has weaker conventions and AI imposes its own pattern rigidly. Go and Rust both have strong cultural conventions (gofmt,rustfmt, idiomatic style guides) that flatten the AI-vs-human distinction to noise.
AI does not generate
more duplicate code than humans.
We expected DRY-cluster density — the number of similar-looking function pairs above the per-language similarity threshold, normalized by function count — to climb with AI authorship. The intuition: AI generates fourteen copies of pagination, twenty subtly different validation helpers, sprawling near-duplicates that humans would have factored out.
The data does not show that. AI-heavy repos sit slightly below human-heavy repos on this measure. The slight inverse trend is small but directionally consistent: AI-generated functions cluster differently from each other than human functions do, but they do not cluster more. The “AI generates fourteen versions of pagination” concern is not visible at this benchmark’s scale.
| Bucket | DRY pairs / fn |
|---|---|
| low | 8.24 |
| mid | 8.36 |
| high | 7.37 |
The rare hand-coded edges
aren’t rarer in AI’s share.
We expected: in AI-dominated codebases, the most-isolated functions — the rare hand-coded utilities, the one-off infrastructure pieces — would be disproportionately human. Across 15 high-AI repos, the bottom-10% most-isolated functions had AI authorship within ±10% of the repo’s overall AI rate.
The “humans hand-craft the rare edges” mental model is wrong on average, though three repos do show it clearly — bmad-module-skill-forge is 81% AI overall but only 54% AI in the isolated tail. AI writes uniform code; AI also writes the rare-shaped code. Mean AI surplus across the 15 high-AI repos is +0.002 — indistinguishable from zero.
AI Python is simpler at most sizes —
except the 21–50-line band.
For Python functions across five repos with both AI and human samples (cyclomatic complexity via radon), AI is measurably simpler in three of four line-count bins. The 21–50-line band — which contains most “real function” sizes in production code — goes the other way. AI is 11% more complex per function there.
What we predicted.
What we found.
Six hypotheses locked in the methodology document at v0.1.0 before any sampling. We report the result regardless of direction.
| ID | Prediction | Result | Magnitude |
|---|---|---|---|
| H1 | AI more uniform than human within repo | ✓confirmed | +0.00034 mean (CI95 [−0.0013, +0.0020]) |
| H2 | Gap correlates with repo AI ratio | ✓confirmed | Pearson r = 0.579 |
| H3 | DRY density higher in AI-heavy repos | ✗counter-prediction | high = 7.37, low = 8.24 |
| H4 | Isolated functions disproportionately human | ✗counter-prediction | mean surplus +0.002, 3 / 15 repos show human surplus |
| H5 | AI lower CC than human at matched size | ≈partial | 3 / 4 line-count bins; 21–50-line band reverses |
| H6 | AI uniformity gap varies across languages | ✓confirmed | TS −0.0025 ↔ JS +0.0038 (sign reversal — only pair across the five languages whose mean gaps fall on opposite sides of zero) |
Same code path
we ran.
Every number in this report came from the public agent-uniformity package. Install it, point it at any of the 48 sampled repos at their locked SHAs, and you should match within ~1% (semble’s BM25 has small non-determinism).
# install the public reference implementation
$ pip install agent-uniformity
# verify any single number from the published numbers
$ agent-uniformity run-one davila7-claude-code-templates --output ./out
# rerun the full sample sequentially (~6-8 hr on a laptop)
$ agent-uniformity run-all --output ./out
Datta, Y. (saucam). (2026). Code Uniformity Q2 2026 — How AI authorship reshapes the structure of public open-source code. Agent Almanac. https://github.com/saucam/agent-uniformity-q2-2026
- run id
- 2026-05-09T15-02-36Z-4931
- run date
- 2026-05-09
- methodology
- v0.1.0
- published
- 2026-05-10
- byline
- Yash Datta · saucam