Pop Software and the Johnnie Walker Problem


Johnnie Walker Black Label has a reputation for quality. It’s served at business dinners, gifted to clients, stocked by hotels that want to signal sophistication. The brand carries decades of accumulated prestige.

It’s also not particularly good whisky.

That’s not a bug — it’s the business model. Reputation is a lagging indicator. It reflects what a thing was, not what it is. As long as most consumers can’t tell the difference, the gap between signal and substance is exploitable indefinitely. Luxury goods entire market structure is built on this: the product is the brand. The contents are secondary.

The music industry codified the playbook in the 1990s. Once distribution costs collapsed, producers figured out you don’t need a good artist to have a successful one. Pop stars aren’t discovered — they’re assembled. Manufacture social proof, reach critical mass before the market can assess quality, collect revenue while the gap persists.

Venture capital ran the same formula after 2010. Dev advocacy, conference presence, HackerNews seeding, bought GitHub stars — the mechanisms were conscious and the incentives were clear. MongoDB’s data loss problems were known and documented. The reputation signal was louder than the failure signal. That’s not a coincidence; it’s the point.

This is the Johnnie Walker problem applied to software. And it is now structural.

Reputation networks degrade as they scale

In a small network, reputation is a social graph property. You know the recommender, or know someone who does. The signal carries identity weight. Gaming it requires infiltrating real relationships — expensive.

As the network grows, recommendations become anonymous. The signal stops being a graph traversal and becomes a vote count. Vote counts are cheap to manipulate. The attack economics invert: earning legitimate reputation gets harder (stand out in noise) while faking it gets cheaper (buy reviews, manufacture downloads) — simultaneously, as a function of the same growth.

Growth destroys the property that made the network valuable in the first place.

The standard response is to superimpose a small expert network on top of the broken large one. Consumer Reports, Rotten Tomatoes, RedHat — these work by ignoring popular opinion and substituting accountable curation. They don’t fix the large network. They bypass it, by preserving the properties that only work at small scale: known identities, skin in the game, consequences for bad calls. This is why you paid RedHat: not for the software, but for a named entity with something to lose standing behind it.

Identity authentication solves the wrong problem

The industry’s response to supply chain attacks was identify verification: if we know who wrote this, we can build trust. E.g. package signing: verify that what you downloaded is genuinely from entity X. sigstore, verified publishers, signed releases — all variations on this.

This solves authentication. It does not solve trust.

Internet identity is an opaque handle with unknown real-world binding. A GitHub handle could be a person, a team, or an agent working for Chinese intelligence. Even if the binding is known today, it degrades: maintainers hand off projects, companies get acquired, incentives change. The reputation score persists through all of it. OpenSSL had strong reputation until Heartbleed triggered a retrospective audit that revealed the reputation was never deserved — the codebase had been poorly maintained for years. The signature was valid throughout.

The failure mode isn’t impersonation. It’s identitifier continuity masking quality discontinuity. npm’s event-stream was signed by its legitimate maintainer right up until that maintainer transferred control to someone who shipped malicious code to millions of dependents. Authentic label. Different contents (As a side note, identity is for one these commonly-used-but-rarely-understood concepts, i’ll blabber about it some other time).

Identity helps at the margins. It is not a building block for scalable trust. The only functional identity is one tied to properties of the thing, which usually entails intimate knowledge. People who actually know the actor — which is the small network condition, restated.

The feedback loop that isn’t

A library’s real value isn’t the code as written on day one. It’s the accumulated knowledge: corrections, edge cases found and fixed in production, failure modes discovered under adversarial conditions, behaviors that only manifest at scale or under specific hardware conditions. This is what makes a mature library trustworthy in a way a freshly-written equivalent is not.

In popular open source, that experience routinely dissipates. Production failures become internal postmortems. Workarounds get applied locally. Issues get filed and go stale. The maintainer sees noise, not signal.

This isn’t accidental — it’s structural. Low switching costs, which the ecosystem treats as a feature, are also the mechanism that destroys feedback. When exit is cheap, users exit. When exit is expensive — as with Linux or Postgres — users are forced to engage, file bugs, work with maintainers to fix things. Linux and Postgres are not zero-cost software: they’re backed by funded foundations, governed by accountable maintainers, and depended on by organizations for whom migration is a multi-year project. The feedback loop works because it’s paid for and because the switching cost makes engagement the rational choice.

npm projects have none of this. No funding, no governance, trivially replaceable. The ecosystem trains users to exit rather than engage, and operational knowledge dissipates accordingly.

The cost of curation didn’t disappear with zero-cost software. It was externalized — onto security teams, breach victims, and developers running npm audit and hoping for the best.

The code was never the value

LLM code generation makes obvious something that was always true: the implementation is not the moat.

You don’t run the commit history. You run the final artifact — which is the residue of years of modifications, not the modifications themselves. The actual value is the verification work behind it: the test suite encoding known failure modes, the CVE history documenting adversarial exposure, the regression suite built from production incidents. SQLite and MySQL are open source; significant parts of their test suites are not. That’s not an accident. The implementation is a solvable engineering problem. The tests encode decades of institutional memory about how the implementation fails under conditions you haven’t thought of yet. That’s the moat.

The reputation signal worth having isn’t “popular” or “recently updated.” It’s: how much of the behavioral space has been adversarially probed, by whom, and is the evidence accessible? Download counts measure exposure. Adversarial test coverage measures trustworthiness. These are uncorrelated, and reputation networks treat them as identical. We have no good metrics for “project quality” other than “expert opinion”.

LLMs and the problem you didn’t know you had

LLMs generate plausible-looking code from pattern-matching on training data. The code compiles, passes happy-path tests, looks correct. The model cannot signal whether it’s drawing on deep operational knowledge or shallow syntactic completion. The output is identical either way.

Three structural problems make this worse than it looks:

Training data was not curated for quality. The model learned from the distribution of all published code — dominated by mediocre code (Sturgeon’s law: 90% of everything is crap). Bad patterns are arguably over-represented: they generate more Stack Overflow questions, more blog posts, more issue threads. There’s no strong prior toward quality.

Code encodes what to do, not what to avoid. The road not taken is silent. A correct implementation doesn’t document the failure modes it was written to prevent. That knowledge lives in tests, in CVE histories, in maintainer memory — none of which is recoverable from the API surface. Generated code that looks correct has no way to signal which failure modes it hasn’t considered.

Verification cost exceeds generation cost. Previously: writing code was expensive, verification was comparable order of magnitute. Now: generation is essentially free, verification is the entire cost — and nobody is going to spend 100x the generation cost auditing an ad-hoc library. Generated code ships unverified at scale.

Sometimes generation is the right call. Avoiding a large, bloated, poorly-maintained dependency by generating exactly what you need is strictly better than the alternative. But knowing whether that’s the case requires the same type of domain experience as writing the library correctly — knowing what failure modes exist, what questions to ask, which behavioral space is adversarial. Judgment can be fail-fast; obvious wrongness is detectable without deep expertise. But you need to know what to look for. That knowledge doesn’t come from the generated output.

The tab comes due

The mechanisms here aren’t new. Supply chain attacks, feedback loop failures, reputation manipulation — all documented, all visible for years.

What’s changed is the rate. LLMs accelerate generation of plausible-looking software while doing nothing for the accumulation of operating experience. The gap between surface legitimacy and actual trustworthiness — already exploitable — is now cheap to manufacture at scale. And the quality normalization that started with tolerating crashes in consumer apps has been propagating upward into infrastructure, financial systems, and medical software for a decade. The engineering culture that never had to care about quality doesn’t recalibrate at the domain boundary. And now we handed fancy code synthesizers to a generation of developers that grew up on shitty Pop software.

Reputation networks and Identity verification were never the answer. Download counts were never a good signal. The answer is good engineering: expensive to grow, rational to pay for, impossible to fake at scale.

The solution to the Johnnie Walker problem was always, and will always be: good taste.

software-engineering systems-thinking supply-chain llm
comments powered by Disqus