Conventional wisdom is that AI coding raises the floor — that it allows mediocre engineers and even non-coders to be more productive. To some degree this is true, but when it comes to serious engineering projects it’s largely backwards. AI is making bad engineering expensive for the first time.

I’m a consultant. I dive into codebases I’ve never seen and get productive faster than the developers who’ve worked on them for years. When people ask how, the honest answer isn’t “I read faster” or “I ask good questions.” It’s that I apply working knowledge of how systems behave before reading the specific code. I can infer a lot of structural things, behaviors and potential bugs from seemingly superficial details — because there are rules that govern all systems. Doing something over the network or async? Somewhere there should be a timeout, and most likely it’s ill-defined. Using timestamps? I already know what time-related bugs to look for, because developers always make wrong assumptions about time. This isn’t magic — it’s years of experience and theoretical knowledge about systems, hardware, and physics that amalgamated into intuition.

LLMs cannot do this. They apply such reasoning reactively — when you prompt them to look for it. They’ll answer “does this have clock sync problems?” competently. But they won’t notice unprompted while doing something else.

That gap isn’t a context window problem. The instinct to retrieve more context misses something: retrieval requires knowing what’s relevant to retrieve. Knowing what’s relevant requires the understanding you’re trying to apply. No amount of RAG architecture fixes an epistemological constraint. You need the prior to know what evidence to collect.

For most of software history, that prior lived in developers’ heads. The code didn’t need to make intent explicit because the humans reading it brought the intent with them. That subsidy is ending.

APIs Are Context Compression

A well-designed API lets you operate on a module without reading its implementation. This was always the point — not aesthetics, not purity, but operating with less context. You work with the interface; the implementation is someone else’s problem.

When humans are the readers, this is valuable. When agents are the readers, it’s load-bearing.

Consider the difference between calculate(data) and calculate(data, timeout=100). The timeout parameter is a telltale sign — it tells you this call is slow, probably crosses a network boundary, and is likely to fail. One extra parameter triggers a whole thread of reasoning about failure modes. That’s the interface carrying intent.

LLM context windows are finite. The question isn’t file size — it’s whether the right subset of the codebase can be loaded without the rest. Good module boundaries answer that question structurally. The agent reads the interface, infers the contract, and works without traversing the implementation. A leaky abstraction — one where you have to understand implementation details to use the interface correctly — forces the agent to load more context for every task. That cost compounds across every interaction.

File structure is a secondary heuristic. Tools like ctags and symbol search help agents navigate (or at least, well configured agents), and splitting code along natural context boundaries makes loading easier. But the underlying principle is API design. Files are how you deliver context to the agent’s toolchain; module boundaries are where context actually lives.

People still argue about naming things, but the naming problem worth caring about isn’t inconsistency. Inconsistency is noise — recoverable. Semantic overloading is signal corruption. When “node” means graph node in one module, cluster node in another, and DOM node in a third, no tool resolves the ambiguity because it isn’t a navigation problem. It’s a domain boundary that was never drawn. The agent hallucinates because the codebase itself is ambiguous.

The Substrate Is Always Someone’s Problem

There’s a persistent fiction in software teams: “infrastructure” is what other people manage. Developers ship features; ops/platform/SRE manages the substrate. The fiction works well enough when teams are large enough to specialize — the silos are invisible when handoffs are smooth.

But it was always a fiction. Infrastructure is someone’s code. Developers understood this intuitively when it came to upstream dependencies — they filed bugs, contributed patches, migrated to better libraries when the substrate was failing them. They just drew the “my problem” boundary at their own codebase and refused to extend it inward.

I’ve spent years asking developers why they’re doing something manually instead of automating it. The answers are: “that’s ops,” or a blank stare. The blank stare is more honest. It isn’t laziness — it’s a failure to recognize that the local environment is substrate, and that substrate is the developer’s problem.

This failure compounds. Teams that don’t improve their own substrate accumulate drag: no dev tooling, no scripted migrations, no reproducible environments, no test harness for the hard cases. Each of these is a friction point that slows down every future change. For humans, that friction is partly absorbed by experience — the developer who’s been there two years knows the manual workaround. For agents, there is no two-year developer. Every run starts cold (and no, AGENTS.md and skills are not a solution).

The organizations that avoided this trap weren’t staffed with more virtuous engineers. They had feedback loops: mechanisms that converted friction encountered during work into investment in the thing causing the friction. AWS didn’t outcompete on infrastructure by having brilliant people. It outcompeted by having a system where operational pain became platform improvement, continuously, at scale. The teams that only ever fixed the immediate problem — without improving the substrate — moved slower every year.

The idea that workarounds are piling garbage instead of fixing the actual problem never caught on, because the cost of fixing — technical, organizational, and often personal — was high. It was easier to just shovel the garbage.

AI changes the cost structure of building that feedback loop. Implementation is nearly free now. Writing tooling, automating dev tasks, building a migration script, improving CI — these used to require staffing, learning obscure build system APIs, justifying the sprint time. An agent can do most of the implementation. What remains expensive is diagnosis: identifying what to fix, understanding why it’s failing, knowing what the right solution looks like. That’s still human work, because it requires the first-principles reasoning agents lack.

More experiments per unit of human attention. The feedback loop can spin faster, if you have one.

The Culmination of a Split

The commoditization of web development in the early 2010s split the industry in two: engineers who understand systems, and application developers who use commodity tools to build products. When you can build a working web application without understanding what happens below the framework, most people don’t bother.

AI doesn’t change that split. It reveals which side you’re on.

Fred Brooks identified “tool-builders” as a core part of software teams in 1975. Not a nice-to-have — a necessary role, because compounding capability requires someone whose job is the process itself, not just the output. Brooks extended this further in The Design of Design, where iterative improvement of the engineering process is the meta-skill. The same insight runs through Deming and Taichi Ohno’s Toyota Production System: you don’t just build cars, you build the factory that builds cars, and you continuously improve the factory. The product is the output; the system is the work.

We told developers for years: “automate yourself out of a job.” Nobody understood what that meant. It wasn’t advice to use tools. It was a description of the engineering mindset: you build the machines, not just the product. You build the factory, not just what the factory produces. The developer who scripts their dev environment, builds observability and CI tooling, fixes libraries instead of working around the bugs and deficiencies — that developer was always doing something different from the one who treats those as someone else’s problem.

Agentic development makes the loop non-optional. The agent surfaces friction continuously — every blocked run, every hallucinated path, every wrong assumption is a signal that something in the substrate is failing. The question is whether the developer has a mechanism to act on those signals, or treats them as model limitations and moves on. The engineer stops the line and fixes the machine. The app developer files a ticket and works around it.

The goal was never the code. It was the working system, the solved problem, the delivered outcome. For most of software’s history, the gap between “write code” and “pursue goal” was bridged by human cognition — judgment, intuition, the ability to infer intent from behavior. That cognition was invisible because it was always present.

Now part of the execution is done by agents that come without that context, every time. The substrate has to carry intent that humans used to carry implicitly. And carrying that intent explicitly — in APIs, module boundaries, legible goals, documented decisions — is not a new skill. It’s what engineering was always for.

AI didn’t change what good engineering is. It stopped subsidizing the cost of ignoring it.

Published June 10, 2026

Disclaimer: This post was created by a human with the assistance of LLM

software-engineering ai agentic-development systems-thinking