AI Agents Don’t Modernize Legacy Code on Their Own

AI Agents promise faster and easier modernization of legacy systems. In this interview, Markus Harrer explains why sustainable software modernization still requires clear guardrails, architectural expertise, and proven techniques such as characterization testing and seams.

Many companies currently hope for “push button software modernization” through AI Agents. What already works well today, and where does the illusion begin?

A lot is already possible in clearly scoped and manageable domains using widely adopted programming languages. AI can generate unit tests, create explanatory documentation for small and complex code sections where it is actually needed, assist with refactoring, or extend functionality in well-structured codebases.

Personally, I am especially excited about how much easier it has become to create highly specialized analysis tools for older software systems. Where I previously had to write Python code myself, a coding agent powered by a large language model now does much of the work for me almost automatically. I am also a big fan of hybrid AI approaches where coding agents create refactoring scripts for me, which I then apply deterministically and safely to the codebase using code transformation tools. I also frequently use agents as sparring partners to explore alternative design options and overcome mental blocks or blind spots.

At the same time, there are many use cases that sound highly convincing in sales pitches but are not particularly serious in practice. One company advertises that its modernization platform has saved hundreds of thousands of development hours by using AI to generate code documentation for entire codebases. Using an open-source project on GitHub as an example, the company demonstrated how a documentation agent generated almost thirty PDF pages for a COBOL routine of roughly one hundred lines of code. Impressive at first glance, but only at first glance. I always ask myself who is supposed to validate all that output and keep it up to date later on. To me, these kinds of AI applications fall into the category Peter Drucker once described as: “There is nothing so useless as doing efficiently that which should not be done at all.”

You say that an AI Agent can already restructure a legacy codebase overnight. Why is that still not enough to modernize a system successfully?

Technically, we are now able to orchestrate multiple agents in parallel to continuously and automatically apply changes to a codebase, provided there is sufficient budget. For example, the company Cursor let agents run for almost a week to rebuild a browser engine from scratch. The numbers sounded impressive: trillions of consumed tokens, millions of generated lines of code, thousands of created files. Sounds fantastic. But much of the result was either copied from existing implementations or simply faked by the agents.

As a company, operating business critical systems, that would make me very uncomfortable. Most enterprise code is proprietary, often implemented in highly unconventional ways, and full of implicit domain knowledge that is documented nowhere. This makes it extremely difficult for agents to produce meaningful results because that information is not part of the models’ training data. An agent should actually struggle in such situations, but it does not notice and simply keeps generating text based on probabilities.

What I find even more problematic is the lack of perspective beyond the source code itself. Anyone modernizing purely at the code level quickly ends up with what I call “modern legacy.” By that I mean code that has been technically polished by an agent while the implemented business processes behind it have long become outdated. Focusing solely on the code misses the real business leverage. True modernization must also question the processes implemented by the system.

Before using AI for modernization, I strongly believe in Scott Hanselman’s old saying: “The less you do, the more of it you can do.” Everything I do not need to modernize with AI makes modernization dramatically faster from the beginning.

In your session, you revisit classic legacy modernization techniques. Why are methods such as characterization testing or seams becoming especially relevant again in the age of AI Agents?

Characterization testing, also known as golden master testing, helps us establish at least a basic safety net in previously untested codebases before agents start making changes. The principle is a classic in software modernization: you take representative input data, let the current system process it, and record the outputs. After refactoring or migration, the renewed system processes the same input data in the hope of producing the same outputs as before.

Why is this relevant again in AI driven modernization projects? Because it allows me to generate reliable and machine-readable feedback that tells agents whether they preserved the system’s behavior or not. That feedback can then be fed directly back into the agent workflow to enable autonomous bug detection and correction.

But we also need more precise intervention points directly in the code. This is where seams become important. Seams are places where the behavior of a system can be changed without modifying the code at that exact location. Existing interfaces or abstract classes are typical examples. The key advantage for agents is that when I give an agent a clearly defined seam, it no longer needs to understand the entire system. It knows the entry point and the surrounding code area. In effect, we place the agent inside its own small sandbox. Unwanted changes across the entire codebase become far less likely. That is the crucial difference between an agent wandering freely through the entire system and one operating within clearly defined boundaries.

You talk about the “Agent Blast Radius.” What risks arise when AI Agents are unleashed on existing systems without clear guardrails?

Without proper guidance and supporting tools, coding agents rely on simple text-based search mechanisms such as grep or glob. They modify code purely based on text patterns, not on semantic understanding. In large codebases with many dependencies, this quickly leads to situations where related implementations, overrides, or references are overlooked and quietly break.

This is why orientation aids are needed on multiple levels and at different points in time. I distinguish between a phase before the language model is used and a phase after the agent has produced its result.

In the preparation phase, the focus is on clearly formulating the modernization intent through context files such as CLAUDE.md or AGENTS.md and making the codebase more AI friendly. Techniques such as domain-driven refactoring or introducing a shared pattern language in the code help make the structure and semantics of the source code easier to navigate and modify more precisely. This has very practical benefits. If a class is called “CustomerScreenController” and resides in a module named “customer-management.partner-data” instead of simply being named “ACS002,” the language model immediately gains more orientation, understands the business context better, and can work far more precisely instead of randomly digging through the system.

In the post-generation phase, there are several ways to keep agent-based modernization on track. The now classic approach uses feedback mechanisms such as compiler feedback, linter rules, and fitness functions. These already cover a great deal and can be incrementally refined based on what agent critics or human reviewers still discover afterward.

An alternative is to avoid ending up in a situation where everything has to be validated manually in the first place. For that, I rely on two approaches: guided AI, where the language model only changes what was previously identified deterministically by an analysis script, and code transformation recipes generated by AI. Depending on the problem, this gives me the best of both worlds: nondeterministic intelligence where it helps and deterministic reliability where it matters most. Agents provide speed, guardrails provide accuracy.

Is there a risk that companies using AI Agents will change code faster while understanding less and less about what is actually happening inside the system?

Absolutely. Margaret-Anne Storey coined a very fitting term for this: cognitive debt. It describes the gradual erosion of a team’s shared understanding of its own system. Teams start thinking, “Well, it somehow works.” But as soon as a bug needs to be fixed, nobody can really explain anymore why the system was designed the way it was.

Closely related is another concept described by her: intent debt. Intent debt arises when goals, constraints, and decision rationales are never documented in the first place or when existing artifacts become outdated and nobody maintains them anymore. This is where my old fascination with epistemology from my computer science studies suddenly comes back to life: what can we actually know, and what always remains implicit?

AI Agents work exclusively with externalized knowledge. An agent only knows the artifacts it has access to, not the context or reasoning that led to those artifacts.

Put differently, generated code often lacks purpose and intent. Nobody remembers anymore why an agent introduced something into the codebase. The agent itself does not care either. It simply keeps generating, emotionless and extremely fast, without understanding the overall vision or involving other people in its reasoning. Cognitive debt and intent debt reinforce each other. If people no longer understand the goals, they cannot build good mental models. And without good mental models, they can no longer formulate clear goals.

That is exactly why I consider careless use of coding agents in existing systems to be very dangerous. Companies can seriously harm themselves very quickly. This is why my talk advocates responsible use of AI and complementary or alternative approaches to software modernization.

How will software modernization evolve in the coming years through agent-based approaches? Will AI Agents remain tools for architects, or become independent modernization partners?

AGI will arrive and solve the legacy modernization problem once and for all! No, of course not. Many people already know my views on this. Even AGI will not magically take over our jobs because by then the agents would probably be smart enough not to touch our legacy code at all.

But seriously, I think it is important to stay calm amid all the hype. Sure, I can generate the thousandths todo list application in TypeScript within minutes using Claude Code. But expecting that this also means we can effortlessly modernize a thirty-year-old payroll system written in assembler on a mainframe is simply naive.

Instead, we need to remain grounded when using AI powered modernization tools and carefully think about where nondeterministic approaches actually provide benefits compared to deterministic ones and how both can be combined effectively. Companies should also think beyond the codebase itself. Can parts of the system be replaced with standard software? Can outdated business capabilities be outsourced? Do we even need all the custom solutions for customers that no longer exist? These and many other questions should be asked first.

As modernization partners, though? In certain areas, I already use agents or even simple chats as modernization buddies for improving naming, performing ad hoc reverse engineering, or acting as constant sparring partners. And yes, I can absolutely imagine more autonomous modernization partners in the future. Right now, I am thinking about counselor agents that work a bit like Commander Deanna Troi on the Enterprise: evaluating situations, uncovering contradictions, and pointing out blind spots. But perhaps that is still a bit too much science fiction.

Ultimately, every development team needs to figure out what works best in its own situation. “It depends!” is not without reason the favorite phrase of software architects everywhere.

More on this topic at the Software Architecture Forum 2026. In his session, Markus Harrer will demonstrate how AI Agents can be used responsibly in legacy modernization projects and why proven techniques such as seams, characterization testing, and fitness functions are more important than ever.

About the Author Markus Harrer

Country: Germany

Markus Harrer has been working in software development for several years, primarily in conservative industries. As a Senior Consultant, he helps organizations develop and improve software in a sustainable and economically meaningful way. He is an active contributor to communities focused on software analytics, software architecture, software modernization, and Wardley Maps. Markus Harrer is also an accredited trainer for the iSAQB Foundation Level and the Advanced Level module IMPROVE.

Blog

About the Author Markus Harrer