Using LLM agents for large-scale code migrations
How to run a framework upgrade or codebase-wide refactor with agents, discovery, per-file transforms, verification, isolation, and review gates that keep it safe.
Large mechanical migrations, bumping a major framework version, replacing a deprecated API across hundreds of call-sites, converting synchronous code to async, share an uncomfortable property: they’re too big to hand-edit and too nuanced for a dumb codemod. The pattern is recognisable enough for an LLM to understand but irregular enough that a regex will silently break something. That gap is exactly where agent-driven migration shines.
This is a playbook for running those migrations safely.
Start with discovery, not transforms
The worst mistake is sending an agent straight into editing files. Before touching a single line, build the work-list.
A discovery pass asks the agent to read, only read, and produce a structured inventory: which files match the migration scope, what pattern each file follows, and a rough complexity score (straightforward rename, needs context, potentially risky). This pass is cheap, fast, and gives you two things: an accurate estimate of the total surface area, and a triage that lets you tackle simple cases in bulk while flagging edge cases for manual review.
Run discovery as a dry run. Nothing is written. The agent outputs a JSON manifest of affected files, and you review it before any editing begins.
Transform each unit in isolation
Once you have the work-list, process each file as an independent task. One agent call per file, one diff per file, no shared mutable state between tasks. This matters for two reasons.
First, errors stay contained. A botched transform on payments/invoice.ts cannot cascade into auth/session.ts. Second, isolated tasks are parallelisable, you can fan out across many files without worrying about race conditions, and you can retry a single file without re-running the whole migration.
Each transform should produce a diff that can be reviewed, accepted, or rejected independently. The agent does not commit. It proposes.
-
migration.discoverread-only inventory pass, outputs file manifest Low -
migration.transformsingle-file diff proposal, no writes yet Low -
migration.verifybuild + typecheck + tests on the proposed diff Low -
migration.commitaccepts the diff and stages the file Medium -
migration.bulk_commitstages all verified files at once High -
migration.revertdiscards all staged changes Critical
Verify before accepting
A proposed diff is not a merged diff. After each transform, run verification: compile the project, run the type-checker, execute the test suite for that module. If verification passes, the change is accepted. If it fails, the agent gets the error output and attempts a correction, or the file is flagged for human review.
This is the step most agent migration scripts skip, and it’s the one that matters most. A migration that produces green builds at every step is a migration you can trust. One that batches hundreds of transforms and runs tests at the end is a migration that produces a confusing, entangled failure.
Isolate parallel work
If you’re running transforms in parallel, which you should be, for large codebases, keep them in separate branches or worktrees. Git worktrees let you check out the same repository into multiple directories simultaneously, each with its own working tree. An agent working in worktree/migrate-auth cannot stomp changes being made in worktree/migrate-payments.
Merge order then becomes a deliberate decision rather than a race condition. Finish and verify each branch independently, then integrate them one at a time, running the full test suite at each merge. The integration step surfaces any cross-file interactions that per-file verification missed.
Human review gates for risky edits
Not every change in a migration is equivalent. Renaming an import is low-risk. Changing a function signature that crosses a public API boundary is not. Modifying retry logic, auth checks, or database transaction patterns warrants a human eye regardless of whether the build passes.
Critical Classify files before the migration starts. Any file touching security controls, billing logic, or external I/O contracts should require explicit human approval before its diff is accepted, no matter how confident the agent is. Build that gate into the pipeline as a hard stop, not a soft warning.
The agent proposes; the human approves. For low-risk mechanical changes, approval can be a fast bulk review of a clean diff. For critical files, it should be a proper code review.
Keep an audit log of every change
Every accepted diff should be logged: the file path, the nature of the transform, the verification result, and, if a human reviewed it, who approved it and when. This is not bureaucracy. When a regression surfaces three weeks after a migration, the log is how you answer “did the migration touch this code?” in under a minute.
The log also gives you a re-run target. If the migration was interrupted, deployment froze, a dependency was updated mid-run, you can resume from the last verified checkpoint rather than starting over.
The MCP-first parallel
This playbook mirrors the principles that MCP-first applies to agent systems generally. Discovery before execution. Dry-run before bulk writes. Verification before acceptance. Explicit risk tiers with hard gates for destructive actions. An audit trail of every change.
The reason those principles exist in the MCP-first manifest is precisely because agents executing at scale, whether against an API or a codebase, share the same failure modes: silent side effects, hard-to-reverse errors, and the compounding of small mistakes into large ones. A migration that follows this playbook is not slower than one that doesn’t. It’s the same work, ordered correctly. See the tools reference for more on how to structure the verification and review steps in your own agent pipeline.
An agent that proposes, verifies, and gates is safer than one that just writes.