未分类April 15, 2026 at 10:30 PM作者单一鸣5 分钟阅读0 个标签

Changes in responsibility boundaries brought about by Codex

Generating code is fast, but the really expensive thing is to quietly change the issue of "who is responsible for checking"

专题入口 / 未分类

简体中文 English

When I first introduced Codex to the team on a large scale, the first good thing was that I was finally willing to make some small changes.

A common situation in the past was that everyone knew clearly: this code needs to be refactored, this script needs to be tested, and this boundary needs to be logged. But they are all stuck in the same reality. You have to pay attention costs to start writing, and you have to review after writing.

Codex reduces the “hands-on friction”.

The problems began to arise from that day.

Because as friction decreases, the boundaries of responsibility are also quietly moving.

On the surface, you may think that you are buying faster authors, but what you are actually buying is more changes. When there are too many changes, the system needs clearer acceptance and stronger control mechanisms.

If not, the efficiency of Codex will be given in another form: online accidents, review fatigue, and rework that “looks right but is unstable” again and again.

How did things gradually become “the reviewer is responsible”

In the first week after introducing Codex, everyone used it very restrainedly.

-Write a widget

Change the name of a paragraph
Add an if judgment

These changes, if not perfect, carry limited risk.

Starting in the second week, the changes got bigger.

Some people started to ask Codex to “organize the surrounding code by the way”, some people started to ask it to “refactor this module together”, and some even replaced an entire section of complex logic with a generated version.

At this stage, the most common sentence is:

Look, the code is all written and the logic is correct.

The problem is, there is a hidden substitution in this sentence.

In the past, the author was mainly responsible for whether the logic was correct or not. The author needs to go through a complete convergence when writing: reasons for changes, what are the boundaries, what will happen if it fails, and how to roll back.

Codex flattens the convergence process in the middle. The author becomes the “one who makes demands” and leaves the restraint to the reviewer.

So the responsibility boundary starts from:

The author is responsible for converging the changes until they are ready for release

became:

Reviewers are responsible for filtering changes until they cause no problems

This is not a moral issue, this is a mechanical issue.

The mode of generating code naturally tends to “write a version first” rather than “think clearly about the boundaries first”. This transfer of responsibility occurs as long as the process does not make boundaries explicit.

Often the price will be paid in three places immediately

1) Review becomes “archaeology”

In the past, review looked at the author’s choices:

Use this to achieve
Split into these functions
Add this protection here

Now review looks at the output:

Did this generated code miss any boundaries?
Are there any new side effects introduced?
Have you quietly changed the semantics of the old logic?

It’s not the same kind of job.

The former is to judge the author’s reasoning process, and the latter is to find traps in unfamiliar code. The latter is more expensive and relies more on the reviewer’s familiarity with the context.

You will soon see a phenomenon:

The code is merged
Something went wrong online
During the review, no one could tell why it was written like this.

Because the “why” is never written down.

2) Testing is treated as “optional”

The biggest temptation in generating code is: it looks complete.

The functions are beautifully broken down, the names are the same, and even some additional single tests are provided.

The problem is that these unit tests are often “verifying the implementation” rather than “verifying the requirements”.

Typical symptoms are:

Increased coverage
A lot of accidents

Because what is lacking is acceptance criteria, not a lack of a JUnit/pytest format.

3) The rollback path is forgotten

Handwritten changes often lead to natural thoughts of “what if it doesn’t work.”

Generating changes can easily give the illusion that this is a cleaner implementation and should be fine.

But the most painful thing online is “the changes are too big and you can’t go back.”

When the generated changes span multiple files and are refactored in multiple places, rollback is no longer about undoing a commit, but about rebuilding the old semantics.

This is the most expensive bill after the responsibility boundaries are moved.

To bring back the boundaries of responsibility, what we need to do is not “disable Codex”

When many teams see these problems, their first reaction is to restrict their use:

Disable large segment generation
Modification of core modules is prohibited
Key logic must be written by hand

These rules are useful in the short term, but quickly become formalistic.

The truly effective approach is to move the “author’s responsibility” part forward into the Codex usage process.

I finally narrowed it down to four hard requirements. If one of them is missing, I won’t give up:

The intention of change must be written clearly: describe the behavior to be changed in one sentence, not “refactor it”.
Acceptance criteria must be enforceable: what are the inputs, what are the expected outputs, and how to behave in case of failure.
The risk boundary must be listed: Which callers are affected by this change, and what is the worst case scenario?
The rollback plan must be clear: which version to roll back to, how to process the data, and where the downgrade switch is.

Codex can help write implementations, but it cannot replace these four items.

More importantly: after writing these four items, the review became lighter.

The reviewer does not need to guess “what exactly you want to do” from the code, but only needs to judge “whether the written intention and implementation are consistent.”

The most common misunderstandings

Misunderstanding 1: Treating Codex as “a newcomer who can write code”

Newcomers get stuck in places they don’t know, ask questions, and expose uncertainty.

Codex does not.

It will give you the answer that seems most comfortable when you are unsure. Without acceptance criteria, it buries uncertainty into the code.

Myth 2: Using more prompt words to make up for missing engineering processes

Cue words can help it more closely match the style, but they can’t replace the gatekeeping mechanism.

Stuffing missing processes into prompt will end up with an “organizational memory that is written longer and longer,” but it has no version control, no rollback strategy, and no failure drills.

Misunderstanding 3: Only counting “how much time is saved”

The most dangerous metric is: how much development time is saved per requirement.

Because it will drive everyone to pursue a greater scope of generation.

I prefer to focus on two indicators:

Generate rework rate after change integration
Proportion of incidents introduced by generation changes

They are related to boundaries of responsibility.

Applicable boundaries

Not all teams need such heavy constraints.

If the system is small enough, releases are frequent enough, and rollbacks are simple enough, the risk of Codex will be swallowed up.

But once any of the following is met, the responsibility boundaries must be made explicit:

Changes will affect high-cost links such as payment, risk control, and approval
Long release cycle and high rollback cost
Code ownership is ambiguous and review is already a bottleneck

In these scenarios, Codex’s “faster” will first push to slower.

Summary

Codex certainly makes writing code faster.

But what it really changes is the team’s default answer to “who is responsible for change?”

If intent, acceptance, boundaries, and rollbacks are not moved forward, the responsibility will naturally slide to the reviewer.

In the end, you will find that what is saved is not development time, but the attention of the entire team is spent. This account is much more expensive than token.

FAQ