返回首页

Changes in responsibility boundaries brought about by Codex

Generating code is fast, but the really expensive thing is to quietly change the issue of "who is responsible for checking"

When I first introduced Codex to the team on a large scale, the first good thing was that I was finally willing to make some small changes.

A common situation in the past was that everyone knew clearly: this code needs to be refactored, this script needs to be tested, and this boundary needs to be logged. But they are all stuck in the same reality. You have to pay attention costs to start writing, and you have to review after writing.

Codex reduces the “hands-on friction”.

The problems began to arise from that day.

Because as friction decreases, the boundaries of responsibility are also quietly moving.

On the surface, you may think that you are buying faster authors, but what you are actually buying is more changes. When there are too many changes, the system needs clearer acceptance and stronger control mechanisms.

If not, the efficiency of Codex will be given in another form: online accidents, review fatigue, and rework that “looks right but is unstable” again and again.

How did things gradually become “the reviewer is responsible”

In the first week after introducing Codex, everyone used it very restrainedly.

-Write a widget

  • Change the name of a paragraph
  • Add an if judgment

These changes, if not perfect, carry limited risk.

Starting in the second week, the changes got bigger.

Some people started to ask Codex to “organize the surrounding code by the way”, some people started to ask it to “refactor this module together”, and some even replaced an entire section of complex logic with a generated version.

At this stage, the most common sentence is:

Look, the code is all written and the logic is correct.

The problem is, there is a hidden substitution in this sentence.

In the past, the author was mainly responsible for whether the logic was correct or not. The author needs to go through a complete convergence when writing: reasons for changes, what are the boundaries, what will happen if it fails, and how to roll back.

Codex flattens the convergence process in the middle. The author becomes the “one who makes demands” and leaves the restraint to the reviewer.

So the responsibility boundary starts from:

  • The author is responsible for converging the changes until they are ready for release

became:

  • Reviewers are responsible for filtering changes until they cause no problems

This is not a moral issue, this is a mechanical issue.

The mode of generating code naturally tends to “write a version first” rather than “think clearly about the boundaries first”. This transfer of responsibility occurs as long as the process does not make boundaries explicit.

Often the price will be paid in three places immediately

1) Review becomes “archaeology”

In the past, review looked at the author’s choices:

  • Use this to achieve
  • Split into these functions
  • Add this protection here

Now review looks at the output:

  • Did this generated code miss any boundaries?
  • Are there any new side effects introduced?
  • Have you quietly changed the semantics of the old logic?

It’s not the same kind of job.

The former is to judge the author’s reasoning process, and the latter is to find traps in unfamiliar code. The latter is more expensive and relies more on the reviewer’s familiarity with the context.

You will soon see a phenomenon:

  • The code is merged
  • Something went wrong online
  • During the review, no one could tell why it was written like this.

Because the “why” is never written down.

2) Testing is treated as “optional”

The biggest temptation in generating code is: it looks complete.

The functions are beautifully broken down, the names are the same, and even some additional single tests are provided.

The problem is that these unit tests are often “verifying the implementation” rather than “verifying the requirements”.

Typical symptoms are:

  • Increased coverage
  • A lot of accidents

Because what is lacking is acceptance criteria, not a lack of a JUnit/pytest format.

3) The rollback path is forgotten

Handwritten changes often lead to natural thoughts of “what if it doesn’t work.”

Generating changes can easily give the illusion that this is a cleaner implementation and should be fine.

But the most painful thing online is “the changes are too big and you can’t go back.”

When the generated changes span multiple files and are refactored in multiple places, rollback is no longer about undoing a commit, but about rebuilding the old semantics.

This is the most expensive bill after the responsibility boundaries are moved.

To bring back the boundaries of responsibility, what we need to do is not “disable Codex”

When many teams see these problems, their first reaction is to restrict their use:

  • Disable large segment generation
  • Modification of core modules is prohibited
  • Key logic must be written by hand

These rules are useful in the short term, but quickly become formalistic.

The truly effective approach is to move the “author’s responsibility” part forward into the Codex usage process.

I finally narrowed it down to four hard requirements. If one of them is missing, I won’t give up:

  1. The intention of change must be written clearly: describe the behavior to be changed in one sentence, not “refactor it”.
  2. Acceptance criteria must be enforceable: what are the inputs, what are the expected outputs, and how to behave in case of failure.
  3. The risk boundary must be listed: Which callers are affected by this change, and what is the worst case scenario?
  4. The rollback plan must be clear: which version to roll back to, how to process the data, and where the downgrade switch is.

Codex can help write implementations, but it cannot replace these four items.

More importantly: after writing these four items, the review became lighter.

The reviewer does not need to guess “what exactly you want to do” from the code, but only needs to judge “whether the written intention and implementation are consistent.”

The most common misunderstandings

Misunderstanding 1: Treating Codex as “a newcomer who can write code”

Newcomers get stuck in places they don’t know, ask questions, and expose uncertainty.

Codex does not.

It will give you the answer that seems most comfortable when you are unsure. Without acceptance criteria, it buries uncertainty into the code.

Myth 2: Using more prompt words to make up for missing engineering processes

Cue words can help it more closely match the style, but they can’t replace the gatekeeping mechanism.

Stuffing missing processes into prompt will end up with an “organizational memory that is written longer and longer,” but it has no version control, no rollback strategy, and no failure drills.

Misunderstanding 3: Only counting “how much time is saved”

The most dangerous metric is: how much development time is saved per requirement.

Because it will drive everyone to pursue a greater scope of generation.

I prefer to focus on two indicators:

  • Generate rework rate after change integration
  • Proportion of incidents introduced by generation changes

They are related to boundaries of responsibility.

Applicable boundaries

Not all teams need such heavy constraints.

If the system is small enough, releases are frequent enough, and rollbacks are simple enough, the risk of Codex will be swallowed up.

But once any of the following is met, the responsibility boundaries must be made explicit:

  • Changes will affect high-cost links such as payment, risk control, and approval
  • Long release cycle and high rollback cost
  • Code ownership is ambiguous and review is already a bottleneck

In these scenarios, Codex’s “faster” will first push to slower.

Summary

Codex certainly makes writing code faster.

But what it really changes is the team’s default answer to “who is responsible for change?”

If intent, acceptance, boundaries, and rollbacks are not moved forward, the responsibility will naturally slide to the reviewer.

In the end, you will find that what is saved is not development time, but the attention of the entire team is spent. This account is much more expensive than token.

FAQ

读完之后,下一步看什么

如果还想继续了解,可以从下面几个方向接着读。

Related

继续阅读

这里整理了同分类、同标签或同类问题的文章。