返回首页

How to use Codex and its boundaries in real projects

Think of it as a piece of the change pipeline, not a faster author

A common situation is to ask “how to use Codex”. By default, it is asking about shortcut keys, prompt word templates, and how to make it write more code.

I later discovered that this question was asked in the wrong direction.

Codex’s biggest value is allowing teams to initiate changes more frequently and cheaper. The real risk is also here: after increasing the frequency of changes, the team’s original check-in mechanism may not be enough.

So I won’t talk about the “Ten Prompt Word Techniques” in this article. I will only talk about one main line: Connect Codex into a change pipeline that can be accepted, rolled back, and traced.

1. First select the “work unit suitable for Codex”

The most easily overturned usage is to throw a vague requirement directly to Codex:

  • “Refactor this module”
  • “Optimize performance”
  • “Make this place more elegant”

It certainly gives a bunch of changes, and they all look right. The problem is that there is no way to accept or review it.

I have condensed the tasks suitable for Codex into three categories, each with clear “completion conditions”.

A. Behavior changes that can be written as assertions

The characteristic is: can the input and output be described in one sentence.

  • Add boundary processing to a function
  • Fixed a definite bug
  • Migrate a piece of logic to a new interface but keep the semantics unchanged

The best way to do acceptance is to write tests directly, or at least write a runnable assertion script.

B. Enumerable Mechanical Changes

The characteristics are: clear rules and controllable coverage.

  • Batch renaming
  • Upgrade API calls
  • Complete nullability / error handling

Codex is powerful for this kind of work, but it must be allowed to output a “modification list” and a “searchable scope of changes”, otherwise it tends to miss corners.

C. Local optimization with hard indicators

The characteristic is: the ability to define measurement methods.

  • Reduce IO once during startup phase
  • Reduce serialization once for a certain link
  • Reduce unnecessary rearrangement of a certain page

Don’t optimize without metrics. Codex will mistake “it looks faster” as “it’s really faster”.

2. Write the requirements as “acceptance criteria” and then write the prompt words

A common situation is to write prompt words like writing wishes:

  • “Please write elegantly”
  • “Please follow best practices”

These are unacceptable.

I require that every time before letting Codex take action, I first write an acceptance criterion and write it in the same task description:

  • What behavior will this change change?
  • What behaviors are not allowed to change?
  • How to behave when you fail
  • How to roll back

You will find that once the acceptance criteria are clearly written, the prompt words become shorter.

A structure I often use is:

  1. Background (two or three sentences, don’t talk about history)
  2. Goal (only write behavior, not implementation)
  3. Constraints (what cannot be moved)
  4. Acceptance (test points or scripts)
  5. Output format (give plan first, then patch)

3. Ask Codex to give a “change plan” first, rather than giving the final code directly.

The most expensive thing in real projects is “discovering that the semantics have changed after merging”.

So I made the default output of Codex a two-step process:

  • Step 1: List which files will be changed, why they will be changed, and the risks of each change
  • Step 2: Apply patch again

If it outputs a whole piece of code as soon as it comes up, I will just call it back and let it make up the plan.

The reason is simple:

  • The planning stage can reveal whether it understands boundaries
  • Whether excessive changes can be cut off in time during the planning stage

4. Let Codex output a “reviewable patch” instead of just a bunch of text

Copying and pasting hundreds of lines of code into the IDE is turning review into a manual job.

The correct approach is: let the Codex output in the form of diff/patch, or let it only make the smallest mergeable changes.

I will adhere to two constraints in the team:

  • A single PR does not span two unrelated intents
  • The core difference of a single PR is within 200 lines (otherwise it must be split)

It is easy to write more and more codex, and it must be “closed in a small box” with PR constraints.

5. Let the test be the “first output”, not the “last addition”

The most common pitfall of Codex is that the implementation is completely written and the tests appear to be there, but the test is only validating the implementation it wrote.

So I asked it to output in order:

  1. List of test cases (which boundaries are covered)
  2. Key test code
  3. Implement patch

If it cannot provide test points, it either means that the requirements are not clearly written, or it means that this change is not suitable for it to make.

6. Write the “rollback path” into the change

People who make changes by hand often subconsciously leave a fallback path.

It’s easy for the person who made the change to forget.

I would require every build change to satisfy at least one rollback policy:

-feature flag

  • Configuration switches
  • Keep the old path for a while (double run)

Major changes without a rollback path, no matter who wrote them, should not go live. Codex just makes it easier to make “big changes”.

7. Treat traceability as a “cost item” of using Codex

If you only count “how much development time is saved”, you will often be encouraged to write more and more Codex.

What I care more about is: whether the problem can be reproduced.

So in the process I will forcefully record three things:

  • Task description for Codex this time (including acceptance criteria)
  • Change plan given by Codex
  • Final patch and test results

These three items are the chain of evidence during accident review.

Without logging, the more you use Codex, the more it will feel like you are running an unreproducible system.

8. A set of prompt word skeletons that can be used directly

The following paragraph is the minimum change to make its output go online.

You can copy it directly and replace the square brackets:

Changes need to be made in a real project.

Background: [Two or three sentences describing the current situation and problems] Goals (behavioral level):

  • Must do: [List 2-4 items]
  • Never change: [List 2-4] Constraints:
  • Not allowed to introduce new dependencies/not allowed to change public API/must be compatible with old data (select as needed) Acceptance:
  • gives a list of test points
  • Give at least [N] key test case codes Output format:
  1. Give a change plan first (document list + risk points)
  2. Give the smallest patch (diff), do not include long explanatory text

Applicable boundaries

The scenarios where Codex is not suitable are also clear:

  • I can’t even tell what behavior I want, so I can only “try it first”
  • This is a high-risk change involving data migration, permissions, and funding, but there is no complete acceptance environment
  • Reliance on tacit knowledge (online switches, grayscale strategies, historical incidents) without documentation

In these scenarios, Codex will solidify uncertainty directly into code.

Summary

The answer to “how to use Codex” is a set of engineering constraints.

Thinking of it as faster authoring puts the responsibility on review and online.

Treat it as a section of the pipeline, use acceptance criteria, testing, rollback and traceability to contain changes, and it will become a real efficiency tool.

FAQ

读完之后,下一步看什么

如果还想继续了解,可以从下面几个方向接着读。

Related

继续阅读

这里整理了同分类、同标签或同类问题的文章。