返回首页

Swift Concurrency Series 06|Common problems in Swift concurrency: race conditions, repeated requests and state confusion

The real trouble is that these problems often manifest themselves as sporadic outages in the business rather than explicit breakdowns.

The most frustrating thing about concurrency bugs is that they often don’t feel like bugs.

It more often manifests itself online as these ambiguous questions:

  • The user said “sometimes it flashes”
  • Test says “Occasionally old data appears”
  • The product said “I just cut the filter, why did it jump back again?”
  • There is no clear crash in the log, but the page status is just wrong.

In other words, many concurrency problems look more like “occasional business exceptions” than “obviously technically broken”.

So in this article, I don’t want to just talk about the definition of terms, but directly focus on a more real list page scenario and break down the three most common types of problems:

  • Competition
  • Repeat request
  • state of confusion

And how they grow in real code.

1. First look at a page that is so real that it couldn’t be more real.

Suppose there is an article list page that supports these operations:

  • Automatic loading when the page enters for the first time
  • Pull down to refresh
  • Switch categories
  • Enter keyword search
  • Click “Retry”

Many projects are written like this at the beginning:

@MainActor
final class ArticlesViewModel: ObservableObject {
  @Published var items: [Article] = []
  @Published var isLoading = false
  @Published var errorMessage: String?
  @Published var selectedCategory: String = "all"
  @Published var keyword: String = ""

  let repository: ArticlesRepository

  init(repository: ArticlesRepository) {
    self.repository = repository
  }

  func onAppear() {
    Task {
      await load()
    }
  }

  func refresh() {
    Task {
      await load()
    }
  }

  func retry() {
    Task {
      await load()
    }
  }

  func categoryChanged(to value: String) {
    selectedCategory = value
    Task {
      await load()
    }
  }

  func keywordChanged(to value: String) {
    keyword = value
    Task {
      await load()
    }
  }

  func load() async {
    isLoading = true
    errorMessage = nil

    do {
      items = try await repository.fetchArticles(
        category: selectedCategory,
        keyword: keyword
      )
    } catch {
      errorMessage = error.localizedDescription
    }

    isLoading = false
  }
}

When this code is first written, everyone usually thinks it is “quite smooth”:

  • Yes async/await
  • The code is straightforward
  • Every entrance works

But as long as the page is actually used, concurrency problems will soon arise.

2. The first type of problem: the race condition is a default order that does not exist.

Still this code. Its core problem is not that it opens a lot of Task, but that it defaults to these things happening in the order you want:

  • The request sent first will be returned first.
  • When the old request comes back, the current filtering conditions have not changed.
  • The start and end of loading always correspond to one-to-one

But asynchronous systems do not guarantee these orders for the team.

For example, the user operates as follows:

  1. Enter the page and request A to issue
  2. Immediately switch to the “iOS” category and request B to send
  3. Enter the keyword swift again to request C to issue

At this time, if the return order is:

  1. C comes back first
  2. Come back after A
  3. B comes back last

According to the current code, the three results will be changed to items. In other words, what is displayed on the final page depends on who comes back last, not who corresponds to the current user intention.

This is the most typical race condition:

The code secretly relies on order, but the order is not constrained at all.

3. The second type of problem: The root cause of repeated requests is usually that the entrance is not closed.

Looking at the ViewModel above, there are at least five entries that will trigger load():

  • onAppear
  • refresh
  • retry
  • categoryChanged
  • keywordChanged

Each entrance has its own Task. This is certainly legal from a syntax perspective, but from an engineering perspective it means:

  • There is no unified scheduling point for similar tasks
  • No one knows whether there is already a similar task running
  • When new tasks appear, old tasks have no clear fate

Then “repeat requests” are no longer accidental, but a natural product of the structure.

So in concurrency management, I rarely ask:

“Why is there an extra request here?”

I more often ask:

“How many entrances are there to the same type of tasks? Are there any substitution relationships between them?”

If you cannot answer these two questions, repeated requests are almost inevitable.

4. The third type of problem: The status is disordered, often because the expired results are still eligible to be written.

A common situation is that as long as the request returns successfully, the result should be accepted.

This is usually fine in synchronous systems, but often wrong in concurrent systems.

Because the most critical issue in a concurrent scenario is:

**Is this result still considered a valid result for the current page? **

For example:

  • The current page has been switched to keyword = "swift"
  • The result is from the old request keyword = ""

The result is real, successful, and in the right format, but it has expired. If it is still allowed to write the UI, the state will be wrong.

Therefore, in a concurrent system, “the result is correct” and “the result is valid” are two different things. On the surface, many page problems appear to be wrong results, but in fact, it is closer to not being able to judge whether they are still qualified to be implemented.

5. Don’t rush to use complex tools first. The first step is to close similar tasks.

What the above code needs most is to do a very simple thing first:

**Give similar tasks a unified entrance. **

For example, first load the list like this:

@MainActor
final class ArticlesViewModel: ObservableObject {
  @Published private(set) var state: ViewState = .idle
  @Published private(set) var items: [Article] = []
  @Published var selectedCategory: String = "all"
  @Published var keyword: String = ""

  private let repository: ArticlesRepository
  private var loadTask: Task<Void, Never>?

  init(repository: ArticlesRepository) {
    self.repository = repository
  }

  func reload() {
    let request = RequestContext(
      category: selectedCategory,
      keyword: keyword
    )

    loadTask?.cancel()
    loadTask = Task {
      await performLoad(request: request)
    }
  }

  private func performLoad(request: RequestContext) async {
    state = .loading

    do {
      let result = try await repository.fetchArticles(
        category: request.category,
        keyword: request.keyword
      )

      guard !Task.isCancelled else { return }
      guard request.category == selectedCategory,
         request.keyword == keyword else { return }

      items = result
      state = .loaded
    } catch is CancellationError {
      // 取消不更新页面
    } catch {
      guard !Task.isCancelled else { return }
      state = .failed(error.localizedDescription)
    }
  }
}

This code does several very key things:

  • There is only one holding point for similar loading tasks loadTask
  • When a new task arrives, the old task will be canceled first
  • Freeze the “current context” to RequestContext when sending a request
  • After the result is returned, it will be verified whether it still corresponds to the current page

Note that what’s really important here is that the task relationships start to become clear.

6. “Freezing request context” is so critical

Many concurrency articles talk about task cancellation, but not enough emphasis on “context snapshot”. But in the page business, it’s very important.

For example, when requesting:

  • selectedCategory = "ios"
  • keyword = "swift"

Then these two values ​​should not dynamically read the latest values ​​on the current ViewModel after the request flies out. Otherwise you will often get a very strange state:

  • When sending a request, it is a set of parameters
  • Another set of parameters is used when verifying the results

So a very practical principle is:

When initiating an asynchronous task, freeze the business context that the task really depends on.

In this way, there will be a clear basis for judging “whether this result is still the current result” later.

7. Many concurrency bugs end up with “too many status write entries”

A common situation is that when encountering a concurrency problem, you will immediately think of:

  • Do you want to lock it?
  • Do you want to be an Actor?
  • Do you want to switch threads?

Of course these are sometimes important, but in page-level scenarios, the more common problems are actually:

  • There are too many places to write items
  • Too many places can be changed isLoading
  • Too many entrances can send requests directly

Once the state write entries are scattered, even if there is no real data competition, the phenomenon of “the combination is wrong” will occur.

So when I do this kind of troubleshooting, I usually ask the following questions first:

  • Which codes have the authority to change this status
  • Which tasks have the right to end the current loading
  • Which results have the right to overwrite the current list

Once these issues are not addressed, it is usually only a matter of time before bugs develop.

8. An evolution sequence closer to the real project

If you really want to solve this kind of problem, I suggest evolving in this order instead of introducing too many mechanisms at the beginning:

1. Close the entrance to similar tasks

First, let “list loading” have only one unified entrance, instead of sending its own request for each UI event.

2. Clarify the task replacement relationship

Which tasks should be concurrent and which should cancel old tasks and only keep the last one.

3. Freeze request context

Collect the key business parameters relied upon when making requests into a clear object.

4. Add validity judgment to the result

Not all successfully returned results are eligible to change the current page.

5. Finally, consider more complex shared state isolation

For example, cross-page shared cache, cross-module resource coordination, then look at Actor, unified coordinator and other solutions.

This order is more stable because it solves the business concurrency relationship first, rather than introducing more complex technical vocabulary first.

9. Conclusion: The essence of most business concurrency problems is “no modeling of task relationships”

Race conditions, duplicate requests, and state confusion seem to be three problems, but the actual root causes are often very close:

  • Who is the same task as who, no modeling
  • New tasks are coming, what to do with old tasks, there is no modeling
  • Is the result still valid? There is no modeling.
  • Where can I write my status without closing it?

So to rephrase this article in a shorter way, I would say:

Most concurrency issues in business appear to be incompetent with concurrency syntax, but in fact they are closer to failing to clearly model task relationships, result validity, and state write permissions.

Once these three things start to become clear, a lot of “accidental confusion” will go away more easily than you think.

FAQ

读完之后,下一步看什么

如果还想继续了解,可以从下面几个方向接着读。

Related

继续阅读

这里整理了同分类、同标签或同类问题的文章。