BugZero | MongoDB BugID 3175639 - Committed migration recovery on step-up may leave ...

OPERATIONAL DEFECT DATABASE

...

BugZero | MongoDB BugID 3175639 - Committed migration recovery on step-up may leave ...

MongoDB - Defect ID: 3175639

Committed migration recovery on step-up may leave range deleter service in inconsistent state on donor

MongoDB - Defect ID: 3175639

Committed migration recovery on step-up may leave range deleter service in inconsistent state on donor

Last updated on September 12th, 2025

BugZero Risk Score
5.3 Medium

Overall: 5.3

Severity: 6.4

Community: 3.7

Lifecycle: 4.6

What is the BugZero Risk Score?

MongoDB Integration

Learn more about where this data comes from

MongoDB Integration

Learn more

Bug Scrub Advisor

Streamline upgrades with automated vendor bug scrubs

Bug Scrub Advisor

Learn more

BugZero Enterprise

Wish you caught this bug sooner? Get proactive today.

BugZero Enterprise

Learn more

Bug Details

Priority: Major - P3
Status: Needs Scheduling
Resolution: Unresolved

Description

Info

In sharded clusters, on step-up those two flows may execute concurrently: (I) The range deleter service is scanning config.rangeDeletions, enqueuing all range deletion tasks that are marked as non-pending (II) The step-up procedure is spawning the recovery of migration coordinators based on the content of config.migrationCoordinators Regarding (II), when a committed migration needs to be recovered, the donor is re-executing all the local part of the commit upon step-up, that includes: (A) Enqueue a PENDING range deletion task (B) Updating the persistent state on config.rangeDeletions by marking the document as non-pending (C) Asynchronously observe the update performed at step (2), causing the task enqueued at (1) to be marked as non-pending in order to be eventually served It may happen for (A) and (B) to have already been executed before stepping down. In that case - when stepping up - it may happen for the flows (I) and (II) to interleave in the following way: [flow (I)] The range deleter service enqueues task T because it is already marked as ready in config.rangeDeletions The range deleter service quickly serves T, deleting the persistent document from config.rangeDeletions [flow (II)] The pending range deleter T is scheduled at step (A) [flow (II)] Step B is a no-op because the document was deleted at (2) Since (4) is a no-op, step C never happens. That's the correct behavior because the range deletion task doesn't exist anymore and the node may be starting receiving the same range, so no deletion should be performed. The problem is rather that (3) creates a dangling range deletion task that should never be marked as ready because it should not have been enqueued in the first place. Proposed solution The migration coordinator should only execute once the procedure marking the range deleter as non-pending. This code should be executed conditionally, only if the range deletion document is still pending.

Top User Comments

Steps to Reproduce

Change history

2025-09-12 Added: 7.0.0, 8.0.0, 8.1.0, 8.2.0

Links

Relevant Products

Click on a version to see all relevant bugs

Affected versions:7.0.0, 8.0.0, 8.1.0, 8.2.0

Fixed versions: No known fixed versions

Relevant Products

Click on a version to see all relevant bugs

Affected versions:7.0.0, 8.0.0, 8.1.0, 8.2.0

Fixed versions: No known fixed versions

Top MongoDB Defects

5.5Defect ID: 3192414
Sharded DDL commands may complete while the DDL coordinator is still active in-memory (cleaning up)
5.4Defect ID: 3194150
Shard role API stashing doesn't abandon the snapshot for recursive acquisitions
5.3Defect ID: 3175837
$bottom returns wrong document
5.3Defect ID: 3191790
Change Stream breaks on document with top-level $v
5.3Defect ID: 3175639
Committed migration recovery on step-up may leave range deleter service in inconsistent state on donor

MongoDB Integration

Learn more about where this data comes from

MongoDB Integration

Learn more

Bug Scrub Advisor

Streamline upgrades with automated vendor bug scrubs

Bug Scrub Advisor

Learn more

BugZero Enterprise

Wish you caught this bug sooner? Get proactive today.

BugZero Enterprise

Learn more

Ready to prevent the next vendor outage?

Get a demo

OPERATIONAL DEFECT DATABASE

MongoDB - Defect ID: 3175639

Committed migration recovery on step-up may leave range deleter service in inconsistent state on donor

MongoDB - Defect ID: 3175639

Committed migration recovery on step-up may leave range deleter service in inconsistent state on donor

Last updated on September 12th, 2025

BugZero Risk Score5.3 Medium

Bug Details

Info

Top User Comments

Steps to Reproduce

Links

Top MongoDB Defects

Ready to prevent the next vendor outage?

BugZero Risk Score
5.3 Medium