BugZero | MongoDB BugID 3423956 - Config server crashes with invariant failure in Qu...

MongoDB - Defect ID: 3423956

Config server crashes with invariant failure in QueryAnalysisCoordinator::onSamplerDelete when documents are deleted from config.mongos

MongoDB - Defect ID: 3423956

Config server crashes with invariant failure in QueryAnalysisCoordinator::onSamplerDelete when documents are deleted from config.mongos

Last updated on March 14th, 2026

BugZero Risk Score
8.5 High

Overall: 8.5

Severity: 10.0

Community: 3.7

Lifecycle: 3.7

What is the BugZero Risk Score?

MongoDB Integration

Learn more about where this data comes from

MongoDB Integration

Learn more

Bug Scrub Advisor

Streamline upgrades with automated vendor bug scrubs

Bug Scrub Advisor

Learn more

BugZero Enterprise

Wish you caught this bug sooner? Get proactive today.

BugZero Enterprise

Learn more

Bug Details

Priority: Blocker - P1
Status: Needs Verification
Resolution: Unresolved

Description

Info

[SERVER-121686] Config server crashes with invariant failure in QueryAnalysisCoordinator::onSamplerDelete when documents are deleted from config.mongos

Top User Comments

Steps to Reproduce

Environment: - MongoDB Community Server 7.0.25 and 7.0.30 - Ubuntu 22.04.5 LTS (kernel 5.15.0) - x86_64 architecture - Sharded cluster: 3 shards (11 data nodes total), 5-member config replica set, 26 active mongos routers - TLS enabled in preferTLS mode across all components - No authentication enabled Severity: Critical — causes full config server replica set outage with unrecoverable crash loop Any delete operation on the config.mongos collection causes all config server mongod processes to crash with an invariant failure in QueryAnalysisCoordinator::onSamplerDelete. The crash occurs both during live operations (when the primary processes the delete) and during oplog replay on secondaries and during startup recovery, creating an unrecoverable crash loop that takes down the entire config server replica set. --- Steps to Reproduce: 1. Deploy a sharded cluster with a config server replica set (we used 5 members) 2. Ensure there are entries in the config.mongos collection (normal mongos registrations) 3. From a mongos, execute any delete on the config.mongos collection: db.getSiblingDB("config").mongos.deleteOne({ _id: "any-mongos-host:27018" }) 4. The config server primary crashes immediately 5. Other config server members crash when they attempt to replicate the delete via oplog application 6. All config servers enter a crash loop — every restart attempt replays the delete from the oplog during startup recovery, triggering the same crash — Expected Behavior: Delete operations on config.mongos should succeed without crashing the config server. The QueryAnalysisCoordinator::onSamplerDelete handler should gracefully handle the deletion, including during oplog replay and startup recovery contexts. — Actual Behavior: The config server process aborts with an invariant failure. The crash occurs in the ReplWriterWorker thread during WriteUnitOfWork::commit(), specifically in the commit handler registered by QueryAnalysisCoordinator for monitoring changes to the config.mongos collection. Fatal log entry: {"s":"F", "c":"ASSERT", "id":23080, "ctx":"ReplWriterWorker-50604", "msg":"\n\n***aborting after invariant() failure\n\n"} Crash stack trace (demangled): mongo::invariantFailed(char const*, char const*, unsigned int) mongo::analyze_shard_key::QueryAnalysisCoordinator::onSamplerDelete(mongo::MongosType const&) mongo::RecoveryUnit::_executeCommitHandlers(boost::optional) mongo::RecoveryUnit::commitRegisteredChanges(boost::optional) mongo::WiredTigerRecoveryUnit::_commit() mongo::RecoveryUnit::commitUnitOfWork() mongo::WriteUnitOfWork::commit() mongo::repl::applyOperation_inlock(...) lambda #17 mongo::repl::applyOperation_inlock(...) mongo::repl::OplogApplierUtils::applyOplogEntryOrGroupedInsertsCommon(...) mongo::repl::applyOplogEntryOrGroupedInserts(...) mongo::repl::OplogApplierUtils::applyOplogBatchCommon(...) mongo::repl::OplogApplierImpl::applyOplogBatchPerWorker(...) Key observation: The crash is in a RecoveryUnit commit handler. The QueryAnalysisCoordinator::onSamplerDelete callback appears to hit an invariant because the coordinator is not in a valid state to process the delete — potentially not fully initialized during oplog replay/recovery mode, or encountering a state inconsistency when the sampler being deleted is not tracked by the coordinator. — Impact: This is a total config server outage scenario: 1. A single delete on config.mongos crashes the primary config server 2. Secondaries crash when applying the same oplog entry 3. All config servers enter an unrecoverable crash loop — startup recovery replays the delete from the oplog, triggering the same invariant failure 4. The sharded cluster operates on cached metadata only (no balancing, no chunk migrations, no metadata changes, no writes to config database) 5. If any mongos router restarts while config servers are down, it may fail to start The crash loop cannot be resolved through normal restart — manual intervention with oplog surgery is required (see workaround below). — Workaround: We developed the following recovery procedure after extensive troubleshooting: 1. Stop the config server mongod process 2. Apply the oplog in standalone mode without the --configsvr flag (this avoids loading the QueryAnalysisCoordinator and its onSamplerDelete callback): sudo -u mongodb mongod --dbpath /var/lib/mongo-config --port 27019 \ --setParameter recoverFromOplogAsStandalone=true 2. Wait for it to accept connections, then shut it down. 3. Start as a writable standalone (without recoverFromOplogAsStandalone): sudo -u mongodb mongod --dbpath /var/lib/mongo-config --port 27019 \ --fork --logpath /tmp/mongod-standalone.log 4. Set the oplogTruncateAfterPoint to skip the problematic delete entry: var local = db.getSiblingDB("local"); var entries = local.oplog.rs.find().sort({$natural: -1}).limit(5).toArray(); // Find the last entry that is NOT a delete on config.mongos for (var i = 0; i if (!(entries[i].ns === "config.mongos" && entries[i].op === "d")) { local.getCollection("replset.oplogTruncateAfterPoint").updateOne( \{_id: "oplogTruncateAfterPoint"} , {$set: {oplogTruncateAfterPoint: entries[i].ts}} ); break; } } 5. Shut down the standalone and restart as a config server via systemctl. The oplog entry containing the delete will be truncated during startup, and the config server will start normally. 6. Repeat for each config server replica set member. Important: After recovery, do NOT attempt any delete operations on config.mongos — the same crash will occur again. Stale mongos entries must remain until a fix is available. — Additional Notes: - We confirmed the bug exists in both 7.0.25 (original crash) and 7.0.30 (tested during recovery attempts — same crash with gitVersion 67480f41dfa5802ce14af5c95bd0e9826d3b2131) - The crash does NOT occur when the oplog is replayed without the --configsvr flag, confirming the issue is specific to the QueryAnalysisCoordinator callback registered in config server mode - The triggering oplog entry is a standard delete: {ns: "config.mongos", op: "d", o: {_id: ":"}} - Our cluster has no analyzeShardKey operations configured — this is purely the coordinator's change-stream observer on config.mongos - The config.mongos collection is purely informational (used by sh.status()) — deletes on it should never be able to crash the server —

Relevant Products

Click on a version to see all relevant bugs

Affected versions:7.0.25, 7.0.30

Fixed versions: No known fixed versions

Relevant Products

Click on a version to see all relevant bugs

Affected versions:7.0.25, 7.0.30

Fixed versions: No known fixed versions

Top MongoDB Defects

8.4Defect ID: 3423956
Config server crashes with invariant failure in QueryAnalysisCoordinator::onSamplerDelete when documents are deleted from config.mongos
8.4Defect ID: 3392546
SIGSEGV (Exit Code 139) exactly 30s after start on AMD Zen 5 due to hardware Shadow Stacks (user_shstk) clashing with coroutines
8.4Defect ID: 3380084
aggregate sort on string field inconsistent on Linux
6.8Defect ID: 3422474
$project silently drops root-level fields after $lookup + $unwind when multiple nested documents contain a type field with different value types
5.5Defect ID: 3407629
[v8.0] Fix write_without_shard_key_base.js to avoid issuing getMore command

Ready to prevent the next vendor outage?

Get a demo

MongoDB - Defect ID: 3423956

Config server crashes with invariant failure in QueryAnalysisCoordinator::onSamplerDelete when documents are deleted from config.mongos

MongoDB - Defect ID: 3423956

Config server crashes with invariant failure in QueryAnalysisCoordinator::onSamplerDelete when documents are deleted from config.mongos

Last updated on March 14th, 2026

BugZero Risk Score8.5 High

Bug Details

Info

Top User Comments

Steps to Reproduce

Top MongoDB Defects

Ready to prevent the next vendor outage?

Links

BugZero Risk Score
8.5 High