Loading...
Loading...
MongoDB config server will crash and can't start normally in a cluster which is upgrade from 6.0 version。The crash log is as follows: with 7.0+ version: {"t":{"$date":"2026-03-19T17:15:03.886+08:00"},"s":"F", "c":"ASSERT", "id":23079, "ctx":"ReplWriterWorker-2","msg":"Invariant failure","attr":{"expr":"erased","file":"src/mongo/db/s/query_analysis_coordinator.cpp","line":164}} with 8.0+ version: {"t":{"$date":"2026-03-17T21:21:24.897+08:00"},"s":"F", "c":"ASSERT", "id":23079, "svc":"S", "ctx":"ReplWriterWorker-3","msg":"Invariant failure","attr":{"expr":"erased","file":"src/mongo/db/s/query_analysis_coordinator.cpp","line":189}} The main reason for this is that QueryAnalysisCoordinator records `_samplers` when inserting documents into `config.mongos` and cleans up `_samplers` when deleting records. There's also an invariant check after `QueryAnalysisCoordinator::onSamplerDelete _samplers.erase`. However, for clusters upgraded from local versions, `config.mongos` retains information from older versions. These records haven't been inserted after the upgrade, so they're not recorded in `_samplers`. This causes `_samplers.erase` to return 0 during deletion, leading to invariant failure and process crash. void QueryAnalysisCoordinator::onSamplerDelete(const MongosType& doc) { invariant(serverGlobalParams.clusterRole.has(ClusterRole::ConfigServer)); stdx::lock_guard lk(_mutex); auto erased = _samplers.erase(doc.getName()); invariant(erased); } I think we need to optimize the logic and remove `invariant(erased)`.
JIRAUSER1282335 commented on Fri, 17 Apr 2026 18:21:32 +0000: Based on the available information, this looks like another instance of the known QueryAnalysisCoordinator::onSamplerDelete bug tracked in SERVER-106903 (with SERVER-121686 already marked as a duplicate there). We don’t have a full backtrace for this ticket yet, but: The crash log shows an invariant failure with expr: "erased" in query_analysis_coordinator.cpp (line 164/189), which matches the existing issue. The failure is triggered by deleting documents from config.mongos after a 6.0 → 7.0/8.0 upgrade, which is consistent with deleting stale config.mongos entries that were never inserted into the in-memory _samplers map. The resulting behavior (config servers crashing and entering a crash loop when applying the delete) aligns with the repro and impact described in SERVER-121686 / SERVER-106903. Given the matching crash signature and nearly identical repro pattern, I’m treating this as a suspected duplicate of SERVER-106903 despite the missing full backtrace. If further diagnostics show a different root cause, we can undup and re-open the investigation.
create an cluster of version 6.0.27 with a mongos Ensure that config.mongos contains a mongos record. Upgrade the cluster to version 7.0 in the order described Connect to the cluster using mongosh, execute db.getSiblingDB("config").mongos.remove({}), and then check your config server . The config servers will likely crash and remain unrecoverable until we intervene and correct the oplog data.
MongoDB Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.