
OPERATIONAL DEFECT DATABASE
...

...
There's a bug where the shard role API resource stashing mechanism fails to abandon snapshots when operating under recursive GlobalLock acquisition. Not abandoning the snapshot on yield/stash can cause the node to crash on restore if the node steps down. The API will try to change the read source which is illegal if a snapshot is still active (i.e invariant) Problem While stashing normally abandons snapshots by destroying the GlobalLock object (which does it on destruction) it only works for non-recursive locks. When running aggregations via dbDirectClient (which for example happens for $collstats when the balancer registry isn't initialized), we get a recursive lock acquisition that will cause the snapshot to stay active on stashing Note that this happens because the dbDirectClient operations just detach the resources instead of stashing them, leaving the GlobalLock acquired by the parent operation still in scope Possible Solutions: This is a potential bug that can infect more points. We should probably ensure to abandon the snapshot at stashing, or avoid stashing when using dbDirectClient. The fix depends on which assumption the code should be based on: Every stash should never happen in recursion (issue is in dbDirectClient) Every stash should handle recursion (issue is in stashing logic) .
MongoDB Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.