Loading...
Loading...
ShardServerCatalogCacheLoader::getChunkSince can throw StaleConfig under some interleavings between reading the cache and the background thread that persists the materialized cache. In practice, the CatalogCache handles this by retrying, so it doesn't cause harm. However, this race can cause failures on the shard_server_catalog_cache_loader_test unit test (e.g here). We can address this by making the test expect and retry this failure. Alternatively, we could make ShardServerCatalogCacheLoader retry itself. The interleaving that can cause this is: 1. SSCCL discovers the new epoch. 2. Next, it schedules an asynchronous task to update the persisted metadata. 3. Next, it calls `_getLoaderMetadata`, which calls `getIncompletePersistedMetadataSinceVersion`, which calls `getPersistedMetadataSinceVersion`, which finally calls `readShardChunks`. readShardChunks reads from the config.cache.xxxx collection. 4. Concurrently with the read (3), the task scheduled at (2) proceeds to drop the config.cache.xxxx collection (because the epoch has changed). 5. The read started at (3) yields and on restore it discovers that the collection no longer exists, therefore it fails with QueryPlanKilled.
xgen-internal-githook commented on Thu, 21 Dec 2023 16:32:34 +0000: Author: {'name': 'david-dominguez-sal', 'email': '97509688+david-dominguez-sal@users.noreply.github.com', 'username': 'david-dominguez-sal'} Message: SERVER-83530: Fix shard_server_catalog_cache_loader_test. Retry on recoverable errors the usages of getChunksSince. (#17657) GitOrigin-RevId: 6a84385455be57880336ab8c2329825b52c24a72 Branch: master https://github.com/mongodb/mongo/commit/d1f998f81838f79006f5b8c59e3ba5ac5e6096d2
Click on a version to see all relevant bugs
MongoDB Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.