
OPERATIONAL DEFECT DATABASE
...


...

The explain command for write operations (e.g. insert/delete/update) on legacy tracked timeseries executed from a router with a stale cache could fail with StaleConfig exception after extinguish the 10 retries. At the high level this is what happens: The stale router things the collection is a tracked legacy timeseries It forward the command to the shard(s) using the timeseries view namespace but attaching the shard version from the associated timeseries buckets collection. The shard receive the command and check the shard version against the received namespace (timeseries view). Since the collection have been dropped and recreated as a normal collection it sends a StaleVersion to the router informing them that the view namespace is now unsharded. The router receives the StaleVersion error, perform a refresh of the view timeseries namespace but not of the buckets namespace and retry the operation starting from the beginning. The solution to this is to make the shard always send the correct namespace and shard version along with the command. In this case since the router thinks the collection is a tracked legacy timeseries it should convert the command to target the system.buckets namespace and forward the translated commands to the shard(s) along with the shard version of the system.buckets namespace. If we do this the shard will correctly send a StaleVersion error for the system.buckets namespace, the router will refresh the cache associated to the system.buckets namespace and they will eventually agree on the shard version of the collection.
Execute the attached repro on the no_passthrough suite in commit r8.3.0-alpha0-3261-gb815656be09
MongoDB Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.