Loading...
Loading...
It could happen that write operations that use two-phase protocol fails with NamespaceNotSharded error when executed against an unsharded collections through a stale router. The two-phase protocol is used for write operations (updates and deletes) that cannot be directly targeted to a single shard. The problem happens when the router that serve the write is stale, thinks the collection is sharded, decide to use the two phase write protocol. However, after receiving a StaleInfo error from the shard, it will refresh its cache, retry the two-phase protocol, and finally fail with "NamespaceNotShard".". The problem is that the ClusterQueryWithoutShardKey command implement a router loop that swallow (intercepts and retry) the StaleInfo error. Instead, the error should be bubble up to the write executor so that after refreshing the cache and restarting the operation, it will decide to use the correct write protocol (single-phase vs two-phase) according to the refreshed metadata info. After the first failure, if the write operation is executed again it will succeed because the cache have been already updated.
Run the attached reproducible in the no_passthrough suite at commit r8.3.0-alpha0-3321-g01c498aae4a
MongoDB Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.