BugZero | MongoDB BugID 415121 - mongos does not detect stale config when clients u...

MongoDB - Defect ID: 415121

mongos does not detect stale config when clients use non-primary read preferences

MongoDB - Defect ID: 415121

mongos does not detect stale config when clients use non-primary read preferences

Last updated on 9/7/2017

Overall: 6.16.1

Severity: 6.46.4

Community: 7.47.4

Lifecycle: 9.19.1

What is the BugZero Risk Score?

Vendor details

Priority: Major - P3
Status: Closed

Overall: 6.16.1

Severity: 6.46.4

Community: 7.47.4

Lifecycle: 9.19.1

What is the BugZero Risk Score?

Vendor details

Priority: Major - P3
Status: Closed

Info

Mongos instances which do not receive any requests with the primary read preference do not get their chunk location configuration updated after a chunk migration. This results in missing data in query results in cases where the query includes the shard key and the mongos routes the query to the wrong shard. The only workarounds I have come up with so far is to hit every mongos instance with a dummy primary read pref query for each sharded collection (or maybe call the refresh command against the mongos) at some regular interval. Background info: I run a single 5-node replica which spans 3 data centers. 3 nodes in the central "primary" DC, 1 node in each of our regional "secondary" DCs. My application is read-only, runs in all 3 DCs, has high read performance requirements, and high tolerance for eventual consistency. As a result, I run with the "nearest" read preference so that my app running in a regional DC will prefer to read from the mongodb secondary replica running in the same DC, rather than going all the way back to the primary mongodb in the central DC. We've hit VM RAM capacity issues, and are now attempting to shard in-place into 3 shards, with a mongos instance co-located with each app instance. Everything went smoothly at first, I allowed the balancer to migrate some chunks to the new shards. After a few chunks I disabled the balancer to verify no production errors, and found that objects which had moved are no longer coming back in queries by shard key. If I make an identical query agains the mongos from the shell (which defaults to primary read preference) I see the following in the logs and get correct results: 2017-08-10T17:30:45.750+0000 D QUERY [conn87] Received error status for query query: { guid: "some_guid" } sort: {} projection: {} on attempt 1 of 10: SendStaleConfig: [MyDb.myCollection] shard version not ok: version mismatch detected for MyDb.myCollection ( ns : MyDb.myCollection, received : 118|0||598b5cf1b6ff8d56d195d96f, wanted : 121|1||598b5cf1b6ff8d56d195d96f, send ) Afterwards, my app's queries (using readPref=nearest) correctly return the same results.

Top User Comments

mark.agarunov commented on Thu, 17 Aug 2017 19:15:21 +0000: Hello skelly, As this behavior seems to be due to the same underlying issue as SERVER-28948, I've closed this ticket as a duplicate. Please follow SERVER-28948 for updates on this issue. Thanks, Mark schwerin commented on Sat, 12 Aug 2017 14:51:51 +0000: I think that's your best choice today. Disable the balancer if your data naturally has an even distribution, maybe. skelly commented on Fri, 11 Aug 2017 18:53:36 +0000: Do any of you have any recommendation for a workaround in the meantime? I was thinking I could run a separate thread that makes a simple query against each mongos using the primary read preference at some reasonable interval. Thoughts on that approach? skelly commented on Fri, 11 Aug 2017 18:49:10 +0000: Great thanks a lot guys for the quick response. I'm hoping I can develop a reasonable workaround until the feature comes along. I expect chunk migrations will be rare in my deployment anyway, at least after the initial balancing of the data. schwerin commented on Fri, 11 Aug 2017 16:56:16 +0000: OK. skelly, this is a duplicate of a feature request that we've been developing for the 3.6 release. It's sufficiently complicated that it cannot be back ported, I'm afraid, but should be available later this year. dianna.hohensee, can you help mark.agarunov out by selecting an appropriate ticket that is duplicated by this one, so Seth can track this work if he wants? dianna.hohensee commented on Fri, 11 Aug 2017 13:35:28 +0000: Yes, it will be resolved by our safe secondary reads project in v3.6. Secondaries do not currently (v3.4 or earlier) use routing information to filter results, and in v3.6 they will. To fully resolve his problem, he will likely need to use after cluster time reads (also a v3.6 feature) in order to ensure secondaries are not lagging behind their primaries, in case a mongos that is used to do the secondary reads has a stale shardVersion. schwerin commented on Fri, 11 Aug 2017 12:21:10 +0000: I believe the safe secondary reads project will resolve this. dianna.hohensee?

Steps to Reproduce

Stand up and configure the following MongoDB configuration: Two 3-node replica sets Two mongos instances One config server replica set Create an unsharded database and populate a collection with enough test data that it would be split into multiple chunks upon sharding. from the first mongos, shard the collection and wait for a chunk to auto-balance over to the second shard. using the second mongos, query for data that is in the migrated chunk by shard key using readPreference=secondary. 0 results will be returned. using the second mongos, query for data that is in the migrated chunk by shard key using readPreference=primary. The correct results will be returned. using the second mongos, again query for data that is in the migrated by shard key chunk using readPreference=secondary. Correct results will now be returned

5.9Defect ID: 2956672
Some time-series tests implicitly rely on measurement insertion order for unordered inserts when checking bucket catalog stats
6.14Defect ID: 2965528
Remove push, publish_packages, and crypt_push tasks from Graviton 4 variants in v7.0 and v8.0
6.14Defect ID: 2947969
[SBE] Release storage engine resources when saveState() or restoreState() throws
5.68Defect ID: 2919474
StackLocator broken by v5 toolchain ASAN
5.88Defect ID: 2968769
Make new write path helper functions use acquireAndValidateBucketsCollection instead of acquireCollection

Ready to prevent the next vendor outage?

Get a demo

OPERATIONAL DEFECT DATABASE

MongoDB - Defect ID: 415121

mongos does not detect stale config when clients use non-primary read preferences

MongoDB - Defect ID: 415121

mongos does not detect stale config when clients use non-primary read preferences

Last updated on 9/7/2017

Vendor details

Vendor details

Description

Info

Top User Comments

Steps to Reproduce

Links

Top MongoDB defects by risk score

Ready to prevent the next vendor outage?