
OPERATIONAL DEFECT DATABASE
...

...
Mongos is reporting inconsistent information for sharded collections. Our Setup: A DB Cluster with three shards. Each shard has PSA Architecture. Three config servers. Mongo Versions: 4.0.X Scenario: Mongos reports that collection alerts_20191026 is not sharded. But config.chunks reports that there are three chunks for that namespace that are in different shards mongos> db.alerts_20191026.getShardDistribution() Collection hitron.alerts_20191026 is not sharded. mongos> use config switched to db config mongos> db.chunks.find({ns: "hitron.alerts_20191026"}) { "_id" : "hitron.alerts_20191026-lineId_-3074457345618258602", "lastmod" : Timestamp(2, 0), "lastmodEpoch" : ObjectId("5da687c8a664d0a846cf713f"), "ns" : "hitron.alerts_20191026", "min" : \{ "lineId" : NumberLong("-3074457345618258602") }, "max" : { "lineId" : NumberLong("3074457345618258602") }, "shard" : "rs2", "history" : [ { "validAfter" : Timestamp(1571194825, 261), "shard" : "rs2" }, { "validAfter" : Timestamp(1571194824, 926), "shard" : "rs1" } ] } { "_id" : "hitron.alerts_20191026-lineId_3074457345618258602", "lastmod" : Timestamp(3, 0), "lastmodEpoch" : ObjectId("5da687c8a664d0a846cf713f"), "ns" : "hitron.alerts_20191026", "min" : \{ "lineId" : NumberLong("3074457345618258602") }, "max" : { "lineId" : { "$maxKey" : 1 } }, "shard" : "rs3", "history" : [ { "validAfter" : Timestamp(1571194825, 616), "shard" : "rs3" }, { "validAfter" : Timestamp(1571194824, 926), "shard" : "rs1" } ] } { "_id" : "hitron.alerts_20191026-lineId_MinKey", "lastmod" : Timestamp(3, 1), "lastmodEpoch" : ObjectId("5da687c8a664d0a846cf713f"), "ns" : "hitron.alerts_20191026", "min" : { "lineId" : { "$minKey" : 1 } }, "max" : { "lineId" : NumberLong("-3074457345618258602") }, "shard" : "rs1", "history" : [ { "validAfter" : Timestamp(1571194824, 926), "shard" : "rs1" } ] } mongos> db.alerts_20191026.stats().nchunks 1 mongos> sh.status() hitron.alerts_20191026 shard key: { "lineId" : "hashed" } unique: false balancing: true chunks: rs1 1 rs2 1 rs3 1 { "lineId" : { "$minKey" : 1 } } -->> { "lineId" : NumberLong("-3074457345618258602") } on : rs1 Timestamp(3, 1) { "lineId" : NumberLong("-3074457345618258602") } -->> { "lineId" : NumberLong("3074457345618258602") } on : rs2 Timestamp(2, 0) { "lineId" : NumberLong("3074457345618258602") } -->> { "lineId" : { "$maxKey" : 1 } } on : rs3 Timestamp(3, 0)
carl.champain commented on Mon, 11 Nov 2019 15:48:41 +0000: Hi sadithela@assia-inc.com, Have you found out what the java responses are? Thanks, Carl sadithela@assia-inc.com commented on Sat, 2 Nov 2019 01:42:23 +0000: Hi Carl, I had uploaded a zip file called "alerts_20191111" to your secure uploader. It should have all the mongos, mongocfg, mongodb related logs. It should also have shardVersion output. Regarding java responses, I have to follow up with my colleagues. Sorry for late reply. Thanks, Stephen carl.champain commented on Tue, 29 Oct 2019 15:34:19 +0000: Hi again sadithela@assia-inc.com, icruz, To help us look more into how the deployment reaches this state: 1. Can you please run getShardVersion() on shards which have chunks for the alerts_20191030 collection (or any collection affected by the described behavior) and share the output? 2. In the Java code, what response(s) are returned by the shardCollection commands? 3. Can you provide the logs from the mongos, shard primary and config servers? Ideally, please provide these all for the same improperly sharded collection. Please upload your files to our secure uploader here. Only MongoDB engineers can view these files and they will expire after a period time. Thank you, Carl carl.champain commented on Mon, 28 Oct 2019 20:06:55 +0000: Hi sadithela@assia-inc.com, icruz, Very sorry for the confusion about the Java code. I re-opened the ticket for additional investigation. You mentioned that you dropped and re-sharded the collection; I want to make sure that you are aware of SERVER-17397 which provides a way to do so properly. Back to your initial issue, we are currently investigating what the cause might be and are attempting to reproduce the described behavior. We will keep you updated and will reach out if questions come up. Kind regards, Carl icruz commented on Thu, 24 Oct 2019 16:27:56 +0000: That difference is due to shardCollection command vs sh.shardCollection helper, which have a different syntax. Only differences between java code and shell are: In java we are using numInitialChunks parameter, while on shell we use the default. In java there are several servers issuing a shardCollection command (for different collections) at the same time. Not sure if this concurrent shardCollection might cause the bug. Please reopen this ticket because it is indeed a server bug, no matter what we do from a client, the DB should not end up with an inconsistent sharding configuration. carl.champain commented on Thu, 24 Oct 2019 15:09:24 +0000: Hi sadithela@assia-inc.com, In the Java code, you are using key as the hashed key: new BasicDBObject("shardCollection", collection.getFullName()) .append("key", new BasicDBObject(shardKey, "hashed")) .append("numInitialChunks", shardCount)) This is different from the shell code in which you are using lineId as the hashed key: sh.shardCollection('hitron.alerts_20191030',{lineId: 'hashed'},false,{numInitialChunks: 3}) It seems that the Java code should be: new BasicDBObject("shardCollection", collection.getFullName()) .append("lineId", new BasicDBObject(shardKey, "hashed")) .append("numInitialChunks", shardCount)) That said, the SERVER project is for bugs and feature suggestions for the MongoDB server. As this ticket does not appear to be a bug, I will now close it. If you need further assistance troubleshooting, I encourage you to ask our community by posting on the mongodb-user group or on Stack Overflow with the mongodb tag. Kind regards, Carl sadithela@assia-inc.com commented on Wed, 23 Oct 2019 18:26:57 +0000: Hi Carl, From mongo cli: sh.shardCollection('hitron.alerts_20191030',{lineId: 'hashed'},false,{numInitialChunks: 3}) From java code: Attached code snippet in file: shardcolcreation.txt carl.champain commented on Wed, 23 Oct 2019 18:15:42 +0000: Hi sadithela@assia-inc.com, Thanks for sharing the stats. I still need more details to determine what is happening. Can you please share some sample code showing how you created the collection via the Java driver and via the Shell? sadithela@assia-inc.com commented on Tue, 22 Oct 2019 22:13:28 +0000: Hi Carl, As this collection (alerts_20191026) has to be sharded for our production systems to work properly, We have dropped that collection and re-created again as a sharded collection from mongo CLI. That collection was initially created by java driver in our software. Right now, that collection's shard distribution output: Shard rs1 at rs1/hitron-db-01a:27018,hitron-db-01b:27018 data : 0B docs : 0 chunks : 1 estimated data per chunk : 0B estimated docs per chunk : 0 Shard rs2 at rs2/hitron-db-02b:27018,hitron-db-02c:27018 data : 0B docs : 0 chunks : 1 estimated data per chunk : 0B estimated docs per chunk : 0 Shard rs3 at rs3/hitron-db-03a:27018,hitron-db-03b:27018 data : 0B docs : 0 chunks : 1 estimated data per chunk : 0B estimated docs per chunk : 0 Totals data : 0B docs : 0 chunks : 3 Shard rs1 contains NaN% data, NaN% docs in cluster, avg obj size on shard : NaNGiB Shard rs2 contains NaN% data, NaN% docs in cluster, avg obj size on shard : NaNGiB Shard rs3 contains NaN% data, NaN% docs in cluster, avg obj size on shard : NaNGiB This is the right output we expect when we create a sharded collection. The collections being created from our software using java driver are still facing the same issue as mentioned in the description of this ticket. Here are results of a similar collection: (alerts_20191030) mongos> db.alerts_20191030.getShardDistribution() Collection hitron.alerts_20191030 is not sharded. mongos> db.chunks.find({ns: "hitron.alerts_20191030"}) { "_id" : "hitron.alerts_20191030-lineId_-3074457345618258602", "lastmod" : Timestamp(2, 0), "lastmodEpoch" : ObjectId("5dabcdd2a664d0a8465ae63b"), "ns" : "hitron.alerts_20191030", "min" : \{ "lineId" : NumberLong("-3074457345618258602") }, "max" : { "lineId" : NumberLong("3074457345618258602") }, "shard" : "rs2", "history" : [ { "validAfter" : Timestamp(1571540435, 16), "shard" : "rs2" }, { "validAfter" : Timestamp(1571540434, 306), "shard" : "rs1" } ] } { "_id" : "hitron.alerts_20191030-lineId_3074457345618258602", "lastmod" : Timestamp(3, 0), "lastmodEpoch" : ObjectId("5dabcdd2a664d0a8465ae63b"), "ns" : "hitron.alerts_20191030", "min" : { "lineId" : NumberLong("3074457345618258602") }, "max" : { "lineId" : { "$maxKey" : 1 } }, "shard" : "rs3", "history" : [ { "validAfter" : Timestamp(1571540435, 208), "shard" : "rs3" }, { "validAfter" : Timestamp(1571540434, 306), "shard" : "rs1" } ] } { "_id" : "hitron.alerts_20191030-lineId_MinKey", "lastmod" : Timestamp(3, 1), "lastmodEpoch" : ObjectId("5dabcdd2a664d0a8465ae63b"), "ns" : "hitron.alerts_20191030", "min" : { "lineId" : { "$minKey" : 1 } }, "max" : { "lineId" : NumberLong("-3074457345618258602") }, "shard" : "rs1", "history" : [ { "validAfter" : Timestamp(1571540434, 306), "shard" : "rs1" } ] } Also, as you have mentioned, i am attaching collstats of alerts_20191030 collection to this ticket. alerts_20191030_stats.txt Please let me know if you are in need of any other info. Thank you Useful Info: mongo-java-driver version: 2.14.3 mongo-async-driver version: 2.0.2 carl.champain commented on Mon, 21 Oct 2019 17:07:27 +0000: Hi sadithela@assia-inc.com, Thank you for the report. Can you please run db.runCommand({ collStats : "alerts_20191026" }) in mongos and share the output here? This will help me better understand what is happening. Kind regards, Carl
MongoDB Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.