...
The ChunkType::genID method uses the BSONElement::toString method, which was changed to provide a better formatting for UUID BinData. Unfortunately, the ChunkType::genID is used all around sharding-related code as a value of "_id" field in "config.chunks" collection. When the chunk minimum field has a value that is an UUID, the value of "_id" for v3.6 and v3.4 (and previous versions) differ. We've hit it when trying to move chunks manually in a cluser we recently moved from v3.4 to v3.6: 2019-03-10T16:54:39.264+0300 I COMMAND [conn469729] command admin.$cmd appName: "MongoDB Shell" command: _configsvrMoveChunk { _configsvrMoveChunk: 1, _id: "a.fs.chunks-files_id_UUID("05660000-0000-e000-96c2-dc81ca6fa911")n_0", ns: "a.fs.chunks", min: { files_id: UUID("05660000-0000-e000-96c2-dc81ca6fa911"), n: 0 }, max: { files_id: UUID("05666100-0000-e000-9252-7b82dea0b186"), n: 3 }, shard: "driveFS-2", lastmod: Timestamp(571033, 1), lastmodEpoch: ObjectId('51793868331d54dfcf8e0032'), toShard: "driveFS-17", maxChunkSizeBytes: 536870912, secondaryThrottle: {}, waitForDelete: false, writeConcern: { w: "majority", wtimeout: 15000 }, lsid: { id: UUID("605f316b-6296-4010-9f26-835b60f923ff"), uid: BinData(0, EE3A53D0CA965E6112DBEBF842D31DC81E8CE7E7548256DE28D08422B2C59D3B) }, $replData: 1, $clusterTime: { clusterTime: Timestamp(0, 0), signature: { hash: BinData(0, 0000000000000000000000000000000000000000), keyId:0 } }, $client: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "3.6.9" }, os: { type: "Windows", name: "Microsoft Windows 10", architecture: "x86_64", version: "10.0 (build 17134)" }, mongos: { host: "dorado2:27017",client: "10.254.3.70:1334", version: "3.6.10" } }, $configServerState: { opTime: { ts: Timestamp(1552225736, 38), t: 95 } }, $db: "admin" } exception: Chunk move was not successful due to E11000 duplicate key error collection: config.chunks index: ns_1_min_1 dup key: { : "a.fs.chunks", : { files_id: UUID("05660000-0000-e000-96c2-dc81ca6fa911"), n: 0 } } code:DuplicateKey numYields:0 reslen:562 locks:{ Global: { acquireCount: { r: 10, w: 6 } }, Database: { acquireCount: { r: 2, w: 6 } }, Collection: { acquireCount: { r: 2,w: 3 } }, oplog: { acquireCount: { w: 3 } } } protocol:op_msg 340766ms Of course, the "config.chunks" collection contains this: > db.chunks.find({ns:"a.fs.chunks",min: { files_id: UUID("05660000-0000-e000-96c2-dc81ca6fa911"), n: 0 } }) { "_id" : "a.fs.chunks-files_id_BinData(4, 056600000000E00096C2DC81CA6FA911)n_0", "lastmod" : Timestamp(539637, 1290), "lastmodEpoch" : ObjectId("51793868331d54dfcf8e0032"), "ns" : "a.fs.chunks", "min" : { "files_id" : UUID("05660000-0000-e000-96c2-dc81ca6fa911"), "n" : 0 }, "max" : { "files_id" : UUID("05666100-0000-e000-9252-7b82dea0b186"), "n" : 3 }, "shard" : "driveFS-2" } Since I do not know what other operations are using the "_id" field, I cannot estimate the true potential of this problem, but cursory inspection of the codebase shows there are at least some places where the update is performed without checking the number of matched/modified documents, so it may be the case that the metadata (I mean the chunk structure) could be lost/damaged silently.
amardeepsg@gmail.com commented on Mon, 15 Apr 2019 15:07:34 +0000: Any change of this getting closed on 3.6.7? Also, what are the alternate ways for us to ensure shard data balancing, if empty chunks continue to exist and we cant merge them. onyxmaster commented on Sat, 16 Mar 2019 07:33:16 +0000: Kaloian, thank you! I'm feeling relieved knowing that there is no data loss involved. Thank you for investigating this, looking forward to the fix. I hope this gets backported to 3.6 too, since we're not yet ready to move to 4.x (at least until this summer). kaloian.manassiev commented on Fri, 15 Mar 2019 20:21:47 +0000: Hi onyxmaster, Thank you very much for uploading your config server's metadata! Thanks to it I was able to reproduce the problem locally and now we have everything we need in order to continue investigating this issue, so feel free to remove the upload. For now I don't have anything to update, but we will post the results of the investigation here. Thanks again for your help! Best regards, -Kal. PS: I also edited the ticket title to indicate that there is no actual data loss onyxmaster commented on Mon, 11 Mar 2019 13:13:12 +0000: There is nothing specifically sensitive in this data, so I uploaded it at -https://drive-public-eu.s3.amazonaws.com/mongodb/server-40061-config.zip- (10MB). Still, I would like to not to keep it up longer than needed, so please tell me when you no longer need it to be online. kaloian.manassiev commented on Mon, 11 Mar 2019 12:56:48 +0000: Thank you very much for the prompt response. From the message, this looks like a duplicate key that was generated in the ns, min index, not the _id. Would it be possible to attach a dump of your config database so I can recreate the cluster (without data) and try the move command that you are issuing? If attaching it to this ticket poses any concern on your side, we can provide a secure upload portal to which only MongoDB employees have access. onyxmaster commented on Mon, 11 Mar 2019 12:51:41 +0000: I'd also like to note that this problem reproduces in 100% of the cases for our cluster that was upgraded from v3.4 – all chunks that have UUID field in "min" key fail at migration commit every time. I believe splits are failing too, because I don't see any splits in logs after upgrade (although it might be I'm looking in the wrong place since IIRC v3.6 moved a lot of chunk management onto shards themselves). onyxmaster commented on Mon, 11 Mar 2019 12:48:26 +0000: As for my data loss concerns, the Balancer::_splitOrMarkJumbo will probably always silently fail marking chunks as jumbo (no data loss, yet suboptimal) and ShardingCatalogManager::commitChunkSplit will probably never be able to split chunk (not sure how the failure will be handled since it's an upsert). onyxmaster commented on Mon, 11 Mar 2019 12:42:55 +0000: No, I'm not using the internal command, I'm issuing the sh.moveChunk command from shell (or the "moveChunk" command from our C# application). Consider the following log from one of the shard primaries: ... 2019-03-10T16:54:38.231+0300 I SHARDING [conn788179] moveChunk data transfer progress: { waited: true, active: true, sessionId: "driveFS-2_driveFS-17_5c8515ca7bc16fcff631675b", ns: "a.fs.chunks", from: "driveFS-2/d1.s2.fs.drive.bru:27022,d2.s2.fs.drive.bru:27022,d3.s2.fs.drive.bru:27022", min: { files_id: UUID("05660000-0000-e000-96c2-dc81ca6fa911"), n: 0 }, max: { files_id: UUID("05666100-0000-e000-9252-7b82dea0b186"), n: 3 }, shardKeyPattern: { files_id: 1.0, n: 1.0 }, state: "clone", counts: { cloned: 2576, clonedBytes: 346861214, catchup: 0, steady: 0 }, ok: 1.0, $gleStats: { lastOpTime: Timestamp(0, 0), electionId: ObjectId('7fffffff000000000000000f') }, $configServerState: { opTime: { ts: Timestamp(1552226074, 5), t: 95 } } } mem used: 0 documents remaining to clone: 0 2019-03-10T16:54:38.598+0300 I SHARDING [conn788179] moveChunk data transfer progress: { waited: true, active: true, sessionId: "driveFS-2_driveFS-17_5c8515ca7bc16fcff631675b", ns: "a.fs.chunks", from: "driveFS-2/d1.s2.fs.drive.bru:27022,d2.s2.fs.drive.bru:27022,d3.s2.fs.drive.bru:27022", min: { files_id: UUID("05660000-0000-e000-96c2-dc81ca6fa911"), n: 0 }, max: { files_id: UUID("05666100-0000-e000-9252-7b82dea0b186"), n: 3 }, shardKeyPattern: { files_id: 1.0, n: 1.0 }, state: "steady", counts: { cloned: 2576, clonedBytes: 346861214, catchup: 0, steady: 0 }, ok: 1.0, $gleStats: { lastOpTime: Timestamp(0, 0), electionId: ObjectId('7fffffff000000000000000f') }, $configServerState: { opTime: { ts: Timestamp(1552226074, 5), t: 95 } } } mem used: 0 documents remaining to clone: 0 2019-03-10T16:54:38.635+0300 I SHARDING [conn788179] Migration successfully entered critical section 2019-03-10T16:54:39.011+0300 I SHARDING [conn788179] Error occurred while committing the migration. Performing a majority write against the config server to obtain its latest optime :: caused by :: DuplicateKey: E11000 duplicate key error collection: config.chunks index: ns_1_min_1 dup key: { : "a.fs.chunks", : { files_id: UUID("05660000-0000-e000-96c2-dc81ca6fa911"), n: 0 } } 2019-03-10T16:54:39.011+0300 I SHARDING [conn788179] about to log metadata event into changelog: { _id: "dorado3-2019-03-10T16:54:39.011+0300-5c85171f7bc16fcff6319a41", server: "dorado3", clientAddr: "10.3.1.11:51332", time: new Date(1552226079011), what: "moveChunk.validating", ns: "a.fs.chunks", details: { min: { files_id: UUID("05660000-0000-e000-96c2-dc81ca6fa911"), n: 0 }, max: { files_id: UUID("05666100-0000-e000-9252-7b82dea0b186"), n: 3 }, from: "driveFS-2", to: "driveFS-17" } } 2019-03-10T16:54:39.187+0300 I SHARDING [conn788179] about to log metadata event into changelog: { _id: "dorado3-2019-03-10T16:54:39.187+0300-5c85171f7bc16fcff6319a4e", server: "dorado3", clientAddr: "10.3.1.11:51332", time: new Date(1552226079187), what: "moveChunk.error", ns: "a.fs.chunks", details: { min: { files_id: UUID("05660000-0000-e000-96c2-dc81ca6fa911"), n: 0 }, max: { files_id: UUID("05666100-0000-e000-9252-7b82dea0b186"), n: 3 }, from: "driveFS-2", to: "driveFS-17" } } 2019-03-10T16:54:39.190+0300 W SHARDING [conn788179] Chunk move failed :: caused by :: DuplicateKey: Chunk move was not successful due to E11000 duplicate key error collection: config.chunks index: ns_1_min_1 dup key: { : "a.fs.chunks", : { files_id: UUID("05660000-0000-e000-96c2-dc81ca6fa911"), n: 0 } } 2019-03-10T16:54:39.190+0300 I SHARDING [conn788179] about to log metadata event into changelog: { _id: "dorado3-2019-03-10T16:54:39.190+0300-5c85171f7bc16fcff6319a50", server: "dorado3", clientAddr: "10.3.1.11:51332", time: new Date(1552226079190), what: "moveChunk.from", ns: "a.fs.chunks", details: { min: { files_id: UUID("05660000-0000-e000-96c2-dc81ca6fa911"), n: 0 }, max: { files_id: UUID("05666100-0000-e000-9252-7b82dea0b186"), n: 3 }, step 1 of 6: 0, step 2 of 6: 88, step 3 of 6: 90, step 4 of 6: 339694, step 5 of 6: 386, to: "driveFS-17", from: "driveFS-2", note: "aborted" } } kaloian.manassiev commented on Mon, 11 Mar 2019 12:36:20 +0000: Hi onyxmaster, Thank you very much for your report and concern. First I would like to alleviate your concerns about data loss: The _id field in the config.chunks collection is not used for routing of queries and/or updates, but only for generating chunk modifications internally (SERVER-27735). So, even if there is change in how it is generated will not cause any operations to be mis-routed or for the balancer to move chunks to wrong shards. So rest assured that there is no data loss. For the DuplicateKey error - I notice that the application name for the _configsvrMoveChunk command which encountered it is "MongoDB Shell". Are you issuing this command directly against the config server? Because this is an internal command which should not be used directly, and instead moveChunk against MongoS should be used. Best regards, -Kal. onyxmaster commented on Mon, 11 Mar 2019 07:28:34 +0000: The "changed" link in the description is wrong, it should be this one.