...
BugZero found this defect 2592 days ago.
The node performing the initial sync appears to be able to retain the documents that were inserted prior to the collection being dropped and re-created after changing the featureCompatibilityVersion to 3.6. This issue is related to UUIDs and their impact on oplog application, and therefore doesn't affect the 3.2 or 3.4 branches. 2017-09-10T19:21:06.495-0400 The following documents are missing on the primary: 2017-09-10T19:21:06.495-0400 { "_id" : "while in fCV=3.4" } ... 2017-09-10T19:21:06.498-0400 checkReplicatedDataHashes, the primary and secondary have a different hash for the test database: { 2017-09-10T19:21:06.498-0400 "master" : { 2017-09-10T19:21:06.499-0400 "host" : "hanamizu:20010", 2017-09-10T19:21:06.499-0400 "collections" : { 2017-09-10T19:21:06.499-0400 "mycoll" : "09aabf5621c57d91db16b98b365d8e65" 2017-09-10T19:21:06.499-0400 }, 2017-09-10T19:21:06.499-0400 "md5" : "2105eeb0b1ec2ade59f08fa1f3f40ba9", 2017-09-10T19:21:06.499-0400 "timeMillis" : 0, 2017-09-10T19:21:06.499-0400 "ok" : 1, 2017-09-10T19:21:06.499-0400 "operationTime" : Timestamp(1505085665, 18) 2017-09-10T19:21:06.499-0400 }, 2017-09-10T19:21:06.499-0400 "slaves" : [ 2017-09-10T19:21:06.499-0400 { 2017-09-10T19:21:06.500-0400 "host" : "hanamizu:20011", 2017-09-10T19:21:06.500-0400 "collections" : { 2017-09-10T19:21:06.500-0400 "mycoll" : "b8b6211fb0b559d95ae6df5cc4071420" 2017-09-10T19:21:06.500-0400 }, 2017-09-10T19:21:06.500-0400 "md5" : "072bbaef3649d98b3270e6a2a6eac21f", 2017-09-10T19:21:06.500-0400 "timeMillis" : 0, 2017-09-10T19:21:06.500-0400 "ok" : 1, 2017-09-10T19:21:06.500-0400 "operationTime" : Timestamp(1505085665, 18) 2017-09-10T19:21:06.500-0400 } 2017-09-10T19:21:06.500-0400 ] 2017-09-10T19:21:06.500-0400 }
xgen-internal-githook commented on Mon, 9 Oct 2017 15:20:08 +0000: Author: {'email': 'judah@mongodb.com', 'name': 'Judah Schvimer', 'username': 'judahschvimer'} Message: SERVER-31019 fail initial sync if fCV changes during oplog application Branch: master https://github.com/mongodb/mongo/commit/d7a30a716243db13644a16618a939df6bc1344fc spencer commented on Fri, 15 Sep 2017 16:22:06 +0000: To fix this we should just fail initial sync if the featureCompatibilityVersion changes in the middle of it. To do this, we should make sure that the very first collection we clone is admin.system.version, so that we know the FCV of the sync source at the beginning of initial sync. Then during initial sync oplog application, we should fail and restart initial sync if we replicate a change to the FCV.
I've only had success in reproducing this issue with the MMAPv1 storage engine, and not with the WiredTiger or EphemeralForTest storage engines; however, it isn't clear to me why this issue would be storage engine-specific though. python buildscripts/resmoke.py --suites=no_server repro_server31019.js --storageEngine=mmapv1 --repeat=5 repro_server31019.js (function() { "use strict"; const verbositySettings = tojson({ verbosity: 1, replication: 2, storage: 2, }); const rst = new ReplSetTest({ nodes: 1, nodeOptions: { setParameter: {logComponentVerbosity: verbositySettings}, } }); rst.startSet(); rst.initiate(); const primaryDB = rst.getPrimary().getDB("test"); rst.add({ setParameter: { "failpoint.initialSyncHangBeforeCopyingDatabases": tojson({mode: "alwaysOn"}), logComponentVerbosity: verbositySettings } }); // We disallow the secondary node from voting so that the primary's featureCompatibilityVersion // can be modified while the secondary node is still waiting to complete its initial sync. { const replSetConfig = rst.getReplSetConfigFromNode(0); replSetConfig.members = rst.getReplSetConfig().members; replSetConfig.members[1].priority = 0; replSetConfig.members[1].votes = 0; ++replSetConfig.version; assert.commandWorked(primaryDB.adminCommand({replSetReconfig: replSetConfig})); } // We set the primary's featureCompatibilityVersion to "3.4" and implicitly create a collection // without a UUID via an insert operation. { assert.commandWorked(primaryDB.adminCommand({setFeatureCompatibilityVersion: "3.4"})); primaryDB.mycoll.drop(); assert.writeOK(primaryDB.mycoll.insert({_id: "while in fCV=3.4"})); } // Next, we set the primary's featureCompatibilityVersion to "3.6" and drop the collection that // was previously created. We then implicitly create another collection of the same name (but // with a UUID this time) via an insert operation. { assert.commandWorked(primaryDB.adminCommand({setFeatureCompatibilityVersion: "3.6"})); primaryDB.mycoll.drop(); assert.writeOK(primaryDB.mycoll.insert({_id: "while in fCV=3.6"})); } // Finally, we allow the secondary node to proceed with its initial sync. It should end up with // only the document that was inserted into the collection when the primary's // featureCompatibilityVersion was "3.6". const secondaryDB = rst.getSecondary().getDB("test"); assert.commandWorked(secondaryDB.adminCommand({ configureFailPoint: "initialSyncHangBeforeCopyingDatabases", mode: "off", })); rst.checkReplicatedDataHashes(); rst.stopSet(); })();