...
If a test shuts down a node without committing or aborting a prepared transaction, collection validation will run into a prepare conflict. Previously, this would result in a hang. After SERVER-40936, this results in an invariant failure. While this is easy to avoid, it contradicts our general philosophy that we should not be able to hit invariants in tests. It might also be worth investigating if there are any other dangers that might come of this.
xgen-internal-githook commented on Mon, 8 Jul 2019 21:30:45 +0000: Author: {'name': 'Lingzhi Deng', 'username': 'ldennis', 'email': 'lingzhi.deng@mongodb.com'} Message: SERVER-41888: Disallow validate to accept a read concern or afterClusterTime (cherry picked from commit 5915e1114d1e45e79642767c8184d514ac957245) Branch: v4.2 https://github.com/mongodb/mongo/commit/3ed19ce7c51a84999fa85ba5b644f4972e7037cb xgen-internal-githook commented on Mon, 8 Jul 2019 18:34:42 +0000: Author: {'name': 'Lingzhi Deng', 'username': 'ldennis', 'email': 'lingzhi.deng@mongodb.com'} Message: SERVER-41888: Disallow validate to accept a read concern or afterClusterTime Branch: master https://github.com/mongodb/mongo/commit/5915e1114d1e45e79642767c8184d514ac957245 vesselina.ratcheva commented on Thu, 27 Jun 2019 21:07:00 +0000: I left some relevant TODOs in a test I wrote, in this section. judah.schvimer commented on Wed, 26 Jun 2019 15:08:53 +0000: We do not document validate taking a read concern, so this isn't an API change. alyson.cabral, do you see any problems with not allowing validate to accept a read concern or afterClusterTime? geert.bosch commented on Wed, 26 Jun 2019 03:07:49 +0000: I think it is OK for validate to ignore prepare conflicts: we'd ignore the same conflict on both the collection and all indexes, so everything should still be consistent. It is expected that the result of validate does not include uncommitted transactions. geert.bosch commented on Wed, 26 Jun 2019 03:02:01 +0000: One of the reasons validate acquires an X lock is that the storage engine (at least in the case of WiredTiger) may not be able to actually do a full validation if there still are open cursors on the collection. Another reason is that validate may actually write to update some stats, such as document count and storage size, that may not be tracked perfectly in presence of failures and rollbacks, but are recomputed anyway during validation. Rather than trying to improve this, I think it is better to focus on background validation and eventually making that the default and only way to validate collections. judah.schvimer commented on Mon, 24 Jun 2019 18:22:03 +0000: The investigation in SERVER-40723 didn't cover X locks. I don't really remember why not, but I think it was because I assumed we couldn't take them on secondaries. Doing something similar to SERVER-40938 feels appropriate for this ticket. geert.bosch, why does validate acquire an X lock instead of an S lock? Is it ok for it to ignore prepare conflicts? Are there any other X locks we should be worried about? Is it a problem for validate to not accept afterClusterTime? bruce.lucas@10gen.com commented on Mon, 24 Jun 2019 16:58:13 +0000: Sounds like this is something a customer could encounter as well, and hitting an invariant during shutdown would not be an optimal behavior from a customer perspective.
--- a/jstests/replsets/shutdown_with_prepared_transaction.js +++ b/jstests/replsets/shutdown_with_prepared_transaction.js @@ -34,5 +34,5 @@ jsTestLog("Shutting down the set with the transaction still in prepare state"); // Skip validation during ReplSetTest cleanup since validate() will block behind the prepared // transaction's locks when trying to take a collection X lock. - replTest.stopSet(null /*signal*/, false /*forRestart*/, {skipValidation: true}); + replTest.stopSet(null /*signal*/, false /*forRestart*/); }());