...
When a collection is dropped with w:majority, the server is supposed to wait internally for that drop and all earlier pending collection drops to physically complete i.e. to completely drop the collection from the catalog. This guarantees that no strong database locks will be taken as a result of asynchronous collection drops after a w:majority drop is acknowledged. In some cases, though, it is possible that a MODE_X database lock may be taken as a result of an asynchronous collection drop after a w:majority collection drop was already acknowledged to a client. Consider the case of two collections in the same database, test.coll1 and test.coll2. Let's say a client drops test.coll1 at w:1 and then drops test.coll2 at w:majority. By the time the drop of coll2 completes, it should be guaranteed that both test.coll1 and test.coll2 are physically dropped. It may still be possible for us to acquire a database lock after the coll2 drop completes, though. Whenever our commit point advances, we notify oplog waiters, which in turn causes us to schedule a call to DropPendingCollectionReaper::dropCollectionsOlderThan, if there exist drop pending collections and our new commit point is greater than or equal to the earliest drop pending optime. So, in the case of our two collections, we may advance our commit point to the optime of the test.coll1 drop, which would schedule such a call. We may then subsequently advance our commit point to the optime of the test.coll2 drop. We now have two outstanding calls scheduled to dropCollectionsOlderThan. It is possible that the first scheduled call started to run and looked at the list of drop pending namespaces, thinking that it had to drop 1 collection i.e. test.coll1. The second scheduled call, though, may end up running first, and drop both drop pending collections i.e. test.coll1 and test.coll2. Because there are now no drop pending collections at this point, a majority write should be allowed to complete and return to the client. The originally scheduled call to dropCollectionsOlderThan, though, may then complete, and go through its list of collections (it only has one) and try to drop them. Since coll1 was already physically dropped, this will have no effect, but it will still try to acquire a MODE_X lock on the database, even after the w:majority collection drop completed and returned to the client.
xiangyu.yao commented on Mon, 28 Jan 2019 22:33:17 +0000: Closing this ticket. We might have future QW tickets that could potentially fix the EMRC=false case. xiangyu.yao commented on Mon, 28 Jan 2019 20:18:48 +0000: In 4.2, we will use the new two-phase drop so this issue can never happen. In 4.0 and 4.2 with EMRC=false, this can still happen. But there is no easy way to fix it because of the intrinsic flaws in the old two-phase drop algorithm. geert.bosch commented on Thu, 17 Jan 2019 23:45:38 +0000: judah.schvimer, while this is relevant on 4.0 as well, I do not think we should backport this change. As you indicate, the implementation on 4.0 would have to be significantly different, and therefore would require extra 4.0-only work. Significant changes to locking semantics in stable releases are risky. Any fix would affect all uses of MongoDB 4.0, while the locking behavior is really only an issue with applications that use interactive transactions, where the extra locking would degrade performance in the database where a collection is dropped.. milkie commented on Tue, 15 Jan 2019 14:45:46 +0000: In the description of this ticket, it doesn't explain why the acquisition of the MODE_X lock is undesirable. Does the undesirability extend to 4.0 and 3.6? judah.schvimer commented on Tue, 15 Jan 2019 14:39:00 +0000: I think that this can happen on 4.0 as well (I don't think 3.6, but maybe). Do we want to backport it? I expect a backport might look significantly different from the master version. geert.bosch commented on Mon, 14 Jan 2019 18:44:25 +0000: Indeed. It would make sense to have this be part of that project. tess.avitabile commented on Mon, 14 Jan 2019 18:42:35 +0000: milkie, geert.bosch, will this issue go away once we do two-phase drops in WT? My understanding is that after we have two-phase drops in WT, we will never take locks for a drop after the drop command returns. william.schultz commented on Fri, 21 Dec 2018 19:15:20 +0000: Note that this appeared as an issue for transactions related tests, since many of them rely on the fact that a w:majority drop guarantees no more strong locks will be taken in the middle of the test, thus interfering and potentially deadlocking with locks taken by transactions. william.schultz commented on Fri, 21 Dec 2018 19:14:01 +0000: This is somewhat difficult to reproduce but it is possible to do by adding sleeps in the server. By adding a sleep of around 100 milliseconds at this line and running the attached script, two_phase_drop_locks.js , it is possible to see this issue appear. If there are two scheduled calls to dropCollectionsOlderThan, where the first one tries to drop 1 collection and the second tries to drop 2 collections, the issue can be exposed by sleeping a while during the first call, but not the second. One way to trigger this is by sleeping randomly 0 or 100 milliseconds at the mentioned point in the code. After several runs, it becomes fairly likely to see a case where the first scheduled thread sleeps 100 milliseconds and the second scheduled thread sleeps 0 milliseconds, causing an X lock acquisition well after the w:majority drop already succeeded.