...
A (albeit unlikely) crash could occur in the following scenario A resharding recipient is in kCreatingCollection, receives an abort after creating the collection The recipient cleans up the temporary collection (which means the collection gets renamed to a drop-pending collection, keeping the same UUID) but steps down before it can persist its transition to kDone The recipient steps back up, still in kCreatingCollection Before the coordinator re-sends _shardsvrAbortReshardCollection to the new primary, the recipient state machine begins to run again ensureTempReshardingCollectionExistsWithIndexes will then cause an invariant to be hit since the drop-pending renamed temporary collection with the ReshardingUUID is still in the catalog
JIRAUSER1259052 commented on Wed, 6 Oct 2021 18:28:54 +0000: Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you! xgen-internal-githook commented on Fri, 20 Aug 2021 16:06:59 +0000: Author: {'name': 'Randolph Tan', 'email': 'randolph@10gen.com', 'username': 'renctan'} Message: SERVER-58603 ensureTempReshardingCollectionExistsWithIndexes may hit an invariant if collection was previously dropped (cherry picked from commit 6bf3ad77bb5bcc9c07ef38110e687d3a55ef40f7) Branch: v5.0 https://github.com/mongodb/mongo/commit/f2233a69550028c0a3dc4b15e040026cdba8bf1e xgen-internal-githook commented on Thu, 29 Jul 2021 15:47:45 +0000: Author: {'name': 'Randolph Tan', 'email': 'randolph@10gen.com', 'username': 'renctan'} Message: SERVER-58603 ensureTempReshardingCollectionExistsWithIndexes may hit an invariant if collection was previously dropped Branch: master https://github.com/mongodb/mongo/commit/6bf3ad77bb5bcc9c07ef38110e687d3a55ef40f7 max.hirschhorn@10gen.com commented on Wed, 21 Jul 2021 14:43:00 +0000: I think a solution here could be for the RecipientStateMachine to throw an exception in the situation where a namespace with the reshardingUUID already exists and has a namespace string other than the temporary resharding collection. It is fine to drop and reacquire locks around this check because only the thread calling RecipientStateMachine::run() would ever create or drop the temporary resharding collection so there's no real concurrency for these operations.