...
It seems like attempting to run ReplicationCoordinator::stepDown() is unnecessary when the replica set configuration is known to only contain one node electable as primary. The extra time it takes to shut down the replica set is mildly annoying for certain aspects of my local development workflow. 2019-03-21T03:09:22.009-0400 I CONTROL [signalProcessingThread] got signal 15 (Terminated), will terminate after current cmd ends 2019-03-21T03:09:22.009-0400 I REPL [RstlKillOpthread] Starting to kill user operations 2019-03-21T03:09:22.009-0400 I REPL [RstlKillOpthread] Stopped killing user operations 2019-03-21T03:09:32.020-0400 I REPL [RstlKillOpthread] Starting to kill user operations 2019-03-21T03:09:32.020-0400 I REPL [RstlKillOpthread] Stopped killing user operations 2019-03-21T03:09:32.020-0400 I STORAGE [signalProcessingThread] Failed to stepDown in non-command initiated shutdown path ExceededTimeLimit: No electable secondaries caught up as of 2019-03-21T03:09:32.020-0400. Please use the replSetStepDown command with the argument {force: true} to force node to step down. 2019-03-21T03:09:32.020-0400 I NETWORK [signalProcessingThread] shutdown: going to close listening sockets...
jason.carey commented on Mon, 15 Apr 2019 14:33:46 +0000: Closing this out after the change made in SERVER-40335. I think that satisfies the intent of this ticket vesselina.ratcheva commented on Mon, 25 Mar 2019 23:00:15 +0000: I think the fix Jason pointed out in the topology coordinator is the way to go implementation-wise (it can also be made in isSafeToStepDown), provided we come to a consensus about user-facing behavior. In the same spirit as the proposition SERVER-40335, I would also propose making a parameter to gate that new behavior directly in topo instead. jason.carey commented on Mon, 25 Mar 2019 21:14:31 +0000: After some reflection (and conversation with max.hirschhorn), I'm going to features we're not sure of this, for now. If we don't want to tackle allowing shutdown in more configurations, we should probably just make the timeout configurable (and make it 0 for most tests). I've opened SERVER-40335 to explore that avenue. schwerin commented on Fri, 22 Mar 2019 16:38:30 +0000: Absolutely. My point is we shouldn't fix this regression by trading it for another user-facing behavior change without considering it. daniel.hatcher commented on Fri, 22 Mar 2019 13:44:56 +0000: If SERVER-38994 is what caused this, there is an argument to be made that as-is it's a client-facing regression (albeit a small one). schwerin commented on Fri, 22 Mar 2019 04:01:04 +0000: I am reluctant to change the user-facing behavior of the stepDown and shutDown commands in this instance to make our tests run faster. I made a conscious decision to require the user to force shutdown whenever there is no other electable node. At the very least, we should let product weigh in. We might also have to update the documentation. max.hirschhorn@10gen.com commented on Thu, 21 Mar 2019 17:19:51 +0000: FWIW, I filed this ticket because of my use of 1-node replica sets locally, but I think the change should apply to any replica set where electableCount == 1. Stepping down a single voting replica set may still be useful for testing purposes, i.e. to have the primary actually transition to state SECONDARY, but to just skip the election handoff part. jason.carey commented on Thu, 21 Mar 2019 15:31:10 +0000: I think the fix here is to make repl coordinator stepDown, or topology coordinator attemptStepDown, return quickly if the configured set has 1 node. That would fix the slowness on sigterm, and make the shutdown command do something sane for 1 node repl sets. At a glance, I'd probably change https://github.com/mongodb/mongo/blob/2a4d8ed5bb64af081b887f17dabf298831866b1d/src/mongo/db/repl/topology_coordinator.cpp#L2237 bool TopologyCoordinator::_canCompleteStepDownAttempt(Date_t now, Date_t waitUntil, bool force) { const bool forceNow = force && (now >= waitUntil); if (forceNow) { return true; } return isSafeToStepDown(); } so that there is an additional check for single node sets judah.schvimer commented on Thu, 21 Mar 2019 15:27:00 +0000: This feels pretty costly in terms of evergreen time spent. CC mira.carey@mongodb.com for any thoughts. max.hirschhorn@10gen.com commented on Thu, 21 Mar 2019 14:28:11 +0000: I would vote for changing the replSetStepDown command because you also cannot use the shutdown command without force=true to shut down a 1-node replica set.