BugZero | MongoDB BugID 720416 - Signaling 1-node replica set to shut down now take...

MongoDB - Defect ID: 720416

Signaling 1-node replica set to shut down now takes an extra 10 seconds

MongoDB - Defect ID: 720416

Signaling 1-node replica set to shut down now takes an extra 10 seconds

Last updated on January 8th, 2024

BugZero Risk Score
4.4 Medium

Overall: 4.4

Severity: 4.6

Community: 6.4

Lifecycle: 9.1

What is the BugZero Risk Score?

MongoDB Integration

Learn more about where this data comes from

MongoDB Integration

Learn more

Bug Scrub Advisor

Streamline upgrades with automated vendor bug scrubs

Bug Scrub Advisor

Learn more

BugZero Enterprise

Wish you caught this bug sooner? Get proactive today.

BugZero Enterprise

Learn more

Bug Details

Priority: Minor - P4
Status: Closed
Views: 4

Description

Info

It seems like attempting to run ReplicationCoordinator::stepDown() is unnecessary when the replica set configuration is known to only contain one node electable as primary. The extra time it takes to shut down the replica set is mildly annoying for certain aspects of my local development workflow. 2019-03-21T03:09:22.009-0400 I CONTROL [signalProcessingThread] got signal 15 (Terminated), will terminate after current cmd ends 2019-03-21T03:09:22.009-0400 I REPL [RstlKillOpthread] Starting to kill user operations 2019-03-21T03:09:22.009-0400 I REPL [RstlKillOpthread] Stopped killing user operations 2019-03-21T03:09:32.020-0400 I REPL [RstlKillOpthread] Starting to kill user operations 2019-03-21T03:09:32.020-0400 I REPL [RstlKillOpthread] Stopped killing user operations 2019-03-21T03:09:32.020-0400 I STORAGE [signalProcessingThread] Failed to stepDown in non-command initiated shutdown path ExceededTimeLimit: No electable secondaries caught up as of 2019-03-21T03:09:32.020-0400. Please use the replSetStepDown command with the argument {force: true} to force node to step down. 2019-03-21T03:09:32.020-0400 I NETWORK [signalProcessingThread] shutdown: going to close listening sockets...

Top User Comments

jason.carey commented on Mon, 15 Apr 2019 14:33:46 +0000: Closing this out after the change made in SERVER-40335. I think that satisfies the intent of this ticket vesselina.ratcheva commented on Mon, 25 Mar 2019 23:00:15 +0000: I think the fix Jason pointed out in the topology coordinator is the way to go implementation-wise (it can also be made in isSafeToStepDown), provided we come to a consensus about user-facing behavior. In the same spirit as the proposition SERVER-40335, I would also propose making a parameter to gate that new behavior directly in topo instead. jason.carey commented on Mon, 25 Mar 2019 21:14:31 +0000: After some reflection (and conversation with max.hirschhorn), I'm going to features we're not sure of this, for now. If we don't want to tackle allowing shutdown in more configurations, we should probably just make the timeout configurable (and make it 0 for most tests). I've opened SERVER-40335 to explore that avenue. schwerin commented on Fri, 22 Mar 2019 16:38:30 +0000: Absolutely. My point is we shouldn't fix this regression by trading it for another user-facing behavior change without considering it. daniel.hatcher commented on Fri, 22 Mar 2019 13:44:56 +0000: If SERVER-38994 is what caused this, there is an argument to be made that as-is it's a client-facing regression (albeit a small one). schwerin commented on Fri, 22 Mar 2019 04:01:04 +0000: I am reluctant to change the user-facing behavior of the stepDown and shutDown commands in this instance to make our tests run faster. I made a conscious decision to require the user to force shutdown whenever there is no other electable node. At the very least, we should let product weigh in. We might also have to update the documentation. max.hirschhorn@10gen.com commented on Thu, 21 Mar 2019 17:19:51 +0000: FWIW, I filed this ticket because of my use of 1-node replica sets locally, but I think the change should apply to any replica set where electableCount == 1. Stepping down a single voting replica set may still be useful for testing purposes, i.e. to have the primary actually transition to state SECONDARY, but to just skip the election handoff part. jason.carey commented on Thu, 21 Mar 2019 15:31:10 +0000: I think the fix here is to make repl coordinator stepDown, or topology coordinator attemptStepDown, return quickly if the configured set has 1 node. That would fix the slowness on sigterm, and make the shutdown command do something sane for 1 node repl sets. At a glance, I'd probably change https://github.com/mongodb/mongo/blob/2a4d8ed5bb64af081b887f17dabf298831866b1d/src/mongo/db/repl/topology_coordinator.cpp#L2237 bool TopologyCoordinator::_canCompleteStepDownAttempt(Date_t now, Date_t waitUntil, bool force) { const bool forceNow = force && (now >= waitUntil); if (forceNow) { return true; } return isSafeToStepDown(); } so that there is an additional check for single node sets judah.schvimer commented on Thu, 21 Mar 2019 15:27:00 +0000: This feels pretty costly in terms of evergreen time spent. CC mira.carey@mongodb.com for any thoughts. max.hirschhorn@10gen.com commented on Thu, 21 Mar 2019 14:28:11 +0000: I would vote for changing the replSetStepDown command because you also cannot use the shutdown command without force=true to shut down a 1-node replica set.

Steps to Reproduce

Relevant Products

Click on a version to see all relevant bugs

Affected versions:No known affected versions

Fixed versions: No known fixed versions

Relevant Products

Click on a version to see all relevant bugs

Affected versions:No known affected versions

Fixed versions: No known fixed versions

Top MongoDB Defects

5.5Defect ID: 3311853
Balancer does not make progress when the most loaded shard is already balanced within its zones
5.5Defect ID: 3305743
Fix detecting current git repo on windows
5.5Defect ID: 3277022
CollectionCatalog can't replace a collection with a view in the same WriteUnitOfWork
5.4Defect ID: 3287683
[v6.0] Ensure update fails on primary when doc_diff::computeDiff() produces empty sub-diffs
5.3Defect ID: 3312454
Extended Range control block inconsistency

Ready to prevent the next vendor outage?

Get a demo

MongoDB - Defect ID: 720416

Signaling 1-node replica set to shut down now takes an extra 10 seconds

MongoDB - Defect ID: 720416

Signaling 1-node replica set to shut down now takes an extra 10 seconds

Last updated on January 8th, 2024

BugZero Risk Score4.4 Medium

Bug Details

Info

Top User Comments

Steps to Reproduce

Top MongoDB Defects

Ready to prevent the next vendor outage?

Links

BugZero Risk Score
4.4 Medium