BugZero | MongoDB BugID 375396 - Strange election on network failure

MongoDB - Defect ID: 375396

Strange election on network failure

MongoDB - Defect ID: 375396

Strange election on network failure

Last updated on 10/27/2023

Overall: 5.95.9

Severity: 6.46.4

Community: 66.0

Lifecycle: 9.19.1

What is the BugZero Risk Score?

Vendor details

Priority: Major - P3
Status: Closed

Overall: 5.95.9

Severity: 6.46.4

Community: 66.0

Lifecycle: 9.19.1

What is the BugZero Risk Score?

Vendor details

Priority: Major - P3
Status: Closed

Info

The subject replicaset has 3 nodes (see rs.conf() below). t1 IP address is 10.3.1.12 t2 IP address is 10.3.1.13 t3 IP address is 10.3.1.16 After a transient network failure (switch ports were disabled and enabled back) on the secondary (t3) it became primary, causing rollbacks on the previous primary (t1) and other secondary (t2). All writes are done with w:majority, so this is really strange. Logs from all three machines are attached. rs.conf() { "_id" : "driveFS-temp-1", "version" : 4, "protocolVersion" : NumberLong(1), "writeConcernMajorityJournalDefault" : false, "members" : [ { "_id" : 0, "host" : "t1.s1.fs.drive.bru:27231", "arbiterOnly" : false, "buildIndexes" : true, "hidden" : false, "priority" : 1, "tags" : { }, "slaveDelay" : NumberLong(0), "votes" : 1 }, { "_id" : 1, "host" : "t2.s1.fs.drive.bru:27231", "arbiterOnly" : false, "buildIndexes" : true, "hidden" : false, "priority" : 1, "tags" : { }, "slaveDelay" : NumberLong(0), "votes" : 1 }, { "_id" : 2, "host" : "t3.s1.fs.drive.bru:27231", "arbiterOnly" : false, "buildIndexes" : true, "hidden" : false, "priority" : 1, "tags" : { }, "slaveDelay" : NumberLong(0), "votes" : 1 } ], "settings" : { "chainingAllowed" : true, "heartbeatIntervalMillis" : 2000, "heartbeatTimeoutSecs" : 10, "electionTimeoutMillis" : 5000, "catchUpTimeoutMillis" : 2000, "getLastErrorModes" : { }, "getLastErrorDefaults" : { "w" : 1, "wtimeout" : 0 }, "replicaSetId" : ObjectId("58c9657b40aba377920b23f2") } }

Top User Comments

onyxmaster commented on Mon, 1 May 2017 15:47:10 +0000: Thank you for the information. I was more surprised that election allowed a secondary to be elected as primary when primary was available and connected to the other secondary. Well, since this preserves the acknowledged majority writes it's okay. thomas.schubert commented on Mon, 1 May 2017 02:37:17 +0000: Hi onyxmaster, After reviewing the logs, there is no indication of a bug during this failover. While, w : majority acknowledges writes will not be rolled back, writes written with this write concern that have not been acknowledged to the application are liable to be rolled back on failover. In this case, it appears that writes were completed on the secondary, but the rest of the replicaset (and application by extension) was not yet aware that these writes had been completed. Consequently, the secondary and old primary rolled back on fail over. Kind regards, Thomas thomas.schubert commented on Wed, 19 Apr 2017 14:25:42 +0000: Hi onyxmaster, Thank you for the detailed report and logs, we're investigating this behavior and will update this ticket after we've finished reviewing the logs. Kind regards, Thomas

Steps to Reproduce

5.9Defect ID: 2956672
Some time-series tests implicitly rely on measurement insertion order for unordered inserts when checking bucket catalog stats
6.14Defect ID: 2965528
Remove push, publish_packages, and crypt_push tasks from Graviton 4 variants in v7.0 and v8.0
6.14Defect ID: 2947969
[SBE] Release storage engine resources when saveState() or restoreState() throws
5.68Defect ID: 2919474
StackLocator broken by v5 toolchain ASAN
5.88Defect ID: 2968769
Make new write path helper functions use acquireAndValidateBucketsCollection instead of acquireCollection

Ready to prevent the next vendor outage?

Get a demo

OPERATIONAL DEFECT DATABASE

MongoDB - Defect ID: 375396

Strange election on network failure

MongoDB - Defect ID: 375396

Strange election on network failure

Last updated on 10/27/2023

Vendor details

Vendor details

Description

Info

Top User Comments

Steps to Reproduce

Links

Top MongoDB defects by risk score

Ready to prevent the next vendor outage?