...
For secondary batch application, the ApplyBatchFinalizer is used to advance optimes after application of an oplog batch completes. We currently use the ApplyBatchFinalizerForJournal for durable storage engines and the ApplyBatchFinalizer for non-durable storage engines, which only updates the lastApplied optime. On primaries, for non-durable storage engines, the replication system keeps the lastDurable optime up to date wth the lastApplied optime, since the lastDurable optime has no functional meaning on a non-durable storage engine. It seems we should be keeping this behavior consistent between primaries and secondaries, so we should update the lastDurable optime on batch application on non-durable storage engines.
spencer commented on Wed, 29 Aug 2018 19:19:35 +0000: Spent some time investigating the current behavior here and it's a bit interesting. The following describes the behavior of a single inserts with various configurations and writeConcerns specified Primary inMemory, Secondary WT, writeConcernMajorityJournalDefault: true No writeConcern specified: No error, lastApplied updated but not lastDurable on the primary, both lastApplied and lastDurable updated on secondary w:majority without 'j' specified: No error, both lastApplied and lastDurable advance on both nodes. w:majority with j: true specified explicitly: Error "cannot use 'j' option when a host does not have journaling enabled" returned. No write is performed so no optimes advance. w:majority with j:false specified: write is successful, writeConcern times out. lastApplied is updated but not lastDurable on primary, both updated on secondary. #2 here I believe can be explained by this line, which waits for durability and then unconditionally sets lastOpDurable to the lastApplied. This seems like problematic behavior however since j wasn't specified and writeConcernMajorityJournalDefault is true I'd expect this to error in some way. #4 above is pretty surprising, and I haven't looked into yet why this is the behavior. Primary WT, Secondary inMemory, writeConcernMajorityJournalDefault: true No writeConcern specified: No error, both OpTimes advanced on primary, lastApplied but not lastDurable updated on secondary w:majority without 'j' specified: Write is successful, writeConcern times out. Both OpTimes advanced on primary, lastApplied but not lastDurable updated on secondary w:majority with j: true specified explicitly: Write is successful, writeConcern times out. Both OpTimes advanced on primary, lastApplied but not lastDurable updated on secondary w:majority with j:false specified: Write is successful, writeConcern times out. Both OpTimes advanced on primary, lastApplied but not lastDurable updated on secondary All 4 cases behave the same. It's a bit surprising that #4 still errors even though j:false is specified. Primary WT, Secondary inMemory, writeConcernMajorityJournalDefault: false No writeConcern specified: No error, both OpTimes advanced on primary, lastApplied but not lastDurable updated on secondary w:majority without 'j' specified: No error, both OpTimes advanced on primary, lastApplied but not lastDurable updated on secondary w:majority with j: true specified explicitly: Write is successful, writeConcern times out. Both OpTimes advanced on primary, lastApplied but not lastDurable updated on secondary w:majority with j:false specified: No error, both OpTimes advanced on primary, lastApplied but not lastDurable updated on secondary No surprises here Primary inMemory, Secondary WT, writeConcernMajorityJournalDefault: false No writeConcern specified: No error, lastApplied updated but not lastDurable on the primary, both lastApplied and lastDurable updated on secondary w:majority without 'j' specified: No error, lastApplied updated but not lastDurable on the primary, both lastApplied and lastDurable updated on secondary w:majority with j: true specified explicitly: Error "cannot use 'j' option when a host does not have journaling enabled" returned. No write is performed so no optimes advance. w:majority with j:false specified: No error, lastApplied updated but not lastDurable on the primary, both lastApplied and lastDurable updated on secondary No surprises here. tess.avitabile commented on Tue, 3 Jul 2018 18:18:06 +0000: We should investigate whether having an in-memory primary node keep lastDurable up to date with lastApplied causes us to incorrectly confirm majority (durable) writes. william.schultz commented on Mon, 25 Jun 2018 20:08:49 +0000: milkie I came across this behavior when diagnosing a build failure that occurred specifically on the ephemeralForTest storage engine. What I observed was that the lastDurable optime on a secondary was not advancing during normal steady state replication, but an update to it was later triggered by another (internal) write happening in the system; in this case it was the writing of our "last vote" document to storage. This then seemed to cause the durable optime to advance, and because of this, we triggered an updatePosition request to our sync source, which ended up interfering with other commands in an unintended way (SERVER-35766). That isn't explicitly related to this issue, but that is how I discovered this. When I noticed that we weren't updating our lastDurable optime during batch application, it seemed incorrect. Perhaps the behavior I was observing could also be due to an ephemeralForTest engine bug? I wasn't entirely sure. Maybe the existing behavior is acceptable, but I suppose we should at least decide what we want the behavior to be, since it certainly appears that we try to keep lastDurable optimes up to date with lastApplied optimes on the primary. milkie commented on Mon, 25 Jun 2018 19:49:54 +0000: I'm not sure we should be making this change unless there's an advantage to making it. I suspect it won't be trivial to change this behavior, and it will change the use of writeConcernMajorityJournalDefault parameter, since you would no longer need to change it when setting up a replica set with nondurable nodes.
Click on a version to see all relevant bugs
MongoDB Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.