...
On secondary, rsBackgroundSync makes a new operation context to get minValid, which conflicts with the PBWM lock. In steady state, the applier is the only writer of minValid by advancing the minValid after writing the oplog. Bgsync reads the minValid to choose sync source if minValid is ahead of the last fetched optime. This read of minValid doesn't have to be synchronized to the boundary of batches due to PBWM lock. Thread 54: "rsBackgroundSync" (Thread 0x7e983ac4ee60 (LWP 16775)) #0 0x00007e9857b82b94 in pthread_cond_timedwait@@GLIBC_2.17 () from /lib/powerpc64le-linux-gnu/libpthread.so.0 #1 0x00000b5b64fa1e68 in __gthread_cond_timedwait (__abs_timeout=0x7e983ac4cb68, __mutex=, __cond=) at /opt/mongodbtoolchain/revisions/94dac13bc8c0b50beff286acac77adeb2e81761e/stow/gcc-v3.tmq/include/c++/8.2.0/ppc64le-mongodb-linux/bits/gthr-default.h:871 #2 std::condition_variable::__wait_until_impl > > (__atime=..., __lock=..., this=0xb5b909d72d0) at /opt/mongodbtoolchain/revisions/94dac13bc8c0b50beff286acac77adeb2e81761e/stow/gcc-v3.tmq/include/c++/8.2.0/condition_variable:178 #3 std::condition_variable::wait_until > > (__atime=..., __lock=..., this=0xb5b909d72d0) at /opt/mongodbtoolchain/revisions/94dac13bc8c0b50beff286acac77adeb2e81761e/stow/gcc-v3.tmq/include/c++/8.2.0/condition_variable:106 #4 std::condition_variable::wait_until >, mongo::CondVarLockGrantNotification::wait(mongo::Milliseconds):: > (__p=..., __atime=..., __lock=..., this=0xb5b909d72d0) at /opt/mongodbtoolchain/revisions/94dac13bc8c0b50beff286acac77adeb2e81761e/stow/gcc-v3.tmq/include/c++/8.2.0/condition_variable:129 #5 std::condition_variable::wait_for, mongo::CondVarLockGrantNotification::wait(mongo::Milliseconds):: > (__p=..., __rtime=..., __lock=..., this=0xb5b909d72d0) at /opt/mongodbtoolchain/revisions/94dac13bc8c0b50beff286acac77adeb2e81761e/stow/gcc-v3.tmq/include/c++/8.2.0/condition_variable:156 #6 mongo::CondVarLockGrantNotification::wait (this=this@entry=0xb5b909d72a0, timeout=..., timeout@entry=...) at src/mongo/db/concurrency/lock_state.cpp:213 #7 0x00000b5b64fa3eec in mongo::LockerImpl::lockComplete (this=0xb5b909d7200, opCtx=0x0, resId=..., mode=, deadline=...) at src/mongo/db/concurrency/lock_state.cpp:867 #8 0x00000b5b64fa67a0 in mongo::LockerImpl::lock (this=, resId=..., mode=, deadline=...) at src/mongo/db/concurrency/lock_state.h:173 #9 0x00000b5b64f9155c in mongo::Lock::ResourceLock::lock (this=0x7e983ac4d290, mode=) at src/mongo/db/concurrency/d_concurrency.cpp:343 #10 0x00000b5b64f9172c in mongo::Lock::GlobalLock::_enqueue (this=this@entry=0x7e983ac4d280, lockMode=lockMode@entry=mongo::MODE_IS, deadline=...) at src/mongo/db/concurrency/d_concurrency.cpp:181 #11 0x00000b5b64f918e8 in mongo::Lock::GlobalLock::GlobalLock (this=0x7e983ac4d280, opCtx=, lockMode=, deadline=..., behavior=, enqueueOnly=...) at src/mongo/db/concurrency/d_concurrency.cpp:158 #12 0x00000b5b64f91974 in mongo::Lock::GlobalLock::GlobalLock (this=0x7e983ac4d280, opCtx=, lockMode=, deadline=..., behavior=) at src/mongo/db/concurrency/d_concurrency.cpp:140 #13 0x00000b5b64f91a74 in mongo::Lock::DBLock::DBLock (this=0x7e983ac4d268, opCtx=0xb5b91332940, db="local", mode=, deadline=...) at src/mongo/db/concurrency/lock_manager_defs.h:101 #14 0x00000b5b642664e4 in mongo::AutoGetDb::AutoGetDb (this=, opCtx=, dbName=..., mode=, deadline=...) at src/mongo/db/catalog_raii.cpp:55 #15 0x00000b5b6426732c in mongo::AutoGetCollection::AutoGetCollection (this=0x7e983ac4d268, opCtx=0xb5b91332940, nsOrUUID=..., modeDB=, modeColl=, viewMode=, deadline=...) at src/mongo/base/string_data.h:61 #16 0x00000b5b62d5f99c in mongo::AutoGetCollection::AutoGetCollection (deadline=..., viewMode=mongo::AutoGetCollection::kViewsForbidden, modeAll=, nsOrUUID=..., opCtx=, this=) at src/mongo/db/catalog_raii.h:91 #17 mongo::repl::(anonymous namespace)::::operator()(void) const (__closure=__closure@entry=0x7e983ac4d3c8) at src/mongo/db/repl/storage_interface_impl.cpp:606 #18 0x00000b5b62d60b28 in mongo::writeConflictRetry, mongo::repl::StorageInterface::ScanDirection, const mongo::BSONObj&, const mongo::BSONObj&, mongo::BoundInclusion, std::size_t, mongo::repl::(anonymous namespace)::FindDeleteMode):: > (f=..., ns=..., opStr=..., opCtx=0xb5b91332940) at /opt/mongodbtoolchain/revisions/94dac13bc8c0b50beff286acac77adeb2e81761e/stow/gcc-v3.tmq/include/c++/8.2.0/bits/atomic_base.h:390 #19 mongo::repl::(anonymous namespace)::_findOrDeleteDocuments (opCtx=, opCtx@entry=0xb5b91332940, nsOrUUID=..., indexName=..., scanDirection=, scanDirection@entry=mongo::repl::StorageInterface::ScanDirection::kForward, startKey=unowned empty BSONObj @ 0xb5b65398a10 , endKey=..., boundInclusion=, limit=, limit@entry=0, mode=mode@entry=mongo::repl::(anonymous namespace)::FindDeleteMode::kFind) at src/mongo/db/repl/storage_interface_impl.cpp:712 #20 0x00000b5b62d698cc in mongo::repl::StorageInterfaceImpl::findDocuments (this=, opCtx=0xb5b91332940, nss=..., indexName=..., scanDirection=, startKey=unowned empty BSONObj @ 0xb5b65398a10 , boundInclusion=, limit=2) at src/mongo/bson/bsonobj.h:128 #21 0x00000b5b62d5d828 in mongo::repl::StorageInterfaceImpl::findSingleton (this=, opCtx=, nss=...) at src/mongo/bson/bsonobj.h:128 #22 0x00000b5b62da0380 in mongo::repl::ReplicationConsistencyMarkersImpl::_getMinValidDocument (this=, opCtx=) at src/mongo/db/repl/replication_consistency_markers_impl.cpp:74 #23 0x00000b5b62da09bc in mongo::repl::ReplicationConsistencyMarkersImpl::getMinValid (this=, opCtx=) at src/mongo/db/repl/replication_consistency_markers_impl.cpp:179 #24 0x00000b5b62e3361c in mongo::repl::BackgroundSync::_produce (this=this@entry=0xb5b8b357e00) at /opt/mongodbtoolchain/revisions/94dac13bc8c0b50beff286acac77adeb2e81761e/stow/gcc-v3.tmq/include/c++/8.2.0/bits/unique_ptr.h:342 #25 0x00000b5b62e35564 in mongo::repl::BackgroundSync::_runProducer (this=this@entry=0xb5b8b357e00) at src/mongo/db/repl/bgsync.cpp:213 #26 0x00000b5b62e357ac in mongo::repl::BackgroundSync::_run (this=0xb5b8b357e00) at src/mongo/db/repl/bgsync.cpp:174 #27 0x00000b5b65377694 in std::execute_native_thread_routine (__p=) at ../../../../../src/combined/libstdc++-v3/src/c++11/thread.cc:80 #28 0x00007e9857b7885c in start_thread () from /lib/powerpc64le-linux-gnu/libpthread.so.0 #29 0x00007e9857a99028 in clone () from /lib/powerpc64le-linux-gnu/libc.so.6
siyuan.zhou@10gen.com commented on Thu, 31 Oct 2019 05:18:46 +0000: To make the concurrency correct, we actually need finishRecoveryIfEligible to be exclusive with oplog application. In 4.0, we have a global lock to protect getting last applied and the minValid. In 4.2, we changed it to RSTL, which introduces a race. The last applied is acquired while the applier is running, but the minValid conflicts with the applier on PBWM. Thus it's more likely the last applied is smaller than the minValid. I'd suggest acquiring PBWM before RSTL in finishRecoveryIfEligible. greg.mckeon commented on Mon, 6 May 2019 17:26:04 +0000: Whoever investigates should see if the proposed solution is safe.
Click on a version to see all relevant bugs
MongoDB Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.