...
Hello, I am trying to upgrade our MongoDB 4.2.1 replica set to 4.4.2. Our Spring Boot 2.4 application with Java driver 4.1.1 uses multi-documents transactions and manipulates lots of documents in the same transaction (batch writting). We do not have any issue on MongoDB 4.2 but theses transactions fails on WriteConflict on 4.4. I add here a Java/Spring simple unit test case which pass on mongo 4.2 but not on mongo 4.4 which do the following : init the collection outside the transaction with 20 000 docs (with 2 properties, id and string with 1000 chars) in a transaction, removes all the 20 000 docs and re-insert 20 000 similar docs Step to reproduce : Run the following test on mongo 4.2.1 : it passes @RunWith(SpringRunner.class) @ActiveProfiles(value = "test-mongo44") @Import( {MongoAutoConfiguration.class, MongoDataAutoConfiguration.class} ) @SpringBootTest() public class Mongo44Test { @Autowired private MongoTemplate mongoTemplate; @Autowired private MongoTransactionManager mongoTransactionManager; @Test public void test1() { // init the collection with 20 000 docs with a 1000 chars length string Collection elements = new ArrayList(); IntStream.range(0,20000).forEach(i -> { Document doc = new Document(); doc.put("id", "site"); doc.put("string", RandomStringUtils.random(1000, true, true)); elements.add(doc); }); mongoTemplate.insert(elements, "mongo44"); TransactionTemplate transactionTemplate = new TransactionTemplate(mongoTransactionManager); transactionTemplate.execute(new TransactionCallbackWithoutResult() { @Override protected void doInTransactionWithoutResult(TransactionStatus status) { // remove all docs in transaction mongoTemplate.remove(Query.query(Criteria.where("id").is("site")), "mongo44"); elements.clear(); // and re-insert 20 000 docs with a 1000 chars length string in the transaction IntStream.range(0,20000).forEach(i -> { Document doc = new Document(); doc.put("id", "site"); doc.put("string", RandomStringUtils.random(1000, true, true)); elements.add(doc); }); mongoTemplate.insert(elements, "mongo44"); }; }); } } Run the same test on mongo 4.4.2 : it fails on following error Could not commit Mongo transaction for session [ClientSessionImpl@6d4a8c85 id = {"id": {"$binary": {"base64": "uCifnjaYRkqu73ku3PC4Tw==", "subType": "04"}}}, causallyConsistent = true, txActive = false, txNumber = 3, error = d != java.lang.Boolean].; nested exception is com.mongodb.MongoCommandException: Command failed with error 112 (WriteConflict): 'WriteConflict error: this operation conflicted with another operation. Please retry your operation or multi-document transaction.' on server localhost:27017. The full response is {"errorLabels": ["TransientTransactionError"], "operationTime": {"$timestamp": {"t": 1608556898, "i": 20000}}, "ok": 0.0, "errmsg": "WriteConflict error: this operation conflicted with another operation. Please retry your operation or multi-document transaction.", "code": 112, "codeName": "WriteConflict", "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1608556900, "i": 70000}}, "signature": {"hash": {"$binary": {"base64": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "subType": "00"}}, "keyId": 0}}}org.springframework.transaction.TransactionSystemException: Could not commit Mongo transaction for session [ClientSessionImpl@6d4a8c85 id = {"id": {"$binary": {"base64": "uCifnjaYRkqu73ku3PC4Tw==", "subType": "04"}}}, causallyConsistent = true, txActive = false, txNumber = 3, error = d != java.lang.Boolean].; nested exception is com.mongodb.MongoCommandException: Command failed with error 112 (WriteConflict): 'WriteConflict error: this operation conflicted with another operation. Please retry your operation or multi-document transaction.' on server localhost:27017. The full response is {"errorLabels": ["TransientTransactionError"], "operationTime": {"$timestamp": {"t": 1608556898, "i": 20000}}, "ok": 0.0, "errmsg": "WriteConflict error: this operation conflicted with another operation. Please retry your operation or multi-document transaction.", "code": 112, "codeName": "WriteConflict", "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1608556900, "i": 70000}}, "signature": {"hash": {"$binary": {"base64": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "subType": "00"}}, "keyId": 0}}} at org.springframework.data.mongodb.MongoTransactionManager.doCommit(MongoTransactionManager.java:203) at org.springframework.transaction.support.AbstractPlatformTransactionManager.processCommit(AbstractPlatformTransactionManager.java:743) at org.springframework.transaction.support.AbstractPlatformTransactionManager.commit(AbstractPlatformTransactionManager.java:711) at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:152) at com.myproject.mongo44.Mongo44Test.test1(Mongo44Test.java:54) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.springframework.test.context.junit4.statements.RunBeforeTestExecutionCallbacks.evaluate(RunBeforeTestExecutionCallbacks.java:74) at org.springframework.test.context.junit4.statements.RunAfterTestExecutionCallbacks.evaluate(RunAfterTestExecutionCallbacks.java:84) at org.springframework.test.context.junit4.statements.RunBeforeTestMethodCallbacks.evaluate(RunBeforeTestMethodCallbacks.java:75) at org.springframework.test.context.junit4.statements.RunAfterTestMethodCallbacks.evaluate(RunAfterTestMethodCallbacks.java:86) at org.springframework.test.context.junit4.statements.SpringRepeat.evaluate(SpringRepeat.java:84) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:251) at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:97) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.springframework.test.context.junit4.statements.RunBeforeTestClassCallbacks.evaluate(RunBeforeTestClassCallbacks.java:61) at org.springframework.test.context.junit4.statements.RunAfterTestClassCallbacks.evaluate(RunAfterTestClassCallbacks.java:70) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.run(SpringJUnit4ClassRunner.java:190) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62) at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33) at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94) at com.sun.proxy.$Proxy5.processTestClass(Unknown Source) at org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:182) at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:164) at org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:413) at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64) at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:48) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:56) at java.base/java.lang.Thread.run(Thread.java:834)Caused by: com.mongodb.MongoCommandException: Command failed with error 112 (WriteConflict): 'WriteConflict error: this operation conflicted with another operation. Please retry your operation or multi-document transaction.' on server localhost:27017. The full response is {"errorLabels": ["TransientTransactionError"], "operationTime": {"$timestamp": {"t": 1608556898, "i": 20000}}, "ok": 0.0, "errmsg": "WriteConflict error: this operation conflicted with another operation. Please retry your operation or multi-document transaction.", "code": 112, "codeName": "WriteConflict", "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1608556900, "i": 70000}}, "signature": {"hash": {"$binary": {"base64": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "subType": "00"}}, "keyId": 0}}} at com.mongodb.internal.connection.ProtocolHelper.getCommandFailureException(ProtocolHelper.java:175) at com.mongodb.internal.connection.InternalStreamConnection.receiveCommandMessageResponse(InternalStreamConnection.java:302) at com.mongodb.internal.connection.InternalStreamConnection.sendAndReceive(InternalStreamConnection.java:258) at com.mongodb.internal.connection.UsageTrackingInternalConnection.sendAndReceive(UsageTrackingInternalConnection.java:99) at com.mongodb.internal.connection.DefaultConnectionPool$PooledConnection.sendAndReceive(DefaultConnectionPool.java:500) at com.mongodb.internal.connection.CommandProtocolImpl.execute(CommandProtocolImpl.java:71) at com.mongodb.internal.connection.DefaultServer$DefaultServerProtocolExecutor.execute(DefaultServer.java:224) at com.mongodb.internal.connection.DefaultServerConnection.executeProtocol(DefaultServerConnection.java:202) at com.mongodb.internal.connection.DefaultServerConnection.command(DefaultServerConnection.java:118) at com.mongodb.internal.connection.DefaultServerConnection.command(DefaultServerConnection.java:110) at com.mongodb.internal.operation.CommandOperationHelper$13.call(CommandOperationHelper.java:710) at com.mongodb.internal.operation.OperationHelper.withReleasableConnection(OperationHelper.java:500) at com.mongodb.internal.operation.CommandOperationHelper.executeRetryableCommand(CommandOperationHelper.java:703) at com.mongodb.internal.operation.TransactionOperation.execute(TransactionOperation.java:69) at com.mongodb.internal.operation.CommitTransactionOperation.execute(CommitTransactionOperation.java:132) at com.mongodb.internal.operation.CommitTransactionOperation.execute(CommitTransactionOperation.java:53) at com.mongodb.client.internal.MongoClientDelegate$DelegateOperationExecutor.execute(MongoClientDelegate.java:207) at com.mongodb.client.internal.ClientSessionImpl.commitTransaction(ClientSessionImpl.java:129) at org.springframework.data.mongodb.MongoTransactionManager$MongoTransactionObject.commitTransaction(MongoTransactionManager.java:469) at org.springframework.data.mongodb.MongoTransactionManager.doCommit(MongoTransactionManager.java:236) at org.springframework.data.mongodb.MongoTransactionManager.doCommit(MongoTransactionManager.java:200) ... 60 more Is there a change in
luke.pearson commented on Sun, 31 Jan 2021 21:47:56 +0000: Thanks gregory.wlodarek. We introduced that change to handle the scenario where the oldest transaction, which was pinning content was also getting stuck in forced eviction and subsequently taking longer to complete, pinning more content. At the storage layer we expected operations that we returned WT_ROLLBACK to which would then get translated to a WCE to get retried. Is that not the case in the oplog? However we should also determine why that thread did get pulled into forced eviction with only 0.281% cache usage. gregory.wlodarek commented on Sun, 31 Jan 2021 18:39:38 +0000: luke.pearson, that did the trick! I can no longer reproduce this issue with your patch. luke.pearson commented on Thu, 28 Jan 2021 22:57:06 +0000: I could be wrong but this could be WT-6444, gregory.wlodarek could you try running the reproducer with this patch? diff --git a/src/btree/bt_read.c b/src/btree/bt_read.c index f44bc11cb..1c0fef980 100644 --- a/src/btree/bt_read.c +++ b/src/btree/bt_read.c @@ -348,10 +348,10 @@ read: * Forced eviction failed: check if this transaction is keeping content pinned * in cache. */ - if (force_attempts > 1 && + /*if (force_attempts > 1 && (ret = __wt_txn_is_blocking(session, true)) == WT_ROLLBACK) WT_STAT_CONN_INCR(session, cache_eviction_force_rollback); - WT_RET(ret); + WT_RET(ret);*/ stalled = true; break; } gregory.wlodarek commented on Fri, 15 Jan 2021 20:12:09 +0000: louis.williams pointed out WT-6627 to me, could these be related? gregory.wlodarek commented on Wed, 13 Jan 2021 21:12:00 +0000: I've attached some FTDC data from the reproducer. I found out that WiredTiger can return WT_ROLLBACK on write transactions to free up cache space, but looking at the FTDC data we're only using 0.281% of the cache at the time, so I don't believe that's the problem. Does WiredTiger impose some sort of per-transaction limit on large transactions that forces them to be rolled back? gregory.wlodarek commented on Wed, 13 Jan 2021 18:53:51 +0000: I've verified Bruce's observations. The write conflict exceptions are being thrown when inserting operations into the oplog collection. In the reproducer provided by Bruce, there are no concurrent operations running at the time of the transaction, so I'm not sure why we're getting the write conflict exception thrown. From what I can see, when we're committing the transaction there are multiple oplog entries associated with it (identical txnNumber) as it's a large transaction and gets split up. The first oplog entry inserts successfully into the table with the RecordId key 6917308536905007105 and the following contents: { "op": { "lsid": { "id": { "$uuid": "94c1804b-cd1f-4091-b201-173215e52c14" }, "uid": { "$binary": { "base64": "47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=", "subType": "0" } } }, "txnNumber": 0, "op": "c", "ns": "admin.$cmd", "o": { "applyOps": [ { "op": "i", "ns": "test.c", "ui": { "$uuid": "1f4d0d9a-27b3-495c-bd37-20f94e452e10" }, "o": { "_id": { "$oid": "5fff376f524b3165ab754a2b" }, "x": "xxxxx.........xxxxx" } }, { "op": "i", "ns": "test.c", "ui": { "$uuid": "1f4d0d9a-27b3-495c-bd37-20f94e452e10" }, "o": { "_id": { "$oid": "5fff376f524b3165ab754a2c" }, "x": "xxxxx.........xxxxx" } }, { "op": "i", "ns": "test.c", "ui": { "$uuid": "1f4d0d9a-27b3-495c-bd37-20f94e452e10" }, "o": { "_id": { "$oid": "5fff376f524b3165ab754a2d" }, "x": "xxxxx.........xxxxx" } }, { "op": "i", "ns": "test.c", "ui": { "$uuid": "1f4d0d9a-27b3-495c-bd37-20f94e452e10" }, "o": { "_id": { "$oid": "5fff376f524b3165ab754a2e" }, "x": "xxxxx.........xxxxx" } }, { "op": "i", "ns": "test.c", "ui": { "$uuid": "1f4d0d9a-27b3-495c-bd37-20f94e452e10" }, "o": { "_id": { "$oid": "5fff376f524b3165ab754a2f" }, "x": "xxxxx.........xxxxx" } }, { "op": "i", "ns": "test.c", "ui": { "$uuid": "1f4d0d9a-27b3-495c-bd37-20f94e452e10" }, "o": { "_id": { "$oid": "5fff376f524b3165ab754a30" }, "x": "xxxxx.........xxxxx" } }, { "op": "i", "ns": "test.c", "ui": { "$uuid": "1f4d0d9a-27b3-495c-bd37-20f94e452e10" }, "o": { "_id": { "$oid": "5fff376f524b3165ab754a31" }, "x": "xxxxx.........xxxxx" } }, { "op": "i", "ns": "test.c", "ui": { "$uuid": "1f4d0d9a-27b3-495c-bd37-20f94e452e10" }, "o": { "_id": { "$oid": "5fff376f524b3165ab754a32" }, "x": "xxxxx.........xxxxx" } }, { "op": "i", "ns": "test.c", "ui": { "$uuid": "1f4d0d9a-27b3-495c-bd37-20f94e452e10" }, "o": { "_id": { "$oid": "5fff376f524b3165ab754a33" } } } ] } }, "ts": { "$timestamp": { "t": 1610561399, "i": 1 } } } The insertion of the second oplog entry into the table is the one that is consistently returning WT_ROLLBACK (-31800) for me, resulting in the write conflict exception being thrown. For this entry, we use the RecordId key 6917308536905007106 and it contains the following contents: { "op": { "lsid": { "id": { "$uuid": "94c1804b-cd1f-4091-b201-173215e52c14" }, "uid": { "$binary": { "base64": "47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=", "subType": "0" } } }, "txnNumber": 0, "op": "c", "ns": "admin.$cmd", "o": { "applyOps": [ { "op": "i", "ns": "test.c", "ui": { "$uuid": "1f4d0d9a-27b3-495c-bd37-20f94e452e10" }, "o": { "_id": { "$oid": "5fff3770524b3165ab75860e" }, "x": "xxxxx.........xxxxx" } }, { "op": "i", "ns": "test.c", "ui": { "$uuid": "1f4d0d9a-27b3-495c-bd37-20f94e452e10" }, "o": { "_id": { "$oid": "5fff3770524b3165ab75860f" }, "x": "xxxxx.........xxxxx" } }, { "op": "i", "ns": "test.c", "ui": { "$uuid": "1f4d0d9a-27b3-495c-bd37-20f94e452e10" }, "o": { "_id": { "$oid": "5fff3770524b3165ab758610" }, "x": "xxxxx.........xxxxx" } }, { "op": "i", "ns": "test.c", "ui": { "$uuid": "1f4d0d9a-27b3-495c-bd37-20f94e452e10" }, "o": { "_id": { "$oid": "5fff3770524b3165ab758611" }, "x": "xxxxx.........xxxxx" } }, { "op": "i", "ns": "test.c", "ui": { "$uuid": "1f4d0d9a-27b3-495c-bd37-20f94e452e10" }, "o": { "_id": { "$oid": "5fff3770524b3165ab758612" }, "x": "xxxxx.........xxxxx" } }, { "op": "i", "ns": "test.c", "ui": { "$uuid": "1f4d0d9a-27b3-495c-bd37-20f94e452e10" }, "o": { "_id": { "$oid": "5fff3770524b3165ab758613" }, "x": "xxxxx.........xxxxx" } }, { "op": "i", "ns": "test.c", "ui": { "$uuid": "1f4d0d9a-27b3-495c-bd37-20f94e452e10" }, "o": { "_id": { "$oid": "5fff3770524b3165ab758614" }, "x": "xxxxx.........xxxxx" } }, { "op": "i", "ns": "test.c", "ui": { "$uuid": "1f4d0d9a-27b3-495c-bd37-20f94e452e10" }, "o": { "_id": { "$oid": "5fff3770524b3165ab758615" }, "x": "xxxxx.........xxxxx" } }, { "op": "i", "ns": "test.c", "ui": { "$uuid": "1f4d0d9a-27b3-495c-bd37-20f94e452e10" }, "o": { "_id": { "$oid": "5fff3770524b3165ab758616" } } } ] } }, "ts": { "$timestamp": { "t": 1610561399, "i": 2 } } } During this time there have been no other operations with the RecordId key 6917308536905007106, so I don't see how we can be getting a WT_ROLLBACK from WiredTiger. I'm going to send this over to WiredTiger for further investigation. This issue also reproduces consistently for me on v4.9. bruce.lucas@10gen.com commented on Mon, 21 Dec 2020 18:17:46 +0000: It looks like the write conflict exceptions are occuring while inserting the operations in the oplog. (gdb) bt #0 0x000055da5f67ef98 in mongo::WriteConflictException::WriteConflictException() () #1 0x000055da5f7ea8b4 in mongo::wtRCToStatus_slow(int, char const*) () #2 0x000055da5f7d5b1a in mongo::WiredTigerRecordStore::_insertRecords(mongo::OperationContext*, mongo::Record*, mongo::Timestamp const*, unsigned long) () #3 0x000055da5f7d5c74 in mongo::WiredTigerRecordStore::insertRecords(mongo::OperationContext*, std::vector >*, std::vector > const&) () #4 0x000055da6000c7a3 in mongo::CollectionImpl::insertDocumentsForOplog(mongo::OperationContext*, std::vector >*, std::vector > const&) () #5 0x000055da600739b8 in mongo::repl::_logOpsInner(mongo::OperationContext*, mongo::NamespaceString const&, std::vector >*, std::vector > const&, mongo::Collection*, mongo::repl::OpTime, mongo::Date_t) () #6 0x000055da600742a8 in mongo::repl::logOp(mongo::OperationContext*, mongo::repl::MutableOplogEntry*) () #7 0x000055da5fe165a4 in mongo::(anonymous namespace)::logOperation(mongo::OperationContext*, mongo::repl::MutableOplogEntry*) () #8 0x000055da5fe181b0 in mongo::(anonymous namespace)::logApplyOpsForTransaction(mongo::OperationContext*, mongo::repl::MutableOplogEntry*, boost::optional, boost::optional, bool) [clone .constprop.961] () #9 0x000055da5fe1a7da in mongo::(anonymous namespace)::logOplogEntriesForTransaction(mongo::OperationContext*, std::vector >*, std::vector > const&, unsigned long, bool) () #10 0x000055da5fe1df93 in mongo::OpObserverImpl::onUnpreparedTransactionCommit(mongo::OperationContext*, std::vector >*, unsigned long) () #11 0x000055da5f756ce5 in mongo::OpObserverRegistry::onUnpreparedTransactionCommit(mongo::OperationContext*, std::vector >*, unsigned long) () #12 0x000055da600ae2bb in mongo::TransactionParticipant::Participant::commitUnpreparedTransaction(mongo::OperationContext*) () #13 0x000055da5fc452dc in mongo::(anonymous namespace)::CmdCommitTxn::run(mongo::OperationContext*, std::__cxx11::basic_string, std::allocator > const&, mongo::BSONObj const&, mongo::BSONObjBuilder&) () #14 0x000055da603ebe5a in mongo::BasicCommandWithReplyBuilderInterface::Invocation::run(mongo::OperationContext*, mongo::rpc::ReplyBuilderInterface*) () #15 0x000055da603e639f in mongo::CommandHelpers::runCommandInvocation(mongo::OperationContext*, mongo::OpMsgRequest const&, mongo::CommandInvocation*, mongo::rpc::ReplyBuilderInterface*) () #16 0x000055da5fb8e8b1 in mongo::(anonymous namespace)::invokeWithSessionCheckedOut(mongo::OperationContext*, mongo::OpMsgRequest const&, mongo::CommandInvocation*, mongo::OperationSessionInfoFromClient const&, mongo::rpc::ReplyBuilderInterface*) () #17 0x000055da5fb8f828 in mongo::(anonymous namespace)::runCommandImpl(mongo::OperationContext*, mongo::CommandInvocation*, mongo::OpMsgRequest const&, mongo::rpc::ReplyBuilderInterface*, mongo::LogicalTime, mongo::ServiceEntryPointCommon::Hooks const&, mongo::BSONObjBuilder*, mongo::OperationSessionInfoFromClient const&) () #18 0x000055da5fb91ff9 in mongo::(anonymous namespace)::execCommandDatabase(mongo::OperationContext*, mongo::Command*, mongo::OpMsgRequest const&, mongo::rpc::ReplyBuilderInterface*, mongo::ServiceEntryPointCommon::Hooks const&) () #19 0x000055da5fb932e0 in mongo::(anonymous namespace)::receivedCommands(mongo::OperationContext*, mongo::Message const&, mongo::ServiceEntryPointCommon::Hooks const&) () #20 0x000055da5fb93f6d in mongo::ServiceEntryPointCommon::handleRequest(mongo::OperationContext*, mongo::Message const&, mongo::ServiceEntryPointCommon::Hooks const&) () #21 0x000055da5fb8195c in mongo::ServiceEntryPointMongod::handleRequest(mongo::OperationContext*, mongo::Message const&) () #22 0x000055da5fb8c48c in mongo::ServiceStateMachine::_processMessage(mongo::ServiceStateMachine::ThreadGuard) () #23 0x000055da5fb8a1c5 in mongo::ServiceStateMachine::_runNextInGuard(mongo::ServiceStateMachine::ThreadGuard) () #24 0x000055da5fb8b176 in std::_Function_handler::_M_invoke(std::_Any_data const&) () #25 0x000055da60e47b22 in mongo::transport::ServiceExecutorSynchronous::schedule(std::function, mongo::transport::ServiceExecutor::ScheduleFlags, mongo::transport::ServiceExecutorTaskName) () #26 0x000055da5fb895db in mongo::ServiceStateMachine::_scheduleNextWithGuard(mongo::ServiceStateMachine::ThreadGuard, mongo::transport::ServiceExecutor::ScheduleFlags, mongo::transport::ServiceExecutorTaskName, mongo::ServiceStateMachine::Ownership) () #27 0x000055da5fb8a93d in mongo::ServiceStateMachine::_sourceCallback(mongo::Status) () #28 0x000055da5fb8aa20 in auto mongo::future_details::call)#2}&, mongo::StatusWith >(mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard)::{lambda(mongo::StatusWith)#2}&, mongo::StatusWith&&) () #29 0x000055da5fb8aca5 in mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard) () #30 0x000055da5fb8a23a in mongo::ServiceStateMachine::_runNextInGuard(mongo::ServiceStateMachine::ThreadGuard) () #31 0x000055da5fb8b176 in std::_Function_handler::_M_invoke(std::_Any_data const&) () #32 0x000055da60e48188 in std::_Function_handler, mongo::transport::ServiceExecutor::ScheduleFlags, mongo::transport::ServiceExecutorTaskName)::{lambda()#2}>::_M_invoke(std::_Any_data const&) () #33 0x000055da61197b06 in std::_Function_handler)::{lambda()#3}>::_M_invoke(std::_Any_data const&) () #34 0x000055da61197b74 in mongo::(anonymous namespace)::runFunc(void*) () #35 0x00007f18e63da6db in start_thread (arg=0x7f18bb27d700) at pthread_create.c:463 #36 0x00007f18e610388f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 eric.sedor commented on Mon, 21 Dec 2020 18:05:57 +0000: Thanks o.boudet@gmail.com, we're passing this report on to an appropriate team to investigate further. bruce.lucas@10gen.com commented on Mon, 21 Dec 2020 16:21:48 +0000: I was able to reproduce this with the following: function repro() { const doc = {x: "x".repeat(1000)} const many = new Array(20000).fill(doc) const session = db.getMongo().startSession() db = session.getDatabase("test") db.createCollection("c") for (var i = 0; i < 10; i++) { try { session.startTransaction({readConcern: {level: "snapshot" }}) db.c.insertMany(many) session.commitTransaction() print("success") } catch (e) { print(e) } } } Removing documents is not necessary to reproduce the problem Interestingly the bulk insert fails on the first try, succeeds on the second, then fails on most successive tries, but succeeds occasionally Does not reproduce as readily if you cut either the size of the string or the number of documents in half, but still occasionally fails; however, those occasional failures are on the insert, not on the commit Neither the serverStatus counter nor the slow query log message show any write conflicts, even though the error returned from the commit is a write conflict error Fails in the same way if the bulk insert is replaced by a loop inserting the same number of documents individually Fails on 4.4.0 and 4.4.1 as well esha.bhargava commented on Mon, 21 Dec 2020 15:04:07 +0000: o.boudet@gmail.com Seems like this is a change in server behavior, so we've moved this to the SERVER project.