...
I have a shard cluster with 4000 shard collections. After executing stepdown on one of the shards, many errors will occur when executing transactions: (StaleConfig) Transaction 6ba3e857-e289-4ab1-a63c-c038a18bfc6c:614 was aborted on statement 1 due to: an error from cluster data placement change :: caused by :: Encountered error from xx.xx.xx.xx:xxxx during a transaction :: caused by :: epoch mismatch detected for xx.xx, the collection may have been dropped and recreated find from config server's log: [PeriodicShardedIndexConsistencyChecker] Attempt 0 to check index consistency for millionGroup.g_m_version1 received StaleShardVersion error :: caused by :: StaleConfig{ ns: "millionGroup.g_m_version1", vReceived: Timestamp(1, 3), vReceivedEpoch: ObjectId('6189f321bbcd3f66776bbe8a'), vWanted: Timestamp(0, 0), vWantedEpoch: ObjectId('000000000000000000000000') }: epoch mismatch detected for millionGroup.g_m_version1, the collection may have been dropped and recreated Similarly, after adding a shard to the shard cluster, many errors will occur when executing transactions: (StaleConfig) Transaction 324be44f-a3d4-4ee5-9fc4-9bbff6d53ffe:25 was aborted on statement 0 due to: an error from cluster data placement change :: caused by :: Encountered error from xx.xx.xx.xx:xxxx during a transaction :: caused by :: version mismatch detected for xx.xx For the latter case, `jstests/sharding/transactions_stale_shard_version_errors.js` explains that transaction failure is an expected behavior after chunk migration. And I can solve the above two problems by executing findOne (readpref is PrimaryMode) on each collection before executing the transaction after stepdown or chunk migration. So my questions and suggestions are: 1. Is it an expected behavior that the first transaction executed on each collection is aborted after the stepdown is complete? 2. Why can't the catalog cache (or somethingelse) on the shard be updated in time to ensure that the transaction will not be aborted because of epoch/version mismatch?
max.hirschhorn@10gen.com commented on Tue, 6 Jun 2023 15:31:44 +0000: We haven’t heard back from you for some time, so I'm going to close this ticket. If this is still an issue for you or if there are further questions, then please provide additional information and we will reopen the ticket. max.hirschhorn@10gen.com commented on Mon, 5 Dec 2022 16:00:40 +0000: Hi beatjean1314@gmail.com, It looks like you were already able to determine some of the system behaviors by reading through the server codebase and tests. Nice! I wanted to take a moment and clarify the answers to your initial questions and double check the overall behavior you are seeing. 1. Is it an expected behavior that the first transaction executed on each collection is aborted after the stepdown is complete? Yes, for sharded collections and {$readPreference: {mode: "primary"}} is likely the first statement will trigger a stale shard version (StaleConfig, StaleEpoch, etc. exception). This is something which is true for both operations which happen within a transaction and operations which happen outside a transaction. The difference for operations within a transaction versus outside a transaction is whether mongos can automatically retry. It is straightforward for mongos to automatically and solely retry for operations outside transactions because no work has been done for the operation if it fails with a stale shard version. The shard version check happens before any work has been done for the operation. It is straightforward for mongos to automatically retry for the first statement within a multi-statement transaction because similarly no work has been done for the whole transaction operation already if it fails on the first statement. SERVER-39704 would be an improvement for mongos to automatically retry for non-first statements within a multi-statement transaction. In cases where mongos is not prepared to automatically retry for a transaction, the error response will also include a "TransientTransactionError" label. When using the WithTransaction() API the driver will automatically start a new transaction and (re)run its statements. It is strongly recommended to always use the WithTransaction() driver API over the low-level StartTransaction() API for this reason. Would you clarify for me whether you are seeing the WithTransactionExample() return a non-nil error? Application errors would be unexpected due to the driver being capable of automatically retrying in this scenario. I'd like to make certain I have understood your question to be about the presence of the StaleConfig log messages and not due to an application error. Each member of the replica set shard tracks the sharding metadata for its collections in-memory independently. And so if a shard-versioned request has never been sent to the member of the replica set shard, then it won't have initialized the sharding metadata and will therefore need to first recover it from the config server. 2. Why can't the catalog cache (or somethingelse) on the shard be updated in time to ensure that the transaction will not be aborted because of epoch/version mismatch? Great question and there is some positive news in this direction. SERVER-55412 was an enhancement to the mirrored reads for cache warming feature from MongoDB 4.4 which enables secondaries to have initialized the sharding metadata in-memory already even when the application is using {$readPreference: {mode: "primary"}}. The default sample rate for mirrored reads is 1% so it is still possible for a secondary which steps up to be the new primary to be stale but the window for it being stale is at least narrower. There are other ideas for having secondaries be more proactive the Sharding team is considering to explore. However if the secondary member never becomes the primary and is never targeted for reads either, then there was some wasted work involved and that's part of the tradeoff here. JIRAUSER1269413 commented on Tue, 26 Jul 2022 10:15:44 +0000: I found tickets related to this issue. SERVER-39624 states that if a transaction has a stale version error on some shards, then abort the transaction. And return TransientTransactionError to the client, the WithTransaction method in the driver will automatically retry the entire transaction for TransientTransactionError. But for the core api in the driver, it will not automatically retry, so there is still a problem here. SERVER-39704 looks like it should be dedicated to following up on this issue, but it doesn't seem to have been resolved. JIRAUSER1269413 commented on Mon, 18 Jul 2022 11:04:11 +0000: Hi Chris Kelly, With some testing and code reading, I have some new progress. I found that the problem only occurs on newly started Mongod processes. After the StepDown is completed, if the newly started Mongod process becomes the primary, then the problem will occur. According to the code, the request sent by Mongos to Mongod has the correct shard version, and the newly started Mongod process regards all collections as UNSHARDED, and the two are inconsistent, so an error occurs in the check and the transaction is aborted. The newly started Mongod process treats all collections as unsharded because std::list> _metadata; is empty. JIRAUSER1265262 commented on Fri, 1 Jul 2022 09:44:09 +0000: We haven’t heard back from you for some time, so I’m going to close this ticket. If this is still an issue for you, please provide additional information and we will reopen the ticket. JIRAUSER1265262 commented on Fri, 10 Jun 2022 13:49:08 +0000: Hi beatjean1314@gmail.com, We still need additional information to diagnose the problem. If this is still an issue for you, would you please provide the requested information? Regards, Christopher JIRAUSER1265262 commented on Tue, 17 May 2022 13:00:09 +0000: Hi beatjean1314@gmail.com, Thank you for your patience. To start answering your questions: 1) It doesn't appear that this is normal; in fact, there is an eerily similar error that occurred with the same workaround of doing a single find() in SERVER-28019. This duplicated SERVER-27286 which was eventually attributed to a PHP driver passing in a BatchSize: 0 instead of BatchSize: null in transactions by default. In your case, there's a chance something similar could be going on. A workaround that was used for this behavior before was to simply upgrade the driver you're using. 2) While not exactly the same output, this does seem to be normal behavior according to the docs on what happens to transactions during chunk migration.. However, there is also a chance this could be related to the issue above. In order to get any more information, we'd need more log data to see what's going on. Could you please provide logs covering the following two events: Executing the transaction with findOne() - specify the timestamp Executing the transaction without findOne() - specify the timestamp Would you please archive (tar or zip) the mongod.log files and the $dbpath/diagnostic.data directory (the contents are described here) and upload them to this support uploader location? Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time. Regards, Christopher
Tested on the following versions: 4.2.11-rc1 - 5.0.6 OS version: centos 7 Reproduce code: package main const collNum = 4096 const testDbName = "TestDB" func main() { log.InitLogrus("./transactions.log") mongosUri := "mongodb://xxxx" rs0Primary := "mongodb://xxxx" cli, err := connectToMongos(mongosUri) if err != nil { logrus.Error(err) return } defer cli.Disconnect(context.Background()) for i := 0; i <= collNum; i++ { coll := "test" + strconv.Itoa(i) err = mongoutil.EnableShard(cli, testDbName, coll, bson.M{"name": "hashed"}, false) if err != nil { logrus.Error(err) return } } go doStepDown(rs0Primary, 10*time.Second) doTransactions(cli) } func doStepDown(uri string, delay time.Duration) { time.Sleep(delay) cli, err := connectMongo(context.Background(), uri) if err != nil { logrus.Error(err) return } result := cli.Database("admin").RunCommand(context.Background(), bson.M{"replSetStepDown": 120}) if result.Err() != nil { logrus.Error(result.Err()) return } logrus.Infof("run stepdown command for %s success", uri) } func doTransactions(cli *mongo.Client) { for { WithTransactionExample(cli, testDbName) } } WithTransactionExample function copy from https://docs.mongodb.com/manual/core/transactions/#transactions-api