...
The TemporarilyUnavailable error indicates that the operation has been aborted, likely due to excessive server load (e.g. transaction rolled back for eviction). This error is retried in the server with an increasingly larger backoff. Internal operations are retried indefinitely, user operations are retried up to a fixed number of attempts before returning TemporarilyUnavailable to the client. ------ Original title: Instead of WriteConflict, return a more specialized error when oldest transactions are rolled back for eviction Original description: Currently, when a write operation is hitting the wt dirty threshold limit, we take the error from WiredTiger, a WT_ROLLBACK, and up-convert to a WriteConflict. This is misleading and should print something more specific instead. Something that would indicate the actual reason.
xgen-internal-githook commented on Tue, 15 Feb 2022 23:53:52 +0000: Author: {'name': 'Josef Ahmad', 'email': 'josef.ahmad@mongodb.com', 'username': 'josefahmad'} Message: SERVER-60839 Add TemporarilyUnavailable error Introduce a TemporarilyUnavailable error and exception type for load shedding. This error indicates that the operation has been aborted, likely due to excessive server load. Errors are retried with an increasingly larger backoff. Internal operations are retried indefinitely, user operations up to a fixed number of attempts. Branch: master https://github.com/mongodb/mongo/commit/581c58c475a872e25b2e3bf7cf5ccd52425ef7c7 louis.williams commented on Mon, 7 Feb 2022 09:13:21 +0000: kevin.jernigan, there are 2 cases to consider: Tests that use multi-document transactions handle WriteConflictExceptions as a TransientTransactionError and retry indefinitely. This is what we tell users to do, and in fact, newer drivers do this automatically for users. For non-multi-document transactions, this error is currently being retried indefinitely inside the server. The proposed behavior is to retry a finite number of times before eventually letting it escape. The problem here is that our multi-document transactions tests were designed to handle this type of error, but the rest of our tests (i.e. most of them) are not. JIRAUSER1258778 commented on Fri, 4 Feb 2022 17:52:37 +0000: When this condition happens today, i.e. when a write operation hits the Wired Tiger dirty threshold limit, we convert to a WriteConflict. How do we handle this in our test infrastructure - don't we fail entire tests for commands that aren't retryable? If so, then what changes if we return a more specialized error for this condition - won't the same tests fail that would fail without the changes in this ticket? xgen-internal-githook commented on Wed, 2 Feb 2022 13:56:40 +0000: Author: {'name': 'Josef Ahmad', 'email': 'josef.ahmad@mongodb.com', 'username': 'josefahmad'} Message: SERVER-60839 Make wtRCToStatus require a WT_SESSION pointer This is groundwork for further differentiating WT return codes. Branch: master https://github.com/mongodb/mongo/commit/f4aaa34d623e7385b2ac5b332ee07ece1f22c428 xgen-internal-githook commented on Wed, 2 Feb 2022 13:56:38 +0000: Author: {'name': 'Josef Ahmad', 'email': 'josef.ahmad@mongodb.com', 'username': 'josefahmad'} Message: SERVER-60839 Make wtRCToStatus require a WT_SESSION pointer Branch: master https://github.com/10gen/mongo-enterprise-modules/commit/7cfa78a4e20eb59c4d592bb12b6493c451b8dd13 milkie commented on Fri, 14 Jan 2022 20:38:28 +0000: Thanks for the clarifications; I modified the title of this ticket for better specificity. Should we close SERVER-61454 as a duplicate? louis.williams commented on Fri, 14 Jan 2022 09:06:25 +0000: milkie, after discussing with keith.smith, he confirmed that there is only one scenario for a transaction being rolled-back due to pinning cache space, and that is the "oldest pinned transaction ID rolled back for eviction". The "synchronous" case you described is just a generalization of the asynchronous case. When a very large transaction pins cache space and is unable to evict pages, WiredTiger will start to roll-back transactions, starting from the oldest, until it gets to the large one. So these two cases that you described are not distinguishable from WiredTiger's perspective. milkie commented on Thu, 13 Jan 2022 13:26:26 +0000: It sounds like this ticket is starting to overlap with SERVER-61454. There are actually two similar cases for transaction rollback; one is asynchronous via other threads performing eviction and is based on transaction id age, and one I believe is synchronous within the transaction thread itself once that transaction pins too many pages with uncommitted writes, regardless of transaction age. I was assuming this ticket SERVER-60839 was dealing with the latter situation. In any event, I think we should treat these two cases differently with respect to retry logic. louis.williams commented on Thu, 13 Jan 2022 10:17:16 +0000: We should consider retrying internally once or twice in the existing writeConflictRetry path before ultimately letting this error escape. Additionally, we considering labeling this error code as retryable so that drivers can retry once on their end. We won't be able to let this error escape internal threads. We can only let the error escape for user-originating operations. louis.williams commented on Wed, 5 Jan 2022 09:29:09 +0000: Using the work from WT-8290, we can now call WT_SESSION::get_rollback_reason after receiving a WT_ROLLBACK. If the reason is "oldest pinned transaction ID rolled back for eviction", we will return an error code indicating that the operation exceeded a memory limit. Perhaps the existing ExceededMemoryLimit would be a good error code to use.