...
In shard_remote, the maxTimeMS of a query is used as a request timeout. If the time limit is exceeded by the network layer, "NetworkInterfaceExeededTimeLimit" will be returned instead of MaxTimeMSExpired. (e.g. here, here).
shane.harvey commented on Thu, 5 May 2022 18:51:45 +0000: max.hirschhorn@mongodb.com brought to my attention some subtlety that I missed in the previous question. If the client sends maxTimeMS 60 and the server only waits 30 seconds before returning MaxTimeMSExpired, that behavior seems unexpected. In this case it should be fine to return a non-MaxTimeMSExpired error code since the time limit hasn’t actually expired. Ideally MaxTimeMSExpired would only be returned when maxTimeMS has actually expired. Note that the opposite (ie "when maxTimeMS expires the server always returns MaxTimeMSExpired") isn't necessarily true. It's expected that the server may return other error codes. For example the server could be in the process of returning a shutdown error when maxTimeMS expires in which case returning ShutdownInProgress is fine. shane.harvey commented on Fri, 29 Apr 2022 20:08:34 +0000: It would be great for the server to return MaxTimeMSExpired instead of NetworkInterfaceExceededTimeLimit. Clients expect the MaxTimeMSExpired error to be returned when a request exceeds MaxTimeMS. JIRAUSER1263891 commented on Fri, 29 Apr 2022 16:24:38 +0000: Hello shane.harvey@mongodb.com, after discussing this SERVER ticket with both Max and Sam we've identified two scenarios which might concern drivers, they are: if a command sets MaxTimeMS of 60 seconds, the connection pool might return NetworkInterfaceExceededTimeLimit if a command sets MaxTimeMS of 60 seconds, but the connection pool has a refreshTimeout of 30, it might return the same NetworkInterfaceExceededTimeLimit. This exception only tells a connection wasn't available within the requested timeframe. The underlying reason gets hidden away from the requester and we believe it should be re-tried. The easiest approach would be to have it rewritten to MaxTimeMSExpired at either the client or the sharding layer. I hope you may answer in terms of the client expectation, if it would make sense to rewrite one exception into the other, and if so, if that could be done even if the specified timeout was shorter than what the command specified or if the executor should add a retry logic itself. Please let me know if all of that makes sense, JIRAUSER1263891 commented on Tue, 26 Apr 2022 17:33:47 +0000: After discussing this with amirsaman.memaripour@mongodb.com, MaxTimeMSExpired only relates with client operations. executor mongo::executor::ConnectionPool::SpecificPool::getConnection mongo::executor::ConnectionPool::get mongo::executor::NetworkInterfaceTL::startCommand mongo::executor::ThreadPoolTaskExecutor::scheduleRemoteCommandOnAny mongo::executor::ShardingTaskExecutor::scheduleRemoteCommandOnAny mongo::executor::TaskExecutor::scheduleRemoteCommand client mongo::RemoteCommandRetryScheduler::_schedule_inlock mongo::RemoteCommandRetryScheduler::startup mongo::Fetcher::schedule shard/client mongo::ShardRemote::_runExhaustiveCursorCommand Based on the above stack trace I think the exception should go on RemoteCommandRetryScheduler which is a client's concern. JIRAUSER1263891 commented on Tue, 12 Apr 2022 22:48:50 +0000: PR : https://github.com/10gen/mongo/pull/4597 JIRAUSER1261330 commented on Tue, 12 Apr 2022 11:28:34 +0000: I have recently seen BF-24814 that seems to be related to this. It happens on the {taskExecutorPoolSize: 4} build variant. I will temporarily allow the NetworkInterfaceExceededTimeLimit code and leave a TODO for this ticket. JIRAUSER1263891 commented on Mon, 11 Apr 2022 22:47:20 +0000: SERVER-43155 tells a compelling story for BF-20554. The BF which stills occurs for both 5.0 and at the tip of the repository, triggers `deferredStateUpdateFunc`, a lambda inside `src/mongo/executor/connection_pool.cpp` which sweeps through `requests` checking their `Date_t` expiration time and possibly assigning the `NetworkInterfaceExceededTimeLimit` error to its associated promise. The error surfaces while running `jstests/core/txns/many_txns.js` on a `rhel80-small` instance. The test, which runs concurrently with other tests, stresses the host by committing many parallel machines, and then querying for them. Giving the load, query commands may exceed their `maxTimeMS` and may get `NetworkInterfaceExceedTimeLimit` mistakenly assigned as a result. It is worth mentioning this could not be reproduced locally. max.hirschhorn@10gen.com commented on Fri, 8 Apr 2022 04:03:08 +0000: Reopening this ticket because the MaxTimeMSExpired error code (50) is semantically meaningful to drivers (e.g. ExecutionTimeout exception type, behavior of adding label UnknownTransactionCommitResult to commitTransaction with maxTimeMS) and must be used for errors related to using maxTimeMS. See also SERVER-35031. JIRAUSER1262719 commented on Thu, 24 Feb 2022 21:55:19 +0000: We haven’t heard back from you for at least one calendar year, so this issue is being closed. If this is still an issue for you, please provide additional information and we will reopen the ticket. david.storch commented on Mon, 9 Sep 2019 15:45:26 +0000: The query team believes that this should be triaged by the service architecture team.