...
Since the upgrade from 4.4.4 to 5.0.3, we have observed a steadily increasing number of user assertions in the metrics. On debug level, user assertions with the error message "HostUnreachable: Connection closed by peer" became visible. As a cause for these user assertions we identified our Kubernetes readiness probe. It is successfully executed every 10 and opens a tcp socket on the mongodb port. Further analysis showed that this probe also generates error messages about "HostUnreachable" in 4.4.4, but these messages do not result in user assertions. This is the relevant log excerpt from 4.4.4: --->> {"t":{"$date":"2021-10-12T14:38:47.811+00:00"},"s":"D2", "c":"NETWORK", "id":22986, "ctx":"conn8778","msg":"Session from remote encountered a network error during SourceMessage","attr":{"remote":"172.23.43.233:44014","error":{"code":6,"codeName":"HostUnreachable","errmsg":"Connection closed by peer"}} {"t":{"$date":"2021-10-12T14:38:47.811+00:00"},"s":"I", "c":"NETWORK", "id":22944, "ctx":"conn8778","msg":"Connection ended","attr":{"remote":"172.23.43.233:44014","connectionId":8778,"connectionCount":24}} And this is the excerpt from 5.0.3: {"t":{"$date":"2021-10-12T14:22:34.700+00:00"},"s":"D4", "c":"NETWORK", "id":4898000, "ctx":"listener","msg":"Setting initial ServiceExecutor context for client","attr":{"client":"conn3385","threadingModel":"dedicated","canUseReserved":false}} {"t":{"$date":"2021-10-12T14:22:34.700+00:00"},"s":"D3", "c":"-", "id":5127801, "ctx":"thread3756","msg":"Setting the Client","attr":{"client":"conn3385"}} -->> {"t":{"$date":"2021-10-12T14:22:34.700+00:00"},"s":"D2", "c":"NETWORK", "id":22986, "ctx":"conn3385","msg":"Session from remote encountered a network error during SourceMessage","attr":{"remote":"172.23.38.209:53800","error":{"code":6,"codeName":"HostUnreachable","errmsg":"Connection closed by peer"}}} {"t":{"$date":"2021-10-12T14:22:34.700+00:00"},"s":"D1", "c":"-", "id":23074, "ctx":"conn3385","msg":"User assertion","attr":{"error":"HostUnreachable: Connection closed by peer","file":"src/mongo/transport/service_state_machine.cpp","line":394}} {"t":{"$date":"2021-10-12T14:22:34.700+00:00"},"s":"D1", "c":"-", "id":23074, "ctx":"conn3385","msg":"User assertion","attr":{"error":"HostUnreachable: Connection closed by peer","file":"src/mongo/transport/service_state_machine.cpp","line":536}} {"t":{"$date":"2021-10-12T14:22:34.700+00:00"},"s":"D2", "c":"NETWORK", "id":5763901, "ctx":"conn3385","msg":"Terminating session due to error","attr":{"error":{"code":6,"codeName":"HostUnreachable","errmsg":"Connection closed by peer"}}} {"t":{"$date":"2021-10-12T14:22:34.700+00:00"},"s":"D2", "c":"NETWORK", "id":5127900, "ctx":"conn3385","msg":"Ending session","attr":{"error":{"code":6,"codeName":"HostUnreachable","errmsg":"Connection closed by peer"}}} {"t":{"$date":"2021-10-12T14:22:34.700+00:00"},"s":"D4", "c":"NETWORK", "id":4898001, "ctx":"conn3385","msg":"Resetting ServiceExecutor context for client","attr":{"client":"conn3385","threadingModel":"dedicated","canUseReserved":false}} {"t":{"$date":"2021-10-12T14:22:34.700+00:00"},"s":"D4", "c":"NETWORK", "id":4898001, "ctx":"conn3385","msg":"Resetting ServiceExecutor context for client","attr":{"client":"conn3385","threadingModel":"dedicated","canUseReserved":false}} {"t":{"$date":"2021-10-12T14:22:34.700+00:00"},"s":"I", "c":"NETWORK", "id":22944, "ctx":"conn3385","msg":"Connection ended","attr":{"remote":"172.23.38.209:53800","uuid":"3402de9f-26f6-4a35-aff3-6806a83695c1","connectionId":3385,"connectionCount":30}} {"t":{"$date":"2021-10-12T14:22:34.700+00:00"},"s":"D3", "c":"-", "id":5127803, "ctx":"conn3385","msg":"Released the Client","attr":{"client":"conn3385"}} {"t":{"$date":"2021-10-12T14:22:34.700+00:00"},"s":"D3", "c":"-", "id":5127803, "ctx":"conn3385","msg":"Released the Client","attr":{"client":"conn3385"}} Please note how the marked lines appear in both versions but only in 5.0.3 there are the two additional user assertions. As an attempt to workaround the problem we changed the tcp socket probe to the following command: mongo -u user -p password --authenticationDatabase admin --quiet --eval 'db.adminCommand("ping").ok' However, the user assertions persisted. Our questions: Why do both commands each return successfully but still generate a DEBUG message indicating an error? Why is the DEBUG message followed by a user assertion in version 5.0.3?
JIRAUSER1259347 commented on Mon, 27 Nov 2023 07:35:55 +0000: Same issue on 5.0.22 JIRAUSER1275472 commented on Fri, 13 Oct 2023 22:41:38 +0000: Experiencing a similiar behaviour on MongoDB 6.0.5 (TLS enabled server) The below user assert error appears on a connection close via PyMongo, Go Client and Mongo shell. {"t":{"$date":"2023-10-13T23:13:51.279+01:00"},"s":"D2", "c":"NETWORK", "id":22986, "ctx":"conn565","msg":"Session from remote encountered a network error during SourceMessage","attr":{"remote":"10.77.154.173:41086","error":{"code":9001,"codeName":"SocketException","errmsg":"stream truncated"}}} {"t":{"$date":"2023-10-13T23:13:51.279+01:00"},"s":"D1", "c":"ASSERT", "id":23074, "ctx":"conn565","msg":"User assertion","attr":{"error":"SocketException: stream truncated","file":"src/mongo/transport/service_state_machine.cpp","line":299}} This makes the monitoring of user assert errors useless. Even if I turn off the log level and enable quite mode the counter keeps raising on a simple connection close command with the above output. JIRAUSER1264757 commented on Wed, 9 Feb 2022 17:31:06 +0000: Hi, can confirm for version 5.0.6, using mongo-ruby-driver v2.16.1 as well as PyMongo 3.12.3. For Python clients I can confirm successfully query processing on client side without any assertions or connection issues. {"t":{"$date":"2022-02-09T16:50:03.735+00:00"},"s":"D4", "c":"NETWORK", "id":4898000, "ctx":"listener","msg":"Setting initial ServiceExecutor context for client","attr":{"client":"conn100897","threadingModel":"dedicated","canUseReserved":false}} {"t":{"$date":"2022-02-09T16:50:03.736+00:00"},"s":"D3", "c":"-", "id":5127801, "ctx":"thread101909","msg":"Setting the Client","attr":{"client":"conn100897"}} {"t":{"$date":"2022-02-09T16:50:03.736+00:00"},"s":"D2", "c":"COMMAND", "id":21965, "ctx":"conn100897","msg":"About to run the command","attr":{"db":"admin","commandArgs":{"hello":1,"topologyVersion":{"processId":{"$oid":"61fd4c4120b8dce937fc9bbe"},"counter":6},"maxAwaitTimeMS":10000,"$db":"admin"}}} {"t":{"$date":"2022-02-09T16:50:03.736+00:00"},"s":"D3", "c":"FTDC", "id":23904, "ctx":"conn100897","msg":"Using maxAwaitTimeMS for awaitable hello protocol","attr":{"maxAwaitTimeMS":10000}} {"t":{"$date":"2022-02-09T16:50:03.736+00:00"},"s":"D1", "c":"REPL", "id":21342, "ctx":"conn100897","msg":"Waiting for a hello response from a topology change or until deadline","attr":{"deadline":{"$date":"2022-02-09T16:50:13.736Z"},"currentTopologyVersionCounter":6}} {"t":{"$date":"2022-02-09T16:50:04.125+00:00"},"s":"I", "c":"-", "id":20883, "ctx":"conn100897","msg":"Interrupted operation as its client disconnected","attr":{"opId":11445983}} {"t":{"$date":"2022-02-09T16:50:04.126+00:00"},"s":"D3", "c":"-", "id":4892201, "ctx":"conn100897","msg":"Internal assertion","attr":{"error":{"code":279,"codeName":"ClientDisconnect","errmsg":"operation was interrupted"},"location":"{fileName:\"src/mongo/util/interruptible.h\", line:395, functionName:\"operator()\"}"}} {"t":{"$date":"2022-02-09T16:50:04.127+00:00"},"s":"D1", "c":"-", "id":23074, "ctx":"conn100897","msg":"User assertion","attr":{"error":"ClientDisconnect: operation was interrupted","file":"src/mongo/db/repl/replication_coordinator_impl.cpp","line":2339}} {"t":{"$date":"2022-02-09T16:50:04.127+00:00"},"s":"D1", "c":"-", "id":23074, "ctx":"conn100897","msg":"User assertion","attr":{"error":"ClientDisconnect: operation was interrupted","file":"src/mongo/util/future_impl.h","line":1104}} {"t":{"$date":"2022-02-09T16:50:04.127+00:00"},"s":"D1", "c":"-", "id":23074, "ctx":"conn100897","msg":"User assertion","attr":{"error":"ClientDisconnect: operation was interrupted","file":"src/mongo/util/future_impl.h","line":1087}} {"t":{"$date":"2022-02-09T16:50:04.127+00:00"},"s":"D1", "c":"-", "id":23074, "ctx":"conn100897","msg":"User assertion","attr":{"error":"ClientDisconnect: operation was interrupted","file":"src/mongo/util/future_impl.h","line":1104}} {"t":{"$date":"2022-02-09T16:50:04.127+00:00"},"s":"D1", "c":"-", "id":23074, "ctx":"conn100897","msg":"User assertion","attr":{"error":"ClientDisconnect: operation was interrupted","file":"src/mongo/util/future_impl.h","line":1087}} {"t":{"$date":"2022-02-09T16:50:04.127+00:00"},"s":"D1", "c":"COMMAND", "id":21962, "ctx":"conn100897","msg":"Assertion while executing command","attr":{"command":"hello","db":"admin","commandArgs":{"hello":1,"topologyVersion":{"processId":{"$oid":"61fd4c4120b8dce937fc9bbe"},"counter":6},"maxAwaitTimeMS":10000,"$db":"admin"},"error":"ClientDisconnect: operation was interrupted"}} {"t":{"$date":"2022-02-09T16:50:04.127+00:00"},"s":"I", "c":"COMMAND", "id":51803, "ctx":"conn100897","msg":"Slow query","attr":{"type":"command","ns":"admin.$cmd","command":{"hello":1,"topologyVersion":{"processId":{"$oid":"61fd4c4120b8dce937fc9bbe"},"counter":6},"maxAwaitTimeMS":10000,"$db":"admin"},"numYields":0,"ok":0,"errMsg":"operation was interrupted","errName":"ClientDisconnect","errCode":279,"reslen":242,"locks":{},"remote":"192.168.63.148:36478","protocol":"op_msg","durationMillis":0}} {"t":{"$date":"2022-02-09T16:50:04.127+00:00"},"s":"D2", "c":"QUERY", "id":22783, "ctx":"conn100897","msg":"Received interrupt request for unknown op","attr":{"opId":11445983,"knownOps":[]}} {"t":{"$date":"2022-02-09T16:50:04.128+00:00"},"s":"D3", "c":"-", "id":5127803, "ctx":"conn100897","msg":"Released the Client","attr":{"client":"conn100897"}} {"t":{"$date":"2022-02-09T16:50:04.128+00:00"},"s":"D3", "c":"-", "id":5127801, "ctx":"conn100897","msg":"Setting the Client","attr":{"client":"conn100897"}} {"t":{"$date":"2022-02-09T16:50:04.128+00:00"},"s":"D2", "c":"NETWORK", "id":22986, "ctx":"conn100897","msg":"Session from remote encountered a network error during SourceMessage","attr":{"remote":"192.168.63.148:36478","error":{"code":6,"codeName":"HostUnreachable","errmsg":"Connection closed by peer"}}} {"t":{"$date":"2022-02-09T16:50:04.128+00:00"},"s":"D1", "c":"-", "id":23074, "ctx":"conn100897","msg":"User assertion","attr":{"error":"HostUnreachable: Connection closed by peer","file":"src/mongo/transport/service_state_machine.cpp","line":394}} {"t":{"$date":"2022-02-09T16:50:04.128+00:00"},"s":"D1", "c":"-", "id":23074, "ctx":"conn100897","msg":"User assertion","attr":{"error":"HostUnreachable: Connection closed by peer","file":"src/mongo/transport/service_state_machine.cpp","line":536}} {"t":{"$date":"2022-02-09T16:50:04.128+00:00"},"s":"D2", "c":"NETWORK", "id":5763901, "ctx":"conn100897","msg":"Terminating session due to error","attr":{"error":{"code":6,"codeName":"HostUnreachable","errmsg":"Connection closed by peer"}}} {"t":{"$date":"2022-02-09T16:50:04.128+00:00"},"s":"D2", "c":"NETWORK", "id":5127900, "ctx":"conn100897","msg":"Ending session","attr":{"error":{"code":6,"codeName":"HostUnreachable","errmsg":"Connection closed by peer"}}} {"t":{"$date":"2022-02-09T16:50:04.128+00:00"},"s":"D4", "c":"NETWORK", "id":4898001, "ctx":"conn100897","msg":"Resetting ServiceExecutor context for client","attr":{"client":"conn100897","threadingModel":"dedicated","canUseReserved":false}} {"t":{"$date":"2022-02-09T16:50:04.128+00:00"},"s":"I", "c":"NETWORK", "id":22944, "ctx":"conn100897","msg":"Connection ended","attr":{"remote":"192.168.63.148:36478","uuid":"0803cc6e-3f12-41c9-a9a5-204af0bfc1de","connectionId":100897,"connectionCount":97}} {"t":{"$date":"2022-02-09T16:50:04.128+00:00"},"s":"D3", "c":"-", "id":5127803, "ctx":"conn100897","msg":"Released the Client","attr":{"client":"conn100897"}} UPDATE: Also no problems on client side for ruby clients. There isn't even a correlation between client load and user assertions within MongoDB. grossmueller@e-spirit.com commented on Mon, 7 Feb 2022 11:36:23 +0000: Hi team, this issue makes the metric "db.serverStatus().asserts.user" entirely useless as it keeps rising for no reason at all. Your documentation states that this metric is supposed to show "errors that user may generate, such as out of disk space or duplicate key" and that we "can prevent these assertions by fixing a problem with your application or deployment." As long as a "ping" command (usually used for healthchecks) lets the value increase, genuine problems may get masked Is there any chance to see some progress on this in the near future? JIRAUSER1257066 commented on Mon, 25 Oct 2021 17:33:38 +0000: Hi grossmueller@e-spirit.com, Thanks for your patience as we investigate this issue. I've been able to reproduce this behavior and can see clearly that there are additional user asserts in 5.0.3 compared with 4.4.10. I performed the following steps to reproduce this issue on each version: mlaunch init --single --auth mongo -u user -p password --authenticationDatabase admin --quiet --eval 'db.setLogLevel(5);db.adminCommand("ping").ok' mongo -u user -p password --authenticationDatabase admin --quiet --eval 'db.getSiblingDB("admin").shutdownServer()' I've attached mongod.4.4.10.log and mongod.5.0.3.log which show isolated log lines from the ping command used above. Looking at the FTDC, we see 3 uasserts in 5.0.3, up from 1 uassert in 4.4.10. I'll assign this ticket to the appropriate team for further investigation. Best, Edwin grossmueller@e-spirit.com commented on Thu, 14 Oct 2021 07:22:28 +0000: @dmitry.agranat@mongodb.com Yes, the user assertions also increase when the default log verbosity of 0 is used. The number of user assertions can be retrieved from the metric "db.serverStatus().asserts.user" - so this is really the place to watch it rising (independently of the respective log message). Since the plain number does not give any insight about reason for the assertion, I increased the log verbosity level to find that out as described in your documentation https://docs.mongodb.com/manual/reference/command/serverStatus/#mongodb-serverstatus-serverstatus.asserts. dmitry.agranat commented on Wed, 13 Oct 2021 13:08:46 +0000: grossmueller@e-spirit.com, do you see the same increase in user assertions when using the default log verbosity level? Based on the logs, you are currently using log debug levels D2, D3 and D4.
Set the log verbosity to 5 and execute the command: mongo -u user -p password --authenticationDatabase admin --quiet --eval 'db.adminCommand("ping").ok' In the logs you will find user assertions relating to HostUnreachable even thought the command return successfully.