...
We are running a mongodb v3.6 replSet for several months (or even years) already without any problems so far. However, recently our database client, which is using mongodb java driver v3.11 for a decent time already, got the following error: ERROR 2020-06-07 18:31:11,705 [ttp-nio-8080-exec-13] Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is com.mongodb.MongoQueryException: Query failed with error code 261 and error message 'cannot add session into the cache' on server mongo-opensearch-03.db00.pro06.eu.idealo.com:27017] with root cause [DirectJDKLog.java:175] com.mongodb.MongoQueryException: Query failed with error code 261 and error message 'cannot add session into the cache' on server mongo-opensearch-03.db00.pro06.eu.idealo.com:27017 at com.mongodb.operation.FindOperation$1.call(FindOperation.java:735) It turned out that the default of 1 million logical sessions had been reached. We even could not kill any sesssion: opensearch:PRIMARY> db.runCommand( { killAllSessions: [ ] } ) { "ok" : 0, "errmsg" : "cannot add session into the cache", "code" : 261, "codeName" : "TooManyLogicalSessions" } We had to restart mongod in order to reset all logical sessions. opensearch:SECONDARY> db.aggregate( [ { $listLocalSessions: { allUsers: true } },{$group:{_id:"$user",count:{$sum:1}}} ] ) { "_id" : "admin@admin", "count" : 1 } { "_id" : "mms@admin", "count" : 3 } We then upgraded from mongodb v3.6 to 4.0.18 to see if this solves the issue. It did not. After 12 days, the number of logical sessions is still constantly increasing: opensearch:SECONDARY> db.aggregate( [ { $listLocalSessions: { allUsers: true } },{$group:{_id:"$user",count:{$sum:1}}} ] ) { "_id" : "opensearch@admin", "count" : 115 } { "_id" : "admin@admin", "count" : 2388 } { "_id" : "mms@admin", "count" : 1370 } I'll attach also a screenshot of our dashboard where you see that the number of logical sessions is constantly increasing. The user "opensearch" is our usual database client. The number of logical sessions of 125 is reasonably low. The user "mms" is mongodb's CloudManager agent. It's been updated recently to the newest version. However, the number of logical sessions of 1370 is quite high already. Although we have been running dozens of mongodb v3.6 replSets in production for several years now, this has never happened before. Since everything is rolled out automatically, and therefore uses identical configuration, we cannot explain the error and therefore suspect a mongodb bug.
kay.agahd@idealo.de commented on Wed, 15 Jul 2020 16:51:14 +0000: Hi guys, I found the reason for the constantly increasing number of sessions. I had to remove the following parameter from the config and restart mongod: sharding.clusterRole: shardsvr Since then the number of sessions is almost constant, going little up and down. I think this setting has been wrongly copied & pasted from a shard config. And because it's working for so long time already, nobody thought of misconfiguration. Maybe mongod could print a warning if it's detecting such a misconfiguration as you do already with other detected misconfigurations. So, you can close this ticket. Thanks for your time! kay.agahd@idealo.de commented on Fri, 10 Jul 2020 06:32:20 +0000: Hi Dmitry, no, this deployment never was part of a sharded cluster in the past. Also, the config.system.sessions collection does not exist: MongoDB server version: 3.6.17 opensearch:PRIMARY> use config switched to db config opensearch:PRIMARY> db.system.sessions.stats() { "ns" : "config.system.sessions", "ok" : 0, "errmsg" : "Collection [config.system.sessions] not found." } opensearch:PRIMARY> This is the number of current sessions on the primary: opensearch:PRIMARY> db.aggregate( [ { $listLocalSessions: { allUsers: true } },{$group:{_id:"$user",count:{$sum:1}}} ] ) { "_id" : "mms@admin", "count" : 15125 } { "_id" : "admin@admin", "count" : 8184 } { "_id" : "opensearch@admin", "count" : 264003 } dmitry.agranat commented on Thu, 9 Jul 2020 10:06:37 +0000: Thanks kay.agahd@idealo.de, a couple of follow up questions: Was this deployment a part of a sharded cluster in the past? Does the config.system.sessions collection exist? kay.agahd@idealo.de commented on Wed, 8 Jul 2020 21:39:31 +0000: Hi Dmitry, here are the results from the query you asked me to execute again: opensearch:PRIMARY> db.aggregate( [ { $listLocalSessions: { allUsers: true } },{$group:{_id:"$user",count:{$sum:1}}} ] ) { "_id" : "mms@admin", "count" : 14970 } { "_id" : "admin@admin", "count" : 8115 } { "_id" : "opensearch@admin", "count" : 254256 } opensearch:SECONDARY> db.aggregate( [ { $listLocalSessions: { allUsers: true } },{$group:{_id:"$user",count:{$sum:1}}} ] ) { "_id" : "opensearch@admin", "count" : 230 } { "_id" : "mms@admin", "count" : 3177 } { "_id" : "admin@admin", "count" : 5646 } opensearch:SECONDARY> db.aggregate( [ { $listLocalSessions: { allUsers: true } },{$group:{_id:"$user",count:{$sum:1}}} ] ) { "_id" : "opensearch@admin", "count" : 70 } { "_id" : "admin@admin", "count" : 1577 } { "_id" : "mms@admin", "count" : 3438 } The version of the mms agents has been at the time I wrote this ticket 10.14.6.6374 but has now be updated to the latest version (10.15.2.6472). The link to our project is https://cloud.mongodb.com/v2/4f5f582287d1d86fa8b88186 The link to the replSet this ticket is about is: https://cloud.mongodb.com/v2/4f5f582287d1d86fa8b88186#metrics/replicaSet/5e50561fedfef105da6670ab/overview We verified again our software and couldn't find any unclosed cursors. dmitry.agranat commented on Wed, 8 Jul 2020 08:32:35 +0000: Hi kay.agahd@idealo.de, I am taking over this case from Carl and getting myself familiar with the issue. It is not clear at the moment where the session leak is coming from, driver, server or CM agent. Could you execute again this command on all 3 members and post results here? This will give us another indication (since the last time this command was executed) which user is responsible for the most session creation: db.aggregate( [ { $listLocalSessions: { allUsers: true } },{$group:{_id:"$user",count:{$sum:1}}} ] ) Also, can you check if your application calls either of these methods? We'd like to make sure there is no a situation where either a com.mongodb.DBCursor or a com.mongodb.client.MongoCursor is not closed by the application, e.g: MongoCursor cursor = collection.find().iterator(); while (cursor.hasNext()) { System.out.println(cursor.next()); } // cursor not closed! We know that mongod version is 4.0.18, Java driver version is 3.11, what is the CM agent version? You can post a link to your project so we can have a look. kay.agahd@idealo.de commented on Tue, 7 Jul 2020 15:38:28 +0000: May I ask if you need any additional information for your investigation? kay.agahd@idealo.de commented on Wed, 24 Jun 2020 05:16:30 +0000: The requested log and diagnostic files of the other two replSet members have now also been uploaded. carl.champain commented on Tue, 23 Jun 2020 19:04:53 +0000: Yes, please provide the logs and diagnostic files for the other two members! kay.agahd@idealo.de commented on Tue, 23 Jun 2020 18:55:54 +0000: Hi carl.champain, I've uploaded the requested files. I can send also log and diagnostic files of the two other replSet members if you need them. Thank you! carl.champain commented on Tue, 23 Jun 2020 17:58:43 +0000: Hi kay.agahd@idealo.de, Thank you for the report. Would you please archive (tar or zip) the mongod.log files and the $dbpath/diagnostic.data directory (the contents are described here) covering this behavior and upload them to this support uploader location? Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time. Kind regards, Carl