...
System environment System: CentOS release 6.6 Kernel: 2.6.32-504.el6.x86_64 Memory: 256G Disk: SSD-PCIE File System: xfs mongo version: 3.4.10 The config server has three members in my cluster, one of members that is secondary shows some slow queries in logs file. When I executed sh.disableAutoSplit(), there was more slow queries on this secondary's log. And the ops of all members have risen to 2k/s. 2017-12-19T20:14:12.749+0800 I COMMAND [conn4490] command config.settings command: find { find: "settings", filter: { _id: "autosplit" }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1513685647000|1, t: 9 } }, limit: 1, maxTimeMS: 30000 } planSummary: IDHACK keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nretur ned:1 reslen:391 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 4997ms 2017-12-19T20:14:12.749+0800 I COMMAND [conn4585] command config.settings command: find { find: "settings", filter: { _id: "balancer" }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1513685647000|1, t: 9 } }, limit: 1, maxTimeMS: 30000 } planSummary: IDHACK keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nreturn ed:1 reslen:404 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 4997ms 2017-12-19T20:14:12.749+0800 I COMMAND [conn4602] command config.settings command: find { find: "settings", filter: { _id: "chunksize" }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1513685647000|1, t: 9 } }, limit: 1, maxTimeMS: 30000 } planSummary: IDHACK keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nretur ned:1 reslen:392 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 4989ms 2017-12-19T20:14:12.749+0800 I COMMAND [conn4583] command config.settings command: find { find: "settings", filter: { _id: "balancer" }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1513685647000|1, t: 9 } }, limit: 1, maxTimeMS: 30000 } planSummary: IDHACK keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nreturn ed:1 reslen:404 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 4989ms 2017-12-19T20:14:12.749+0800 I COMMAND [conn4580] command config.settings command: find { find: "settings", filter: { _id: "chunksize" }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1513685647000|1, t: 9 } }, limit: 1, maxTimeMS: 30000 } planSummary: IDHACK keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nretur ned:1 reslen:392 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 4994ms 2017-12-19T20:14:12.749+0800 I COMMAND [conn4545] command config.settings command: find { find: "settings", filter: { _id: "autosplit" }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1513685647000|1, t: 9 } }, limit: 1, maxTimeMS: 30000 } planSummary: IDHACK keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nretur ned:1 reslen:391 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 4997ms 2017-12-19T20:14:12.749+0800 I COMMAND [conn4600] command config.settings command: find { find: "settings", filter: { _id: "balancer" }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1513685647000|1, t: 9 } }, limit: 1, maxTimeMS: 30000 } planSummary: IDHACK keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nreturn ed:1 reslen:404 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 4963ms 2017-12-19T20:14:12.749+0800 I COMMAND [conn4451] command config.settings command: find { find: "settings", filter: { _id: "balancer" }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1513685647000|1, t: 9 } }, limit: 1, maxTimeMS: 30000 } planSummary: IDHACK keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nreturn ed:1 reslen:404 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 4988ms 2017-12-19T20:14:12.749+0800 I COMMAND [conn4572] command config.settings command: find { find: "settings", filter: { _id: "chunksize" }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1513685647000|1, t: 9 } }, limit: 1, maxTimeMS: 30000 } planSummary: IDHACK keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nretur ned:1 reslen:392 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 4976ms 2017-12-19T20:14:12.749+0800 I COMMAND [conn4387] command config.settings command: find { find: "settings", filter: { _id: "balancer" }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1513685647000|1, t: 9 } }, limit: 1, maxTimeMS: 30000 } planSummary: IDHACK keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nreturn ed:1 reslen:404 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 4908ms 2017-12-19T20:14:12.749+0800 I COMMAND [conn4578] command config.settings command: find { find: "settings", filter: { _id: "chunksize" }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1513685647000|1, t: 9 } }, limit: 1, maxTimeMS: 30000 } planSummary: IDHACK keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nretur ned:1 reslen:392 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 4974ms 2017-12-19T20:14:12.749+0800 I COMMAND [conn4607] command config.settings command: find { find: "settings", filter: { _id: "autosplit" }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1513685647000|1, t: 9 } }, limit: 1, maxTimeMS: 30000 } planSummary: IDHACK keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nretur ned:1 reslen:391 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 4990ms
dmitry.agranat commented on Sun, 4 Mar 2018 11:59:21 +0000: Hi ufozhangliang@163.com, Thank you for providing the requested information. As suspected, when there is a lag, the reported query is slow in the config server. This is expected as explained in my previous comment. Looking at the two such occurrences: The times presented above are in UTC A and B represent the start of the slow query which correlates with the lag: 2018-03-02T15:38:23.554+0800 I COMMAND [conn405762] command config.settings command: find { find: "settings", filter: { _id: "balancer" }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1519976298000|5, t: 9 } }, limit: 1, maxTimeMS: 30000 } planSummary: IDHACK keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nreturned:1 reslen:404 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 4998ms 2018-03-02T15:45:23.509+0800 I COMMAND [conn405848] command config.settings command: find { find: "settings", filter: { _id: "chunksize" }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1519976718000|4, t: 9 } }, limit: 1, maxTimeMS: 30000 } planSummary: IDHACK keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nreturned:1 reslen:392 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 4998ms I am going to close this ticket as "works as designed". However, I'd like to mention one abnormality I've noticed: Server BJ-DB-107 is 3.2.13 Server ZW-10-0-1-76 is 3.4.10 Server BJ-DB-109 is 3.4.10 The practice of mixing different major MongoDB versions in a replica set is not recommended (except for the time of the upgrade). Perhaps, by fixing this misconfiguration, you might fix the issue with the periodic lag in your config servers replica set. Thanks, Dima ufozhangliang@163.com commented on Fri, 2 Mar 2018 09:46:31 +0000: files have been uploaded . dmitry.agranat commented on Wed, 28 Feb 2018 13:53:17 +0000: Hi ufozhangliang@163.com, Thank you for providing the requested information. I'd like to validate that there is no replication lag between your config servers. Please note that the read is doing afterOptime in combination with readConcern: majority so the query will block, holding no locks, until a majority of members have reached the specified optime. Please archive (tar or zip) the $dbpath/diagnostic.data directory from all the config servers covering the time when the query is very slow on the config server and upload it to this ticket. A fresh set of config server logs where the query is very slow. Thanks, Dima ufozhangliang@163.com commented on Fri, 23 Feb 2018 09:56:27 +0000: Sorry, the reply is delayed. Uploaded log of all nodes.All nodes have this problem. thomas.schubert commented on Fri, 16 Feb 2018 22:39:41 +0000: Hi ufozhangliang@163.com, We still need additional information to diagnose the problem. If this is still an issue for you, would you please address the questions in Mark's most recent comment? Thank you, Kelsey mark.agarunov commented on Tue, 23 Jan 2018 21:54:25 +0000: Hello ufozhangliang@163.com, Thank you for the additional information and my apologies for the delay in replying. Unfortunately we have not yet been able to determine the cause of this behavior. If possible, could you provide the complete logs from all nodes in the cluster? Additionally, is this behavior causing adverse issues such as performance regressions? Do all the nodes experience the same issue or only this specific node? Thanks, Mark ufozhangliang@163.com commented on Wed, 27 Dec 2017 09:20:51 +0000: if re-enable autosplit, this slow query will also have less than disabled, but there will also be. when re-enabled autosplit, ops will be reduced a lot,but there show less slow query in logs. mark.agarunov commented on Tue, 26 Dec 2017 19:35:38 +0000: Hello ufozhangliang@163.com, Thank you for providing this information. After looking over the logs and diagnostic data, it seems that the slow queries are correlated with saslStart and saslContinue commands: This leads me to suspect that the slow queries may be due to a slow response from the sasl authentication. Was this behavior present in any capacity before disabling autosplitting, and does it continue if autosplit is re enabled? Thanks Mark mark.agarunov commented on Thu, 21 Dec 2017 18:32:29 +0000: Hello ufozhangliang@163.com, Thank you for the report. To get a better idea of what may be causing this behavior, could you please provide the following: The complete log files from all affected node Please archive (tar or zip) the $dbpath/diagnostic.data directory and upload it to this ticket. This should give some insight into this issue. Thanks, Mark ufozhangliang@163.com commented on Wed, 20 Dec 2017 11:57:18 +0000: config server queries take about 2s, and sometimes it would block our client query.