...
Hello Guys, We have been facing an issue with one of our sharded mongo cluster (community edition). Cluster Mongo Version: 4.4.18 NShards: 3 Data nodes per shard: 4 + 1 arbiter Config nodes: 5 Individual mongos clients for all the service machines (~18) A client which is read heavy is facing the issue of mongos querying data from only 1 shard instead of all 3 shards. The queried collection is a sharded collection. We tried to run flushRouterConfig cmd before the scripts process that collection to make sure it refreshes its cache to recognize its a sharded collection. but its intermittent. For example: Running flushRouterConfig on database level and that particular collection will generate the following log entry: {"t":{"$date":"2023-07-17T00:10:02.074+00:00"},"s":"I", "c":"SH_REFR", "id":24104, "ctx":"ConfigServerCatalogCacheLoader-6957","msg":"Refreshed cached collection","attr":{"namespace":"dbname.pe_20230716","newVersion":{"0":{"$timestamp":{"t":1,"i":2}},"1":{"$oid":"64a399837a590637cc1a9994"}},"oldVersion":"","durationMillis":1}} {"t":{"$date":"2023-07-17T01:30:02.216+00:00"},"s":"I", "c":"SH_REFR", "id":24104, "ctx":"ConfigServerCatalogCacheLoader-6976","msg":"Refreshed cached collection","attr":{"namespace":"dbname.pe_20230716","newVersion":{"0":{"$timestamp":{"t":1,"i":2}},"1":{"$oid":"64a399837a590637cc1a9994"}},"oldVersion":"","durationMillis":5}} However, a log from ctx ConfigServerCatalogCacheLoader saying that that collection is indeed refreshed is expected after these lines which we don't observe most of time (sometime works). For example: {"t":{"$date":"2023-07-18T04:18:36.325+00:00"},"s":"I", "c":"SH_REFR", "id":24104, "ctx":"ConfigServerCatalogCacheLoader-0","msg":"Refreshed cached collection","attr":{"namespace":"dbname.pe_20230730","newVersion":{"0":{"$timestamp":{"t":1,"i":2}},"1":{"$oid":"64b60e8d7a590637cc95899e"}},"oldVersion":"","durationMillis":2}} Our current alternative we resorted to after lots of issues is to restart mongos to force refresh and cache all collections and databases just before beginning of our critical processes. Could you advise on this? Thanks & Regards
stephenpaul2727@gmail.com commented on Wed, 26 Jul 2023 18:59:08 +0000: Hi Team, Any updates from your end on this case? stephenpaul2727@gmail.com commented on Fri, 21 Jul 2023 14:49:03 +0000: I have uploaded the data now. Thanks!! JIRAUSER1265262 commented on Thu, 20 Jul 2023 16:04:04 +0000: Sorry, it doesn't look like we've gotten any files in that bucket. I've made a new link for you here - you can try that one and see if it works better. Alternatively, that data can be uploaded to this ticket if you like. stephenpaul2727@gmail.com commented on Wed, 19 Jul 2023 23:47:31 +0000: As an aside, it looks like the balancer is disabled on wifiPe_20230719 because of 'noBalance': true. This means that MongoDB won't automatically move chunks between shards to balance the distribution of data in this collection. This also would seem to imply you are relying on the hashed shard key to balance across single jumbo chunks instead of relying on chunk migration from the mongos' balancer. Is there a compelling reason you chose to have single chunks and the balancer disabled? This is not a typical recommended setup. These huge jumbo chunks can become a bottleneck, especially if the shard key value occurs with high frequency. The mongos would definitely be interesting around the time period you are referencing. Regarding this, yes, we are using this approach for a while as the chunk splitter (autosplit) creates too many chunks ( These dated collections are set to be expired in week to less than a month and dropped. So we don't want additional overhead of rebalancing these huge collections affecting our cluster performance. There are other collections that are long-term and we do let the balancer take care of those collections. Once we get the diagnostic data, I am going to assign this ticket to the relevant team to gauge whether there are any weird edge case behavior we could expect around this sort of deployment, whether there are any follow-up questions, or whether we want to continue investigating this as a possible bug. I have uploaded the diagnostic data to portal already. Could you confirm from your end? Thanks!! JIRAUSER1265262 commented on Wed, 19 Jul 2023 23:11:58 +0000: stephenpaul2727@gmail.com, can you try to re-upload to that link? I didn't see any files there. Alternatively you can upload them to the ticket directly if you like. I don't have enough compelling evidence to implicate the mongos in anything here yet. nShards can change based on the nature of the query, where the data exists, or whether the shard key was used in a query (and a scatter-gather was performed across all shards). In itself, and without logs, this appears to be expected. If you're seeing more nShards: 1 than usual, this would be expected to be due to: changes in your query patterns (accessing data in the chunk on node 1 more often) changes in the data Your shard key is hashed: This means that each shard should receive approximately the same number of writes, assuming the shard key has a high cardinality (many unique values). This appears to be true from your distribution across shards. If your queries often need to retrieve documents based on the shard key, and the query condition is an equality condition or an exact match (e.g., find({shardKey: value})), then the mongos router can target the query directly to the shard that holds the matching document(s), similar to writes. If that data happens to be on shard 1, that would make explain an increase in writes to shard 1. However, this would seem to be unlikely if you are evenly distributing across each shard (which appears to be true) As an aside, it looks like the balancer is disabled on wifiPe_20230719 because of 'noBalance': true. This means that MongoDB won't automatically move chunks between shards to balance the distribution of data in this collection. This also would seem to imply you are relying on the hashed shard key to balance across single jumbo chunks instead of relying on chunk migration from the mongos' balancer. Is there a compelling reason you chose to have single chunks and the balancer disabled? This is not a typical recommended setup. These huge jumbo chunks can become a bottleneck, especially if the shard key value occurs with high frequency. The mongos would definitely be interesting around the time period you are referencing. Once we get the diagnostic data, I am going to assign this ticket to the relevant team to gauge whether there are any weird edge case behavior we could expect around this sort of deployment, whether there are any follow-up questions, or whether we want to continue investigating this as a possible bug. Christopher stephenpaul2727@gmail.com commented on Wed, 19 Jul 2023 22:15:48 +0000: It sounds like flushRouterConfig is a bit of a distraction here, does it actually help with this behavior normally? Has its behavior suddenly changed? If so, pointing out specific timestamps this occurs would also be helpful. Regarding this, we have slow queries that are logged by mongos and we noticed that wifiPe_2023xxxx dated collection query log is logged with nShards: 1 (instead of 3), this is how it was noticed that the mongos is only querying 1 shard instead of 3. When the flushRouterConfig was ran, Sometimes, it does refresh the collection cache and if we run our script later, nShard will be 3. Sometimes, even with that cmd run, mongos client cache for that collection doesn't refresh. a mongos client restart is solving the problem in this case as of now. When mongos starts, it usually re-caches all the config state of every db, collection i believe stephenpaul2727@gmail.com commented on Wed, 19 Jul 2023 22:06:37 +0000: Thanks for your reply chris.kelly@mongodb.com , I can confirm the shard distribution of the collection is not the cause. We have a specific dated collection that was created by one of our application every day and we ensure its created with chunks equalling the shard count and later when writing all data is distributed across all 3 shards equally. Here is a snippet of some cmds: for example, wifiPe_20230719 is the dated collection created today with 3 chunks in 3 shards respectively which will be populated later. also wifiPe_20230718 is the dated collection created yesterday which was fully populated mongos> use test switched to db test mongos> use config switched to db config mongos> db.collections.findOne({_id: "test.wifiPe_20230719"}) { "_id" : "test.wifiPe_20230719", "lastmodEpoch" : ObjectId("64a78e057a590637ccd200c3"), "lastmod" : ISODate("1970-02-19T17:02:47.298Z"), "dropped" : false, "key" : { "lineId" : "hashed" }, "unique" : false, "uuid" : UUID("997789ed-6736-4079-b6f0-5feee3ecabcd"), "distributionMode" : "sharded", "noBalance" : true } mongos> use test switched to db test mongos> db.wifiPe_20230719.getShardDistribution()Shard rs3 at rs3/test-db-03a:27018,test-db-03b:27018,test-db-03c:27018,test-db-03d:27018 data : 0B docs : 0 chunks : 1 estimated data per chunk : 0B estimated docs per chunk : 0Shard rs2 at rs2/test-db-02a:27018,test-db-02b:27018,test-db-02c:27018,test-db-02d:27018 data : 0B docs : 0 chunks : 1 estimated data per chunk : 0B estimated docs per chunk : 0Shard rs1 at rs1/test-db-01a:27018,test-db-01b:27018,test-db-01c:27018,test-db-01d:27018 data : 0B docs : 0 chunks : 1 estimated data per chunk : 0B estimated docs per chunk : 0Totals data : 0B docs : 0 chunks : 3 Shard rs3 contains 0% data, 0% docs in cluster, avg obj size on shard : 0B Shard rs2 contains 0% data, 0% docs in cluster, avg obj size on shard : 0B Shard rs1 contains 0% data, 0% docs in cluster, avg obj size on shard : 0Bmongos> db.wifiPe_20230718.getShardDistribution()Shard rs2 at rs2/test-db-02a:27018,test-db-02b:27018,test-db-02c:27018,test-db-02d:27018 data : 106.93GiB docs : 2009122 chunks : 1 estimated data per chunk : 106.93GiB estimated docs per chunk : 2009122Shard rs3 at rs3/test-db-03a:27018,test-db-03b:27018,test-db-03c:27018,test-db-03d:27018 data : 107.01GiB docs : 2009791 chunks : 1 estimated data per chunk : 107.01GiB estimated docs per chunk : 2009791Shard rs1 at rs1/test-db-01a:27018,test-db-01b:27018,test-db-01c:27018,test-db-01d:27018 data : 106.85GiB docs : 2007546 chunks : 1 estimated data per chunk : 106.85GiB estimated docs per chunk : 2007546Totals data : 320.8GiB docs : 6026459 chunks : 3 Shard rs2 contains 33.33% data, 33.33% docs in cluster, avg obj size on shard : 55KiB Shard rs3 contains 33.35% data, 33.34% docs in cluster, avg obj size on shard : 55KiB Shard rs1 contains 33.3% data, 33.31% docs in cluster, avg obj size on shard : 55KiBmongos> JIRAUSER1265262 commented on Wed, 19 Jul 2023 21:51:13 +0000: Thanks for the prompt reply stephenpaul2727@gmail.com. I want to preface this on the fact that asymmetric performance like this is likely to be solved first by checking out our documentation or reaching out on the community forums. It sounds like flushRouterConfig is a bit of a distraction here, does it actually help with this behavior normally? Has its behavior suddenly changed? If so, pointing out specific timestamps this occurs would also be helpful. Before proceeding further, please review your sharding implementation to ensure it's being used effectively. Here are some recommendations I would like you to fully consider, and share the results of, before we engage further in an server bug investigation. If a mongos is directing queries to one shard over another, it may be helpful to also have information such as shard distribution. See https://www.mongodb.com/docs/manual/reference/method/db.collection.getShardDistribution/ for an example. It's possible that your shard key choice or the distribution of the data within the key is causing the uneven load. A well-chosen shard key should evenly distribute writes and queries across shards. Check your shard key design and the chunk distribution with the command sh.status(). It is especially significant to confirm whether you are using a monotonically increasing shard key, since a shard key on a value that increases or decreases monotonically is more likely to distribute inserts to a single chunk within the cluster. The root of this problem is often related to data distribution. If data is not evenly distributed across the shards, MongoDB will naturally end up querying the shard with the most data more often. If this is the case, you might want to consider rebalancing your shards. This could involve changing your shard key, splitting your chunks, resharding, or manually balancing your data around your existing shard key more effectively. If your issue persists, and you suspect this is a server bug and not an implementation issue, we can proceed once the results above are provided. Christopher stephenpaul2727@gmail.com commented on Wed, 19 Jul 2023 21:30:05 +0000: Hi chris.kelly@mongodb.com , Thanks for taking the time with this issue. I have uploaded the diagnostics-data tgz file to the upload portal from all data nodes and cfg nodes. For mongod logs, we do log some sensitive queries so i am unsure if i can upload those but will ask internally. Would you be able to do some analysis with the diagnostics info itself? Thanks, JIRAUSER1265262 commented on Wed, 19 Jul 2023 20:51:59 +0000: Hi stephenpaul2727@gmail.com, Thanks for your report. To begin investigating the issue, it would be helpful to have some diagnostic data. I've created a secure upload portal for you. Files uploaded to this portal are hosted on Box, are visible only to MongoDB employees, and are routinely deleted after some time. For each node in the replica set spanning a time period that includes the incident, would you please archive (tar or zip) and upload to that link: the mongod logs the $dbpath/diagnostic.data directory (the contents are described here) Thank you! Christopher