...
If an index build on a hashed index is concurrent with any collection writes, the index may be incorrectly marked multikey. As a result, this index can not be used as the shard key on a sharded collection. Original title: shardCollection fails with "couldn't find valid index for shard key" despite index existing Original description: I'm attempting to shard a number of existing collections. They were all created and populated in a standalone mongod which has now been converted to a sharded cluster. Some of the collections have sharded successfully, others have failed with the above error. A possible cause of this was that the indexes were in the process of being created (with the background: true option) when the shardCollection command was first run, due to the script used not waiting for the background creation to complete.
carl.champain commented on Tue, 16 Jun 2020 18:31:23 +0000: Hi gavin.aiken@netcuras.com, To investigate a specific issue as a bug we would want to understand in detail what has happened: You mentioned that the log files have rotated, so do you think you could reproduce the behavior and share the full mongod.log and mongos.log files Please share the output of sh.status() Please connect directly to each shard and share the outputs of getIndexes() We've created a secure upload portal for you. Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time. Kind regards, Carl gavin.aiken@netcuras.com commented on Wed, 10 Jun 2020 20:23:38 +0000: Looks like I can't upload the full log files. I don't have the mongod.log for the time when I first ran the sharding command, the log files have already been rotated and removed, so that's impossible. I do have the full mongos.log but even gzipped it is too big to upload. I have removed some lines from it which were obviously irrelevant, particularly the ones for connections opening and closing such as these below, and uploaded the rest as mongos.log.gz: 2020-06-10T12:57:33.694-0600 I ACCESS [conn7837] Successfully authenticated as principal monitor on admin from client 172.28.16.20:60013 2020-06-10T12:57:33.738-0600 I NETWORK [conn7837] end connection 172.28.16.20:60013 (7 connections now open) 2020-06-10T12:57:34.686-0600 I NETWORK [listener] connection accepted from 172.28.16.20:60025 #7838 (8 connections now open) 2020-06-10T12:57:34.686-0600 I NETWORK [conn7838] received client metadata from 172.28.16.20:60025 conn7838: { driver:{ name: "nodejs", version: "3.5.7" }, os: \{ type: "Linux", name: "linux", architecture: "x64", version: "3.10.0-327.3.1.el7.x86_64" }, platform: "'Node.js v10.13.0, LE (legacy)" } gavin.aiken@netcuras.com commented on Wed, 10 Jun 2020 18:58:24 +0000: I've just added a section of the mongod.log for the primary config server as well, in case that is useful. gavin.aiken@netcuras.com commented on Wed, 10 Jun 2020 18:56:48 +0000: I've attached sections of the log files where I ran the sh.shardCollection command a few seconds after the start of the logs, and then included all the lines for a few minutes afterwards. Let me know if you need complete log files for the last few days instead and I can provide them ASAP. gavin.aiken@netcuras.com commented on Wed, 10 Jun 2020 18:52:23 +0000: Do you want the whole files for the last few days, or just an extract from the files when I run the above commands? I tried watching the logs when I run the above and the only relevant line I see is this: 2020-06-10T12:50:20.352-0600 I SH_REFR [ConfigServerCatalogCacheLoader-3273] Refresh for collection metric.diagnosticsmetrics took 1 ms and found the collection is not sharded carl.champain commented on Tue, 9 Jun 2020 17:57:01 +0000: Hi gavin.aiken@netcuras.com, To help us investigate what's happening, can you please provide the mongod.log and mongos.log files covering this behavior? Thank you, Carl gavin.aiken@netcuras.com commented on Sun, 7 Jun 2020 17:08:35 +0000: It is definitely not a typo. The script worked successfully for about half of the collections with the same schema, and failed on others. However, for what it's worth, here is the output from the manual commands showing that it fails: mongos> use metric switched to db metric mongos> db.diagnosticsmetrics.createIndex({ item: 'hashed' }) { "raw" : { "slc-stage-mongo11:27017" : { "numIndexesBefore" : 4, "numIndexesAfter" : 4, "note" : "all indexes already exist", "ok" : 1 } }, "ok" : 1, "operationTime" : Timestamp(1591549268, 1), "$clusterTime" : { "clusterTime" : Timestamp(1591549268, 1), "signature" : { "hash" : BinData(0,"x8qUBOA9PgeFqQUSGBOOUbGn1aA="), "keyId" : NumberLong("6829344041860071455") } } } mongos> sh.shardCollection('metric.diagnosticsmetrics', { item: 'hashed' }) { "ok" : 0, "errmsg" : "couldn't find valid index for shard key", "code" : 96, "codeName" : "OperationFailed", "operationTime" : Timestamp(1591549304, 4), "$clusterTime" : { "clusterTime" : Timestamp(1591549304, 4), "signature" : { "hash" : BinData(0,"Gd/VpV/fzSc0qF+t1TmxbGlVt+E="), "keyId" : NumberLong("6829344041860071455") } } } dmitry.agranat commented on Sun, 7 Jun 2020 10:29:57 +0000: Hi gavin.aiken@netcuras.com, thank you for the report. Could you reproduce this issue manually (w/o using your script) and post the results here? In addition, please post all commands including index creation and sh.shardCollection. I suspect there is indeed some issue with the script or a typo. Thanks, Dima gavin.aiken@netcuras.com commented on Sat, 30 May 2020 12:48:43 +0000: I tried dropping and recreating the hashed index on "item" on the "diagnosticsmetrics" collection, waiting for the index build to complete before trying to shard the collection again, but I still get the same error, "couldn't find valid index for shard key". I have checked the log on the mongos, both mongod, and the config server mongod, and see no relevant errors when I run the command. gavin.aiken@netcuras.com commented on Thu, 28 May 2020 15:56:00 +0000: Sorry, I accidentally created this issue before I had finished filling in all the fields and there doesn't seem to be any way for me to edit it now? Here's an example of a collection which is exhibiting the problem: mongos> use metric switched to db metric mongos> db.diagnosticsmetrics.getIndices() [ { "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "metric.diagnosticsmetrics" }, { "v" : 2, "key" : { "company" : 1, "timeMs" : 1 }, "name" : "company_1_timeMs_1", "ns" : "metric.diagnosticsmetrics", "background" : true }, { "v" : 2, "unique" : true, "key" : { "item" : 1, "timeMs" : 1 }, "name" : "item_1_timeMs_1", "ns" : "metric.diagnosticsmetrics", "background" : true }, { "v" : 2, "key" : { "item" : "hashed" }, "name" : "item_hashed", "ns" : "metric.diagnosticsmetrics", "background" : true } ] mongos> sh.shardCollection('metric.diagnosticsmetrics', { item: 'hashed' }) { "ok" : 0, "errmsg" : "couldn't find valid index for shard key", "code" : 96, "codeName" : "OperationFailed", "operationTime" : Timestamp(1591705370, 5), "$clusterTime" : { "clusterTime" : Timestamp(1591705370, 5), "signature" : { "hash" : BinData(0,"aBxREKsRZ8JLyK5PCYJ05KVWTWw="), "keyId" : NumberLong("6829344041860071455") } } } This is on version 4.2.6 on RedHat Linux.