...
This ticket is an extension of SERVER-62272 and uses the same reproduction code. Once the sharding has been enabled for a collection that contains schema validation failures if no chunks have moved from the donor shard to a recipient shard, the recipient shard's collection info cannot be modified using collMod. For example, given a 2 shard sharded cluster: MongoDB Enterprise mongos> sh.status() --- Sharding Status --- sharding version: { "_id" : 1, "minCompatibleVersion" : 5, "currentVersion" : 6, "clusterId" : ObjectId("61ccb31408be338b6e44a5c0") } shards: { "_id" : "shard01", "host" : "shard01/localhost:27018,localhost:27019,localhost:27020", "state" : 1 } { "_id" : "shard02", "host" : "shard02/localhost:27021,localhost:27022,localhost:27023", "state" : 1 } active mongoses: "4.4.10" : 1 autosplit: Currently enabled: yes balancer: Currently enabled: yes Currently running: no Failed balancer rounds in last 5 attempts: 0 Migration Results for the last 24 hours: 18 : Failed with error 'aborted', from shard01 to shard02 databases: { "_id" : "config", "primary" : "config", "partitioned" : true } { "_id" : "test", "primary" : "shard01", "partitioned" : true, "version" : { "uuid" : UUID("776a14de-913b-4ed4-a1c1-6533fbbf1ab7"), "lastMod" : 1 } } test.c3 shard key: { "_id" : 1 } unique: false balancing: true chunks: shard01 17 If we issue a collMod while connected to the cluster via a mongos the primary shard will be updated, but any other shards that had this collection created via the shardCollection command will not. For example, if we send the following command we would expect the validationLevel to be updated to off from the default of strict: MongoDB Enterprise mongos> db.getSiblingDB("test").runCommand({ ... collMod: "c3", ... validator: { $jsonSchema: { ... bsonType: "object", ... properties: { ... i: { ... bsonType: "string", ... description: "must be a string" ... }, ... } ... } } , ... "validationLevel" : "off" ... }) { "raw" : { "shard01/localhost:27018,localhost:27019,localhost:27020" : { "ok" : 1 } }, "ok" : 1, "operationTime" : Timestamp(1640805401, 1), "$clusterTime" : { "clusterTime" : Timestamp(1640805401, 1), "signature" : { "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="), "keyId" : NumberLong(0) } } } If we connect to each shard individually now and run db.getCollectionInfos() the result is that shard01 in our cluster has the updated collection metadata whereas shard02 doesn't: MongoDB Enterprise mongos> var db1 = new Mongo(db.getSiblingDB("config").shards.findOne({ _id: "shard01" }).host) MongoDB Enterprise mongos> db1.getDB("test").getCollectionInfos() [ { "name" : "c3", "type" : "collection", "options" : { "validator" : { "$jsonSchema" : { "bsonType" : "object", "properties" : { "i" : { "bsonType" : "string", "description" : "must be a string" } } } }, "validationLevel" : "off", "validationAction" : "error" }, "info" : { "readOnly" : false, "uuid" : UUID("2669f807-557f-4cb9-8d21-3ca5d1bc744f") }, "idIndex" : { "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_" } } ] MongoDB Enterprise mongos> var db2 = new Mongo(db.getSiblingDB("config").shards.findOne({ _id: "shard02" }).host) MongoDB Enterprise mongos> db2.getDB("test").getCollectionInfos() [ { "name" : "c3", "type" : "collection", "options" : { "validator" : { "$jsonSchema" : { "bsonType" : "object", "properties" : { "i" : { "bsonType" : "string", "description" : "must be a string" } } } }, "validationLevel" : "strict", "validationAction" : "error" }, "info" : { "readOnly" : false, "uuid" : UUID("2669f807-557f-4cb9-8d21-3ca5d1bc744f") }, "idIndex" : { "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_" } } ]
jack.mulrow commented on Fri, 22 Jul 2022 20:24:54 +0000: I didn't finish this before my sabbatical, so I'm moving it to the backlog. That said, I have a couple things to add: I believe the history here is that relying on the shard version protocol avoids the complexity needed to swallow NamespaceNotFound errors if the shard doesn't own any chunks for the sharded collection. That's partially true, but the main motivation was that when we didn't use the shard version protocol index operations had the same problem as multi-writes where they can apply 0, 1, or more times on a particular shard with an unlucky interleaving of chunk migrations despite returning ok:1 to the user. Using the shard version protocol was meant to avoid that. For point 1), I agree an option is to revert the collMod related changes from SERVER-45103, but I want to note that I don't believe we need to revert the create/dropIndexes related changes, and that if we did that would reintroduce the problem I mentioned above. For 2), we solved a similar issue for create/dropIndexes by having a shard receiving its first chunk drop any indexes it has locally that are not on the donor shard in SERVER-44914. We could likely extend that logic so a shard receiving its first chunk will also drop the chunk's collection if it exists locally so it will sync it from the donor with the latest options. As for 3), someone should verify this, but to handle create/dropIndex commands concurrently running with chunk migrations leading to divergent indexes, we made create/dropIndexes abort any active chunk migrations when they complete, so a migration can only finish if the collection's indexes haven't changed since they were copied at the beginning of the migration, and we made the same change for collMod. So I don't believe this is an issue (beyond the problem from point 2), which I believe is a problem). And finally, the switch to using DDL coordinators for these operations likely fixed a lot of these issues, so it's possible these problems only exist on earlier branches. kaloian.manassiev commented on Sun, 16 Jan 2022 10:22:07 +0000: CC marcos.grillo who is also working on something related to collMod, which is now written as a DDL Coordinator. max.hirschhorn@10gen.com commented on Wed, 29 Dec 2021 21:22:59 +0000: collMod command not sent to all shards for a sharded collection if no chunks have been received The collMod command not targeting a shard if it doesn't own chunks for the collection and isn't the primary shard for the database is by design (SERVER-44720). This behavior change to rely on the shard version protocol was done together with targeting changes for the createIndexes and dropIndexes commands. (I believe the history here is that relying on the shard version protocol avoids the complexity needed to swallow NamespaceNotFound errors if the shard doesn't own any chunks for the sharded collection.) Failed chunk migrations don't clean up the created collection. Moreover, nothing in the system remembers the chunk migration had been attempted and didn't succeed. (The routing table only records shards which actively own chunks for the sharded collection.) The recipient shard therefore may have a stale notion of the sharded collection. While I don't think we would want to go back to having collMod target all shards in the cluster, I think there are a few related problems involving collMod here: collMod can modify the collection options but isn't guaranteed to always target the primary shard. This means running the listCollections command through mongos may return a stale notion of the mutable collection options (e.g. validator, validationLevel, validationAction). The change to stop targeting the primary shard if it doesn't own any chunks was done under SERVER-45103 for the createIndexes, dropIndexes, and collMod commands. I think we should either revert the changes from SERVER-45103 or consider changing the listCollections command to always target a shard which owns a chunk for the sharded collection (i.e. target the MinKey-chunk owning shard like we do for listIndexes). CC jack.mulrow, cheahuychou.mao Edit: Realized changing the targeting for listCollections would be too complicated because that could be a different shard for each sharded collection and the primary shard is already convenient to handle unsharded collections. Reverting SERVER-45103 seems like the only option to me. A recipient shard which had already created the (empty) sharded collection in a previous failed chunk migration won't attempt to update its collection options by comparing with that of the donor shard like it does for the set of indexes. This can leave shards with divergent collection options permanently (much like can happen for shards around the set of existing indexes in the face of concurrent createIndexes or dropIndexes). SERVER-32722 was closed as "Works as designed" but didn't leave any additional context. I feel like the proposal in its description makes sense and is required here. collMod may be concurrent with a recipient shard receiving a chunk for the sharded collection for the first time. This means (i) the recipient shard may read the current collection options, (ii) the donor shard may update its collection options, and (iii) the recipient shard never learns of the new collection options. SERVER-60694 incidentally(?) addresses this by having the collMod command acquire the distributed lock on the sharded collection namespace which chunk migration also acquires. There's an additional problem with the listCollections command the recipient shard runs on the donor shard being speculative and could read a version of the collection options which ultimately rolls back on the donor shard. (However, it is probably fair to expect the client to retry with the same new collection options again.) Having chunk migration rely on SERVER-62007 and SERVER-62008 would avoid this issue though.