
OPERATIONAL DEFECT DATABASE
...

...
Hello, this is continuation of SERVER-44143 (the same environment) We are getting the following error on our PROD sharded cluster (which was migrated from docker to other servers (withouth using the docker technology) by following the "Restore a Sharded Cluster" https://docs.mongodb.com/v4.0/tutorial/restore-sharded-cluster/#d-restore-each-shard-replica-set (with mongodump/mongorestore) =========================================== 2020-01-25T14:13:31.655+0100 I SHARDING [migrateThread] migrate failed: InvalidUUID: Cannot create collection productrepository.products because we already have an identically named collection with UUID 55ab81fa-7d21-4742-8d71-f4ef8f741ec2, which differs from the donor's UUID 3db9aaae-c037-4162-b0a8-9eec312df936. Manually drop the collection on this shard if it contains data from a previous incarnation of productrepository.products ' =========================================== Here is the collections UUIDs we get if we connect to each shard: PROD: shard1:PRIMARY> db.getCollectionInfos() [ { "name" : "products", "type" : "collection", "options" : { }, "info" : { "readOnly" : false, "uuid" : UUID("b8dd9615-f861-4535-a434-9638f5e5c452") }, "idIndex" : { "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "productrepository.products" } }, shard2:PRIMARY> db.getCollectionInfos() { "name" : "products", "type" : "collection", "options" : { }, "info" : { "readOnly" : false, "uuid" : UUID("55ab81fa-7d21-4742-8d71-f4ef8f741ec2") }, "idIndex" : { "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "productrepository.products" } }, shard3:PRIMARY> db.getCollectionInfos() [ { "name" : "products", "type" : "collection", "options" : { }, "info" : { "readOnly" : false, "uuid" : UUID("3db9aaae-c037-4162-b0a8-9eec312df936") }, "idIndex" : { "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "productrepository.products" } }, Here we can see that UUID of the collection products in config replica set is the same as it is on shards in config.cache.collections configserver:PRIMARY> db.collections.find() { "_id" : "config.system.sessions", "lastmodEpoch" : ObjectId("5bb4b070aec28d86b2174284"), "lastmod" : ISODate("1970-02-19T17:02:47.296Z"), "dropped" : false, "key" : { "_id" : 1 }, "unique" : false, "uuid" : UUID("68a953b8-1136-4891-ac27-acac48925d00") } { "_id" : "productrepository.products", "lastmodEpoch" : ObjectId("5bb4b060aec28d86b2174007"), "lastmod" : ISODate("1970-02-19T17:02:47.298Z"), "dropped" : false, "key" : { "productId" : "hashed" }, "unique" : false, "uuid" : UUID("0024b33d-2295-45b7-8bd1-d8ffb58a438d") } shard1:PRIMARY> db.cache.collections.find() { "_id" : "config.system.sessions", "epoch" : ObjectId("5bb4b070aec28d86b2174284"), "key" : { "_id" : 1 }, "refreshing" : false, "unique" : false, "uuid" : UUID("68a953b8-1136-4891-ac27-acac48925d00"), "lastRefreshedCollectionVersion" : Timestamp(1, 0) } { "_id" : "productrepository.products", "epoch" : ObjectId("5bb4b060aec28d86b2174007"), "key" : { "productId" : "hashed" }, "refreshing" : false, "unique" : false, "uuid" : UUID("0024b33d-2295-45b7-8bd1-d8ffb58a438d"), "lastRefreshedCollectionVersion" : Timestamp(42, 91), "enterCriticalSectionCounter" : 13 } shard2:PRIMARY> db.cache.collections.find() { "_id" : "config.system.sessions", "epoch" : ObjectId("5bb4b070aec28d86b2174284"), "key" : { "_id" : 1 }, "refreshing" : false, "unique" : false, "uuid" : UUID("68a953b8-1136-4891-ac27-acac48925d00"), "lastRefreshedCollectionVersion" : Timestamp(1, 0) } { "_id" : "productrepository.products", "epoch" : ObjectId("5bb4b060aec28d86b2174007"), "key" : { "productId" : "hashed" }, "refreshing" : false, "unique" : false, "uuid" : UUID("0024b33d-2295-45b7-8bd1-d8ffb58a438d"), "lastRefreshedCollectionVersion" : Timestamp(42, 91), "enterCriticalSectionCounter" : 17 } shard3:PRIMARY> db.cache.collections.find() { "_id" : "config.system.sessions", "epoch" : ObjectId("5bb4b070aec28d86b2174284"), "key" : { "_id" : 1 }, "refreshing" : false, "unique" : false, "uuid" : UUID("68a953b8-1136-4891-ac27-acac48925d00"), "lastRefreshedCollectionVersion" : Timestamp(1, 0) } { "_id" : "productrepository.products", "epoch" : ObjectId("5bb4b060aec28d86b2174007"), "key" : { "productId" : "hashed" }, "refreshing" : false, "unique" : false, "uuid" : UUID("0024b33d-2295-45b7-8bd1-d8ffb58a438d"), "lastRefreshedCollectionVersion" : Timestamp(42, 91), "enterCriticalSectionCounter" : 11 } The thing is that we have 2 TEST environmets (sharded clusters) that we clone our PROD to, every week. (We are restoring PROD's backup to the 2 TEST clusters, following the same procedure stated above)' And I see, that on those two (cloned every week) environmets the UUID of products collection (productrepository.products) is every time unique and different between shards, as if mongorestore when we restore sequentially shards assigns new UUID to the sharded collection on each shard. TEST cluster 1 : shard1:PRIMARY> db.getCollectionInfos() [ { "name" : "products", "type" : "collection", "options" : { }, "info" : { "readOnly" : false, "uuid" : UUID("aeffed7c-f1b0-453c-9614-1b42a70991ef") }, "idIndex" : { "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "productrepository.products" } shard2:PRIMARY> db.getCollectionInfos() { "name" : "products", "type" : "collection", "options" : { }, "info" : { "readOnly" : false, "uuid" : UUID("e0cc3f43-ffec-4dd7-a67c-6e38b9635c7a") }, "idIndex" : { "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "productrepository.products" } }, shard3:PRIMARY> db.getCollectionInfos() [ { "name" : "products", "type" : "collection", "options" : { }, "info" : { "readOnly" : false, "uuid" : UUID("95694fff-8918-4c23-900e-99a110476b0c") }, "idIndex" : { "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "productrepository.products" } }, TEST cluster №2 shard1:PRIMARY> db.getCollectionInfos() [ { "name" : "products", "type" : "collection", "options" : { }, "info" : { "readOnly" : false, "uuid" : UUID("37823188-b8f6-4a8d-9a22-dd609d54302e") }, "idIndex" : { "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "productrepository.products" } } shard2:PRIMARY> db.getCollectionInfos() { "name" : "products", "type" : "collection", "options" : { }, "info" : { "readOnly" : false, "uuid" : UUID("b4de5c76-90f2-4075-9aa1-f6a99cef5608") }, "idIndex" : { "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "productrepository.products" } } ] shard3:PRIMARY> db.getCollectionInfos() [ { "name" : "products", "type" : "collection", "options" : { }, "info" : { "readOnly" : false, "uuid" : UUID("f43c6927-b6b4-4bd9-8e81-6e69478c5823") }, "idIndex" : { "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "productrepository.products" } } ] I have tried manually moving chunks in TEST env, using moveChunk command, and as expected, getting the issue with different UUIDs Is mongorestore supposed to assign new UUIDs to restored sharded collections? As I understand, to rectify the issue with chunk migration, the only way is to drop the collection through mongos (following the procedure described in SERVER-17397), then restore the collection through mongos and shard it. But if, in case of a disaster, we are forced to restore the PROD cluster from backup, we would have to recreate all sharded collections this way (so far we have only one, but that will change in the future), and also rework our clone procedure to recreate the sharded collections this way? P.S. After another clone that took place some hours later after writing the information above, I checked once againg the UUID of the sharded collection, and again it is different between shards and also don't correlate to the UUIDs from PROD Thank you.
dmitry.agranat commented on Tue, 7 Jul 2020 07:28:00 +0000: Hi whispers2035@gmail.com, We haven’t heard back from you for some time, so I’m going to mark this ticket as resolved. If this is still an issue for you, please provide additional information and we will reopen the ticket. Regards, Dima carl.champain commented on Mon, 9 Mar 2020 20:43:14 +0000: Hi whispers2035@gmail.com, Sorry for the late response! If you want to see a documentation change, our DOCS project is open source, so feel free to open a new ticket describing the changes you'd like to see and why. Thank you, Carl whispers2035@gmail.com commented on Sat, 8 Feb 2020 19:05:25 +0000: Thank you for your responce! Well, you see, that in the procedure we followed https://docs.mongodb.com/v4.0/tutorial/restore-sharded-cluster (mongodump and mongorestore) we are not exactly encountering the issue decribed in SERVER-17397 as we don't drop any collections from mongos or elsewhere. In the case of the migration the backup of or initial prod was restored to the new servers that had never held any data, or in case of a disaster or just let us say, stop all the services and then just from the shell we purge data dirs on all servers (rm -rf /mongo/data/*) but our backup holds the config information that the collections are not dropped and in tact, however the behaviour of mongorestore leads to the issue of different and unique UUIDs across shards. In my opinion, I see only two ways of preventing any further confusions of anyone else who backs up and restores their sharded clusters with the mongodump and mongorestore utilities. Update the documentation that after restore from backup via mongorestore, all sharded collections must be dropped according to the procedure stated in SERVER-17397 and restored through mongos and sharded again. Address the behavior of mongorestore's assigning new UUIDs when dealing with sharded collections restore via mongorestore Since 4.2 mongodump and mongorestore are not the tools to be used for backup anymore with sharded clusters, I think the first option is the most optimal. Please let me know your thoughts. Best regards, Max carl.champain commented on Fri, 7 Feb 2020 17:05:51 +0000: Hi whispers2035@gmail.com, Is mongorestore supposed to assign new UUIDs to restored sharded collections? mongorestore will intentionally result in a new UUID for a collection, it indicates that a namespace has been reused. We really appreciate you writing this detailed ticket. I was able to recreate the migration error, and as you mentioned, this issue can be solved with the workaround in SERVER-17397, then restore the collection through mongos and shard it. Your last question appears to address a situation related to your topology, and unless it reveals a bug in MongoDB, it is outside of our scope to help you manage it. However, do you think you are encountering a bug that is not addressed in SERVER-17397 and that should require our attention? Kind regards, Carl whispers2035@gmail.com commented on Thu, 30 Jan 2020 16:37:35 +0000: It was unintentionally, I meant to link the whole procedure. Yes, first of all we restore the config replica set, then shards. daniel.hatcher commented on Mon, 27 Jan 2020 20:50:59 +0000: You specifically linked to the section describing restoring the shards. Are you performing the procedure of restoring the config servers beforehand?
Follow "Restore a Sharded Cluster" https://docs.mongodb.com/v4.0/tutorial/restore-sharded-cluster/#d-restore-each-shard-replica-set (with mongodump/mongorestore)
Click on a version to see all relevant bugs
MongoDB Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.