Loading...
Loading...
Summary The balancer does not make progress in certain scenarios where the most loaded shard belongs to a balanced zone, because it keeps selecting that shard as donor even when all shards in its zone are already balanced, and then fails to find a suitable recipient since the remaining underloaded shards belong to different zones. Details When the cluster has zones configured and the most overloaded shard (by data size) is in a zone that is already internally balanced, the balancer repeatedly tries to move chunks from that shard. However, since the other shards in the same zone are already balanced, there are no valid chunk candidates that can be donated while still honoring the existing zone configuration. As a result: The balancer keeps choosing the most loaded shard as the donor No migrations are actually performed, so the overall balancing does not make progress Zones themselves are respected at all times; the issue is with donor selection and progress when the top candidate shard cannot actually donate any chunks Impact Balancer rounds can appear to be “stuck” or not making progress, even though the system is correctly enforcing the configured zones. This mainly affects situations where: One shard is globally the most loaded shard That shard is in a zone that is already locally balanced Other zones may remain unbalanced Expected Behavior If the most loaded shard in a zone cannot donate any further chunks without violating zone constraints, the balancer should: Skip it as a donor candidate for that round, and Consider other shards/zones where valid migrations would still respect the zone configuration and effectively reduce imbalance.
Shard1 [Zone_US] 500 GB Shard2 [Zone_EU] 300 GB Shard3 [Zone_EU] 100 GB In this scenario, the balancer will fail to balance Zone_EU and it will not move any chunks. Instead it should mvoe 100GB from Shard2 to Shard3
MongoDB Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.