...
HI all we set the balancingWindow like below db.settings.update({_id:"balancer"},{$set:{activeWindow:{start:"15:00",stop:"16:00"}}},{upsert:true}) But found out that the migrate and balancing is still working outside the activeWindow. And then check the source code of balancing. I found out the source code in "mongo/src/mongo/s/grid.cpp" is like below: bool Grid::shouldBalance(const SettingsType& balancerSettings) const { if (balancerSettings.isBalancerStoppedSet() && balancerSettings.getBalancerStopped()) { return false; } if (balancerSettings.isBalancerActiveWindowSet()) { boost::posix_time::ptime now = boost::posix_time::second_clock::local_time(); return balancerSettings.inBalancingWindow(now); } return true; } It looks like whether the time is outside the activeWindow or not ,the return is always "true"
stefan@close.io commented on Fri, 6 Nov 2015 20:04:47 +0000: Yes, all the machines are running NTP and are set to the UTC timezone. renctan commented on Fri, 6 Nov 2015 19:56:14 +0000: stefan@close.io You should not need to upgrade the mongod. Do the mongos have the same timezone settings as your shards? stefan@close.io commented on Tue, 27 Oct 2015 21:47:59 +0000: Following up in this issue stefan@close.io commented on Tue, 20 Oct 2015 19:03:07 +0000: Thank you Randolph! I upgraded my mongos and config servers to v3.0.7, but I still see the issue: mongos> db.version() 3.0.7 mongos> db.getSiblingDB('config').find({ _id: 'balancer' }) 2015-10-20T18:57:57.110+0000 E QUERY TypeError: Property 'find' of object config is not a function at (shell):1:27 mongos> db.getSiblingDB('config').settings.find({ _id: 'balancer' }) { "_id" : "balancer", "stopped" : false, "activeWindow" : { "start" : "04:00", "stop" : "11:00" } } mongos> db.version() 3.0.7 mongos> db.getSiblingDB('config').settings.find({ _id: 'balancer' }) { "_id" : "balancer", "stopped" : false, "activeWindow" : { "start" : "04:00", "stop" : "11:00" } } mongos> db.getSiblingDB('closeio').system.profile.find({ 'command.moveChunk': { $exists: true } }).limit(1)[0] { "op" : "command", "ns" : "admin.$cmd", "command" : { "moveChunk" : "closeio.activity", "from" : "elastic-sales-rs/ciodb17.close.io:27017,ciodb18.close.io:27017", "to" : "elastic-sales-rs-2/ciodb5.close.io:27017,ciodb6.close.io:27017", "fromShard" : "elastic-sales-rs", "toShard" : "elastic-sales-rs-2", "min" : { "organization" : DBRef("organization", "orga_YGfgxqRu92pVGqDHQEADNp6bSdKJVD9PtrFSxOv1KOA"), "_id" : "acti_8hDpMYRhi3tWzNcyVmsyOIu5CaHqQS8xcIqhhudbKNp" }, "max" : { "organization" : DBRef("organization", "orga_YGfgxqRu92pVGqDHQEADNp6bSdKJVD9PtrFSxOv1KOA"), "_id" : "acti_NVXQM2O5vI9CPbJjYiezfvATBB1CAp2BbjeBi1hur6P" }, "maxChunkSizeBytes" : NumberLong(67108864), "shardId" : "closeio.activity-organization_{ $ref: \"organization\", $id: \"orga_YGfgxqRu92pVGqDHQEADNp6bSdKJVD9PtrFSxOv1KOA\" }_id_\"acti_8hDpMYRhi3tWzNcyVmsyOIu5CaHqQS8xcIqhhudbKNp\"", "configdb" : "cioconfigdb1.close.io:27019,cioconfigdb2.close.io:27019,cioconfigdb3.close.io:27019", "secondaryThrottle" : true, "waitForDelete" : false, "maxTimeMS" : 0, "epoch" : ObjectId("548cf45a19c734310e9db104") }, "ntoreturn" : 1, "keyUpdates" : 0, "numYield" : 1, "lockStats" : { "timeLockedMicros" : { "R" : NumberLong(0), "W" : NumberLong(42), "r" : NumberLong(23978), "w" : NumberLong(3758) }, "timeAcquiringMicros" : { "R" : NumberLong(0), "W" : NumberLong(557), "r" : NumberLong(9460), "w" : NumberLong(1639) } }, "responseLength" : 37, "millis" : 36939, "execStats" : { }, "ts" : ISODate("2015-10-20T13:13:22.785Z"), "client" : "10.20.0.182", "allUsers" : [ { "user" : "__system", "db" : "local" } ], "user" : "__system@local" } Do I need to upgrade the data mongods too before I see this issue fixed? renctan commented on Tue, 13 Oct 2015 19:52:55 +0000: stefan@close.io I believe you are experiencing SERVER-20557 which we just fixed for v3.0.7. stefan@close.io commented on Tue, 13 Oct 2015 19:29:00 +0000: Hi there, I believe this is still an issue. 1) Based on the attached screenshots (all_mongos_servers.png and all_mongos_servers_time.png), you can see that the time is in sync for all the machines running mongos. 2) The balancing window is set to 4am - 11am UTC: mongos> sh.getBalancerWindow() { "start" : "04:00", "stop" : "11:00" } 3) And yet, looking at the profile collection on one of our primaries, we can see that some chunks were moved between shards outside of the balancing window (pay attention to the "ts" value): elastic-sales-rs:PRIMARY> db.system.profile.find({ millis: {$gte: 10000 } }).limit(5).sort({ ts: -1 })[0] { "op" : "command", "ns" : "admin.$cmd", "command" : { "moveChunk" : "closeio.activity", "from" : "elastic-sales-rs/ciodb17.close.io:27017,ciodb18.close.io:27017", "to" : "elastic-sales-rs-3/ciodb12.close.io:27017,ciodb13.close.io:27017", "fromShard" : "elastic-sales-rs", "toShard" : "elastic-sales-rs-3", "min" : { "organization" : DBRef("organization", "orga_XhqqlrZYbajFQNhdk56EYESbWODMcgAaxbTa04oiZlV"), "_id" : "acti_OOCabjA7iyL6TKnkus66qsRFDHbRoNR5otAvnUU4r3J" }, "max" : { "organization" : DBRef("organization", "orga_XhqqlrZYbajFQNhdk56EYESbWODMcgAaxbTa04oiZlV"), "_id" : "acti_RUL2i8eZXkrIrksg1O3DQYV4xlvH2wGorkMQ0xnDFZI" }, "maxChunkSizeBytes" : NumberLong(67108864), "shardId" : "closeio.activity-organization_{ $ref: \"organization\", $id: \"orga_XhqqlrZYbajFQNhdk56EYESbWODMcgAaxbTa04oiZlV\" }_id_\"acti_OOCabjA7iyL6TKnkus66qsRFDHbRoNR5otAvnUU4r3J\"", "configdb" : "cioconfigdb1.close.io:27019,cioconfigdb2.close.io:27019,cioconfigdb3.close.io:27019", "secondaryThrottle" : true, "waitForDelete" : false, "maxTimeMS" : 0, "epoch" : ObjectId("548cf45a19c734310e9db104") }, "ntoreturn" : 1, "keyUpdates" : 0, "numYield" : 9, "lockStats" : { "timeLockedMicros" : { "R" : NumberLong(0), "W" : NumberLong(49), "r" : NumberLong(34634), "w" : NumberLong(3720) }, "timeAcquiringMicros" : { "R" : NumberLong(0), "W" : NumberLong(431), "r" : NumberLong(18709), "w" : NumberLong(1712) } }, "responseLength" : 37, "millis" : 24417, "execStats" : { }, "ts" : ISODate("2015-10-13T19:04:18.206Z"), "client" : "10.34.8.156", "allUsers" : [ { "user" : "__system", "db" : "local" } ], "user" : "__system@local" } There's a lot more examples where a "moveChunk" happened outside of the balancing window's hours. Note that our mongos instances and config servers run MongoDB v3.0.6, but our data servers run mongod v2.6.11 (mongod v3.0.6 performed very poorly for us and we had to downgrade - we have a ticket open about that at jira.mongodb.org/browse/SUPPORT-1448). Could that be an issue here? Let me know if you need anything else from me to debug this issue. ramon.fernandez commented on Sat, 26 Sep 2015 11:56:37 +0000: eshujiushiwo, we haven't heard back from you for a while so we're going to close this ticket. If this is still an issue for you please provide the additional information requested above. Regards, Ramón. spencer commented on Fri, 14 Aug 2015 21:55:16 +0000: What version are you running? Prior to 3.0.1 there was also SERVER-5004 in which new chunk migrations on different collections could be started after disabling the balancer if they are performed as part of the same balancing "round". renctan commented on Fri, 14 Aug 2015 14:21:05 +0000: Hi, How many mongos processes do you have? Are they all in the same time zone? The active window setting is currently ambiguous as each mongos will use it's own local wall clock to compare with the active window which can be an issue if you have multiple mongos with different time zones. Also see SERVER-4963. Thanks! eshujiushiwo commented on Fri, 14 Aug 2015 08:58:52 +0000: Well, There`s something wrong. "bool Grid::shouldBalance(const SettingsType& balancerSettings) const { if (balancerSettings.isBalancerStoppedSet() && balancerSettings.getBalancerStopped()) { return false; } if (balancerSettings.isBalancerActiveWindowSet()) { boost::posix_time::ptime now = boost::posix_time::second_clock::local_time(); return balancerSettings.inBalancingWindow(now); } return true; }" is ok. But i still don`t know why the the migration is woking outside the activewindow..
1.set activewindow use config db.settings.update({_id:"balancer"},{$set:{activeWindow: {start:"time to start",stop:"time to stop"} }}, {upsert:true} ) 2.check sh.getBalancerWindow() 3.start insert documents outside the activeWindow time. for(i=1,i {num:i} )} (shard key is num) 4.check the chunks and log sh.status()& vim mongod.log 5.found out the migration is done.