Info
Ever since we've upgraded to 3.2.10 - when running db.shutdownServer() - we constantly get:
shutdownServer failed: {
"ok" : 0,
"errmsg" : "No electable secondaries caught up as of 2016-11-14T06:51:05.073+0000. Please use {force: true} to force node to step down.",
"code" : 50
}
There are 2 very "tight" secondaries with 0-2 replication lag
rs.stepDown() works, so we have to stepDown and only then shutdown the server
Top User Comments
yonido commented on Sat, 19 Nov 2016 15:35:56 +0000:
Sounds great. Thanks!
spencer commented on Fri, 18 Nov 2016 20:09:26 +0000:
Hi Yoni,
This is most likely due to the fact that by default the shutdown command will only succeed on a primary if the secondaries are fully caught up at the exact moment that the shutdown command is executed. There is a 'timeoutSecs' argument that can be provided to the shutdown command to give it more time for the secondaries to catch up before it fails. I filed SERVER-27118 to change the default value of the 'timeoutSecs' argument to the shutdown command from 0 to 10, to match the behavior of the replSetStepDown command. In the meantime you can provide that argument explicitly as a workaround.
-Spencer
yonido commented on Mon, 14 Nov 2016 13:10:55 +0000:
As mentioned - they are syncd:
source: in.db3m2.xx.com :27017
syncedTo: Mon Nov 14 2016 13:09:03 GMT+0000 (UTC)
1 secs (0 hrs) behind the primary
logs on either primary or secondary don't mention anything related to this. This reproduces every time, on all of our shard replicas.
ramon.fernandez commented on Mon, 14 Nov 2016 12:53:44 +0000:
yonido, when this happens, can you please provide the output of rs.printSlaveReplicationInfo()? Can you also upload the logs for this node at the time you run db.shutdownServer()? In the mean time I'll try to reproduce on our end.
Thanks,
Ramón.