...
hello, we have encountered the issue SERVER-47553 in Alibaba Cloud hosted mongodb instances, we notice that the mongos crash issue has been fixed, which was caused by throwing exception in destructor function when use ON_BLOCK_EXIT to call appendRequiredFieldsToResponse, and that will cause mongos to call std::terminate(). however, the fix about SERVER-47553 just avoid mongos to crash, but the root issue looks unresolved. because after we apply this patch, it seems mongos can't connect mongod server nodes yet, so we dig out this problem. seems like there is a bug in monitoring-keys-for-HMAC thread on the primary node of config server, would cause signing keys not generated by the KeysRotationIntervalSec interval, and when mongos call KeysCollectionManager::refreshNow to ask config server for new signing keys, it will fail with a timeout exception, which cause this problem to happen. I am sure the root cause is a bug in "howMuchSleepNeedFor" function, which caculate the wake-up interval for monitoring-keys-for-HMAC thread on the primary node of config server: auto millisBeforeExpire = 1000 * (expiredSecs - currentSecs); here expiredSecs and currentSecs are type of unsigned int, and the default wake-up interval is 90days(7776000 seconds), after a unit conversion to mills, it will be 7776000000, which will be an overflow value since the max is 4294967295 this will cause a serious problem, because mongos can't reconnect mongod server nodes even if after restart many times, a feasible resolution is to restart config server nodes and this will trigger monitoring-keys-for-HMAC thread to generate new signing keys, and mongos can reconnect successfully after that.
JIRAUSER1253295 commented on Tue, 10 Nov 2020 08:34:54 +0000: OK, I had reopen a new ticket SERVER-52654, Thanks ! garaudy.etienne commented on Fri, 6 Nov 2020 14:55:56 +0000: jcli.china@gmail.com can you reopen the ticket or file a new ticket mentioning the lingering issue you're observing please? JIRAUSER1253295 commented on Thu, 5 Nov 2020 11:13:39 +0000: hello @Carl Champain, I see the overflow issue is fixed, but the problem already happens after we upgraded the config server to a version of 4.2.10, new signing keys not generated by the monitoring-keys-for-HMAC thread, and mongos can't reconnect mongod server nodes. I think the root cause of SERVER-47553 and this issue is the same, but it have not been digged out, as this issue may cause unexpected downtime for our service, it's a very serious problem, wish it can be fixed ASAP, Thanks! xgen-internal-githook commented on Wed, 12 Aug 2020 18:30:48 +0000: Author: {'name': 'Jack Mulrow', 'email': 'jack.mulrow@mongodb.com', 'username': 'jsmulrow'} Message: SERVER-48709 Fix overflow in key manager wake up calculation (cherry picked from commit 0cb70e9577c46257798d0385b15ec6bff8dbd28d) Branch: v4.0 https://github.com/mongodb/mongo/commit/86d5aa1f6e698d7b89a614cce25479e20cc6ae6c xgen-internal-githook commented on Wed, 12 Aug 2020 18:26:32 +0000: Author: {'name': 'Jack Mulrow', 'email': 'jack.mulrow@mongodb.com', 'username': 'jsmulrow'} Message: SERVER-48709 Fix overflow in key manager wake up calculation (cherry picked from commit 0cb70e9577c46257798d0385b15ec6bff8dbd28d) Branch: v3.6 https://github.com/mongodb/mongo/commit/bdc96c3cab7889f4f3ba7adb20527a4d68445288 xgen-internal-githook commented on Wed, 12 Aug 2020 13:49:18 +0000: Author: {'name': 'Jack Mulrow', 'email': 'jack.mulrow@mongodb.com', 'username': 'jsmulrow'} Message: SERVER-48709 Fix overflow in key manager wake up calculation (cherry picked from commit 0cb70e9577c46257798d0385b15ec6bff8dbd28d) Branch: v4.2 https://github.com/mongodb/mongo/commit/7a2af2b1a2f63fea3c66ce5d5c52dc11ad02a903 xgen-internal-githook commented on Wed, 12 Aug 2020 13:36:24 +0000: Author: {'name': 'Jack Mulrow', 'email': 'jack.mulrow@mongodb.com', 'username': 'jsmulrow'} Message: SERVER-48709 Fix overflow in key manager wake up calculation (cherry picked from commit 0cb70e9577c46257798d0385b15ec6bff8dbd28d) Branch: v4.4 https://github.com/mongodb/mongo/commit/b732c725b34f0d582c69ada999864618b7ff7eb1 xgen-internal-githook commented on Tue, 21 Jul 2020 17:39:56 +0000: Author: {'name': 'Jack Mulrow', 'email': 'jack.mulrow@mongodb.com', 'username': 'jsmulrow'} Message: SERVER-48709 Fix overflow in key manager wake up calculation Branch: master https://github.com/mongodb/mongo/commit/0cb70e9577c46257798d0385b15ec6bff8dbd28d carl.champain commented on Mon, 15 Jun 2020 19:59:39 +0000: Hi jcli.china@gmail.com, Thanks for the report. We are passing this ticket along to the appropriate team for further investigation. Updates will be posted on this ticket as they happen. Kind regards, Carl JIRAUSER1253295 commented on Thu, 11 Jun 2020 07:13:54 +0000: Change the type of "millisBeforeExpire" to unsigned long long should fix it. And our instances are MongoDB 4.2.1 community edition. Look forward for your feedback. thanks!