...
Our MongoDB crashes sporadicly in production, currently it happens every 2-3 weeks so we can't reliably reproduce the issue. After a database restart the issue is gone for the next week(s). This database is being accessed by potentially 5 other dedicated servers this are it's mongostats: insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn time *0 176 11486 *0 0 16|0 2.6% 5.0% 0 2.42G 1.63G 0|0 0|1 1.83m 17.9m 484 May 12 16:50:44.252 2 105 16974 *0 0 18|0 2.6% 5.0% 0 2.42G 1.63G 0|0 0|0 2.66m 1.44m 484 May 12 16:50:45.252 *0 144 18391 *0 0 12|0 2.6% 5.0% 0 2.42G 1.63G 0|0 1|0 2.89m 1.55m 485 May 12 16:50:46.253 *0 35 21393 *0 0 12|0 2.6% 5.0% 0 2.42G 1.63G 0|0 0|0 3.34m 1.71m 485 May 12 16:50:47.252 *0 23 17219 *0 1 13|0 2.6% 5.0% 0 2.42G 1.63G 0|0 1|0 2.69m 1.42m 485 May 12 16:50:48.252 1 47 14004 *0 0 17|0 2.6% 5.0% 0 2.42G 1.63G 0|0 2|0 2.20m 17.9m 486 May 12 16:50:49.252 *0 56 17955 *0 0 24|0 2.6% 5.0% 0 2.42G 1.63G 0|0 0|0 2.83m 1.47m 486 May 12 16:50:50.253 2017-05-12T15:50:33.727+0200 F - [conn36561] Invalid access at address: 0x4000013a7 2017-05-12T15:50:33.735+0200 F - [conn36561] Got signal: 11 (Segmentation fault). 0x5654d14eeee1 0x5654d14edfd9 0x5654d14ee646 0x7ff8f8773370 0x5654d1e36c74 0x5654d1e02459 0x5654d1e52b88 0x5654d11ef44e 0x5654d0b4aa8b 0x5654d0af0e76 0x5654d0b12c63 0x5654d0ae4538 0x5654d0e199c2 0x5654d0e1bd68 0x5654d0e1ca1c 0x5654d0dd5ca2 0x5654d0dd685b 0x5654d09fdad8 0x5654d09d4867 0x5654d09d5c00 0x5654d0ff688d 0x5654d0bf3302 0x5654d0bf5276 0x5654d07f1f1d 0x5654d07f285d 0x5654d1454722 0x7ff8f876bdc5 0x7ff8f849a73d ----- BEGIN BACKTRACE ----- {"backtrace":[{"b":"5654CFF71000","o":"157DEE1","s":"_ZN5mongo15printStackTraceERSo"},{"b":"5654CFF71000","o":"157CFD9"},{"b":"5654CFF71000","o":"157D646"},{"b":"7FF8F8764000","o":"F370"},{"b":"5654CFF71000","o":"1EC5C74"},{"b":"5654CFF71000","o":"1E91459"},{"b":"5654CFF71000","o":"1EE1B88"},{"b":"5654CFF71000","o":"127E44E","s":"_ZN5mongo21WiredTigerRecordStore6Cursor9seekExactERKNS_8RecordIdE"},{"b":"5654CFF71000","o":"BD9A8B","s":"_ZN5mongo16WorkingSetCommon5fetchEPNS_16OperationContextEPNS_10WorkingSetEmNS_11unowned_ptrINS_20SeekableRecordCursorEEE"},{"b":"5654CFF71000","o":"B7FE76","s":"_ZN5mongo10FetchStage6doWorkEPm"},{"b":"5654CFF71000","o":"BA1C63","s":"_ZN5mongo9PlanStage4workEPm"},{"b":"5654CFF71000","o":"B73538","s":"_ZN5mongo15CachedPlanStage12pickBestPlanEPNS_15PlanYieldPolicyE"},{"b":"5654CFF71000","o":"EA89C2","s":"_ZN5mongo12PlanExecutor12pickBestPlanENS0_11YieldPolicyEPKNS_10CollectionE"},{"b":"5654CFF71000","o":"EAAD68","s":"_ZN5mongo12PlanExecutor4makeEPNS_16OperationContextESt10unique_ptrINS_10WorkingSetESt14default_deleteIS4_EES3_INS_9PlanStageES5_IS8_EES3_INS_13QuerySolutionES5_ISB_EES3_INS_14CanonicalQueryES5_ISE_EEPKNS_10CollectionERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS0_11YieldPolicyE"},{"b":"5654CFF71000","o":"EABA1C","s":"_ZN5mongo12PlanExecutor4makeEPNS_16OperationContextESt10unique_ptrINS_10WorkingSetESt14default_deleteIS4_EES3_INS_9PlanStageES5_IS8_EES3_INS_13QuerySolutionES5_ISB_EES3_INS_14CanonicalQueryES5_ISE_EEPKNS_10CollectionENS0_11YieldPolicyE"},{"b":"5654CFF71000","o":"E64CA2","s":"_ZN5mongo11getExecutorEPNS_16OperationContextEPNS_10CollectionESt10unique_ptrINS_14CanonicalQueryESt14default_deleteIS5_EENS_12PlanExecutor11YieldPolicyEm"},{"b":"5654CFF71000","o":"E6585B","s":"_ZN5mongo15getExecutorFindEPNS_16OperationContextEPNS_10CollectionERKNS_15NamespaceStringESt10unique_ptrINS_14CanonicalQueryESt14default_deleteIS8_EENS_12PlanExecutor11YieldPolicyE"},{"b":"5654CFF71000","o":"A8CAD8","s":"_ZN5mongo7FindCmd3runEPNS_16OperationContextERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERNS_7BSONObjEiRS8_RNS_14BSONObjBuilderE"},{"b":"5654CFF71000","o":"A63867","s":"_ZN5mongo7Command3runEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS3_21ReplyBuilderInterfaceE"},{"b":"5654CFF71000","o":"A64C00","s":"_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_RKNS_3rpc16RequestInterfaceEPNS4_21ReplyBuilderInterfaceE"},{"b":"5654CFF71000","o":"108588D","s":"_ZN5mongo11runCommandsEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS2_21ReplyBuilderInterfaceE"},{"b":"5654CFF71000","o":"C82302"},{"b":"5654CFF71000","o":"C84276","s":"_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE"},{"b":"5654CFF71000","o":"880F1D","s":"_ZN5mongo23ServiceEntryPointMongod12_sessionLoopERKSt10shared_ptrINS_9transport7SessionEE"},{"b":"5654CFF71000","o":"88185D"},{"b":"5654CFF71000","o":"14E3722"},{"b":"7FF8F8764000","o":"7DC5"},{"b":"7FF8F83A3000","o":"F773D","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.4.4", "gitVersion" : "888390515874a9debd1b6c5d36559ca86b44babd", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "4.4.30-mod-std-ipv6-64", "version" : "#9 SMP Tue Nov 1 17:58:26 CET 2016", "machine" : "x86_64" }, "somap" : [ { "b" : "5654CFF71000", "elfType" : 3, "buildId" : "BAB2B823978D8F6F84A5C63BFA4F54F3F06A65A4" }, { "b" : "7FFF133FA000", "elfType" : 3, "buildId" : "963A363281907ED5AFD9334B132661E125712016" }, { "b" : "7FF8F968E000", "path" : "/lib64/libssl.so.10", "elfType" : 3, "buildId" : "90EAF65D9B0EEEB1424241281F7F197451D4317D" }, { "b" : "7FF8F92A4000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "7278C69EE161D98DDD0FA00F92B67AD78C7B7F40" }, { "b" : "7FF8F909C000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "82E77ADE22BC9FFF8D3458BD37331E7EDF174C28" }, { "b" : "7FF8F8E98000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "C5F560504E1AF52E29679C3B52FF11121015D6BB" }, { "b" : "7FF8F8B96000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "721C7CC9488EFA25F83B48AF713AB27DBE48EF3E" }, { "b" : "7FF8F8980000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "408B46E291B2D4C9612E27C0509D165D7E186D40" }, { "b" : "7FF8F8764000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "C3DEB1FA27CD0C1C3CC575B944ABACBA0698B0F2" }, { "b" : "7FF8F83A3000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "8B2C421716985B927AA0CAF2A05D0B1F452367F7" }, { "b" : "7FF8F98FC000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "8F3E366E2DB73C330A3791DEAE31AE9579099B44" }, { "b" : "7FF8F8155000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "A2499C359AA179EE23324ED949C0E508E4434F10" }, { "b" : "7FF8F7E6E000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "E09A34D9083DC6FEAF7018C09D55631DEEE2836D" }, { "b" : "7FF8F7C6A000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "BF54B7C8932E450769FBBB8B18864D1DD70BBC67" }, { "b" : "7FF8F7A38000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "BF8F00D7CB849ADB0B7A4703BC7B8D66AEE6A49C" }, { "b" : "7FF8F7822000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "EA8E45DC8E395CC5E26890470112D97A1F1E0B65" }, { "b" : "7FF8F7613000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "1E7A92FDD6FB3871DA97F4BCA2E147E72B6B6E1F" }, { "b" : "7FF8F740F000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "2E01D5AC08C1280D013AAB96B292AC58BC30A263" }, { "b" : "7FF8F71F5000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "FE7AE845A123A3DFC0FDC2408BCBC2BA8B61B158" }, { "b" : "7FF8F6FCE000", "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "76687CA31A406854DF3BCF8D03055656F56E6892" }, { "b" : "7FF8F6D6D000", "path" : "/lib64/libpcre.so.1", "elfType" : 3, "buildId" : "AE64AA461A26E01F60408013D361749D56DD0AE1" } ] }} mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x5654d14eeee1] mongod(+0x157CFD9) [0x5654d14edfd9] mongod(+0x157D646) [0x5654d14ee646] libpthread.so.0(+0xF370) [0x7ff8f8773370] mongod(+0x1EC5C74) [0x5654d1e36c74] mongod(+0x1E91459) [0x5654d1e02459] mongod(+0x1EE1B88) [0x5654d1e52b88] mongod(_ZN5mongo21WiredTigerRecordStore6Cursor9seekExactERKNS_8RecordIdE+0x4E) [0x5654d11ef44e] mongod(_ZN5mongo16WorkingSetCommon5fetchEPNS_16OperationContextEPNS_10WorkingSetEmNS_11unowned_ptrINS_20SeekableRecordCursorEEE+0xAB) [0x5654d0b4aa8b] mongod(_ZN5mongo10FetchStage6doWorkEPm+0x106) [0x5654d0af0e76] mongod(_ZN5mongo9PlanStage4workEPm+0x63) [0x5654d0b12c63] mongod(_ZN5mongo15CachedPlanStage12pickBestPlanEPNS_15PlanYieldPolicyE+0x198) [0x5654d0ae4538] mongod(_ZN5mongo12PlanExecutor12pickBestPlanENS0_11YieldPolicyEPKNS_10CollectionE+0xF2) [0x5654d0e199c2] mongod(_ZN5mongo12PlanExecutor4makeEPNS_16OperationContextESt10unique_ptrINS_10WorkingSetESt14default_deleteIS4_EES3_INS_9PlanStageES5_IS8_EES3_INS_13QuerySolutionES5_ISB_EES3_INS_14CanonicalQueryES5_ISE_EEPKNS_10CollectionERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS0_11YieldPolicyE+0x2D8) [0x5654d0e1bd68] mongod(_ZN5mongo12PlanExecutor4makeEPNS_16OperationContextESt10unique_ptrINS_10WorkingSetESt14default_deleteIS4_EES3_INS_9PlanStageES5_IS8_EES3_INS_13QuerySolutionES5_ISB_EES3_INS_14CanonicalQueryES5_ISE_EEPKNS_10CollectionENS0_11YieldPolicyE+0xEC) [0x5654d0e1ca1c] mongod(_ZN5mongo11getExecutorEPNS_16OperationContextEPNS_10CollectionESt10unique_ptrINS_14CanonicalQueryESt14default_deleteIS5_EENS_12PlanExecutor11YieldPolicyEm+0x132) [0x5654d0dd5ca2] mongod(_ZN5mongo15getExecutorFindEPNS_16OperationContextEPNS_10CollectionERKNS_15NamespaceStringESt10unique_ptrINS_14CanonicalQueryESt14default_deleteIS8_EENS_12PlanExecutor11YieldPolicyE+0x8B) [0x5654d0dd685b] mongod(_ZN5mongo7FindCmd3runEPNS_16OperationContextERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERNS_7BSONObjEiRS8_RNS_14BSONObjBuilderE+0xC98) [0x5654d09fdad8] mongod(_ZN5mongo7Command3runEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS3_21ReplyBuilderInterfaceE+0x527) [0x5654d09d4867] mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_RKNS_3rpc16RequestInterfaceEPNS4_21ReplyBuilderInterfaceE+0x9E0) [0x5654d09d5c00] mongod(_ZN5mongo11runCommandsEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS2_21ReplyBuilderInterfaceE+0x26D) [0x5654d0ff688d] mongod(+0xC82302) [0x5654d0bf3302] mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x6D6) [0x5654d0bf5276] mongod(_ZN5mongo23ServiceEntryPointMongod12_sessionLoopERKSt10shared_ptrINS_9transport7SessionEE+0x1ED) [0x5654d07f1f1d] mongod(+0x88185D) [0x5654d07f285d] mongod(+0x14E3722) [0x5654d1454722] libpthread.so.0(+0x7DC5) [0x7ff8f876bdc5] libc.so.6(clone+0x6D) [0x7ff8f849a73d] ----- END BACKTRACE -----
irwks commented on Sun, 21 May 2017 20:08:04 +0000: Hello Thomas, thank you very much for your work. We moved our database to another hardware instance since then everything kept running smoothly, even under a more heavy load which our network is currently experiencing since the one dedicated root server ist missing. If this is fine for you, I will refer to this issue when talking to our hosting provider since the RAM-check on the mentioned server did not return any errors. I will also keep this issue updated if something happens in the case. Thank you! thomas.schubert commented on Fri, 19 May 2017 15:58:42 +0000: Hi irwks, We've analyzed the provided stack traces. This behavior appears to be the result of fault hardware, specifically memory corruption would most likely explain these segmentation faults. Would you move this mongod instance to a host with new hardware and confirm the that issue is resolved? Thank you, Thomas irwks commented on Wed, 17 May 2017 12:35:23 +0000: Brief Update: After running smoothly for the last 3 days our system is now failing in production right after restart and and running a couple of queries. I just uploaded the current mongod.log file, the diagnostic data and the matching syslog snippet. irwks commented on Mon, 15 May 2017 19:17:06 +0000: Alright, @keith.bostic. To clear out any potential OS installation issues I reinstalled the host completely from scratch and only restored a database backup. If it happens again, I will upload all the needed diagnostic data. keith.bostic commented on Mon, 15 May 2017 19:12:15 +0000: Thank you, ramon.fernandez. irwks, I haven't given up yet, but so far nothing here is pointing to the problem. Since you're seeing repeated failures, would you upload the same information for your next failure, as you did for this one? Thank you! ramon.fernandez commented on Mon, 15 May 2017 17:21:36 +0000: Unfortunately the uploaded logs don't have any additional information that can help us here. We'll let you know if there's other information you may collect that can give us a clue here – thanks in advance for your continued patience. Regards, Ramón. irwks commented on Fri, 12 May 2017 22:22:07 +0000: Of course, @Keith Bostic, I just uploaded the relevant part of the messages.log file via the given upload portal. keith.bostic commented on Fri, 12 May 2017 21:35:58 +0000: irwks, if more failures occur, would you please get us copies of those logs, too? And, could we please have copies of the system diagnostic data (/var/log/messages) covering the time period of this failure, and, of course, any future failures. Thank you! irwks commented on Fri, 12 May 2017 19:21:57 +0000: Hello Thomas, thank you for your fast response. I have uploaded a zip archive of all requested data, beware that the files are pretty big so the interesting part of mongod.log starts at line 1265526. Thanks in advance, Ilja thomas.schubert commented on Fri, 12 May 2017 18:35:29 +0000: Hi irwks, Would you please upload an archive of the diagnostic.data directory as well as the complete log files of the affected mongod? I've created a secure upload portal for you to provide these files. Files uploaded to this portal are only visible to MongoDB employees investigating this issue and are routinely deleted after some time. Thank you, Thomas
Sadly it seems pretty sporadic, so I can't provide steps to reproduce.