...
BugZero found this defect 2771 days ago.
After power failure, 2 MongoDB Servers just won't start again. With one I get: : WiredTiger error (0) [1491533102:978650][31243:0x70b9da9a2dc0], file:WiredTiger.wt, WT_CURSOR.next: read checksum error for 28672B block at offset 9068544: block header checksum of 0 doesn't match expected checksum of 4059665721 And the with other : WiredTiger error (-31802) [1491534981:938109][32667:0x65917b0f2dc0], file:collection-30-4121866540730348039.wt, WT_SESSION.open_cursor: /data2/instagram_bak/collection-30-4121866540730348039.wt: handle-read: pread: failed to read 4094 bytes at offset 2: WT_ERROR: non-specific WiredTiger error Would you be willing to try repairing our .wt files for both our servers separated by 2 set of files that i've attached ? And also would you be able to explain the methods used to perform the repair attempt ? Thanks
thomas.schubert commented on Fri, 7 Apr 2017 22:29:29 +0000: Hi Cezam, Unfortunately, this indicates that there was additional corruption on disk following the power failure. In this situation, my best recommendation would be to resync the affected nodes or restore from a backup. Please note that SERVER project is for reporting bugs or feature suggestions for the MongoDB server. For MongoDB-related support see our Technical Support page for additional resources. Kind regards, Thomas cezam commented on Fri, 7 Apr 2017 13:21:36 +0000: Now set2 failed as well after going through 80% of the db. Here is the error trace 2017-04-07T09:19:35.311-0400 I STORAGE [initandlisten] Repairing collection big_daddy_instagram.accounts_2016_04_28 2017-04-07T09:19:35.311-0400 E STORAGE [initandlisten] WiredTiger error (2) [1491571175:311700][17451:0x760b7e753dc0], file:collection-30-4121866540730348039.wt, WT_SESSION.verify: /data2/instagram_bak/collection-30-4121866540730348039.wt: handle-open: open: No such file or directory 2017-04-07T09:19:35.311-0400 I STORAGE [initandlisten] Verify failed on uri table:collection-30-4121866540730348039. Running a salvage operation. 2017-04-07T09:19:35.311-0400 E STORAGE [initandlisten] WiredTiger error (2) [1491571175:311923][17451:0x760b7e753dc0], file:collection-30-4121866540730348039.wt, WT_SESSION.salvage: /data2/instagram_bak/collection-30-4121866540730348039.wt: handle-open: open: No such file or directory 2017-04-07T09:19:36.513-0400 I - [initandlisten] Invariant failure rs.get() src/mongo/db/catalog/database.cpp 195 2017-04-07T09:19:36.513-0400 I - [initandlisten] ***aborting after invariant() failure 2017-04-07T09:19:36.519-0400 F - [initandlisten] Got signal: 6 (Aborted). 0xc71b0964ac1 0xc71b0963bb9 0xc71b096409d 0x760b7d3bd370 0x760b7d0221d7 0x760b7d0238c8 0xc71afbf5cdc 0xc71afdd4b80 0xc71afddb537 0xc71afddf214 0xc71b02dbda8 0xc71afbdefcb 0xc71afbe1e95 0xc71afc01a94 0x760b7d00eb35 0xc71afc5f87f ----- BEGIN BACKTRACE ----- {"backtrace":[{"b":"C71AF3E6000","o":"157EAC1","s":"_ZN5mongo15printStackTraceERSo"},{"b":"C71AF3E6000","o":"157DBB9"},{"b":"C71AF3E6000","o":"157E09D"},{"b":"760B7D3AE000","o":"F370"},{"b":"760B7CFED000","o":"351D7","s":"gsignal"},{"b":"760B7CFED000","o":"368C8","s":"abort"},{"b":"C71AF3E6000","o":"80FCDC","s":"_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j"},{"b":"C71AF3E6000","o":"9EEB80","s":"_ZN5mongo8Database30_getOrCreateCollectionInstanceEPNS_16OperationContextENS_10StringDataE"},{"b":"C71AF3E6000","o":"9F5537","s":"_ZN5mongo8DatabaseC1EPNS_16OperationContextENS_10StringDataEPNS_20DatabaseCatalogEntryE"},{"b":"C71AF3E6000","o":"9F9214","s":"_ZN5mongo14DatabaseHolder6openDbEPNS_16OperationContextENS_10StringDataEPb"},{"b":"C71AF3E6000","o":"EF5DA8","s":"_ZN5mongo14repairDatabaseEPNS_16OperationContextEPNS_13StorageEngineERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEbb"},{"b":"C71AF3E6000","o":"7F8FCB"},{"b":"C71AF3E6000","o":"7FBE95"},{"b":"C71AF3E6000","o":"81BA94","s":"main"},{"b":"760B7CFED000","o":"21B35","s":"__libc_start_main"},{"b":"C71AF3E6000","o":"87987F"}],"processInfo":{ "mongodbVersion" : "3.4.3", "gitVersion" : "f07437fb5a6cca07c10bafa78365456eb1d6d5e1", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.14.32-xxxx-grs-ipv6-64", "version" : "#9 SMP Thu Oct 20 14:53:52 CEST 2016", "machine" : "x86_64" }, "somap" : [ { "b" : "C71AF3E6000", "elfType" : 3, "buildId" : "E7548BC9521159DC8B80A4D768E5544FA00942D3" }, { "b" : "760B7F064000", "elfType" : 3, "buildId" : "11BE4D720B58B4AAC3FB4BF8311F6F3005C84E6B" }, { "b" : "760B7E2D8000", "path" : "/lib64/libssl.so.10", "elfType" : 3, "buildId" : "90EAF65D9B0EEEB1424241281F7F197451D4317D" }, { "b" : "760B7DEEE000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "7278C69EE161D98DDD0FA00F92B67AD78C7B7F40" }, { "b" : "760B7DCE6000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "82E77ADE22BC9FFF8D3458BD37331E7EDF174C28" }, { "b" : "760B7DAE2000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "C5F560504E1AF52E29679C3B52FF11121015D6BB" }, { "b" : "760B7D7E0000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "721C7CC9488EFA25F83B48AF713AB27DBE48EF3E" }, { "b" : "760B7D5CA000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "408B46E291B2D4C9612E27C0509D165D7E186D40" }, { "b" : "760B7D3AE000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "C3DEB1FA27CD0C1C3CC575B944ABACBA0698B0F2" }, { "b" : "760B7CFED000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "8B2C421716985B927AA0CAF2A05D0B1F452367F7" }, { "b" : "760B7E546000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "8F3E366E2DB73C330A3791DEAE31AE9579099B44" }, { "b" : "760B7CD9F000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "A2499C359AA179EE23324ED949C0E508E4434F10" }, { "b" : "760B7CAB8000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "E09A34D9083DC6FEAF7018C09D55631DEEE2836D" }, { "b" : "760B7C8B4000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "BF54B7C8932E450769FBBB8B18864D1DD70BBC67" }, { "b" : "760B7C682000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "BF8F00D7CB849ADB0B7A4703BC7B8D66AEE6A49C" }, { "b" : "760B7C46C000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "EA8E45DC8E395CC5E26890470112D97A1F1E0B65" }, { "b" : "760B7C25D000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "1E7A92FDD6FB3871DA97F4BCA2E147E72B6B6E1F" }, { "b" : "760B7C059000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "2E01D5AC08C1280D013AAB96B292AC58BC30A263" }, { "b" : "760B7BE3F000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "FE7AE845A123A3DFC0FDC2408BCBC2BA8B61B158" }, { "b" : "760B7BC18000", "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "76687CA31A406854DF3BCF8D03055656F56E6892" }, { "b" : "760B7B9B7000", "path" : "/lib64/libpcre.so.1", "elfType" : 3, "buildId" : "AE64AA461A26E01F60408013D361749D56DD0AE1" } ] }} mongod(_ZN5mongo15printStackTraceERSo+0x41) [0xc71b0964ac1] mongod(+0x157DBB9) [0xc71b0963bb9] mongod(+0x157E09D) [0xc71b096409d] libpthread.so.0(+0xF370) [0x760b7d3bd370] libc.so.6(gsignal+0x37) [0x760b7d0221d7] libc.so.6(abort+0x148) [0x760b7d0238c8] mongod(_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j+0x0) [0xc71afbf5cdc] mongod(_ZN5mongo8Database30_getOrCreateCollectionInstanceEPNS_16OperationContextENS_10StringDataE+0xE0) [0xc71afdd4b80] mongod(_ZN5mongo8DatabaseC1EPNS_16OperationContextENS_10StringDataEPNS_20DatabaseCatalogEntryE+0x677) [0xc71afddb537] mongod(_ZN5mongo14DatabaseHolder6openDbEPNS_16OperationContextENS_10StringDataEPb+0xD44) [0xc71afddf214] mongod(_ZN5mongo14repairDatabaseEPNS_16OperationContextEPNS_13StorageEngineERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEbb+0x418) [0xc71b02dbda8] mongod(+0x7F8FCB) [0xc71afbdefcb] mongod(+0x7FBE95) [0xc71afbe1e95] mongod(main+0x964) [0xc71afc01a94] libc.so.6(__libc_start_main+0xF5) [0x760b7d00eb35] mongod(+0x87987F) [0xc71afc5f87f] ----- END BACKTRACE ----- cezam commented on Fri, 7 Apr 2017 12:55:33 +0000: Hi again, set1 ended up failing. Although repair ran much longer then prior to me sending you the files. The error i'm getting now is the following. Thanks Marc 2017-04-07T08:50:10.523-0400 I STORAGE [initandlisten] Repairing collection bid_daddy_twitter.accounts_2012_08_12 2017-04-07T08:50:10.523-0400 I STORAGE [initandlisten] Verify failed on uri table:collection-1720--1122495060098508656. Running a salvage operation. 2017-04-07T08:50:10.646-0400 I - [initandlisten] Invariant failure rs.get() src/mongo/db/catalog/database.cpp 195 2017-04-07T08:50:10.646-0400 I - [initandlisten] ***aborting after invariant() failure 2017-04-07T08:50:10.652-0400 F - [initandlisten] Got signal: 6 (Aborted). 0x177e13ac1 0x177e12bb9 0x177e1309d 0x688fe952e370 0x688fe91931d7 0x688fe91948c8 0x1770a4cdc 0x177283b80 0x17728a537 0x17728e214 0x17778ada8 0x17708dfcb 0x177090e95 0x1770b0a94 0x688fe917fb35 0x17710e87f ----- BEGIN BACKTRACE ----- {"backtrace":[{"b":"176895000","o":"157EAC1","s":"_ZN5mongo15printStackTraceERSo"},{"b":"176895000","o":"157DBB9"},{"b":"176895000","o":"157E09D"},{"b":"688FE951F000","o":"F370"},{"b":"688FE915E000","o":"351D7","s":"gsignal"},{"b":"688FE915E000","o":"368C8","s":"abort"},{"b":"176895000","o":"80FCDC","s":"_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j"},{"b":"176895000","o":"9EEB80","s":"_ZN5mongo8Database30_getOrCreateCollectionInstanceEPNS_16OperationContextENS_10StringDataE"},{"b":"176895000","o":"9F5537","s":"_ZN5mongo8DatabaseC1EPNS_16OperationContextENS_10StringDataEPNS_20DatabaseCatalogEntryE"},{"b":"176895000","o":"9F9214","s":"_ZN5mongo14DatabaseHolder6openDbEPNS_16OperationContextENS_10StringDataEPb"},{"b":"176895000","o":"EF5DA8","s":"_ZN5mongo14repairDatabaseEPNS_16OperationContextEPNS_13StorageEngineERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEbb"},{"b":"176895000","o":"7F8FCB"},{"b":"176895000","o":"7FBE95"},{"b":"176895000","o":"81BA94","s":"main"},{"b":"688FE915E000","o":"21B35","s":"__libc_start_main"},{"b":"176895000","o":"87987F"}],"processInfo":{ "mongodbVersion" : "3.4.3", "gitVersion" : "f07437fb5a6cca07c10bafa78365456eb1d6d5e1", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.14.32-xxxx-grs-ipv6-64", "version" : "#9 SMP Thu Oct 20 14:53:52 CEST 2016", "machine" : "x86_64" }, "somap" : [ { "b" : "176895000", "elfType" : 3, "buildId" : "E7548BC9521159DC8B80A4D768E5544FA00942D3" }, { "b" : "688FEB1D5000", "elfType" : 3, "buildId" : "11BE4D720B58B4AAC3FB4BF8311F6F3005C84E6B" }, { "b" : "688FEA449000", "path" : "/lib64/libssl.so.10", "elfType" : 3, "buildId" : "90EAF65D9B0EEEB1424241281F7F197451D4317D" }, { "b" : "688FEA05F000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "7278C69EE161D98DDD0FA00F92B67AD78C7B7F40" }, { "b" : "688FE9E57000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "82E77ADE22BC9FFF8D3458BD37331E7EDF174C28" }, { "b" : "688FE9C53000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "C5F560504E1AF52E29679C3B52FF11121015D6BB" }, { "b" : "688FE9951000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "721C7CC9488EFA25F83B48AF713AB27DBE48EF3E" }, { "b" : "688FE973B000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "408B46E291B2D4C9612E27C0509D165D7E186D40" }, { "b" : "688FE951F000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "C3DEB1FA27CD0C1C3CC575B944ABACBA0698B0F2" }, { "b" : "688FE915E000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "8B2C421716985B927AA0CAF2A05D0B1F452367F7" }, { "b" : "688FEA6B7000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "8F3E366E2DB73C330A3791DEAE31AE9579099B44" }, { "b" : "688FE8F10000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "A2499C359AA179EE23324ED949C0E508E4434F10" }, { "b" : "688FE8C29000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "E09A34D9083DC6FEAF7018C09D55631DEEE2836D" }, { "b" : "688FE8A25000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "BF54B7C8932E450769FBBB8B18864D1DD70BBC67" }, { "b" : "688FE87F3000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "BF8F00D7CB849ADB0B7A4703BC7B8D66AEE6A49C" }, { "b" : "688FE85DD000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "EA8E45DC8E395CC5E26890470112D97A1F1E0B65" }, { "b" : "688FE83CE000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "1E7A92FDD6FB3871DA97F4BCA2E147E72B6B6E1F" }, { "b" : "688FE81CA000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "2E01D5AC08C1280D013AAB96B292AC58BC30A263" }, { "b" : "688FE7FB0000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "FE7AE845A123A3DFC0FDC2408BCBC2BA8B61B158" }, { "b" : "688FE7D89000", "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "76687CA31A406854DF3BCF8D03055656F56E6892" }, { "b" : "688FE7B28000", "path" : "/lib64/libpcre.so.1", "elfType" : 3, "buildId" : "AE64AA461A26E01F60408013D361749D56DD0AE1" } ] }} mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x177e13ac1] mongod(+0x157DBB9) [0x177e12bb9] mongod(+0x157E09D) [0x177e1309d] libpthread.so.0(+0xF370) [0x688fe952e370] libc.so.6(gsignal+0x37) [0x688fe91931d7] libc.so.6(abort+0x148) [0x688fe91948c8] mongod(_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j+0x0) [0x1770a4cdc] mongod(_ZN5mongo8Database30_getOrCreateCollectionInstanceEPNS_16OperationContextENS_10StringDataE+0xE0) [0x177283b80] mongod(_ZN5mongo8DatabaseC1EPNS_16OperationContextENS_10StringDataEPNS_20DatabaseCatalogEntryE+0x677) [0x17728a537] mongod(_ZN5mongo14DatabaseHolder6openDbEPNS_16OperationContextENS_10StringDataEPb+0xD44) [0x17728e214] mongod(_ZN5mongo14repairDatabaseEPNS_16OperationContextEPNS_13StorageEngineERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEbb+0x418) [0x17778ada8] mongod(+0x7F8FCB) [0x17708dfcb] mongod(+0x7FBE95) [0x177090e95] mongod(main+0x964) [0x1770b0a94] libc.so.6(__libc_start_main+0xF5) [0x688fe917fb35] mongod(+0x87987F) [0x17710e87f] ----- END BACKTRACE ----- cezam commented on Fri, 7 Apr 2017 12:42:36 +0000: Hey Thomas, I've launched a repair on both databases using your files and now awaiting result. I'll report back on how everything went. Thank you Marc thomas.schubert commented on Fri, 7 Apr 2017 04:18:37 +0000: Hi Cezam, I've attached a tarball with repair attempts for both sets. Please extract and replace them in their respective paths, and let us know if it resolves the issue. Unfortunately, the repair process we use to attempt these repairs is not ready to be publicly shared. We're tracking the work to make repair and recovery of the WiredTiger storage engine more robust in SERVER-19815. Please feel free to watch and vote for SERVER-19815. Thank you, Thomas
/etc/init.d/mongod_tw start and /etc/init.d/mongod_ig start