...
Hi, I got the same issue with this one: SERVER-25285 One different detail... I was able to start mongo using --repair however it would go down few seconds later when running some queries.. The database crashed when I tried to dump using mongodump. Then I tried to dump the collections one by one. That dump worked for all the collections except by one called "user_photos" It is not a big collections, only 4554 documents. > db.user_photos.count() 4554 When running mongodump for this specific collection would make the mongo crash with same message on log: 2017-04-23T00:19:33.460-0300 E STORAGE [conn6] WiredTiger error (0) [1492917573:460488][21787:0x7f7449c2d700], file:collection-182-5605041607111941117.wt, WT_CURSOR.next: read checksum error for 8192B block at offset 614400: calculated block checksum of 3087355304 doesn't match expected checksum of 2418908030 2017-04-23T00:19:33.460-0300 E STORAGE [conn6] WiredTiger error (0) [1492917573:460554][21787:0x7f7449c2d700], file:collection-182-5605041607111941117.wt, WT_CURSOR.next: collection-182-5605041607111941117.wt: encountered an illegal file format or internal value 2017-04-23T00:19:33.460-0300 E STORAGE [conn6] WiredTiger error (-31804) [1492917573:460568][21787:0x7f7449c2d700], file:collection-182-5605041607111941117.wt, WT_CURSOR.next: the process must exit and restart: WT_PANIC: WiredTiger library panic 2017-04-23T00:19:33.460-0300 I - [conn6] Fatal Assertion 28558 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 361 2017-04-23T00:19:33.460-0300 I - [conn6] ***aborting after fassert() failure Can you guys help me fixing? Production system is down thank you!
thomas.schubert commented on Tue, 2 May 2017 22:15:28 +0000: Hi victor@surfmappers.com, Unfortunately, we are not able to repair this type of corruption. This error and your responses suggest that the storage layer is likely at fault. Therefore, I would strongly recommend moving your mongod to a new host. If you encounter this issue on new hardware, please let us know and we will continue to investigate. Thank you, Thomas victor@surfmappers.com commented on Tue, 2 May 2017 22:02:11 +0000: files from the crash in 05/02 Are you guys able to restore? What I can do to stop this problems? thanks victor@surfmappers.com commented on Tue, 2 May 2017 21:37:10 +0000: Just happened again =((( 2017-05-02T18:06:26.616-0300 I COMMAND [conn175414] command surfmappers.spots command: getMore { getMore: 107007301872, collection: "spots" } originatingCommand: { find: "spots", filter: {}, sort: { name: 1 } } planSummary: COLLSCAN cursorid:107007301872 keysExamined:0 docsExamined:0 hasSortStage:1 numYields:39 nreturned:5058 reslen:16773463 locks:{ Global: { acquireCount: { r: 80 } }, Database: { acquireCount: { r: 40 } }, Collection: { acquireCount: { r: 40 } } } protocol:op_query 117ms 2017-05-02T18:07:49.842-0300 E STORAGE [conn175529] WiredTiger error (0) [1493759269:747168][25972:0x7f82a01fe700], file:collection-0-7906981721518985878.wt, WT_CURSOR.next: read checksum error for 12288B block at offset 27873280: calculated block checksum of 289468493 doesn't match expected checksum of 1316217320 2017-05-02T18:07:49.842-0300 E STORAGE [conn175529] WiredTiger error (0) [1493759269:842987][25972:0x7f82a01fe700], file:collection-0-7906981721518985878.wt, WT_CURSOR.next: collection-0-7906981721518985878.wt: encountered an illegal file format or internal value 2017-05-02T18:07:49.843-0300 E STORAGE [conn175529] WiredTiger error (-31804) [1493759269:843054][25972:0x7f82a01fe700], file:collection-0-7906981721518985878.wt, WT_CURSOR.next: the process must exit and restart: WT_PANIC: WiredTiger library panic 2017-05-02T18:07:49.869-0300 I - [conn175529] Fatal Assertion 28558 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 361 2017-05-02T18:07:49.882-0300 I - [conn175529] ***aborting after fassert() failure 2017-05-02T18:07:49.933-0300 I - [WTJournalFlusher] Fatal Assertion 28559 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 64 2017-05-02T18:07:49.933-0300 I - [WTJournalFlusher] ***aborting after fassert() failure 2017-05-02T18:07:49.958-0300 I - [conn175527] Fatal Assertion 28559 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 64 2017-05-02T18:07:49.958-0300 I - [conn175527] ***aborting after fassert() failure 2017-05-02T18:07:50.000-0300 I - [ftdc] Fatal Assertion 28559 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 64 2017-05-02T18:07:50.000-0300 I - [ftdc] ***aborting after fassert() failure 2017-05-02T18:07:50.088-0300 F - [conn175529] Got signal: 6 (Aborted). 0x7f82ad179b81 0x7f82ad178c79 0x7f82ad17915d 0x7f82aa7eb390 0x7f82aa446428 0x7f82aa44802a 0x7f82ac408293 0x7f82ace896c6 0x7f82ac412505 0x7f82ac4125f9 0x7f82ac412851 0x7f82ada7d6a5 0x7f82ada9ae70 0x7f82ada9f298 0x7f82adab707d 0x7f82ada82c01 0x7f82adadd28f 0x7f82ace7cf24 0x7f82ac777cb5 0x7f82ac7a2f93 0x7f82acaa4d8a 0x7f82acaa56ab 0x7f82ac68eab6 0x7f82ac665a47 0x7f82ac666dd9 0x7f82acc7fc0d 0x7f82ac882742 0x7f82ac8846b6 0x7f82ac476a5d 0x7f82ac47739d 0x7f82ad0df8b2 0x7f82aa7e16ba 0x7f82aa51782d here the log victor@surfmappers.com commented on Sun, 30 Apr 2017 21:46:32 +0000: Answering your questions: 1. What kind of underlying storage mechanism are you using? Are the storage devices attached locally or over the network? Are the disks SSDs or HDDs? What kind of RAID and/or volume management system are you using? ssd local drive. no RAID 2. Would you please check the integrity of your disks? Yes. I`ll send the results after when I have it. 3. Has the database always been running this version of MongoDB? If not please describe the upgrade/downgrade cycles the database has been through. No. I first started this system was 2014. I think was Mongod 2.6. Early last year I updated to 3.4.2. It's being running this was since then. This kind of problem started just now. 4. Have you manipulated (copied or moved) the underlying database files? If so, was mongod running? I have not. 5. Have you ever restored this instance from backups? Yes. 6. What method do you use to create backups? mongodump & mongorestore 7. When was the underlying filesystem last checked and is it currently marked clean? I never checked actually. When I upgraded mongo to version 3.4 I also update the server because was running very old OS. So I've ordered a branch new server and installed all from scratch. 8. Has this node recently gone through an unclean shutdown? No. victor@surfmappers.com commented on Sun, 30 Apr 2017 16:23:17 +0000: Hello Ramon Thank you for you help! Before you send this files I was able to recover my system by restoring the collection user_photos from a previous backup. Wasn't old backup and I didn't loose data so this worked fine. However today the problem happened again in a different collection! =( log below: 2017-04-30T12:40:36.863-0300 I NETWORK [conn1] received client metadata from 127.0.0.1:50018 conn1: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "3.4.2" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } } 2017-04-30T12:43:26.287-0300 I COMMAND [conn12] command surfmappers.notifications command: getMore { getMore: 66955036196, collection: "notifications" } originatingCommand: { find: "notifications", skip: 0, snapshot: true } planSummary: COLLSCAN cursorid:66955036196 keysExamined:0 docsExamined:32159 numYields:254 nreturned:32158 reslen:16776795 locks:{ Global: { acquireCount: { r: 510 } }, Database: { acquireCount: { r: 255 } }, Collection: { acquireCount: { r: 255 } } } protocol:op_query 213ms 2017-04-30T12:46:26.630-0300 E STORAGE [conn27] WiredTiger error (0) [1493567186:630867][4995:0x7fe7d0c7b700], file:collection-178-5605041607111941117.wt, WT_CURSOR.next: read checksum error for 12288B block at offset 356352: calculated block checksum of 2166949970 doesn't match expected checksum of 3768224026 2017-04-30T12:46:26.630-0300 E STORAGE [conn27] WiredTiger error (0) [1493567186:630953][4995:0x7fe7d0c7b700], file:collection-178-5605041607111941117.wt, WT_CURSOR.next: collection-178-5605041607111941117.wt: encountered an illegal file format or internal value 2017-04-30T12:46:26.630-0300 E STORAGE [conn27] WiredTiger error (-31804) [1493567186:630970][4995:0x7fe7d0c7b700], file:collection-178-5605041607111941117.wt, WT_CURSOR.next: the process must exit and restart: WT_PANIC: WiredTiger library panic 2017-04-30T12:46:26.630-0300 I - [conn27] Fatal Assertion 28558 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 361 2017-04-30T12:46:26.631-0300 I - [conn27] ***aborting after fassert() failure 2017-04-30T12:46:26.634-0300 I - [WTJournalFlusher] Fatal Assertion 28559 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 64 2017-04-30T12:46:26.634-0300 I - [WTJournalFlusher] I tried same strategy, backup collections one by one to identify where was the problem. Found out the problem was on the collection db.users Managed to restore the backup from old. did with success.. However after starting mongo had the same problem but seems like its in a different collection. See log below: 2017-04-30T13:04:09.387-0300 I NETWORK [conn7] received client metadata from 127.0.0.1:55176 conn7: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "3.4.2" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } } 2017-04-30T13:04:27.378-0300 E STORAGE [conn17] WiredTiger error (0) [1493568267:378172][5455:0x7fa969eed700], file:collection-174-5605041607111941117.wt, WT_CURSOR.next: read checksum error for 8192B block at offset 80494592: calculated block checksum of 148133422 doesn't match expected checksum of 607399529 2017-04-30T13:04:27.378-0300 E STORAGE [conn17] WiredTiger error (0) [1493568267:378317][5455:0x7fa969eed700], file:collection-174-5605041607111941117.wt, WT_CURSOR.next: collection-174-5605041607111941117.wt: encountered an illegal file format or internal value 2017-04-30T13:04:27.378-0300 E STORAGE [conn17] WiredTiger error (-31804) [1493568267:378333][5455:0x7fa969eed700], file:collection-174-5605041607111941117.wt, WT_CURSOR.next: the process must exit and restart: WT_PANIC: WiredTiger library panic 2017-04-30T13:04:27.378-0300 I - [conn17] Fatal Assertion 28558 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 361 2017-04-30T13:04:27.378-0300 I - [conn17] ***aborting after fassert() failure I've attached new Mongo files. they are named with prefix 0430_ ramon.fernandez commented on Mon, 24 Apr 2017 14:23:51 +0000: I've attached the result of a repair attempt, but please note that the error message above points at data corruption in the collection-182-5605041607111941117.wt file, which is most likely caused by a flaky storage layer but can happen in other scenarios: What kind of underlying storage mechanism are you using? Are the storage devices attached locally or over the network? Are the disks SSDs or HDDs? What kind of RAID and/or volume management system are you using? Would you please check the integrity of your disks? Has the database always been running this version of MongoDB? If not please describe the upgrade/downgrade cycles the database has been through. Have you manipulated (copied or moved) the underlying database files? If so, was mongod running? Have you ever restored this instance from backups? What method do you use to create backups? When was the underlying filesystem last checked and is it currently marked clean? Has this node recently gone through an unclean shutdown? I'd recommend you resync this node from a healthy one or, if this node was not part of a replica set, that you restore this data from backups.