...
During the powercycle testing the following fatal assertion occurred: 081-0400 I CONTROL [main] ***** SERVER RESTARTED ***** 2017-08-25T13:39:12.286-0400 I CONTROL [initandlisten] MongoDB starting : pid=1594 port=37017 dbpath=/home/jonathan/pc_val_WT-3/data/db 64-bit host=CAP-JA-Ubuntu14 2017-08-25T13:39:12.286-0400 I CONTROL [initandlisten] db version v3.5.11-251-g3128427f77 2017-08-25T13:39:12.286-0400 I CONTROL [initandlisten] git version: 3128427f77397916967e3eb684dfed845bb1638a 2017-08-25T13:39:12.286-0400 I CONTROL [initandlisten] allocator: tcmalloc 2017-08-25T13:39:12.286-0400 I CONTROL [initandlisten] modules: none 2017-08-25T13:39:12.286-0400 I CONTROL [initandlisten] build environment: 2017-08-25T13:39:12.286-0400 I CONTROL [initandlisten] distarch: x86_64 2017-08-25T13:39:12.286-0400 I CONTROL [initandlisten] target_arch: x86_64 2017-08-25T13:39:12.286-0400 I CONTROL [initandlisten] options: { net: { bindIp: "0.0.0.0", port: 37017 }, processManagement: { fork: true }, setParameter: { logLevel: "3" }, storage: { dbPath: "/home/jonathan/pc_val_WT-3/data/db", engine: "wiredTiger" }, systemLog: { destination: "file", logAppend: true, path: "/home/jonathan/pc_val_WT-3/log/mongod.log" } } 2017-08-25T13:39:12.305-0400 W - [initandlisten] Detected unclean shutdown - /home/jonathan/pc_val_WT-3/data/db/mongod.lock is not empty. 2017-08-25T13:39:12.306-0400 E STORAGE [initandlisten] Unable to read the storage engine metadata file: InvalidPath: Metadata file /home/jonathan/pc_val_WT-3/data/db/storage.bson cannot be empty. 2017-08-25T13:39:12.306-0400 F - [initandlisten] Fatal Assertion 28661 at src/mongo/db/storage/storage_engine_metadata.cpp 76 2017-08-25T13:39:12.306-0400 F - [initandlisten] ***aborting after fassert() failure 2017-08-25T13:39:12.769-0400 F - [initandlisten] Got signal: 6 (Aborted). 0x7f1bdcbf2f11 0x7f1bdcbf2129 0x7f1bdcbf260d 0x7f1bd9e00330 0x7f1bd9a61c37 0x7f1bd9a65028 0x7f1bdb42bfef 0x7f1bdb9895d0 0x7f1bdb989758 0x7f1bdb69cfb1 0x7f1bdb3c619b 0x7f1bdb4987bc 0x7f1bdb42dfa9 0x7f1bd9a4cf45 0x7f1bdb488581 ----- BEGIN BACKTRACE ----- {"backtrace":[{"b":"7F1BDAB5C000","o":"2096F11","s":"_ZN5mongo15printStackTraceERSo"},{"b":"7F1BDAB5C000","o":"2096129"},{"b":"7F1BDAB5C000","o":"209660D"},{"b":"7F1BD9DF0000","o":"10330"},{"b":"7F1BD9A2B000","o":"36C37","s":"gsignal"},{"b":"7F1BD9A2B000","o":"3A028","s":"abort"},{"b":"7F1BDAB5C000","o":"8CFFEF","s":"_ZN5mongo32fassertFailedNoTraceWithLocationEiPKcj"},{"b":"7F1BDAB5C000","o":"E2D5D0","s":"_ZN5mongo21StorageEngineMetadata7forPathERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE"},{"b":"7F1BDAB5C000","o":"E2D758","s":"_ZN5mongo21StorageEngineMetadata23getStorageEngineForPathERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE"},{"b":"7F1BDAB5C000","o":"B40FB1","s":"_ZN5mongo20ServiceContextMongoD29initializeGlobalStorageEngineEv"},{"b":"7F1BDAB5C000","o":"86A19B"},{"b":"7F1BDAB5C000","o":"93C7BC","s":"_ZN5mongo11mongoDbMainEiPPcS1_"},{"b":"7F1BDAB5C000","o":"8D1FA9","s":"main"},{"b":"7F1BD9A2B000","o":"21F45","s":"__libc_start_main"},{"b":"7F1BDAB5C000","o":"92C581"}],"processInfo":{ "mongodbVersion" : "3.5.11-251-g3128427f77", "gitVersion" : "3128427f77397916967e3eb684dfed845bb1638a", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.16.0-31-generic", "version" : "#43~14.04.1-Ubuntu SMP Tue Mar 10 20:13:38 UTC 2015", "machine" : "x86_64" }, "somap" : [ { "b" : "7F1BDAB5C000", "elfType" : 3, "buildId" : "B10A43F8CCC8E0D4E054EDBA11F9129CFF7FF704" }, { "b" : "7FFF99AF1000", "elfType" : 3, "buildId" : "5552B9335DDE9349419BA10896C1E75C9432A946" }, { "b" : "7F1BDA72F000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "E2A6DD5048A0A051FD61043BDB69D8CC68192AB7" }, { "b" : "7F1BDA52B000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "DA9B8C234D0FE9FD8CAAC8970A7EC1B6C8F6623F" }, { "b" : "7F1BDA225000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "D144258E614900B255A31F3FD2283A878670D5BC" }, { "b" : "7F1BDA00E000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "E4AFDDDF7C56BCC9B8DD5EBD99DA89B02C17F7FD" }, { "b" : "7F1BD9DF0000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "31E9F21AE8C10396171F1E13DA15780986FA696C" }, { "b" : "7F1BD9A2B000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "CF699A15CAAE64F50311FC4655B86DC39A479789" }, { "b" : "7F1BDA937000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "D0F537904076D73F29E4A37341F8A449E2EF6CD0" } ] }} mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x7f1bdcbf2f11] mongod(+0x2096129) [0x7f1bdcbf2129] mongod(+0x209660D) [0x7f1bdcbf260d] libpthread.so.0(+0x10330) [0x7f1bd9e00330] libc.so.6(gsignal+0x37) [0x7f1bd9a61c37] libc.so.6(abort+0x148) [0x7f1bd9a65028] mongod(_ZN5mongo32fassertFailedNoTraceWithLocationEiPKcj+0x0) [0x7f1bdb42bfef] mongod(_ZN5mongo21StorageEngineMetadata7forPathERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x2E0) [0x7f1bdb9895d0] mongod(_ZN5mongo21StorageEngineMetadata23getStorageEngineForPathERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x38) [0x7f1bdb989758] mongod(_ZN5mongo20ServiceContextMongoD29initializeGlobalStorageEngineEv+0x91) [0x7f1bdb69cfb1] mongod(+0x86A19B) [0x7f1bdb3c619b] mongod(_ZN5mongo11mongoDbMainEiPPcS1_+0x86C) [0x7f1bdb4987bc] mongod(main+0x9) [0x7f1bdb42dfa9] libc.so.6(__libc_start_main+0xF5) [0x7f1bd9a4cf45] mongod(+0x92C581) [0x7f1bdb488581] ----- END BACKTRACE -----
ramon.fernandez commented on Fri, 15 Sep 2017 18:11:40 +0000: Author: {'username': u'dgottlieb', 'name': u'Daniel Gottlieb', 'email': u'daniel.gottlieb@mongodb.com'} Message:SERVER-30836: Dont rely on boost's file rename functionality for necessary fsyncing on POSIX Branch:master https://github.com/mongodb/mongo/commit/33908bf59707e36b43ac58b50da4bde3e9cbae84 jonathan.abrahams commented on Wed, 13 Sep 2017 14:56:56 +0000: The provided patch does protect the storage,bson from becoming empty during system crashes. jonathan.abrahams commented on Tue, 12 Sep 2017 18:48:08 +0000: It seems this ticket has morphed, as the original issue of asserting when the storage.bson is empty is "works as designed". The secondary issue of the file becoming empty is the issue of interest and the ticket is being retitled. daniel.gottlieb@10gen.com commented on Mon, 11 Sep 2017 17:33:23 +0000: Thanks for the logs. It seems there are two issues going on that I want to discuss separately. 1) Starting up mongod with an empty storage.bson file is a crash. It seems reasonable for this case to have the same behavior as the storage.bson file not existing. At the very least it shouldn't crash but result in an appropriate error message. With regards to recreating the storage.bson file when it does not exist: I'm not sure how that does the right thing (I'm skeptical that's by accident from the upgrade process when storage.bson files were introduced), so I think any work that has the empty storage.bson behave similarly should evaluate that the code. 2) The issue I'm more concerned with is a mongod that can come up and after a process/node crash and the storage.bson file not being intact. From our conversation of when the crash occurs, the crash is only issued after the server starts up and a client can connect. It's my expectation that the storage.bson file is safely persisted on disk at this time. I've found that our creation of the storage.bson file, which is done via a rename, doesn't do all of the "tricks" POSIX renames do[1][2]. I've attached a patch that better conforms to what I've seen other implementations do. Can you try running the patched version by hand and see if the problem becomes less frequent (or hopefully just goes away)? [1]https://github.com/wiredtiger/wiredtiger/blob/95d911ab246e444192f34dc395652dba2653dd3c/src/os_posix/os_fs.c#L214-L223 [2]https://github.com/HewlettPackard/sparkle/blob/b0ae7cc87158ba5941616ecd16aec2c45aba9157/alps/src/common/os.hh#L104-L119 jonathan.abrahams commented on Fri, 8 Sep 2017 19:37:21 +0000: Note, if the storage.bson file does not exist then mongod will create one. jonathan.abrahams commented on Fri, 8 Sep 2017 19:35:01 +0000: Steps to reproduce: Start mongod Stop mongod Create empty storage.bson: rm storage.bson touch storage.bson Start mongod jonathan.abrahams commented on Thu, 7 Sep 2017 16:59:13 +0000: Attached tar file has data and logs. milkie commented on Thu, 7 Sep 2017 14:16:43 +0000: jonathan.abrahams I'm tabling work on this until it reproduces again and we can get log and data files to help diagnose further. daniel.gottlieb@10gen.com commented on Fri, 1 Sep 2017 15:25:48 +0000: jonathan.abrahams Can you attach the entire dbpath after one of these failures? And the mongod log files as that seems to be in a separate path. ian@10gen.com commented on Fri, 1 Sep 2017 14:16:30 +0000: Assigning to Dan to investigate a bit further.