...
The powercycle test created a DB that was not recoverable: 2015-07-02T15:10:36.727-0400 I CONTROL [main] ***** SERVER RESTARTED ***** 2015-07-02T15:10:36.788-0400 W - [initandlisten] Detected unclean shutdown - /home/jonathan/pt-315/data/wiredTiger/mongod.lock is not empty. 2015-07-02T15:10:36.788-0400 W STORAGE [initandlisten] Recovering data from the last clean checkpoint. 2015-07-02T15:10:36.788-0400 I STORAGE [initandlisten] wiredtiger_open config: create,cache_size=8G,session_max=20000,eviction=(threads_max=4),statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0), 2015-07-02T15:10:36.861-0400 E STORAGE [initandlisten] WiredTiger (2) [1435864236:861392][2600:0x7f6b3b048b80], file:collection-181-5805430730046625089.wt: /home/jonathan/pt-315/data/wiredTiger/collection-181-5805430730046625089.wt: No such file or directory 2015-07-02T15:10:36.861-0400 E STORAGE [initandlisten] WiredTiger (2) [1435864236:861464][2600:0x7f6b3b048b80], file:collection-181-5805430730046625089.wt: Operation failed during recovery: No such file or directory 2015-07-02T15:10:36.880-0400 I - [initandlisten] Fatal assertion 28548 NoSuchKey Unable to find metadata for table:collection-175-5805430730046625089 2015-07-02T15:10:36.880-0400 I - [initandlisten] ***aborting after fassert() failure
jonathan.abrahams commented on Wed, 26 Aug 2015 14:28:18 +0000: I am rerunning the tests using 3.1.8-pre-, and so far I have not been able to reproduce this issue. It's possible the fixes have corrected this issue. Some tests run on VMs, some on physical machines. I have pushed the latest powertest.sh to the GitHub repo. Stay tuned and if this is not reproducible, it will be marked resolved. alexander.gorrod commented on Wed, 26 Aug 2015 07:13:20 +0000: Thanks for re-testing jonathan.abrahams. Could you provide some more information: When you say latest - which version do you mean? Do you know how long the test ran for before the failure happened? Were you running the crash cycling on physical hardware or with a virtual machine? Are you running with the power-cycle script as in the GitHub repository? I've updated to the newest version of the script and starting running locally again. jonathan.abrahams commented on Fri, 14 Aug 2015 18:17:32 +0000: Reopening based on last comment. jonathan.abrahams commented on Fri, 14 Aug 2015 14:46:42 +0000: I reran this test on latest and still have this problem: 2015-08-13T16:57:17.880-0400 I CONTROL [main] ***** SERVER RESTARTED ***** 2015-08-13T16:57:17.937-0400 W - [initandlisten] Detected unclean shutdown - /home/jonathan/pt-317-wt-100m/data/wiredTiger/mongod.lock is not empty. 2015-08-13T16:57:17.937-0400 W STORAGE [initandlisten] Recovering data from the last clean checkpoint. 2015-08-13T16:57:17.937-0400 I STORAGE [initandlisten] wiredtiger_open config: create,cache_size=8G,session_max=20000,eviction=(threads_max=4),statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0), 2015-08-13T16:59:33.576-0400 E STORAGE [initandlisten] WiredTiger (2) [1439499573:576196][2591:0x7f547b92ccc0], file:index-27-2812787096723087782.wt: /home/jonathan/pt-317-wt-100m/data/wiredTiger/index-27-2812787096723087782.wt: No such file or directory 2015-08-13T16:59:33.576-0400 E STORAGE [initandlisten] WiredTiger (2) [1439499573:576268][2591:0x7f547b92ccc0], file:index-27-2812787096723087782.wt: Operation failed during recovery: No such file or directory 2015-08-13T16:59:47.078-0400 I STORAGE [initandlisten] dropping unused ident: collection-28-2812787096723087782 2015-08-13T16:59:47.080-0400 I STORAGE [initandlisten] dropping unused ident: collection-30-2812787096723087782 2015-08-13T16:59:47.081-0400 I STORAGE [initandlisten] dropping unused ident: collection-32-2812787096723087782 2015-08-13T16:59:47.082-0400 I STORAGE [initandlisten] dropping unused ident: collection-34-2812787096723087782 2015-08-13T16:59:47.083-0400 I STORAGE [initandlisten] dropping unused ident: collection-36-2812787096723087782 2015-08-13T16:59:47.084-0400 I STORAGE [initandlisten] dropping unused ident: collection-38-2812787096723087782 2015-08-13T16:59:47.085-0400 I STORAGE [initandlisten] dropping unused ident: collection-40-2812787096723087782 2015-08-13T16:59:47.086-0400 I STORAGE [initandlisten] dropping unused ident: collection-55-2812787096723087782 2015-08-13T16:59:47.086-0400 I STORAGE [initandlisten] dropping unused ident: index-173-2812787096723087782 2015-08-13T16:59:47.087-0400 I STORAGE [initandlisten] dropping unused ident: index-174-2812787096723087782 2015-08-13T16:59:47.088-0400 I STORAGE [initandlisten] dropping unused ident: index-175-2812787096723087782 2015-08-13T16:59:47.089-0400 I STORAGE [initandlisten] dropping unused ident: index-176-2812787096723087782 2015-08-13T16:59:47.090-0400 I STORAGE [initandlisten] dropping unused ident: index-177-2812787096723087782 2015-08-13T16:59:47.091-0400 I STORAGE [initandlisten] dropping unused ident: index-178-2812787096723087782 2015-08-13T16:59:47.092-0400 I STORAGE [initandlisten] dropping unused ident: index-179-2812787096723087782 2015-08-13T16:59:47.093-0400 I STORAGE [initandlisten] dropping unused ident: index-180-2812787096723087782 2015-08-13T16:59:47.094-0400 I STORAGE [initandlisten] dropping unused ident: index-181-2812787096723087782 2015-08-13T16:59:47.094-0400 I STORAGE [initandlisten] dropping unused ident: index-182-2812787096723087782 2015-08-13T16:59:47.095-0400 I STORAGE [initandlisten] dropping unused ident: index-183-2812787096723087782 2015-08-13T16:59:47.096-0400 I STORAGE [initandlisten] dropping unused ident: index-184-2812787096723087782 2015-08-13T16:59:47.097-0400 I STORAGE [initandlisten] dropping unused ident: index-189-2812787096723087782 2015-08-13T16:59:47.098-0400 I STORAGE [initandlisten] dropping unused ident: index-190-2812787096723087782 2015-08-13T16:59:47.099-0400 I STORAGE [initandlisten] dropping unused ident: index-191-2812787096723087782 2015-08-13T16:59:47.100-0400 I STORAGE [initandlisten] dropping unused ident: index-192-2812787096723087782 2015-08-13T16:59:47.101-0400 I STORAGE [initandlisten] dropping unused ident: index-213-2812787096723087782 2015-08-13T16:59:47.102-0400 I STORAGE [initandlisten] dropping unused ident: index-214-2812787096723087782 2015-08-13T16:59:47.103-0400 I STORAGE [initandlisten] dropping unused ident: index-215-2812787096723087782 2015-08-13T16:59:47.104-0400 I STORAGE [initandlisten] dropping unused ident: index-216-2812787096723087782 2015-08-13T16:59:47.105-0400 I STORAGE [initandlisten] dropping unused ident: index-221-2812787096723087782 2015-08-13T16:59:47.106-0400 I STORAGE [initandlisten] dropping unused ident: index-222-2812787096723087782 2015-08-13T16:59:47.107-0400 I STORAGE [initandlisten] dropping unused ident: index-223-2812787096723087782 2015-08-13T16:59:47.108-0400 I STORAGE [initandlisten] dropping unused ident: index-224-2812787096723087782 2015-08-13T16:59:47.108-0400 I STORAGE [initandlisten] dropping unused ident: index-233-2812787096723087782 2015-08-13T16:59:47.109-0400 I STORAGE [initandlisten] dropping unused ident: index-234-2812787096723087782 2015-08-13T16:59:47.110-0400 I STORAGE [initandlisten] dropping unused ident: index-235-2812787096723087782 2015-08-13T16:59:47.111-0400 I STORAGE [initandlisten] dropping unused ident: index-236-2812787096723087782 2015-08-13T16:59:47.112-0400 I STORAGE [initandlisten] dropping unused ident: index-241-2812787096723087782 2015-08-13T16:59:47.113-0400 I STORAGE [initandlisten] dropping unused ident: index-242-2812787096723087782 2015-08-13T16:59:47.114-0400 I STORAGE [initandlisten] dropping unused ident: index-243-2812787096723087782 2015-08-13T16:59:47.115-0400 I STORAGE [initandlisten] dropping unused ident: index-244-2812787096723087782 2015-08-13T16:59:47.116-0400 I STORAGE [initandlisten] dropping unused ident: index-245-2812787096723087782 2015-08-13T16:59:47.117-0400 I STORAGE [initandlisten] dropping unused ident: index-246-2812787096723087782 2015-08-13T16:59:47.118-0400 I STORAGE [initandlisten] dropping unused ident: index-247-2812787096723087782 2015-08-13T16:59:47.119-0400 I STORAGE [initandlisten] dropping unused ident: index-248-2812787096723087782 2015-08-13T16:59:47.120-0400 I STORAGE [initandlisten] dropping unused ident: index-253-2812787096723087782 2015-08-13T16:59:47.121-0400 I STORAGE [initandlisten] dropping unused ident: index-254-2812787096723087782 2015-08-13T16:59:47.122-0400 I STORAGE [initandlisten] dropping unused ident: index-255-2812787096723087782 2015-08-13T16:59:47.123-0400 I STORAGE [initandlisten] dropping unused ident: index-256-2812787096723087782 2015-08-13T16:59:47.124-0400 I STORAGE [initandlisten] dropping unused ident: index-257-2812787096723087782 2015-08-13T16:59:47.124-0400 I STORAGE [initandlisten] dropping unused ident: index-258-2812787096723087782 2015-08-13T16:59:47.125-0400 I STORAGE [initandlisten] dropping unused ident: index-259-2812787096723087782 2015-08-13T16:59:47.126-0400 I STORAGE [initandlisten] dropping unused ident: index-260-2812787096723087782 2015-08-13T16:59:47.128-0400 I STORAGE [initandlisten] dropping unused ident: index-261-2812787096723087782 2015-08-13T16:59:47.128-0400 I STORAGE [initandlisten] dropping unused ident: index-262-2812787096723087782 2015-08-13T16:59:47.129-0400 I STORAGE [initandlisten] dropping unused ident: index-263-2812787096723087782 2015-08-13T16:59:47.130-0400 I STORAGE [initandlisten] dropping unused ident: index-264-2812787096723087782 2015-08-13T16:59:47.131-0400 I STORAGE [initandlisten] dropping unused ident: index-265-2812787096723087782 2015-08-13T16:59:47.132-0400 I STORAGE [initandlisten] dropping unused ident: index-266-2812787096723087782 2015-08-13T16:59:47.133-0400 I STORAGE [initandlisten] dropping unused ident: index-267-2812787096723087782 2015-08-13T16:59:47.134-0400 I STORAGE [initandlisten] dropping unused ident: index-268-2812787096723087782 2015-08-13T16:59:47.135-0400 I STORAGE [initandlisten] dropping unused ident: index-269-2812787096723087782 2015-08-13T16:59:47.136-0400 I STORAGE [initandlisten] dropping unused ident: index-270-2812787096723087782 2015-08-13T16:59:47.137-0400 I STORAGE [initandlisten] dropping unused ident: index-271-2812787096723087782 2015-08-13T16:59:47.138-0400 I STORAGE [initandlisten] dropping unused ident: index-272-2812787096723087782 2015-08-13T16:59:47.138-0400 I STORAGE [initandlisten] dropping unused ident: index-54-2812787096723087782 2015-08-13T16:59:47.139-0400 I STORAGE [initandlisten] dropping unused ident: index-67-2812787096723087782 2015-08-13T16:59:47.140-0400 I STORAGE [initandlisten] dropping unused ident: index-79-2812787096723087782 2015-08-13T16:59:47.142-0400 I CONTROL [initandlisten] MongoDB starting : pid=2591 port=37017 dbpath=/home/jonathan/pt-317-wt-100m/data/wiredTiger 64-bit host=capja2 2015-08-13T16:59:47.142-0400 I CONTROL [initandlisten] 2015-08-13T16:59:47.142-0400 I CONTROL [initandlisten] ** NOTE: This is a development version (3.1.7-pre-) of MongoDB. 2015-08-13T16:59:47.142-0400 I CONTROL [initandlisten] ** Not recommended for production. 2015-08-13T16:59:47.142-0400 I CONTROL [initandlisten] 2015-08-13T16:59:47.143-0400 I CONTROL [initandlisten] 2015-08-13T16:59:47.143-0400 I CONTROL [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'. 2015-08-13T16:59:47.143-0400 I CONTROL [initandlisten] ** We suggest setting it to 'never' 2015-08-13T16:59:47.143-0400 I CONTROL [initandlisten] 2015-08-13T16:59:47.143-0400 I CONTROL [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/defrag is 'always'. 2015-08-13T16:59:47.143-0400 I CONTROL [initandlisten] ** We suggest setting it to 'never' 2015-08-13T16:59:47.143-0400 I CONTROL [initandlisten] 2015-08-13T16:59:47.143-0400 I CONTROL [initandlisten] db version v3.1.7-pre- 2015-08-13T16:59:47.143-0400 I CONTROL [initandlisten] git version: a067d03eca22d67a1ca036ea61544cd51df1490b 2015-08-13T16:59:47.143-0400 I CONTROL [initandlisten] allocator: tcmalloc 2015-08-13T16:59:47.143-0400 I CONTROL [initandlisten] options: { net: { port: 37017 }, processManagement: { fork: true }, setParameter: { enableTestCommands: "1" }, storage: { dbPath: "pt-317-wt-100m/data/wiredTiger", engine: "wiredTiger" }, systemLog: { destination: "file", logAppend: true, path: "pt-317-wt-100m/logs/mongod-wiredTiger-recovery.log" } } 2015-08-13T16:59:47.146-0400 I - [initandlisten] Fatal assertion 28579 NoSuchKey Unable to find metadata for table:index-21-2812787096723087782 2015-08-13T16:59:47.146-0400 I - [initandlisten] ***aborting after fassert() failure alexander.gorrod commented on Thu, 23 Jul 2015 05:28:55 +0000: I've been running this test for 24 hours without a failure, when it would fail reliably within 5 minutes before. The fixes in WT-2008 and WT-2009 have fixed the issues I can reproduce here. alexander.gorrod commented on Wed, 15 Jul 2015 07:16:00 +0000: I've found two bugs WT-2008 and WT-2009 that are causing the failures reported here. I have a fix for WT-2008 awaiting review, WT-2009 needs further discussion before implementing a fix. alexander.gorrod commented on Tue, 14 Jul 2015 07:28:26 +0000: I've reproduced the failure - I'm actively chasing it. The failure I see is: 2015-07-14T05:54:38.664+0000 I STORAGE [initandlisten] wiredtiger_open config: create,cache_size=1G,session_max=20000,eviction=(threads_max=4),statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0), 2015-07-14T05:54:40.216+0000 E STORAGE [initandlisten] WiredTiger (2) [1436853280:216378][1604:0x7f984eb1cbc0], file:collection-183--5187426057094844684.wt: /home/vagrant/pt-latest/data/wiredTiger/collection-183--5187426057094844684.wt: No such file or directory 2015-07-14T05:54:40.216+0000 E STORAGE [initandlisten] WiredTiger (2) [1436853280:216510][1604:0x7f984eb1cbc0], file:collection-183--5187426057094844684.wt: Operation failed during recovery: No such file or directory A printlog on the failed database reveals: { "lsn" : [24,33857152], "hdr_flags" : "", "rec_len" : 256, "mem_len" : 256, "type" : "commit", "txnid" : 115824, "ops": [ { "optype": "row_put", "fileid": "0", "key": "table:collection-183--5187426057094844684\u0000", "value": "app_metadata=(formatVersion=1),colgroups=,collator=,columns=,key_format=q,value_format=u\u0000" } ] }, Which indicates that the file has been written to the WiredTiger metadata. The file definitely doesn't exist on disk: $ ls wiredTiger-beforerecovery/collection* wiredTiger-beforerecovery/collection-0--6622855524004531093.wt wiredTiger-beforerecovery/collection-193--5187426057094844684.wt wiredTiger-beforerecovery/collection-1--5187426057094844684.wt wiredTiger-beforerecovery/collection-2--6622855524004531093.wt wiredTiger-beforerecovery/collection-1-5882861703348662567.wt wiredTiger-beforerecovery/collection-3--5187426057094844684.wt wiredTiger-beforerecovery/collection-179--5187426057094844684.wt wiredTiger-beforerecovery/collection-3-5882861703348662567.wt wiredTiger-beforerecovery/collection-181--5187426057094844684.wt wiredTiger-beforerecovery/collection-4--6622855524004531093.wt wiredTiger-beforerecovery/collection-185--5187426057094844684.wt wiredTiger-beforerecovery/collection-6--6622855524004531093.wt wiredTiger-beforerecovery/collection-189--5187426057094844684.wt I'll do some more digging into create code paths tomorrow, to see what may be causing the failure. alexander.gorrod commented on Tue, 14 Jul 2015 03:23:34 +0000: Thanks for reporting this issue jonathan.abrahams. Can you reproduce the failure reliably? How long do you need to run the power-cycle script before seeing the failure? I've been running the power cycle script with the configuration options you specified for over 2 hours now without failure. jonathan.abrahams commented on Tue, 7 Jul 2015 14:19:53 +0000: I also ran the same scenario with a new option for the CRUD client, using writeConcern of {j: true}. This produced the same result. jonathan.abrahams commented on Thu, 2 Jul 2015 19:57:24 +0000: Files were split, cat them back together: cat wiredTiger-afterrecovery.tar* > wiredTiger-afterrecovery.tar
Using the powercycle.sh script with the following arguments: ./powertest.sh -d https://fastdl.mongodb.org/linux/mongodb-linux-x86_64-3.1.5.tgz -r pt-315 -F ~/mongodb/mongo -s 10.4.1.54 -C mpower -O output1 -S "-i /Users/jonathan/.ssh/mFi.pem admin@10.4.117.131" -R -N 10 -l 50 Note - If you need to run this on an AWS instance you would use it as follows. The -F option is the root of the mongo source on your client. ./powertest.sh -d https://fastdl.mongodb.org/linux/mongodb-linux-x86_64-3.1.5.tgz -r pt-315 -F ~/mongodb/mongo -s "-i -M -R -N 10 -l 50
Click on a version to see all relevant bugs
MongoDB Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.