...
After upgrading yesterday from MongoDB 2.4.x to MongoDB 2.6.x we hit the exact same error. In our case on an update() command. collection.update({'_id': {'$in': remote_ids}}, {"$set": {"_oc": True}}, multi=True, w=1, j=True) When the _oc flag is set entries are eventually removed. Although the update operation ended in an exception it seems that the flag was correctly set. Since the documents where removed and the error was not reported again for the document. It's unclear if the document in question was updated correctly or was reprocessed without an error later for a second time. Can MongoDB confirm or deny that this could happen ? Running .validate(true) on the collections returned no errors. Is there any more information about this issue from the MongoDB site ? How could this corruption occur ? Could this be related to upgrading from 2.4 to 2.6 ? We have more MongoDB instances that we would like to upgrade but these kind of corruption issues make us very scared to do so.
ramon.fernandez commented on Thu, 25 Sep 2014 19:52:32 +0000: Hi nifan, without more detailed logs and/or a copy of the data itself it's very difficult to understand why you're seeing malformed index keys. Also, when the error happens it's already too late to extract useful information in the logs, as additional logging would only tell us what we already know (there's a malformed index entry) but not when and why it happened. I'm going to mark this ticket as resolved for now. If you want to try to reproduce the problem on non-production setup and succeed then please re-open and provide additional information. Regards, Ramón. nifan commented on Wed, 17 Sep 2014 16:05:18 +0000: It's a shame that this is not easily reproducible... It's been almost two weeks now without this error occurring on any of our MongoDB servers. I tried turning the logLevel up but it hurts to much on our production nodes specially for something that happens so infrequent Would you consider adding logLevels specifically for errors ? so that we might configure MongoDB to log more information only when this error happens ? dan@10gen.com commented on Sat, 6 Sep 2014 21:02:28 +0000: Increasing the logLevel will give some context around the exact operation that is happening, but won't provide much detail at the btree storage level. This would still be useful because it could help identify the problem index entry. Would it be possible to turn the logLevel up to 1? This will log every operation. You can set it on an already running server with a command or use at startup either on the command line or using a config file. nifan commented on Sat, 6 Sep 2014 16:54:40 +0000: @Ramon, the mongodb instances running into this only have collections that function as a queue. There are 6 collections, one is a capped collection of 10GB. Applications running on the same node are inserting documents into these collections and a remote application reads the collections (queues) and removes the documents. The collections only have one index, the default one on _id. The values in the _id field is not ObjectId but BinData, the bytes itself being similar to ObjectId. After the remote application receives the error it will stop processing the current collection and skip ahead to the next one. It will loop around and eventually process the same collection again. At that moment everything works fine and the error does not happen again. The errors happen infrequent and there no documents 'left behind'. So I could send the collection, but I would think that that won't help to find out that's going wrong... since the documents are so volatile. Recap; So the error (logically) must have happened in the default '_id' index. It has happened on find() and update() operations. When the operation is retried some seconds later it does not error out again. Could MongoDB not include a verbosity level for errors so that we can add log detail about the error when it occurs, specially for these kinds of hard to reproduce bugs... ramon.fernandez commented on Sat, 6 Sep 2014 14:35:39 +0000: nifan, would you be able to send us a dump of one of the affected collections so we can investigate further? Please let me know and I'll send upload instructions. The upgrade should not have affected your indexes, but it is possible that there bugs in 2.4 that are being exposed now by the changes in 2.6. Have you by any chance tried removing the indexes for one of the affected collections and re-building them? If so, does the assertion still happens? Regards, Ramón. nifan commented on Sat, 6 Sep 2014 12:31:50 +0000: On about 15 MongoDB instances (all upgraded from 2.4) we see this error fairly often on both find() and update() operations in different databases and collections. It seem to happen only on collections that have a high insert and remove rate. On capped collections with basically the same usage pattern we have not seen this error. Is there any update on why this could be happening ? And could it be related with the upgrade from 2.4 ? The error on key.cpp:433 seems to indicate that the key in the index is of an unknown type. But regretfully it does not log more information about that and since in production the error does not repeat itself it's hard to reproduce. (Or get a test data set to reliably create this error on) nifan commented on Thu, 28 Aug 2014 16:37:52 +0000: For completeness sake here is the exception I got: 2014-08-28T09:46:27.952+0200 [conn5331] collector_wn.wn Assertion failure false src/mongo/db/structure/btree/key.cpp 433 2014-08-28T09:46:28.061+0200 [conn5331] collector_wn.wn 0x11e6111 0x1187e49 0x116c15e 0xf1ef0b 0xb72e3a 0xb6e88d 0xaa7164 0xc4ad4b 0xc563f6 0xa0b861 0xa0bff0 0xa0e19f 0xa0ecad 0xa1288e 0xa2889a 0xa29ce2 0xa2bea6 0xd5dd6d 0xb9fe62 0xba1440 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x11e6111] /usr/bin/mongod(_ZN5mongo10logContextEPKc+0x159) [0x1187e49] /usr/bin/mongod(_ZN5mongo12verifyFailedEPKcS1_j+0x17e) [0x116c15e] /usr/bin/mongod(_ZNK5mongo5KeyV16toBsonEv+0x19b) [0xf1ef0b] /usr/bin/mongod(_ZNK5mongo18BtreeInterfaceImplINS_12BtreeData_V1EE5keyAtEPKNS_17IndexCatalogEntryENS_7DiskLocEi+0x9a) [0xb72e3a] /usr/bin/mongod(_ZNK5mongo16BtreeIndexCursor6getKeyEv+0x3d) [0xb6e88d] /usr/bin/mongod(_ZN5mongo9IndexScan14prepareToYieldEv+0x64) [0xaa7164] /usr/bin/mongod(_ZN5mongo6updateERKNS_13UpdateRequestEPNS_7OpDebugEPNS_12UpdateDriverEPNS_14CanonicalQueryE+0xe1b) [0xc4ad4b] /usr/bin/mongod(_ZN5mongo14UpdateExecutor7executeEv+0x66) [0xc563f6] /usr/bin/mongod() [0xa0b861] /usr/bin/mongod(_ZN5mongo18WriteBatchExecutor10execUpdateERKNS_12BatchItemRefEPNS_7BSONObjEPPNS_16WriteErrorDetailE+0x80) [0xa0bff0] /usr/bin/mongod(_ZN5mongo18WriteBatchExecutor11bulkExecuteERKNS_21BatchedCommandRequestEPSt6vectorIPNS_19BatchedUpsertDetailESaIS6_EEPS4_IPNS_16WriteErrorDetailESaISB_EE+0x1df) [0xa0e19f] /usr/bin/mongod(_ZN5mongo18WriteBatchExecutor12executeBatchERKNS_21BatchedCommandRequestEPNS_22BatchedCommandResponseE+0x8cd) [0xa0ecad] /usr/bin/mongod(_ZN5mongo8WriteCmd3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x3de) [0xa1288e] /usr/bin/mongod(_ZN5mongo12_execCommandEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x3a) [0xa2889a] /usr/bin/mongod(_ZN5mongo7Command11execCommandEPS0_RNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0x1042) [0xa29ce2] /usr/bin/mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x6c6) [0xa2bea6] /usr/bin/mongod(_ZN5mongo11newRunQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_+0x22ed) [0xd5dd6d] /usr/bin/mongod() [0xb9fe62] /usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x580) [0xba1440] nifan commented on Thu, 28 Aug 2014 16:34:39 +0000: @Ramon, Thanks ! I will try that in the future. (Regretfully imho Jira is much harder to use then Github, Trac, etc which adds a barrier to the community. I was happy to find the clone function to not have to go through the new-ticket process, since re-opening that previous ticket was also not possible...) Back to more important issues Would it be possible to find out (from the developers of MongoDB?) if when this exception happens the running update command succeeds or not ? In the previous ticket you mention that it's most like "transient index corruption,". Would that have as effect that the update command itself will complete saving all changes on disk, but that this document would not be in the index anymore ? ramon.fernandez commented on Thu, 28 Aug 2014 16:24:37 +0000: nifan, thanks for your report. I've moved your comment to the "Description" field for the sake of clarity. Given the limitations with cloning, in the future please open new tickets so you can fill up the "Description" and "Steps to reproduce" fields as needed. nifan commented on Thu, 28 Aug 2014 10:23:50 +0000: Weirdly enough after pressing "clone" on the https://jira.mongodb.org/browse/SERVER-14529 issue (since I cannot reopen that one) one cannot change the ticket with new/updated information. So instead I will put everything in the comments.