...
One of the replicas got fatal assertion 16360 during replication: Thu Nov 14 07:44:15.896 [repl writer worker 1] production.messages Assertion failure type == cbindata src/mongo/db/key.cpp 585 0xde05e1 0xda15bd 0xa2a7fc 0x7fa6ed 0x7fd056 0x7fdefd 0x7fc8c3 0x7fc9ab 0x7fc9ab 0x7fc9ab 0x7fc9ab 0x8009ea 0x9d75bb 0x9e03fa 0xac587d 0xac6b7f 0xa9168a 0xa93757 0xa6d7a0 0xc29713 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xde05e1] /usr/bin/mongod(_ZN5mongo12verifyFailedEPKcS1_j+0xfd) [0xda15bd] /usr/bin/mongod(_ZNK5mongo5KeyV18dataSizeEv+0x12c) [0xa2a7fc] /usr/bin/mongod(_ZN5mongo12BucketBasicsINS_12BtreeData_V1EE16_packReadyForModERKNS_8OrderingERi+0xdd) [0x7fa6ed] /usr/bin/mongod(_ZN5mongo11BtreeBucketINS_12BtreeData_V1EE5splitENS_7DiskLocEiS3_RKNS_5KeyV1ERKNS_8OrderingES3_S3_RNS_12IndexDetailsE+0x5a6) [0x7fd056] /usr/bin/mongod(_ZNK5mongo11BtreeBucketINS_12BtreeData_V1EE10insertHereENS_7DiskLocEiS3_RKNS_5KeyV1ERKNS_8OrderingES3_S3_RNS_12IndexDetailsE+0xabd) [0x7fdefd] /usr/bin/mongod(_ZNK5mongo11BtreeBucketINS_12BtreeData_V1EE7_insertENS_7DiskLocES3_RKNS_5KeyV1ERKNS_8OrderingEbS3_S3_RNS_12IndexDetailsE+0x363) [0x7fc8c3] /usr/bin/mongod(_ZNK5mongo11BtreeBucketINS_12BtreeData_V1EE7_insertENS_7DiskLocES3_RKNS_5KeyV1ERKNS_8OrderingEbS3_S3_RNS_12IndexDetailsE+0x44b) [0x7fc9ab] /usr/bin/mongod(_ZNK5mongo11BtreeBucketINS_12BtreeData_V1EE7_insertENS_7DiskLocES3_RKNS_5KeyV1ERKNS_8OrderingEbS3_S3_RNS_12IndexDetailsE+0x44b) [0x7fc9ab] /usr/bin/mongod(_ZNK5mongo11BtreeBucketINS_12BtreeData_V1EE7_insertENS_7DiskLocES3_RKNS_5KeyV1ERKNS_8OrderingEbS3_S3_RNS_12IndexDetailsE+0x44b) [0x7fc9ab] /usr/bin/mongod(_ZNK5mongo11BtreeBucketINS_12BtreeData_V1EE7_insertENS_7DiskLocES3_RKNS_5KeyV1ERKNS_8OrderingEbS3_S3_RNS_12IndexDetailsE+0x44b) [0x7fc9ab] /usr/bin/mongod(_ZNK5mongo11BtreeBucketINS_12BtreeData_V1EE9bt_insertENS_7DiskLocES3_RKNS_7BSONObjERKNS_8OrderingEbRNS_12IndexDetailsEb+0x11a) [0x8009ea] /usr/bin/mongod(_ZNK5mongo18IndexInterfaceImplINS_12BtreeData_V1EE9bt_insertENS_7DiskLocES3_RKNS_7BSONObjERKNS_8OrderingEbRNS_12IndexDetailsEb+0x8b) [0x9d75bb] /usr/bin/mongod(_ZN5mongo24indexRecordUsingTwoStepsEPKcPNS_16NamespaceDetailsENS_7BSONObjENS_7DiskLocEb+0xbca) [0x9e03fa] /usr/bin/mongod(_ZN5mongo11DataFileMgr6insertEPKcPKvibbbPb+0x123d) [0xac587d] /usr/bin/mongod(_ZN5mongo11DataFileMgr16insertWithObjModEPKcRNS_7BSONObjEbb+0x4f) [0xac6b7f] /usr/bin/mongod(_ZN5mongo14_updateObjectsEbPKcRKNS_7BSONObjES4_bbbRNS_7OpDebugEPNS_11RemoveSaverEbRKNS_24QueryPlanSelectionPolicyEb+0x2eda) [0xa9168a] /usr/bin/mongod(_ZN5mongo27updateObjectsForReplicationEPKcRKNS_7BSONObjES4_bbbRNS_7OpDebugEbRKNS_24QueryPlanSelectionPolicyE+0xb7) [0xa93757] /usr/bin/mongod(_ZN5mongo21applyOperation_inlockERKNS_7BSONObjEbb+0x600) [0xa6d7a0] /usr/bin/mongod(_ZN5mongo7replset8SyncTail9syncApplyERKNS_7BSONObjEb+0x713) [0xc29713] Thu Nov 14 07:44:15.912 [repl writer worker 1] ERROR: writer worker caught exception: assertion src/mongo/db/key.cpp:585 on: { ts: Timestamp 1384415055000|7, h: 5648881096906150481, v: 2, op: "i", ns: "production.messages", o: { _id: ObjectId('52847f4f3032ca9060ab9c95'), ... } } Thu Nov 14 07:44:15.912 [repl writer worker 1] Fatal Assertion 16360 0xde05e1 0xda03d3 0xc28f3c 0xdadf21 0xe28e69 0x7f793cbb5e9a 0x7f793bec83fd /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xde05e1] /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xa3) [0xda03d3] /usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x12c) [0xc28f3c] /usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x281) [0xdadf21] /usr/bin/mongod() [0xe28e69] /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f793cbb5e9a] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f793bec83fd] Thu Nov 14 07:44:15.913 [repl writer worker 1] ***aborting after fassert() failure Thu Nov 14 07:44:15.913 Got signal: 6 (Aborted). Thu Nov 14 07:44:15.915 Backtrace: 0xde05e1 0x6d0559 0x7f793be0a4a0 0x7f793be0a425 0x7f793be0db8b 0xda040e 0xc28f3c 0xdadf21 0xe28e69 0x7f793cbb5e9a 0x7f793bec83fd /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xde05e1] /usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x6d0559] /lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7f793be0a4a0] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7f793be0a425] /lib/x86_64-linux-gnu/libc.so.6(abort+0x17b) [0x7f793be0db8b] /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xde) [0xda040e] /usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x12c) [0xc28f3c] /usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x281) [0xdadf21] /usr/bin/mongod() [0xe28e69] /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f793cbb5e9a] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f793bec83fd] After that the mongod no longer starts, just crashes to the same thing every time. The document contained an (indexed) array field where one entry was "schöne", so it included a non-ascii character, hard to say if it's relevant. The other replica handled it just fine. We've seen these replication errors every few months, previously with 2.2.x and now with 2.4.x. It takes a week to resync a replica, so is it safe to just insert that document (or empty one with the same ts and h) manually to the local's db.oplog.rs and it would then continue replicating normally?
devastor commented on Tue, 18 Mar 2014 13:56:54 +0000: Hi Stephen, Yes, sorry for not updating the ticket. We ended up replacing the disks after the issue kept repeating and after that we haven't seen it again. So it would seem like it did indeed have something to do with the disks although smart, swraid, fsck, etc. didn't find anything and there was nothing in logs/dmesg. Feel free to close the ticket. stennie commented on Tue, 18 Mar 2014 04:33:35 +0000: Hi Tuomas, Apologies for the delay in follow-up .. are you still seeing this issue, or were you able to resolve? Thanks, Stephen devastor commented on Mon, 18 Nov 2013 21:43:34 +0000: There are no disk errors in syslog/dmesg and smart doesn't show any errors. The disks are about 5 month old SSD disks. eliot commented on Mon, 18 Nov 2013 17:57:27 +0000: Can you check the system logs for disk errors? devastor commented on Fri, 15 Nov 2013 12:10:18 +0000: We entered that entry to the oplog and it then continued to successfully replicate again, but after a day the same mongod crashed again to another document (also that one contained ö and ü characters). So I guess the index is just somehow corrupted there.
unknown