BugZero | MongoDB BugID 110174 - Two background index builds in parallel with uniqu...

MongoDB - Defect ID: 110174

Two background index builds in parallel with unique key constraint violations crashes secondary

MongoDB - Defect ID: 110174

Two background index builds in parallel with unique key constraint violations crashes secondary

Last updated on July 11th, 2016

BugZero Risk Score
8.0 High

Overall: 8.0

Severity: 8.2

Community: 6.4

Lifecycle: 9.1

What is the BugZero Risk Score?

MongoDB Integration

Learn more about where this data comes from

MongoDB Integration

Learn more

Bug Scrub Advisor

Streamline upgrades with automated vendor bug scrubs

Bug Scrub Advisor

Learn more

BugZero Enterprise

Wish you caught this bug sooner? Get proactive today.

BugZero Enterprise

Learn more

Bug Details

Priority: Critical - P2
Status: Closed
Views: 1

Description

Info

Issue Status as of March 27, 2014 ISSUE SUMMARY If a background index build is already in progress, issuing the same index creation command will report success immediately. While this in itself is not a problem, if replication is active, the index creation command will be replicated causing possible problems on secondaries. This will become a problem if the index build ultimately has an error, due to a unique key constraint violation for example. Because of how replication works this can lead to secondary nodes shutting down and not recovering on a restart, since this is an unrecoverable data replication error which causes the server to to shutdown (Fatal Assertion 16360, exception during application of oplog operations on secondaries) for data integrity reasons. USER IMPACT Users are affected if they create the same background index multiple times concurrently, for example in environments with automated index creation. Due to the fact that the secondaries will shut down during an unrecoverable replication error (to protect data integrity and consistency), this can cause loss of majority and a primary step-down, rendering the replica set read-only. SOLUTION The fix is to only replicate index creation commands if they successfully build an index, and not in the case the index already exists. This will prevent secondaries from attempting an index build that could fail, and wasn't successful on the primary. WORKAROUNDS To avoid the issue, ensure that background indexes creation commands are only issued one at a time, for each index definition. AFFECTED VERSIONS All versions from 2.4.0 to 2.4.9 are affected. The 2.2 and 2.6 production release cycles are unaffected by this issue. PATCHES The fix is included in the production release version 2.4.10. Original Description If you attempt to build an index that is already in the process of being built (e.g. a background index), the attempt succeeds immediately and also is replicated immediately, even if the in-progress index build might eventually fail (due to, say, a unique constraint violation). This could result in SECONDARY nodes in a replica set attempting index builds that are destined to fail. Such SECONDARY nodes will shut down when the index build fails, since this is a data replication error that cannot be automatically resolved by the server. The code fix was to not replicate index builds in the case that such an index already exists. ---------------------------------------------------------------------------- Original description and example reproducer: On two-member replica set prepare collection like this: function prepare() { db.c.drop() for (var i=0; i<10000; i++) db.c.insert({x:i}) db.c.insert({x:0}) printjson(db.getLastError(2)) } Then start two threads in parallel each doing the following index creations: function index() { while (true) { db.c.ensureIndex({x:1}, {unique:true, background:true}) } } (I think the while loop isn't necessary as long as the two threads start at close enough to the same time.) Index build fails as expected on primary: Sun Feb 9 10:46:53.380 [conn7] background addExistingToIndex exception E11000 duplicate key error index: test.c.$x_1 dup key: { : null } Sun Feb 9 10:46:53.380 [conn7] end connection 127.0.0.1:52421 (0 connections now open) But secondary fasserts: Sun Feb 9 10:46:52.164 [repl writer worker 1] info: indexing in foreground on this replica; was a background index build on the primary Sun Feb 9 10:46:52.164 [repl writer worker 1] build index test.c { x: 1.0 } Sun Feb 9 10:46:52.182 [repl writer worker 1] ERROR: writer worker caught exception: E11000 duplicate key error index: test.c.$x_1 dup key: { : null } on: { ts: Timestamp 1391960812000|1778, h: 5159704\ 80723141670, v: 2, op: "i", ns: "test.system.indexes", o: { _id: ObjectId('52f7a2ec7ddfdca0480c6170'), ns: "test.c", key: { x: 1.0 }, name: "x_1", unique: true, background: true } } Sun Feb 9 10:46:52.183 [repl writer worker 1] Fatal Assertion 16360 0xde46e1 0xda44d3 0xc2ca7c 0xdb2021 0xe2cf69 0x30040079d1 0x30014e8b6d mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xde46e1] mongod(_ZN5mongo13fassertFailedEi+0xa3) [0xda44d3] mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x12c) [0xc2ca7c] mongod(_ZN5mongo10threadpool6Worker4loopEv+0x281) [0xdb2021] mongod() [0xe2cf69] /lib64/libpthread.so.0() [0x30040079d1] /lib64/libc.so.6(clone+0x6d) [0x30014e8b6d] Sun Feb 9 10:46:52.184 [repl writer worker 1] It appears that an index creation oplog entry gets made in this case in spite of the index creation failure.

Top User Comments

xgen-internal-githook commented on Fri, 14 Mar 2014 14:49:44 +0000: Author: {u'username': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'} Message: SERVER-12662: fix index name creation Branch: v2.4 https://github.com/mongodb/mongo/commit/317cb98a19baa2e9731c735ee223a746b60bb334 xgen-internal-githook commented on Fri, 14 Mar 2014 13:29:04 +0000: Author: {u'username': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'} Message: SERVER-12662: Two background index builds in parallel with unique key constraint violations crashes secondary Branch: v2.4 https://github.com/mongodb/mongo/commit/8dd9cf06d7df65baf16eac7725e837c8303eef07 amalia.hawkins@10gen.com commented on Thu, 13 Mar 2014 17:51:49 +0000: Attached is a js script which replicates the problem. xgen-internal-githook commented on Thu, 13 Feb 2014 14:11:40 +0000: Author: {u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'} Message: Revert "SERVER-12662 do not logOp inserts that have no associated diskloc (such as index-already-exists)" This reverts commit 2f3bd6fc8f3b50bf4802f4477c55660355cff9fe. The commit does not work for _id index builds and will require a more comprehensive solution. Branch: v2.4 https://github.com/mongodb/mongo/commit/a003add92d538a4de60d64fa382376177149a0e7 milkie commented on Thu, 13 Feb 2014 13:25:10 +0000: Adding an _id index explicitly (on a collection that didn't have one already) doesn't work. Will fix this. xgen-internal-githook commented on Wed, 12 Feb 2014 22:28:27 +0000: Author: {u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'} Message: SERVER-12662 do not logOp inserts that have no associated diskloc (such as index-already-exists) Branch: v2.4 https://github.com/mongodb/mongo/commit/2f3bd6fc8f3b50bf4802f4477c55660355cff9fe milkie commented on Wed, 12 Feb 2014 17:34:18 +0000: I diagnosed the issue and have a proposed fix for 2.4 branch. I think the problem should already not occur in the master branch; can you check? For now I am targeting this only at 2.4.10. The issue is that if the index already exists, we still log the index-create in the oplog. In 2.6.0rc0 this behavior is changed such that nothing is logged if the index already exists.

Steps to Reproduce

Relevant Products

Click on a version to see all relevant bugs

Affected versions:2.4.9

Fixed versions: 2.4.10

Relevant Products

Click on a version to see all relevant bugs

Affected versions:2.4.9

Fixed versions: 2.4.10

Top MongoDB Defects

7.0Defect ID: 3266121
Error message when adding a node to the shard cluster: "{"error":{"code":26,"codeName":"NamespaceNotFound","errmsg":"Unable to retrieve storageStats in $collStats stage :: caused by :: Collection [local.oplog.rs] not found."},"$db":"local"}}}"
5.5Defect ID: 3269537
Remove unnecessary balancer configuration refresh that may crash the process during addShard
5.5Defect ID: 3277022
CollectionCatalog can't replace a collection with a view in the same WriteUnitOfWork
5.4Defect ID: 3263496
Fix legacy timeseries namespace translation in findAndModify explain command
5.4Defect ID: 3287683
[v6.0] Ensure update fails on primary when doc_diff::computeDiff() produces empty sub-diffs

Ready to prevent the next vendor outage?

Get a demo

MongoDB - Defect ID: 110174

Two background index builds in parallel with unique key constraint violations crashes secondary

MongoDB - Defect ID: 110174

Two background index builds in parallel with unique key constraint violations crashes secondary

Last updated on July 11th, 2016

BugZero Risk Score8.0 High

Bug Details

Info

Top User Comments

Steps to Reproduce

Top MongoDB Defects

Ready to prevent the next vendor outage?

Links

BugZero Risk Score
8.0 High