BugZero | MongoDB BugID 424729 - Investigate background index behavior with causal ...

MongoDB - Defect ID: 424729

Investigate background index behavior with causal consistency and secondary read preference

MongoDB - Defect ID: 424729

Investigate background index behavior with causal consistency and secondary read preference

Last updated on October 27th, 2023

BugZero Risk Score
6.0 Medium

Overall: 6.0

Severity: 6.4

Community: 6.4

Lifecycle: 9.1

What is the BugZero Risk Score?

MongoDB Integration

Learn more about where this data comes from

MongoDB Integration

Learn more

Bug Scrub Advisor

Streamline upgrades with automated vendor bug scrubs

Bug Scrub Advisor

Learn more

BugZero Enterprise

Wish you caught this bug sooner? Get proactive today.

BugZero Enterprise

Learn more

Bug Details

Priority: Major - P3
Status: Closed

Description

Info

Possible bug with secondary read preference and causal consistency on. The reindex_background.js workload was failing with an "IndexNotFound" error somewhat frequently in the concurrency_sharded_causal_consistency(_and_balancer) suites once I turned secondary read preference on. Example stack trace: [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 2017-08-31T22:33:38.519+0000 E QUERY [thread1] Error: 3 threads threw [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 Foreground jstests/concurrency/fsm_workloads/reindex_background.js [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 Error: error: { [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 "ok" : 0, [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 "errmsg" : "text index required for $text query", [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 "code" : 27, [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 "codeName" : "IndexNotFound", [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 "$clusterTime" : { [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 "clusterTime" : Timestamp(1504218621, 2), [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 "signature" : { [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 "hash" : BinData(0,"zON3is6ERc9naFc6aUw6RnuPsrU="), [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 "keyId" : NumberLong("6460566295715774465") [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 } [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 }, [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 "operationTime" : Timestamp(1504218621, 2) [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 } [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 _getErrorWithCode@src/mongo/shell/utils.js:25:13 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 DBCommandCursor@src/mongo/shell/query.js:702:1 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 DBQuery.prototype._exec@src/mongo/shell/query.js:117:28 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 DBQuery.prototype.hasNext@src/mongo/shell/query.js:288:5 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 DBQuery.prototype.itcount@src/mongo/shell/query.js:415:12 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.520+0000 query/<@jstests/concurrency/fsm_workloads/reindex.js:76:25 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.520+0000 assertWithLevel/assertWithLevel@jstests/concurrency/fsm_libs/assert.js:75:1 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.520+0000 query@jstests/concurrency/fsm_workloads/reindex.js:69:13 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.520+0000 runFSM@jstests/concurrency/fsm_libs/fsm.js:37:13 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.520+0000 @ line 6 > eval:10:9 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.520+0000 main@jstests/concurrency/fsm_libs/worker_thread.js:135:17 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.520+0000 @ line 6 > eval:7:1 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.520+0000 @ line 6 > eval:5:24 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.520+0000 _threadStartWrapper@:24:16 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.520+0000 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.520+0000 : [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.520+0000 throwError@jstests/concurrency/fsm_libs/runner.js:344:23 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.520+0000 runWorkloads@jstests/concurrency/fsm_libs/runner.js:739:17 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.520+0000 serial@jstests/concurrency/fsm_libs/runner.js:752:1 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.520+0000 @jstests\concurrency\fsm_all_sharded_causal_consistency_and_balancer.js:105:1 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.520+0000 failed to load: jstests\concurrency\fsm_all_sharded_causal_consistency_and_balancer. evergreen link: https://evergreen.mongodb.com/task/mongodb_mongo_master_windows_64_2k8_debug_concurrency_sharded_causal_consistency_and_balancer_WT_patch_c351caa6815218c5b4a9801342ccbb1b050f6aea_59a87624e3c3314259001bc1_17_08_31_20_58_16

Top User Comments

spencer commented on Tue, 5 Dec 2017 20:18:58 +0000: Agreed with Will's summary. I don't think there's anything replication can do about this at the moment. Sounds like this likely depends on PM-253 (index builds with two phase commit), which is on the storage backlog. Sending back to Sharding. william.schultz commented on Tue, 5 Dec 2017 20:02:44 +0000: misha.tyulenev, to your comment, this sort of seems like a fundamental incompatibility between replicated indexes and causal consistency, at least as the system stands today. As far as I understand, the only operation about an index build that we actually log in the oplog is the createIndex op. If the index creation is synchronous i.e. background=false, then the client can explicitly wait until the build has finished, and then, presumably, use the operationTime of the index build operation to read after. If the build is asynchronous i.e. background=true, then the only information the client can receive about that index's lifecycle is the optime of the createIndex operation. The actual operations carried out internally to build the physical index are not explicitly "replicated", since they are not logged in the oplog. Additionally, there would be no way, from the oplog's perspective, to determine when a background index build "finished". So, it seems like there may be no existing way for a client to check for the completion of a background index build simply by utilizing the causal consistency machinery. The most they could do is a read that guarantees it happened after the start of the index build, since that seems to be the only operation time that actually exists in relation to the lifetime of a replicated index. Let me know if I am misunderstanding any of the fundamental concepts here. There doesn't really seem to be a fix for this specific issue unless the behavior of replicated index builds are significantly changed. benety.goh commented on Tue, 31 Oct 2017 20:00:26 +0000: Assigning to william.schultz to get a minimal reproduction of the issue. misha.tyulenev commented on Fri, 27 Oct 2017 19:01:26 +0000: max.hirschhorn until this issue is resolved all tests that involve background index building are not causally consistent. misha.tyulenev commented on Fri, 27 Oct 2017 18:59:00 +0000: The issue is caused by how ensureIndex( {background: true} ) command is replicated. 1. On the primary when the command finishes its logged to oplog 2. Then its replicated to the secondary it starts a background build Hence the afterClusterTime time in the find command will not be able to wait until index creating has finished Over to replication team to triage. Please file a DOCs ticket to document it if its not being fixed in 3.6. esha.maharishi@10gen.com commented on Fri, 20 Oct 2017 19:14:15 +0000: Could this be related to BF-6752, where there is a race between migrations and createIndexes that causes an index not to be created on a recipient shard? max.hirschhorn@10gen.com commented on Mon, 9 Oct 2017 01:34:57 +0000: misha.tyulenev, the failures in the reindex_background.js FSM workload are not caused by the issue described in SERVER-31456, as the reindex_background.js FSM workload has a no-op $config.setup() function and creates/drops indexes as part of its $config.states functions.

Steps to Reproduce

Change history

2025-03-21 Added: 3.6.0

Links

Relevant Products

Click on a version to see all relevant bugs

Affected versions:3.6.0

Fixed versions: No known fixed versions

Relevant Products

Click on a version to see all relevant bugs

Affected versions:3.6.0

Fixed versions: No known fixed versions

Top MongoDB Defects by Risk Score

5.9Defect ID: 2956672
Some time-series tests implicitly rely on measurement insertion order for unordered inserts when checking bucket catalog stats
6.14Defect ID: 2965528
Remove push, publish_packages, and crypt_push tasks from Graviton 4 variants in v7.0 and v8.0
6.14Defect ID: 2947969
[SBE] Release storage engine resources when saveState() or restoreState() throws
5.68Defect ID: 2919474
StackLocator broken by v5 toolchain ASAN
5.88Defect ID: 2968769
Make new write path helper functions use acquireAndValidateBucketsCollection instead of acquireCollection

MongoDB Integration

Learn more about where this data comes from

MongoDB Integration

Learn more

Bug Scrub Advisor

Streamline upgrades with automated vendor bug scrubs

Bug Scrub Advisor

Learn more

BugZero Enterprise

Wish you caught this bug sooner? Get proactive today.

BugZero Enterprise

Learn more

Ready to prevent the next vendor outage?

Get a demo

OPERATIONAL DEFECT DATABASE

MongoDB - Defect ID: 424729

Investigate background index behavior with causal consistency and secondary read preference

MongoDB - Defect ID: 424729

Investigate background index behavior with causal consistency and secondary read preference

Last updated on October 27th, 2023

BugZero Risk Score6.0 Medium

Bug Details

Info

Top User Comments

Steps to Reproduce

Links

Top MongoDB Defects by Risk Score

Ready to prevent the next vendor outage?

BugZero Risk Score
6.0 Medium