...
Possible bug with secondary read preference and causal consistency on. The reindex_background.js workload was failing with an "IndexNotFound" error somewhat frequently in the concurrency_sharded_causal_consistency(_and_balancer) suites once I turned secondary read preference on. Example stack trace: [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 2017-08-31T22:33:38.519+0000 E QUERY [thread1] Error: 3 threads threw [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 Foreground jstests/concurrency/fsm_workloads/reindex_background.js [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 Error: error: { [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 "ok" : 0, [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 "errmsg" : "text index required for $text query", [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 "code" : 27, [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 "codeName" : "IndexNotFound", [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 "$clusterTime" : { [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 "clusterTime" : Timestamp(1504218621, 2), [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 "signature" : { [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 "hash" : BinData(0,"zON3is6ERc9naFc6aUw6RnuPsrU="), [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 "keyId" : NumberLong("6460566295715774465") [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 } [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 }, [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 "operationTime" : Timestamp(1504218621, 2) [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 } [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 _getErrorWithCode@src/mongo/shell/utils.js:25:13 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 DBCommandCursor@src/mongo/shell/query.js:702:1 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 DBQuery.prototype._exec@src/mongo/shell/query.js:117:28 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 DBQuery.prototype.hasNext@src/mongo/shell/query.js:288:5 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.519+0000 DBQuery.prototype.itcount@src/mongo/shell/query.js:415:12 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.520+0000 query/<@jstests/concurrency/fsm_workloads/reindex.js:76:25 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.520+0000 assertWithLevel/assertWithLevel@jstests/concurrency/fsm_libs/assert.js:75:1 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.520+0000 query@jstests/concurrency/fsm_workloads/reindex.js:69:13 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.520+0000 runFSM@jstests/concurrency/fsm_libs/fsm.js:37:13 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.520+0000 @ line 6 > eval:10:9 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.520+0000 main@jstests/concurrency/fsm_libs/worker_thread.js:135:17 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.520+0000 @ line 6 > eval:7:1 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.520+0000 @ line 6 > eval:5:24 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.520+0000 _threadStartWrapper@:24:16 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.520+0000 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.520+0000 : [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.520+0000 throwError@jstests/concurrency/fsm_libs/runner.js:344:23 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.520+0000 runWorkloads@jstests/concurrency/fsm_libs/runner.js:739:17 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.520+0000 serial@jstests/concurrency/fsm_libs/runner.js:752:1 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.520+0000 @jstests\concurrency\fsm_all_sharded_causal_consistency_and_balancer.js:105:1 [js_test:fsm_all_sharded_causal_consistency_and_balancer] 2017-08-31T22:33:38.520+0000 failed to load: jstests\concurrency\fsm_all_sharded_causal_consistency_and_balancer. evergreen link: https://evergreen.mongodb.com/task/mongodb_mongo_master_windows_64_2k8_debug_concurrency_sharded_causal_consistency_and_balancer_WT_patch_c351caa6815218c5b4a9801342ccbb1b050f6aea_59a87624e3c3314259001bc1_17_08_31_20_58_16
spencer commented on Tue, 5 Dec 2017 20:18:58 +0000: Agreed with Will's summary. I don't think there's anything replication can do about this at the moment. Sounds like this likely depends on PM-253 (index builds with two phase commit), which is on the storage backlog. Sending back to Sharding. william.schultz commented on Tue, 5 Dec 2017 20:02:44 +0000: misha.tyulenev, to your comment, this sort of seems like a fundamental incompatibility between replicated indexes and causal consistency, at least as the system stands today. As far as I understand, the only operation about an index build that we actually log in the oplog is the createIndex op. If the index creation is synchronous i.e. background=false, then the client can explicitly wait until the build has finished, and then, presumably, use the operationTime of the index build operation to read after. If the build is asynchronous i.e. background=true, then the only information the client can receive about that index's lifecycle is the optime of the createIndex operation. The actual operations carried out internally to build the physical index are not explicitly "replicated", since they are not logged in the oplog. Additionally, there would be no way, from the oplog's perspective, to determine when a background index build "finished". So, it seems like there may be no existing way for a client to check for the completion of a background index build simply by utilizing the causal consistency machinery. The most they could do is a read that guarantees it happened after the start of the index build, since that seems to be the only operation time that actually exists in relation to the lifetime of a replicated index. Let me know if I am misunderstanding any of the fundamental concepts here. There doesn't really seem to be a fix for this specific issue unless the behavior of replicated index builds are significantly changed. benety.goh commented on Tue, 31 Oct 2017 20:00:26 +0000: Assigning to william.schultz to get a minimal reproduction of the issue. misha.tyulenev commented on Fri, 27 Oct 2017 19:01:26 +0000: max.hirschhorn until this issue is resolved all tests that involve background index building are not causally consistent. misha.tyulenev commented on Fri, 27 Oct 2017 18:59:00 +0000: The issue is caused by how ensureIndex( {background: true} ) command is replicated. 1. On the primary when the command finishes its logged to oplog 2. Then its replicated to the secondary it starts a background build Hence the afterClusterTime time in the find command will not be able to wait until index creating has finished Over to replication team to triage. Please file a DOCs ticket to document it if its not being fixed in 3.6. esha.maharishi@10gen.com commented on Fri, 20 Oct 2017 19:14:15 +0000: Could this be related to BF-6752, where there is a race between migrations and createIndexes that causes an index not to be created on a recipient shard? max.hirschhorn@10gen.com commented on Mon, 9 Oct 2017 01:34:57 +0000: misha.tyulenev, the failures in the reindex_background.js FSM workload are not caused by the issue described in SERVER-31456, as the reindex_background.js FSM workload has a no-op $config.setup() function and creates/drops indexes as part of its $config.states functions.
Click on a version to see all relevant bugs
MongoDB Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.