
OPERATIONAL DEFECT DATABASE
...

...
We had an issue where a query on a collection containing a wildcard index seemingly randomly returned zero results, even though matching objects existed. The issue first arose one days after introducing wildcard indexes, so a causal relation seemed likely. The problem seems to arise if a first query successfully filters for a non-null value using the wildcard index... filter: { value1: 1.0, value2: 1.0, value3: 1.0, deleted: null } planSummary: IXSCAN { $_path: 1, value3: 1 } And then a second nearly identical query filters for null on the same field... filter: { value1: 1.0, value2: 1.0, value3: null, deleted: null } planSummary: IXSCAN { $_path: 1, value3: 1 } Seemingly, the query planner simply reuses the plan from the first query, not realizing that the wildcard index is sparse by definition.
xgen-internal-githook commented on Tue, 16 Jun 2020 22:19:59 +0000: Author: {'name': 'David Storch', 'email': 'david.storch@mongodb.com', 'username': 'dstorch'} Message: SERVER-48614 Fix plan cache discriminators for partial wildcard indexes Previously, discrimination based on the partial filter expression was done for all paths included in the wildcard projection. This could lead to a situation where two queries were erroneously assigned the same plan cache key. The fix is to ensure that for wildcard indexes, partial index discriminators are instead registered only for those paths mentioned in the partial filter expression. Unlike other kinds of wildcard index discriminators (e.g. handling concerns of null equality or collation), the paths in the partial filter expression are known a priori. Therefore, discrimination based on the partial filter can be done in the same way for wildcard and non-wildcard indexes. (cherry picked from commit 57edf434219c9659354f01fb6bf4f4e6c0370bc4) Branch: v4.2 https://github.com/mongodb/mongo/commit/3d7175997102253e0bf2640ec95657371dd65041 xgen-internal-githook commented on Tue, 16 Jun 2020 18:29:53 +0000: Author: {'name': 'David Storch', 'email': 'david.storch@mongodb.com', 'username': 'dstorch'} Message: SERVER-48614 Fix plan cache discriminators for partial wildcard indexes Previously, discrimination based on the partial filter expression was done for all paths included in the wildcard projection. This could lead to a situation where two queries were erroneously assigned the same plan cache key. The fix is to ensure that for wildcard indexes, partial index discriminators are instead registered only for those paths mentioned in the partial filter expression. Unlike other kinds of wildcard index discriminators (e.g. handling concerns of null equality or collation), the paths in the partial filter expression are known a priori. Therefore, discrimination based on the partial filter can be done in the same way for wildcard and non-wildcard indexes. (cherry picked from commit 80f424c02df47469792917673ab7e6dd77b01421) Branch: v4.4 https://github.com/mongodb/mongo/commit/57edf434219c9659354f01fb6bf4f4e6c0370bc4 xgen-internal-githook commented on Mon, 15 Jun 2020 21:44:49 +0000: Author: {'name': 'David Storch', 'email': 'david.storch@mongodb.com', 'username': 'dstorch'} Message: SERVER-48614 Fix plan cache discriminators for partial wildcard indexes Previously, discrimination based on the partial filter expression was done for all paths included in the wildcard projection. This could lead to a situation where two queries were erroneously assigned the same plan cache key. The fix is to ensure that for wildcard indexes, partial index discriminators are instead registered only for those paths mentioned in the partial filter expression. Unlike other kinds of wildcard index discriminators (e.g. handling concerns of null equality or collation), the paths in the partial filter expression are known a priori. Therefore, discrimination based on the partial filter can be done in the same way for wildcard and non-wildcard indexes. Branch: master https://github.com/mongodb/mongo/commit/80f424c02df47469792917673ab7e6dd77b01421 david.storch commented on Wed, 10 Jun 2020 19:54:08 +0000: ralf.strobel one other note: I filed related ticket SERVER-48700 which you might be interested in. This tracks some ideas for how to increase our test coverage for the plan cache in order to help avoid future plan cache-related query correctness issues such as this one. Feel free to follow and/or vote for SERVER-48700 if you are interested. Also, the code to fix this bug is now being reviewed internally by the Server Query Team. Stay tuned! david.storch commented on Wed, 10 Jun 2020 19:14:08 +0000: ralf.strobel I have a fix in progress that should be suitable for backport to the 4.4 and 4.2 branches. I'm checking with our release team regarding timing of the 4.2.x release into which the fix is likely to be included. ralf.strobel commented on Wed, 10 Jun 2020 06:27:19 +0000: Thank you for the quick and thorough analysis! Can you already estimate how long a fix in 4.2.x might be out? Because wildcard indexes with partialFilterExpressions are currently in place essentially in all of our collections in all of our customers. Given that we only had one bug report so far, it probably only affects rare edge cases, but it still doesn't feel great to know there might be more cases like that lingering somewhere. david.storch commented on Tue, 9 Jun 2020 22:04:56 +0000: Here's an even simpler version of the repro. Notably, this repro shows that the problem can occur when the only secondary index in the collection is a wildcard index with a partialFilterExpression: const coll = db.c; coll.drop(); assert.commandWorked(coll.createIndex({"$**": 1}, { partialFilterExpression: {value1: 1}, })); assert.commandWorked(coll.insert({value1: 1})); const query1 = { value1: 1, value2: 1, }; // does not match the document const query2 = { value1: 1, value2: null, }; // matches the document // Create an active plan cache entry by running 'query1' twice. assert.eq(0, coll.find(query1).itcount()); assert.eq(0, coll.find(query1).itcount()); // This query will incorrectly use the cached plan. assert.eq(1, coll.find(query2).itcount()); david.storch commented on Tue, 9 Jun 2020 21:29:27 +0000: As suspected, this is a bug with the plan cache indexability discriminators. I added some logging which dumps the complete value of the plan cache key, including both the "shape string" and indexability discriminator components. For both query1 and query2 from the repro above, the plan cache key looks like this: MongoDFixture:job0] {"t":{"$date":"2020-06-09T17:21:12.441-04:00"},"s":"I", "c":"QUERY", "id":4861400, "ctx":"conn5","msg":"[storchprint] full plan cache key","attr":{"shapeStr":"an[eqvalue1,eqvalue2]","indexabilityStr":""}} The indexability string has two sections of discriminator bits, one for each predicate. The "" component indicates that the value1 predicate can be assigned to either index, {value1: 1} or the wildcard index. The "" applies to the subsequent predicate on value2. Since there is no non-wildcard index created explicitly over the path value2, there is just one discriminator bit for the wildcard index. For both queries, the "" indicates that the value2 predicate cannot be answered by the wildcard index. In the $eq:null case, this is due to the implicit sparseness property of wildcard indexes. In the case of the query with {value2: {$eq: 1}}, the situation is more subtle. Here, the "" discriminator is due to the partialFilterExpression, since the documents that match {value2: {$eq: 1}} are not provably a subset of the partialFilterExpression. This leads to a scenario where the two queries share a plan cache key, even though they have different indexability properties. Specifically, query1 cannot use the wildcard index to answer the predicate on value2, but query2 can use the wildcard index to answer the predicate on value2. david.storch commented on Tue, 9 Jun 2020 20:25:32 +0000: I've created a slightly simpler version of the attached repro script: const collection = db.c; collection.drop(); collection.createIndex({value1: 1}); collection.createIndex({"$**": 1}, { // The partial filter expression seems to be necessary for the problem to occur. // Note that all queries contain a corresponding filter, or the database would never consider // this index. partialFilterExpression: {value1: 1}, }); collection.insert({value1: 1}); // Note that both queries could use the wildcard index to match "value2", // but only query2 should be able to match "value3" since the expected value is non-null. // Both queries could also use dedicated "value1" index, but only query1 decides to do so. const query1 = { value1: 1, value2: null, }; // matches the document const query2 = { value1: 1, value2: 1, }; // does not match the document // Execute query1 for the first time, it finds the document (via the dedicated "value1" index). // filter: { value1: 1.0, value2: 1.0, value3: null, deleted: null } // planSummary: IXSCAN { value1: 1 } keysExamined:1 docsExamined:1 fromMultiPlanner:1 // cursorExhausted:1 numYields:0 nreturned:1 queryHash:DE2DA7E4 planCacheKey:F2E93F9D assert.eq(1, collection.find(query1).itcount()); // Then execute query2, which correctly does not find the document (using the wildcard index to // match non-null "value3"). filter: { value1: 1.0, value2: 1.0, value3: 1.0, deleted: null } // planSummary: IXSCAN { $_path: 1, value3: 1 } keysExamined:0 docsExamined:0 fromMultiPlanner:1 // cursorExhausted:1 numYields:0 nreturned:0 queryHash:DE2DA7E4 planCacheKey:F2E93F9D assert.eq(0, collection.find(query2).itcount()); // Now execute query1 again. Suddenly it also tries to match "value3" via the wildcard index. // But since we are looking for null, it cannot find an entry in the sparse index. filter: { // value1: 1.0, value2: 1.0, value3: null, deleted: null } planSummary: IXSCAN { $_path: 1, // value3: 1 } keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:0 nreturned:0 // queryHash:DE2DA7E4 planCacheKey:F2E93F9D assert.eq(1, collection.find(query1).itcount()); This looks likely to be a bug with the plan caching system. thomas.schubert commented on Mon, 8 Jun 2020 20:08:15 +0000: Thanks for the detailed report and clear repro script. We can reproduce and are investigating.
Run attached reproduction script. Note that the script loops over the relevant steps until the error occurs, since the problem does not seem to be entirely deterministic but may have some sort of timing sensitivity. Also note the rather specific setup steps. We could only reproduce the issue if the wildcard index is defined with a partialFilterExpression and when a second non-wildcard index is present. But we are not sure if either of these aspects are causally relevant.
Click on a version to see all relevant bugs
MongoDB Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.