BugZero | MongoDB BugID 2125947 - COUNT_SCAN plan not selected when unfiltered $grou...

MongoDB - Defect ID: 2125947

COUNT_SCAN plan not selected when unfiltered $group + $sum on _id performed

MongoDB - Defect ID: 2125947

COUNT_SCAN plan not selected when unfiltered $group + $sum on _id performed

Last updated on March 13th, 2025

BugZero Risk Score
6.0 Medium

Overall: 6.0

Severity: 6.4

Community: 8.1

Lifecycle: 6.4

What is the BugZero Risk Score?

MongoDB Integration

Learn more about where this data comes from

MongoDB Integration

Learn more

Bug Scrub Advisor

Streamline upgrades with automated vendor bug scrubs

Bug Scrub Advisor

Learn more

BugZero Enterprise

Wish you caught this bug sooner? Get proactive today.

BugZero Enterprise

Learn more

Bug Details

Priority: Major - P3
Status: Backlog

Description

Info

As of DRIVERS-501 the driver CRUD specification defines a pipeline to be used to count documents without resorting to the count command (due to potential inaccurate counts without query predicates). db.foo.drop(); db.foo.insertMany([{},{},{},{},{}]); // Driver CRUD spec definition for countDocuments //////////////////// filter = {} skip = null limit = null pipeline = [{'$match': filter}] if (skip) { pipeline.push({'$skip': skip}) } if (limit) { pipeline.push({'$limit': limit}) } pipeline.push({'$group': {'_id': 1, 'n': {'$sum': 1}}}) ////////////////////////////////////////////////////////////////////// db.foo.explain("executionStats").aggregate(pipeline) Without specifying a filter the empty $match stage would result in a COLLSCAN plan winning, though a COUNT_SCAN should be appropriate. /* "nReturned": 5, "executionTimeMillis": 0, "totalKeysExamined": 0, "totalDocsExamined": 5, "executionStages": { "stage": "COLLSCAN", */ This could be worked around by unshifting a {$sort: { _id: 1 }} stage to the pipeline, however after discussing with david.storch@mongodb.com this may be addressable server-side.

Top User Comments

JIRAUSER1273272 commented on Tue, 15 Aug 2023 11:58:39 +0000: Marking this as blocked by SERVER-23406 as the current performance numbers don't allow this patch to be merged as long as the index scans are significantly slower than a col scan. JIRAUSER1255446 commented on Mon, 7 Aug 2023 11:21:21 +0000: Local benchmark shows that for collections with a relatively small number of fields the collection scan is more efficient than the COUNT_SCAN over the _id index. Bellow are results for collections of size 10000, 100000, and 1 000 000. The documents have 6 fields {_id: i, a: i * 5, b: 100, c: 1000, d: "xxx", e: "abc".repeat(100) }. Each test was run 7 times and the first and last measurements were removed when computing the average. The following results were obtained when running with the SBE engine. Running for collection size 10000 Testing scan for hint { "$natural" : 1 } Average execution time (ms): 3 Median execution time (ms): 3 Testing scan for hint { "_id" : 1 } Average execution time (ms): 5 Median execution time (ms): 5 Testing scan for hint { "a" : 1 } Average execution time (ms): 4 Median execution time (ms): 4 Running for collection size 100000 Testing scan for hint { "$natural" : 1 } Average execution time (ms): 29.6 Median execution time (ms): 30 Testing scan for hint { "_id" : 1 } Average execution time (ms): 47 Median execution time (ms): 47 Testing scan for hint { "a" : 1 } Average execution time (ms): 42 Median execution time (ms): 42 Running for collection size 1000000 Testing scan for hint { "$natural" : 1 } Average execution time (ms): 304.6 Median execution time (ms): 305 Testing scan for hint { "_id" : 1 } Average execution time (ms): 470.6 Median execution time (ms): 468 Testing scan for hint { "a" : 1 } Average execution time (ms): 415.2 Median execution time (ms): 415 Similar results for collections with 3 fields, one idea faster over the large collection sizes. charlie.swanson commented on Tue, 6 Sep 2022 16:34:41 +0000: This does seem to be pretty easily addressed. This is a pretty hack-y patch I drummed up which seems to "fix" it and prefer a full COUNT_SCAN of the "_id" index over a COLLSCAN: Unable to find source-code formatter for language: diff. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml diff --git a/src/mongo/db/query/query_planner.cpp b/src/mongo/db/query/query_planner.cpp index ba398f446cb..4e23e3aa435 100644 --- a/src/mongo/db/query/query_planner.cpp +++ b/src/mongo/db/query/query_planner.cpp @@ -1494,6 +1494,17 @@ StatusWith>> QueryPlanner::plan( } } + // If there are no indexed alternatives, but this is a count request without a query predicate, + // we can use a COUNT_SCAN of the entire _id index (if it exists). + if (out.empty() && (params.options & QueryPlannerParams::IS_COUNT) && + query.getQueryObj().isEmpty()) { + for (auto&& index : params.indices) { + if (isIdIndex(index.keyPattern)) { + out.push_back(buildWholeIXSoln(index, query, params)); + } + } + } + // Check whether we're eligible to use the columnar index, assuming no other indexes can be // used. if (out.empty()) { I'll nominate this for a quick win. It won't be super easy since I believe we'll want to benchmark it and possibly consider whether a non-id index should be eligible? Only non-partial, non-multi key indexes would work. I'm not sure how we'd effectively choose amongst them though, so the "_id" index may be the sensible default. charlie.swanson commented on Tue, 6 Sep 2022 16:31:19 +0000: Oops! Disregard my last comment. I confused myself by looking at the example in that linked ticket, which does not include a filter or a field path access: db.items.aggregate([ {$group: {_id: category, count: {$sum: 1}}}]) It was later clarified that this was a typo though and "category" is meant to be a field path reference: db.items.aggregate([ {$group: {_id: "$category", count: {$sum: 1}}}]) That's a pretty different query than the pure count in this ticket, so I will re-open and undo the duplication. I'll also fix up the example on that ticket charlie.swanson commented on Tue, 6 Sep 2022 16:12:58 +0000: Thanks christopher.harris@mongodb.com and nicholas.zolnierz@mongodb.com. This does appear to be a duplicate of SERVER-29444. The empty predicate is the same as no $match, and the two are both impacted by this bug of not generating an entire-index COUNT_SCAN. The ticket christopher.harris@mongodb.com linked is worth keeping in mind for any change like this, but I think described a filtered COUNT, which makes it eligible for index scans.

Steps to Reproduce

Change history

No changes to display

Links

Relevant Products

Click on a version to see all relevant bugs

Affected versions:No known affected versions

Fixed versions: No known fixed versions

Relevant Products

Click on a version to see all relevant bugs

Affected versions:No known affected versions

Fixed versions: No known fixed versions

Top MongoDB Defects by Risk Score

5.9Defect ID: 2956672
Some time-series tests implicitly rely on measurement insertion order for unordered inserts when checking bucket catalog stats
6.14Defect ID: 2965528
Remove push, publish_packages, and crypt_push tasks from Graviton 4 variants in v7.0 and v8.0
6.14Defect ID: 2947969
[SBE] Release storage engine resources when saveState() or restoreState() throws
5.68Defect ID: 2919474
StackLocator broken by v5 toolchain ASAN
5.88Defect ID: 2968769
Make new write path helper functions use acquireAndValidateBucketsCollection instead of acquireCollection

MongoDB Integration

Learn more about where this data comes from

MongoDB Integration

Learn more

Bug Scrub Advisor

Streamline upgrades with automated vendor bug scrubs

Bug Scrub Advisor

Learn more

BugZero Enterprise

Wish you caught this bug sooner? Get proactive today.

BugZero Enterprise

Learn more

Ready to prevent the next vendor outage?

Get a demo

OPERATIONAL DEFECT DATABASE

MongoDB - Defect ID: 2125947

COUNT_SCAN plan not selected when unfiltered $group + $sum on _id performed

MongoDB - Defect ID: 2125947

COUNT_SCAN plan not selected when unfiltered $group + $sum on _id performed

Last updated on March 13th, 2025

BugZero Risk Score6.0 Medium

Bug Details

Info

Top User Comments

Steps to Reproduce

Links

Top MongoDB Defects by Risk Score

Ready to prevent the next vendor outage?

BugZero Risk Score
6.0 Medium