BugZero | MongoDB BugID 2638273 - cache pressure causes one bucket per single docume...

MongoDB - Defect ID: 2638273

cache pressure causes one bucket per single document with time series collections

MongoDB - Defect ID: 2638273

cache pressure causes one bucket per single document with time series collections

Last updated on January 14th, 2025

BugZero Risk Score
6.4 Medium

Overall: 6.4

Severity: 6.4

Community: 7.8

Lifecycle: 9.1

What is the BugZero Risk Score?

MongoDB Integration

Learn more about where this data comes from

MongoDB Integration

Learn more

Bug Scrub Advisor

Streamline upgrades with automated vendor bug scrubs

Bug Scrub Advisor

Learn more

BugZero Enterprise

Wish you caught this bug sooner? Get proactive today.

BugZero Enterprise

Learn more

Bug Details

Priority: Major - P3
Status: Closed

Description

Info

A while (several days usually) after starting up the database (sharded cluster), newly inserted time series collection will have exactly (as far as I personally observed) as many buckets as documents, regardless of granularity or metafield settings. The stats() command shows that the buckets were closed due to cache pressure (numBucketsClosedDueToCachePressure). I also experience the issue with after deleting a time series collection, and inserting either the same or other collections. This behavior has also been documented/observed in: https://www.mongodb.com/community/forums/t/suboptimal-bucket-creation-in-timeseries-collection-due-to-cache-pressure/249863 https://dba.stackexchange.com/questions/336013/mongodb-timeseries-data-at-high-load-causes-cache-pressure-in-time https://www.mongodb.com/community/forums/t/why-timeseries-bucket-count-is-different-on-2-independent-environment-with-same-data/265066/

Top User Comments

xgen-internal-githook commented on Thu, 14 Nov 2024 19:09:08 +0000: Author: {'name': 'henrikedin', 'email': 'henrik.edin@mongodb.com', 'username': 'henrikedin'} Message: SERVER-89233 Track cardinality per-collection for time-series (#27028) (#27817) GitOrigin-RevId: 827eab4828967a9b7a1d4c36f82251c44ff5f659 Branch: v8.0 https://github.com/mongodb/mongo/commit/c70c2202139416d6c87b63c956455faeb4d3294a xgen-internal-githook commented on Thu, 3 Oct 2024 13:21:33 +0000: Author: {'name': 'henrikedin', 'email': 'henrik.edin@mongodb.com', 'username': 'henrikedin'} Message: SERVER-89233 Track cardinality per-collection for time-series (#27028) GitOrigin-RevId: 3648368d1823dfa5594b50c4e989dc3d995717f8 Branch: master https://github.com/mongodb/mongo/commit/46e8db22d75751f6469d95f87a5ad700f51264d6 JIRAUSER1279104 commented on Fri, 21 Jun 2024 18:40:30 +0000: dan.larkin-york@mongodb.com yes my use case also creates multiple time series collections per day. I did not mention it because I did not realize it works this way in background. I also do not remember countering the problem with one large collection instead of multiple smaller ones. JIRAUSER1258161 commented on Fri, 21 Jun 2024 18:20:00 +0000: zongli@ethz.ch In the linked posts, there's a description of a process of creating new collections on a regular basis. Does your use case also follow this pattern? The reason I'm asking, is that the identifier for a given series is of the form (collectionUUID, metaFieldValue). So if you are creating collections over time, then each new collection increases the total cardinality of "active" buckets, and is a very similar problem to the case with collection drops/recreation, and is also solved by more aggressive pruning policy. If that's not the case, and your set of unique (collectionUUID, metaFieldValue) pairs is stable over time, and you're still seeing this behavior, then I'd like to dig in further on your access patterns to see if we can get to the bottom of it. JIRAUSER1279104 commented on Fri, 21 Jun 2024 18:01:26 +0000: dan.larkin-york@mongodb.com Thank you for the quick response. I understand completely I just want to elaborate (in case it was not clear) that this behavior also appears without any deletion of time series collection. It simply "happens" after a while of inserting and reading. JIRAUSER1258161 commented on Fri, 21 Jun 2024 17:45:02 +0000: Hi zongli@ethz.ch, thanks for checking in, and for your patience. The team has been balancing work on several different types of issues, and we haven't had a chance to tackle this one just yet, but we're hoping to get to it in the next few weeks (or, worst case, next few months). JIRAUSER1279104 commented on Fri, 21 Jun 2024 15:30:06 +0000: Hi all! Is there any updates on this matter? Since it is not added to another sprint yet, will the solution be included in the next update or something? Thank you in advance. JIRAUSER1258161 commented on Thu, 11 Apr 2024 13:58:37 +0000: Okay, I've got a pretty good idea what's going on here. The number of "active" buckets tracked by the catalog is high enough that we dynamically adjust the bucket size limit downward so that all active buckets will fit in cache; however, in this case, many of the "active" buckets are actually idle, and in fact have been cleared, e.g. due to collection drop/recreate. Normally we would prune these long-idle, cleared buckets, but our pruning logic only kicks in if the bucket catalog memory usage is above the configured threshold (which doesn't appear to be the case here). We need to revisit the conditions for when the pruning kicks in, and be more aggressive about it. JIRAUSER1278047 commented on Thu, 11 Apr 2024 09:36:37 +0000: Hi chris.kelly@mongodb.com , thanks for the fast response. I uploaded one of the cluster member's log and diagnostics. The main point of interest should be on 9th April where the issue occurred. And specifically the following log messages below seem to be interesting (repeated a lot during this time). As to the other problems. The number of documents and the nature of documents was not relevant in my case. As soon as this behavior with one bucket for each document occurs, the issue persists with time series collections created after this, regardless of the specifics of that time series collection. A clear timestamp is not available to me, as I did not track the cache pressure metric etc. all the time. But a reasonable guess would be the timestamp indicated in the log messages below. There is also no "problem" with the workload per say. The data do get stored. Its just that there are way too many buckets than there should be. And it affects performance heavily. I would also like to note that I personally only experience this issue inside a sharded database, not in an unsharded one. Also I first experienced this with deleting time-series collections and re-inserting them, but also with "after some time" like described in the links that I posted. {"t":\{"$date":"2024-04-09T17:30:00.120+00:00"} ,"s":"I", "c":"STORAGE", "id":6936300, "ctx":"TimestampMonitor","msg":"Drop-pending ident is still in use","attr":{"ident":"index-84-1883709576645939057","dropTimestamp":{"$timestamp":{"t":1711129555,"i":16}},"error":{"code":314,"codeName":"ObjectIsBusy","errmsg":"Failed to remove drop-pending ident index-84-1883709576645939057"}}} {"t":\{"$date":"2024-04-09T17:30:00.120+00:00"} ,"s":"I", "c":"STORAGE", "id":22237, "ctx":"TimestampMonitor","msg":"Completing drop for ident","attr":{"ident":"index-96--1883709576645939057","dropTimestamp":{"$timestamp":{"t":1711375050,"i":38}}}} {"t":\{"$date":"2024-04-09T17:30:00.120+00:00"} ,"s":"I", "c":"STORAGE", "id":6936300, "ctx":"TimestampMonitor","msg":"Drop-pending ident is still in use","attr":{"ident":"index-96-1883709576645939057","dropTimestamp":{"$timestamp":{"t":1711375050,"i":38}},"error":{"code":314,"codeName":"ObjectIsBusy","errmsg":"Failed to remove drop-pending ident index-96-1883709576645939057"}}} JIRAUSER1265262 commented on Thu, 11 Apr 2024 01:16:44 +0000: Hi zongli@amzracing.ch, Thanks for your report. In order to proceed, would you please archive (tar or zip) the mongod.log files and the $dbpath/diagnostic.data directory (the contents are described here) and upload them to this support uploader location? Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time. To give more context around our investigation, please also provide: A clear description of the nature if your insert/delete workload (such as a sample document, number of documents) Clear timestamp of when the offending workload begins Clear timestamp of when you observe an articulable issue with the workload Chris

Steps to Reproduce

In a sharded mongodb cluster (in my case 7.0.4). Insert time-series collection into a sharded database. Either keep inserting, or in my case, delete the one, and re-insert the same time-series collection. The bucket count equals the document count.

Change history

2025-03-21 Added: 7.0.4, 7.0.5

Links

Relevant Products

Click on a version to see all relevant bugs

Affected versions:7.0.4, 7.0.5

Fixed versions: No known fixed versions

Relevant Products

Click on a version to see all relevant bugs

Affected versions:7.0.4, 7.0.5

Fixed versions: No known fixed versions

Top MongoDB Defects by Risk Score

5.9Defect ID: 2956672
Some time-series tests implicitly rely on measurement insertion order for unordered inserts when checking bucket catalog stats
6.14Defect ID: 2965528
Remove push, publish_packages, and crypt_push tasks from Graviton 4 variants in v7.0 and v8.0
6.14Defect ID: 2947969
[SBE] Release storage engine resources when saveState() or restoreState() throws
5.68Defect ID: 2919474
StackLocator broken by v5 toolchain ASAN
5.88Defect ID: 2968769
Make new write path helper functions use acquireAndValidateBucketsCollection instead of acquireCollection

MongoDB Integration

Learn more about where this data comes from

MongoDB Integration

Learn more

Bug Scrub Advisor

Streamline upgrades with automated vendor bug scrubs

Bug Scrub Advisor

Learn more

BugZero Enterprise

Wish you caught this bug sooner? Get proactive today.

BugZero Enterprise

Learn more

Ready to prevent the next vendor outage?

Get a demo

OPERATIONAL DEFECT DATABASE

MongoDB - Defect ID: 2638273

cache pressure causes one bucket per single document with time series collections

MongoDB - Defect ID: 2638273

cache pressure causes one bucket per single document with time series collections

Last updated on January 14th, 2025

BugZero Risk Score6.4 Medium

Bug Details

Info

Top User Comments

Steps to Reproduce

Links

Top MongoDB Defects by Risk Score

Ready to prevent the next vendor outage?

BugZero Risk Score
6.4 Medium