Loading...
Loading...
Issue Status as of October 23rd 2024 ISSUE DESCRIPTION AND IMPACT When cloning a Time Series collection, the internal timeseriesBucketsMayHaveMixedSchemaData flag is defaulted to false. If the user created a Time Series collection in v5.0-v5.1 and then upgraded to a later version before the collection was cloned, some queries may be missing matching documents. All cloning procedures are incorrectly setting the timeseriesBucketsMayHaveMixedSchemaData flag: initial synchronization, chunk migration, movePrimary, resharding, mongodump/ mongorestore and others. DIAGNOSIS Users in v6.0+ versions can determine if they were impacted by running validate() on their Time Series collections and checking the validate.warnings field to determine if there are mixed-schema buckets detected. The validation command can be very impactful. To minimize the performance impact of running validate, issue validate to a secondary following the hidden member steps outlined here. Example - validate run on a standalone/replica set: // Call validate on a mongod process for replica sets. coll.validate(); // The warnings field detects mixed-schema buckets. { "ns" : "db.system.buckets.coll", ... "errors" : [ "Detected a Time Series bucket with mixed schema data when timeseriesBucketsMayHaveMixedSchemaData is false. You can run the collMod command to set this flag" ], ... } Example - validate run on a sharded cluster: // Call validate on mongos for sharded clusters. coll.validate(); // The warnings field detects mixed-schema buckets. // For sharded clusters, this output is an object with a result for every shard in // the "raw" field. { "ns" : "db.system.buckets.coll", ... "errors" : [ "Detected a Time Series bucket with mixed schema data when timeseriesBucketsMayHaveMixedSchemaData is false. You can run the collMod command to set this flag" ], ... "raw" : { "shard-0-name" : { "ns" : "db.system.buckets.coll" ... "warnings" : [ "Detected a Time Series bucket with mixed schema data" ], ... }, "shard-1-name" : { ... }, ... } } AFFECTED VERSIONS The issue affects Time Series collections created before v5.2 that were later cloned. The affected versions are: 6.0.0-6.0.16, 7.0.0-7.0.12, rapid releases 7.1-7.3.3. Later (sub-)versions are not affected. Warning: Even if the cluster is currently on a safe version, any previous upgrade to an unsafe version in a cluster’s upgrade history means it may still be impacted. REMEDIATION AND WORKAROUNDS If impacted, upgrade o a fixed version and set the timeseriesBucketsMayHaveMixedSchemaData to true for each affected collection to ensure that future queries on the collection return correct results. For example: db.runCommand({ collMod: coll, timeseriesBucketsMayHaveMixedSchemaData: true }); Please see the remediation steps provided in our support-tools repo for a complete step-by-step guide on how to resolve the impacted collections. Note: SERVER-91193 describes a related issue with the same root cause but does not require any manual intervention as it only impacts Rapid Release versions 7.1.0-7.3.3 and those clusters have been automatically upgraded. Original description The timeseriesBucketsMayHaveMixedSchemaData collection option is: Only set on time-series Serialized as top-level fields (not part of the options sub-object) Defaulted to false at bucket collection creation time Defaulting timeseriesBucketsMayHaveMixedSchemaData to false (bullet 3) is problematic because collections are not only created from scratch but may also be created due to data cloning in the following cases: Initial synchronization - when cloning a collection as part of adding a new node to a replica set Chunk migration - when cloning a collection as part of migrating sharded collection's data MovePrimary - when cloning a collection as part of changing its db primary in a sharded cluster Resharding (and moveCollection) - when cloning a collection as part of redistributing all its data Mongodump/ mongorestore: when restoring a cluster from a backup This means that in all clusters starting from v5.2 (SERVER-60574), the value of timeseriesBucketsMayHaveMixedSchemaData may be incorrect. The short term solution is to always pretend the option is set to true (not changing the actual value on the catalog but always behave as if it was). SERVER-91195 will take care of designing a long-term solution for avoiding hitting the issue in the future.
pierlauro.sciarelli commented on Thu, 3 Apr 2025 08:13:47 +0000: The ticket had already been solved on all affected versions under SERVER-91195. I'm flagging that in the "fix version" field because several times the lack of this info on this ticket has generated confusion. pierlauro.sciarelli commented on Fri, 12 Jul 2024 15:55:06 +0000: Closing the ticket as fixed in v8.1 (and will also be solved in v8.0 assuming the backport goes in soon). With the changes committed under SERVER-91195: New clusters in v8.0+ will not be affected by the bug For old clusters, users (or Atlas?) can simply call collMod in order to fix potentially inconsistent catalog options. And it will be guaranteed that such options get properly cloned upon any kind of migration using listCollection's output (resharding/moveChunk/initial sync/mongorestore/etc..). Potential inconsistencies can be spotted based on the output of the checkers being added under SERVER-84699 and SERVER-91754. Since no code was committed under this ticket, I am closing also the backport tickets.
MongoDB Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.