...
The changes from SERVER-76855 ensure that the aggregate command will correctly use the collator in mongos when targeting for an untracked collection. There are places outside of the aggregate command in mongos which attempt to utilize the collation. Further investigation is needed to determine the extent to which mongos would be doing post-processing of results after merging cursor results (e.g. $group followed by $match) where the collator used by mongos is relevant for the correctness of query results. Here is a reference to my simple audit from SERVER-76855 along with a more recent output from searching the codebase: $ git grep -E 'collation.*isEmpty' -- src/mongo/s/ src/mongo/s/chunk_manager.cpp:674: const bool hasSimpleCollation = (collation.isEmpty() && !_rt->optRt->getDefaultCollator()) || src/mongo/s/cluster_commands_helpers.cpp:119: const auto noCollationSpecified = collation.isEmpty(); src/mongo/s/cluster_commands_helpers.cpp:226: if (!collation.isEmpty()) { src/mongo/s/collection_routing_info_targeter.cpp:406: if (!collation.isEmpty()) { src/mongo/s/collection_routing_info_targeter.cpp:777: if (!collation.isEmpty()) { src/mongo/s/commands/cluster_distinct_cmd.cpp:373: !collation.isEmpty() src/mongo/s/commands/cluster_map_reduce_agg.cpp:118: if (!collationObj.isEmpty()) { src/mongo/s/commands/cluster_map_reduce_agg.cpp:161: if (!cm.hasRoutingTable() && collationObj.isEmpty()) { src/mongo/s/commands/cluster_query_without_shard_key_cmd.cpp:150: if (!parsedInfo.collation.isEmpty()) { src/mongo/s/commands/cluster_query_without_shard_key_cmd.cpp:427: if (parsedInfoFromRequest.collation.isEmpty()) { src/mongo/s/commands/sharding_expressions.cpp:119: if (auto collation = indexDescriptor->collation(); !collation.isEmpty()) { src/mongo/s/query/cluster_aggregate.cpp:140: if (!collationObj.isEmpty()) { src/mongo/s/query/cluster_aggregate.cpp:159: if ((!cri || !cri->cm.hasRoutingTable()) && collationObj.isEmpty()) { src/mongo/s/query/cluster_aggregation_planner.cpp:633: !collationToReturn.isEmpty()); src/mongo/s/query/cluster_aggregation_planner.cpp:816: if (nss.isCollectionlessAggregateNS() || !collation.isEmpty() || !cm) { src/mongo/s/shard_key_pattern_query_util.cpp:468: if (!collation.isEmpty()) { src/mongo/s/shard_key_pattern_query_util.cpp:476: if (!cm.hasRoutingTable() && collation.isEmpty()) { src/mongo/s/write_ops/write_without_shard_key_util.cpp:267: if (collation.isEmpty()) {
max.hirschhorn@10gen.com commented on Mon, 22 Jan 2024 22:53:48 +0000: As an example, mihai.andrei@mongodb.com and I spot-checked this block in the distinct command which appears to be using the simple collation to process the values array returned by the distinct command. This behavior in mongos is initially suspicious because an untracked collection may have a non-simple default collation for how the contents of the values array ought to be compared. However the loop over shardResponses is guaranteed to execute once for an unsharded collection and therefore BSONObjSet all will still correctly contain the distinct values of the one shard's response (where the shard already applied the collection's default collation to the values array itself).
Click on a version to see all relevant bugs
MongoDB Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.