...
The DocumentKey field of the change event does not capture a shard key if the value of the shard key is null. Consider a collection emp with a shard key name. Inserting a document with no value for name, like {{db.emp.insertOne({"_id":"1001"}) }}produces a change event with DocumentKey that contains only the _id field fullDocument=Document{{_id=1002, documentKey={"_id": "1001"}}} Similarly, updating the name field does not have the previous value of the shard key in the document key. That is, db.emp.updateOne({"_id":"1001","name":null},{$set:{"name":"Adam") }}also produces a change event with DocumentKey that contains only the _id field{{{} documentKey={"_id": "1001"}, updateDescription=UpdateDescription{removedFields=[], updatedFields= {"name": "Adam"} , truncatedArrays=[]}{}}} In both these operations, the DocumentKey field should contain a field for the name as well, like documentKey={"_id": "1001", "name": null} In our use case, we rely on the DocumentKey field to know the shard keys and their previous values. As a workaround we couldn't depend on retrieving the shard key from the config database when Change Streams is run as there is a possibility shard keys may have been refined between when the operation was performed and Change Streams was run. Expected behavior: Provide all the shard keys in DocumentKey even if their value is null.
bernard.gorman commented on Thu, 18 May 2023 10:52:17 +0000: Hi ajay2589@gmail.com, Thank you for raising this issue to our attention. The behaviour you're observing is due to the semantics of null and missing, and the fact that our query language treats a query for null as matching both documents where a field is explicitly null and where that field is absent. When you first insert the document {"_id":"1001"}, the value of "name" is NOT actually null in this case, it is missing. When we started allowing shard keys to be omitted, we took the approach of treating them as-if those keys were null for data-distribution and query-routing purposes, so documents where a shard key field is missing clump together with documents where the field is null. We did this, as mentioned above, because queries for null match documents where that field is explicitly null or where it is missing, which is a special behaviour of our query language. This is why your update on {"_id":"1001", "name":null} matches this document; if you were to instead use a query that actually does match only null: db.emp.updateOne({"_id":"1001", $expr: {$eq: ["$name", null]}}, {$set:{"name":"Adam"}}) ... then it would not match or update this document. But a document with an explicitly null field and one in which the field is missing are still in two distinct states, which is what the documentKey in the change stream output is indicating. We cannot simply report a non-existent shard key field as null, because then use-cases which rely upon distinguishing between shard keys that are present or absent would have no way to differentiate between these two states. As shown below, if the name field is set to explicitly null then subsequent updates will indeed show that field in the documentKey. Incidentally, the reason that the name field is still absent in the update documentKey even though you set {name: "Adam"} is because the documentKey is from before the update occurred - otherwise, there would be no way to tell which document the update applied to. [direct: mongos] test> db.emp.insertOne({ "_id": "1001" }) [direct: mongos] test> var csCursor = db.emp.watch() // Note that we're explicitly setting the value of "name" from to here. // If it was already null, this would be a no-op and no update would occur. [direct: mongos] test> db.emp.updateOne({"_id":"1001", "name": null}, {$set:{"name": null}}) [direct: mongos] test> csCursor.next() { _id: { _data: '...' }, operationType: 'update', clusterTime: Timestamp({ t: 1684401795, i: 1 }), wallTime: ISODate("2023-05-18T09:23:15.126Z"), ns: { db: 'test', coll: 'emp' }, documentKey: { _id: '1001' }, // "name" still missing because documentKey is *before* update updateDescription: { updatedFields: { name: null }, removedFields: [], truncatedArrays: [] } } [direct: mongos] test> db.emp.updateOne({"_id":"1001", "name": null},{$set:{"name": "Adam"} }) [direct: mongos] test> csCursor.next() { _id: { _data: '...' }, operationType: 'update', clusterTime: Timestamp({ t: 1684401826, i: 29 }), wallTime: ISODate("2023-05-18T09:23:46.925Z"), ns: { db: 'test', coll: 'emp' }, documentKey: { name: null, _id: '1001' }, // "name" is now reported as explicitly null updateDescription: { updatedFields: { name: 'Adam' }, removedFields: [], truncatedArrays: [] } } I hope this helps to clarify the behaviour you're seeing. We plan to revisit our null and missing semantics in the future, so I'll add this ticket to the set of issues for consideration as part of that effort.
Environment used: MongoDB Atlas M30 cluster, version: 6.0.5