...
https://docs.mongodb.com/manual/reference/bson-type-comparison-order/#bindata Says MongoDB sorts BinData in the following order: .... Finally, by the data, performing a byte-by-byte comparison. We had engineered this around the expectation that this order would compare the bytes in the raw data however we have discovered to our dismay that it appears to be comparing the Base64-encoded strings instead which is not the same order as comparing the bytes of the raw data. Note that this this doesn't just apply to the shell; I was led to this conclusion when queries were not returning data I was expecting and assume I could demonstrate the same behavior with a server-side query as well. This is actually a serious bug because it means you can't properly sequence binary data (like IPv6 addresses) in a way that supports range compares but I have no expectation that this will actually be fixed – I suspect instead that this will turn into a documentation bug that warns others off from using this datatype.
glen.miner commented on Wed, 31 Oct 2018 18:01:38 +0000: Thank you for investigating! I'm not sure if it's worth making a new bug or migrating this one for the shell compare operators – I don't care about that personally but it might be a trap people fall into like I did. I retraced my steps and I was seeing this with aggregate commands matching on $gte $lte bounds but I cannot reproduce this with the test data above. It's possible that a TTL index was removing items that made aggregates come back empty but I kind of doubt it. I will re-open when if I see it again. Thanks again! asya commented on Mon, 22 Oct 2018 22:20:26 +0000: > Note that this this doesn't just apply to the shell; This only seems to apply to the shell - when storing these values in the server, I can only get correct results, I cannot reproduce any query or aggregation where comparison happens as a string nor returns unexpected/incorrect result: > db.binary.drop() true > db.binary.insert({a:a, b:b, c:c}) WriteResult({ "nInserted" : 1 }) > db.binary.aggregate({$addFields:{aLessB:{$lt:["$a","$b"]}, bLessC:{$lt:["$b", "$c"]}, aLessC:{$lt:["$a","$c"]}}}).pretty() { "_id" : ObjectId("5bce4aac64bac589b170f3d0"), "a" : BinData(0,"JA4A8gAxqTwciCuF5GGzAA=="), "b" : BinData(0,"JA4A8gAxqTwciCuF5GGzTw=="), "c" : BinData(0,"JA4A8gAxqTwciCuF5GGz/w=="), "aLessB" : true, "bLessC" : true, "aLessC" : true } > db.binary.drop() true > db.binary.insert({ip:a}) WriteResult({ "nInserted" : 1 }) > db.binary.insert({ip:b}) WriteResult({ "nInserted" : 1 }) > db.binary.insert({ip:c}) WriteResult({ "nInserted" : 1 }) > db.binary.find().sort({ip:1}) { "_id" : ObjectId("5bce497264bac589b170f3cc"), "ip" : BinData(0,"JA4A8gAxqTwciCuF5GGzAA==") } { "_id" : ObjectId("5bce497464bac589b170f3cd"), "ip" : BinData(0,"JA4A8gAxqTwciCuF5GGzTw==") } { "_id" : ObjectId("5bce497764bac589b170f3ce"), "ip" : BinData(0,"JA4A8gAxqTwciCuF5GGz/w==") } > db.binary.find().sort({ip:-1}) { "_id" : ObjectId("5bce497764bac589b170f3ce"), "ip" : BinData(0,"JA4A8gAxqTwciCuF5GGz/w==") } { "_id" : ObjectId("5bce497464bac589b170f3cd"), "ip" : BinData(0,"JA4A8gAxqTwciCuF5GGzTw==") } { "_id" : ObjectId("5bce497264bac589b170f3cc"), "ip" : BinData(0,"JA4A8gAxqTwciCuF5GGzAA==") } > db.binary.find({ip:{$lt:a}}) > db.binary.find({ip:{$lt:b}}) { "_id" : ObjectId("5bce4c6f64bac589b170f3d1"), "ip" : BinData(0,"JA4A8gAxqTwciCuF5GGzAA==") } > db.binary.find({ip:{$lt:c}}) { "_id" : ObjectId("5bce4c6f64bac589b170f3d1"), "ip" : BinData(0,"JA4A8gAxqTwciCuF5GGzAA==") } { "_id" : ObjectId("5bce4c7264bac589b170f3d2"), "ip" : BinData(0,"JA4A8gAxqTwciCuF5GGzTw==") } > db.binary.find({ip:{$lte:c}}) { "_id" : ObjectId("5bce4c6f64bac589b170f3d1"), "ip" : BinData(0,"JA4A8gAxqTwciCuF5GGzAA==") } { "_id" : ObjectId("5bce4c7264bac589b170f3d2"), "ip" : BinData(0,"JA4A8gAxqTwciCuF5GGzTw==") } { "_id" : ObjectId("5bce4c7464bac589b170f3d3"), "ip" : BinData(0,"JA4A8gAxqTwciCuF5GGz/w==") } I'm closing this ticket as "Cannot reproduce" but if you can find an example of data/query that returns the wrong results from the server please reopen and include any details you can about exact query so we can reproduce and fix it.
I can make 3 binary encoded IPv6 addresses like this: > var a = BinData(0,"JA4A8gAxqTwciCuF5GGzAA==") > var b = BinData(0,"JA4A8gAxqTwciCuF5GGzTw==") > var c = BinData(0,"JA4A8gAxqTwciCuF5GGz/w==") You can confirm these sort properly a > a.hex() 240e00f20031a93c1c882b85e461b300 > b.hex() 240e00f20031a93c1c882b85e461b34f > c.hex() 240e00f20031a93c1c882b85e461b3ff And of course as you would expect the shell agrees: > a.hex() true > b.hex() true > a.hex() true However if I compare the objects themselves I get a peculiar answer: > a true > b false > a false After some digging this peculiar result seemed to match the string-compare of the base-64 encoded data: > a.toString() true > b.toString() false > a.toString() false Which tragically is inconsistent with the "byte-wise lexicographic sort" implied by the documentation. Anecdotal evidence suggests that server-side sorting has the same behavior.