...
dianna.hohensee commented on Thu, 22 Dec 2022 19:56:54 +0000: Here's what the commit msg was supposed to be: SERVER-71664 Column key generation, return found duplicate field when the arrayInfo of the latest leaf node std::mismatch with the previous node's arrayInfo is at the beginning of an array. Column key generation creates column values by building up arrayInfo information per leaf, and then combining information per path. Two leaves with duplicate field names where one leaf is an object and the other is an array must be caught. xgen-internal-githook commented on Thu, 22 Dec 2022 19:48:41 +0000: Author: {'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@mongodb.com', 'username': 'DiannaHohensee'} Message: SERVER-71664 Column key generation, return found duplicate field when the arrayInfo of the latest ... Branch: master https://github.com/mongodb/mongo/commit/1e666d5a6728c800873165d14fd49aa2ea421ce2 dianna.hohensee commented on Thu, 8 Dec 2022 18:09:13 +0000: The last thing worth doing is probably to explore the code directly, and see if it makes logical sense that there's an issue we want to correct. I'll do that next. dianna.hohensee commented on Thu, 8 Dec 2022 17:53:58 +0000: Just tweaked my patch to run the exact example, without invariant. So definitely need another ingredient. rs0:PRIMARY> db.mm.drop() true rs0:PRIMARY> db.mm.insert({ "_id" : 0}) WriteResult({ "nInserted" : 1 }) rs0:PRIMARY> db.mm.find() { "_id" : 0, "bad" : { "c" : { "obj" : 42 } }, "c" : [ 42 ] } rs0:PRIMARY> db.mm.createIndex({"$**": "columnstore"}) { "numIndexesBefore" : 1, "numIndexesAfter" : 2, "createdCollectionAutomatically" : false, "commitQuorum" : "votingMembers", "ok" : 1, "$clusterTime" : { "clusterTime" : Timestamp(1670521968, 3), "signature" : { "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="), "keyId" : NumberLong(0) } }, "operationTime" : Timestamp(1670521968, 3) } dianna.hohensee commented on Thu, 8 Dec 2022 17:42:51 +0000: I've been trying to reproduce the invariant. However, a document with the same shape doesn't appear to trigger it, in my repro attempt. I opted to hack the insert path in order to get duplicate fields inserted; as well as adding logging of the document in the find command code so that I can see the duplicate fields, which may otherwise be eliminated in the shell results. This is my patch. So I inserted a random document rs0:PRIMARY> db.mm.insert({ "_id" : 25, "item" : "ABC3", dimensions: [ [ "koed", 25 ] ] }) WriteResult({ "nInserted" : 1 }) And logs in the insert path show what I added via CPP code. {"t":{"$date":"2022-12-08T17:29:35.382+00:00"},"s":"I", "c":"-", "id":0, "ctx":"conn3","msg":"~~~fixDocumentForInsert start","attr":{"doc":"{ _id: 25.0, item: \"ABC3\", dimensions: [ [ \"koed\", 25.0 ] ] }"}} {"t":{"$date":"2022-12-08T17:29:35.382+00:00"},"s":"I", "c":"-", "id":0, "ctx":"conn3","msg":"~~~fixDocumentForInsert start","attr":{"newDoc":"{ _id: 25.0, item: \"ABC3\", dimensions: [ [ \"koed\", 25.0 ] ], dimensions: \"out-of-this-world\", bad: { c: { obj: 42 } }, c: [ 42 ] }"}} {"t":{"$date":"2022-12-08T17:29:35.383+00:00"},"s":"I", "c":"STORAGE", "id":20320, "ctx":"conn3","msg":"createCollection","attr":{"namespace":"test.mm","uuidDisposition":"generated","uuid":{"uuid":{"$uuid":"e627c770-eb34-4532-994b-1c28698d80db"}},"options":{}}} {"t":{"$date":"2022-12-08T17:29:35.395+00:00"},"s":"I", "c":"INDEX", "id":20345, "ctx":"conn3","msg":"Index build: done building","attr":{"buildUUID":null,"collectionUUID":{"uuid":{"$uuid":"e627c770-eb34-4532-994b-1c28698d80db"}},"namespace":"test.mm","index":"_id_","ident":"index-1-3867456837185354573","collectionIdent":"collection-0-3867456837185354573","commitTimestamp":{"$timestamp":{"t":1670520575,"i":1}}}} Double checked with a find command rs0:PRIMARY> db.mm.find() { "_id" : 25, "item" : "ABC3", "dimensions" : [ [ "koed", 25 ] ], "bad" : { "c" : { "obj" : 42 } }, "c" : [ 42 ] } {"t":{"$date":"2022-12-08T17:29:41.232+00:00"},"s":"I", "c":"QUERY", "id":0, "ctx":"conn3","msg":"~~~Find cmd: result->getBodyBuilder().asTempObj()","attr":{"result":{"cursor":{"firstBatch":[{"_id":25,"item":"ABC3","dimensions":[["koed",25]],"dimensions":"out-of-this-world","bad":{"c":{"obj":42}},"c":[42]}],"id":0,"ns":"test.mm"}}}} Then I created the column store index, successfully without invariant. rs0:PRIMARY> db.mm.createIndex({"$**": "columnstore"}) { "numIndexesBefore" : 1, "numIndexesAfter" : 2, "createdCollectionAutomatically" : false, "commitQuorum" : "votingMembers", "ok" : 1, "$clusterTime" : { "clusterTime" : Timestamp(1670520603, 5), "signature" : { "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="), "keyId" : NumberLong(0) } }, "operationTime" : Timestamp(1670520603, 5) } {"t":{"$date":"2022-12-08T17:30:03.192+00:00"},"s":"I", "c":"INDEX", "id":20438, "ctx":"conn3","msg":"Index build: registering","attr":{"buildUUID":{"uuid":{"$uuid":"e8cecae6-55ef-4b61-b084-1e0356391483"}},"namespace":"test.mm","collectionUUID":{"uuid":{"$uuid":"e627c770-eb34-4532-994b-1c28698d80db"}},"indexes":1,"firstIndex":{"name":"$**_columnstore"},"command":{"createIndexes":"mm","v":2,"indexes":[{"key":{"$**":"columnstore"},"name":"$**_columnstore"}],"ignoreUnknownIndexOptions":false}}} {"t":{"$date":"2022-12-08T17:30:03.208+00:00"},"s":"I", "c":"INDEX", "id":20384, "ctx":"IndexBuildsCoordinatorMongod-0","msg":"Index build: starting","attr":{"buildUUID":{"uuid":{"$uuid":"e8cecae6-55ef-4b61-b084-1e0356391483"}},"collectionUUID":{"uuid":{"$uuid":"e627c770-eb34-4532-994b-1c28698d80db"}},"namespace":"test.mm","properties":{"v":2,"key":{"$**":"columnstore"},"name":"$**_columnstore"},"specIndex":0,"numSpecs":1,"method":"Hybrid","ident":"index-2-3867456837185354573","collectionIdent":"collection-0-3867456837185354573","maxTemporaryMemoryUsageMB":200}} {"t":{"$date":"2022-12-08T17:30:03.208+00:00"},"s":"I", "c":"INDEX", "id":20346, "ctx":"IndexBuildsCoordinatorMongod-0","msg":"Index build: initialized","attr":{"buildUUID":{"uuid":{"$uuid":"e8cecae6-55ef-4b61-b084-1e0356391483"}},"collectionUUID":{"uuid":{"$uuid":"e627c770-eb34-4532-994b-1c28698d80db"}},"namespace":"test.mm","initializationTimestamp":{"$timestamp":{"t":1670520603,"i":2}}}} {"t":{"$date":"2022-12-08T17:30:03.209+00:00"},"s":"I", "c":"INDEX", "id":20440, "ctx":"conn3","msg":"Index build: waiting for index build to complete","attr":{"buildUUID":{"uuid":{"$uuid":"e8cecae6-55ef-4b61-b084-1e0356391483"}},"deadline":{"$date":{"$numberLong":"9223372036854775807"}}}} {"t":{"$date":"2022-12-08T17:30:03.209+00:00"},"s":"I", "c":"STORAGE", "id":4847600, "ctx":"IndexBuildsCoordinatorMongod-0","msg":"Index build: waiting for last optime before interceptors to be majority committed","attr":{"buildUUID":{"uuid":{"$uuid":"e8cecae6-55ef-4b61-b084-1e0356391483"}},"collectionUUID":{"uuid":{"$uuid":"e627c770-eb34-4532-994b-1c28698d80db"}},"deadline":{"$date":"2022-12-08T17:30:13.208Z"},"timeoutMillis":10000,"lastOpTime":{"ts":{"$timestamp":{"t":1670520603,"i":2}},"t":21}}} {"t":{"$date":"2022-12-08T17:30:03.306+00:00"},"s":"I", "c":"INDEX", "id":20391, "ctx":"IndexBuildsCoordinatorMongod-0","msg":"Index build: collection scan done","attr":{"buildUUID":{"uuid":{"$uuid":"e8cecae6-55ef-4b61-b084-1e0356391483"}},"collectionUUID":{"uuid":{"$uuid":"e627c770-eb34-4532-994b-1c28698d80db"}},"namespace":"test.mm","totalRecords":1,"readSource":"kMajorityCommitted","durationMillis":0}} {"t":{"$date":"2022-12-08T17:30:03.307+00:00"},"s":"I", "c":"COMMAND", "id":20685, "ctx":"IndexBuildsCoordinatorMongod-0","msg":"Index build: inserted keys from external sorter into index","attr":{"namespace":"test.mm","index":"$**_columnstore","keysInserted":8,"durationMillis":0}} {"t":{"$date":"2022-12-08T17:30:03.313+00:00"},"s":"I", "c":"CONNPOOL", "id":22576, "ctx":"ReplNetwork","msg":"Connecting","attr":{"hostAndPort":"localhost:50000"}} {"t":{"$date":"2022-12-08T17:30:03.314+00:00"},"s":"I", "c":"NETWORK", "id":22943, "ctx":"listener","msg":"Connection accepted","attr":{"remote":"127.0.0.1:34834","uuid":{"uuid":{"$uuid":"bdda1cdc-4634-4518-acd2-a9f4e7f53a2a"}},"connectionId":5,"connectionCount":2}} {"t":{"$date":"2022-12-08T17:30:03.315+00:00"},"s":"I", "c":"NETWORK", "id":51800, "ctx":"conn5","msg":"client metadata","attr":{"remote":"127.0.0.1:34834","client":"conn5","doc":{"driver":{"name":"NetworkInterfaceTL-ReplNetwork","version":"6.2.0"},"os":{"type":"Linux","name":"Ubuntu","architecture":"x86_64","version":"18.04"}}}} {"t":{"$date":"2022-12-08T17:30:03.318+00:00"},"s":"I", "c":"STORAGE", "id":3856201, "ctx":"conn5","msg":"Index build: commit quorum satisfied","attr":{"indexBuildEntry":{"_id":{"$uuid":"e8cecae6-55ef-4b61-b084-1e0356391483"},"collectionUUID":{"$uuid":"e627c770-eb34-4532-994b-1c28698d80db"},"commitQuorum":"votingMembers","indexNames":["$**_columnstore"],"commitReadyMembers":["localhost:50000"]}}} {"t":{"$date":"2022-12-08T17:30:03.319+00:00"},"s":"I", "c":"CONNPOOL", "id":22566, "ctx":"ReplNetwork","msg":"Ending connection due to bad connection status","attr":{"hostAndPort":"localhost:50000","error":"CallbackCanceled: onInvoke :: caused by :: Callback was canceled","numOpenConns":0}} {"t":{"$date":"2022-12-08T17:30:03.319+00:00"},"s":"I", "c":"STORAGE", "id":3856203, "ctx":"IndexBuildsCoordinatorMongod-0","msg":"Index build: waiting for next action before completing final phase","attr":{"buildUUID":{"uuid":{"$uuid":"e8cecae6-55ef-4b61-b084-1e0356391483"}}}} {"t":{"$date":"2022-12-08T17:30:03.319+00:00"},"s":"I", "c":"STORAGE", "id":3856204, "ctx":"IndexBuildsCoordinatorMongod-0","msg":"Index build: received signal","attr":{"buildUUID":{"uuid":{"$uuid":"e8cecae6-55ef-4b61-b084-1e0356391483"}},"action":"Commit quorum Satisfied"}} {"t":{"$date":"2022-12-08T17:30:03.320+00:00"},"s":"I", "c":"NETWORK", "id":22943, "ctx":"listener","msg":"Connection accepted","attr":{"remote":"127.0.0.1:34840","uuid":{"uuid":{"$uuid":"546bfe2d-f6fd-4aa7-9149-a98b726b1a23"}},"connectionId":7,"connectionCount":3}} {"t":{"$date":"2022-12-08T17:30:03.320+00:00"},"s":"I", "c":"INDEX", "id":20345, "ctx":"IndexBuildsCoordinatorMongod-0","msg":"Index build: done building","attr":{"buildUUID":{"uuid":{"$uuid":"e8cecae6-55ef-4b61-b084-1e0356391483"}},"collectionUUID":{"uuid":{"$uuid":"e627c770-eb34-4532-994b-1c28698d80db"}},"namespace":"test.mm","index":"$**_columnstore","ident":"index-2-3867456837185354573","collectionIdent":"collection-0-3867456837185354573","commitTimestamp":{"$timestamp":{"t":1670520603,"i":4}}}} {"t":{"$date":"2022-12-08T17:30:03.321+00:00"},"s":"I", "c":"NETWORK", "id":51800, "ctx":"conn7","msg":"client metadata","attr":{"remote":"127.0.0.1:34840","client":"conn7","doc":{"driver":{"name":"NetworkInterfaceTL-ReplNetwork","version":"6.2.0"},"os":{"type":"Linux","name":"Ubuntu","architecture":"x86_64","version":"18.04"}}}} {"t":{"$date":"2022-12-08T17:30:03.321+00:00"},"s":"I", "c":"STORAGE", "id":20663, "ctx":"IndexBuildsCoordinatorMongod-0","msg":"Index build: completed successfully","attr":{"buildUUID":{"uuid":{"$uuid":"e8cecae6-55ef-4b61-b084-1e0356391483"}},"collectionUUID":{"uuid":{"$uuid":"e627c770-eb34-4532-994b-1c28698d80db"}},"namespace":"test.mm","indexesBuilt":["$**_columnstore"],"numIndexesBefore":1,"numIndexesAfter":2}} {"t":{"$date":"2022-12-08T17:30:03.321+00:00"},"s":"I", "c":"INDEX", "id":20447, "ctx":"conn3","msg":"Index build: completed","attr":{"buildUUID":{"uuid":{"$uuid":"e8cecae6-55ef-4b61-b084-1e0356391483"}}}} {"t":{"$date":"2022-12-08T17:30:03.323+00:00"},"s":"I", "c":"NETWORK", "id":22944, "ctx":"conn5","msg":"Connection ended","attr":{"remote":"127.0.0.1:34834","uuid":{"uuid":{"$uuid":"bdda1cdc-4634-4518-acd2-a9f4e7f53a2a"}},"connectionId":5,"connectionCount":2}} {"t":{"$date":"2022-12-08T17:30:03.325+00:00"},"s":"I", "c":"COMMAND", "id":51803, "ctx":"conn3","msg":"Slow query","attr":{"type":"command","ns":"test.mm","appName":"MongoDB Shell","command":{"createIndexes":"mm","indexes":[{"key":{"$**":"columnstore"},"name":"$**_columnstore"}],"lsid":{"id":{"$uuid":"0d2f3210-e9b4-4c79-98b3-4cf7dfa5fbcd"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1670520575,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test"},"numYields":0,"reslen":271,"locks":{"ParallelBatchWriterMode":{"acquireCount":{"r":5}},"FeatureCompatibilityVersion":{"acquireCount":{"r":4,"w":4}},"ReplicationStateTransition":{"acquireCount":{"w":8}},"Global":{"acquireCount":{"r":4,"w":4}},"Database":{"acquireCount":{"r":2,"w":3}},"Collection":{"acquireCount":{"r":2,"w":2,"W":1}},"Mutex":{"acquireCount":{"r":8}}},"flowControl":{"acquireCount":3,"timeAcquiringMicros":16},"readConcern":{"level":"local","provenance":"implicitDefault"},"writeConcern":{"w":"majority","wtimeout":0,"provenance":"implicitDefault"},"storage":{},"remote":"127.0.0.1:49200","protocol":"op_msg","durationMillis":133}} {"t":{"$date":"2022-12-08T17:30:03.325+00:00"},"s":"I", "c":"EXECUTOR", "id":6983000, "ctx":"conn3","msg":"Slow SessionWorkflow loop","attr":{"elapsed":{"totalMillis":22078,"activeMillis":134,"receiveWorkMillis":21943,"processWorkMillis":134,"sendResponseMillis":0,"finalizeMillis":0}}} {"t":{"$date":"2022-12-08T17:30:04.250+00:00"},"s":"I", "c":"STORAGE", "id":22260, "ctx":"TimestampMonitor","msg":"Removing drop-pending idents with drop timestamps before timestamp","attr":{"timestamp":{"$timestamp":{"t":1670520303,"i":5}}}} {"t":{"$date":"2022-12-08T17:30:04.250+00:00"},"s":"I", "c":"STORAGE", "id":22237, "ctx":"TimestampMonitor","msg":"Completing drop for ident","attr":{"ident":"internal-3-3867456837185354573","dropTimestamp":{"$timestamp":{"t":0,"i":0}}}} {"t":{"$date":"2022-12-08T17:30:04.251+00:00"},"s":"I", "c":"STORAGE", "id":6776600, "ctx":"TimestampMonitor","msg":"The ident was successfully dropped","attr":{"ident":"internal-3-3867456837185354573","dropTimestamp":{"$timestamp":{"t":0,"i":0}}}} {"t":{"$date":"2022-12-08T17:30:36.002+00:00"},"s":"W", "c":"COMMAND", "id":7024600, "ctx":"ftdc","msg":"The collStats command is deprecated. For more information, see https://dochub.mongodb.org/core/collStats-deprecated"} Then just sanity checked with a find command again. rs0:PRIMARY> db.mm.find() { "_id" : 25, "item" : "ABC3", "dimensions" : [ [ "koed", 25 ] ], "bad" : { "c" : { "obj" : 42 } }, "c" : [ 42 ] } {"t":{"$date":"2022-12-08T17:32:34.681+00:00"},"s":"I", "c":"QUERY", "id":0, "ctx":"conn3","msg":"~~~Find cmd: result->getBodyBuilder().asTempObj()","attr":{"result":{"cursor":{"firstBatch":[{"_id":25,"item":"ABC3","dimensions":[["koed",25]],"dimensions":"out-of-this-world","bad":{"c":{"obj":42}},"c":[42]}],"id":0,"ns":"test.mm"}}}} So it looks like there might be something more at play to hit that invariant. I'm not sure what it is, though.
Insert into a collection a document with duplicated fields like this: {_id: 0, bad: {c: {obj: 42}, c: [42]}} (note: I don't know how to do it from the mongo shell but it looks like mongocxx driver can do that, at least, this is what I'm observing when using Genny that uses mongocxx). coll.createIndex({"$**": "columnstore"}) -> triggers an invariant in ColumnShredder::appendToArrayInfo (and, therefore, crashes the server). invariant(oldIt < oldPosition.end()); oldPosition = {_data = 0x5651220464b0 "{", _size = 1} newPosition = {_data = 0x7f44d33e5df0 "{[", _size = 2} std::mismatch returns newIt set to "[" and oldIt set to oldPosition.end()