...
I'm running a query against a collection with 27,576 documents. Occasionally, queries against this which normally return in a few milliseconds, will suddenly take 50 seconds or more. Any idea what's happening here? The query (I'm using the MongooseJS driver): var regExp = [new RegExp("^[A-Z]+$")]; var query = models.MeSH.find( { "TreeNumberList.TreeNumber": {$in: regExp} }).hint({"TreeNumberList.TreeNumber": 1}); The array of regular expressions might look odd, but this is simplified code; occasionally it contains 2 regexes. Profile data: { "op" : "query", "ns" : "figure1.meshdescriptors", "query" : { "$query" : { "TreeNumberList.TreeNumber" : { "$in" : [ /^[A-Z]+$/ ] } }, "orderby" : { "TreeNumberList.TreeNumber" : 1 }, "$hint" : { "TreeNumberList.TreeNumber" : 1 } }, "ntoreturn" : 0, "ntoskip" : 0, "nscanned" : 56348, "nscannedObjects" : 27463, "keyUpdates" : 0, "numYield" : 3246, "lockStats" : { "timeLockedMicros" : { "r" : NumberLong(4649093), "w" : NumberLong(0) }, "timeAcquiringMicros" : { "r" : NumberLong(2417056), "w" : NumberLong(2780) } }, "nreturned" : 16, "responseLength" : 5751, "millis" : 51795, "execStats" : { "type" : "FETCH", "works" : 59541, "yields" : 3394, "unyields" : 3394, "invalidates" : 0, "advanced" : 16, "needTime" : 56332, "needFetch" : 3192, "isEOF" : 1, "alreadyHasObj" : 0, "forcedFetches" : 0, "matchTested" : 16, "children" : [ { "type" : "IXSCAN", "works" : 56348, "yields" : 3394, "unyields" : 3394, "invalidates" : 0, "advanced" : 27463, "needTime" : 28885, "needFetch" : 0, "isEOF" : 1, "keyPattern" : "{ TreeNumberList.TreeNumber: 1 }", "isMultiKey" : 1, "boundsVerbose" : "field #0['TreeNumberList.TreeNumber']: [\"\", {}), [/^[A-Z]+$/, /^[A-Z]+$/]", "yieldMovedCursor" : 0, "dupsTested" : 56348, "dupsDropped" : 28885, "seenInvalidated" : 0, "matchTested" : 0, "keysExamined" : 56348, "children" : [ ] } ] }, "ts" : ISODate("2015-02-25T19:14:18.433Z"), "client" : "10.165.78.55", "allUsers" : [ ], "user" : "" } Collection stats: { "ns" : "figure1.meshdescriptors", "count" : 27465, "size" : 355400384, "avgObjSize" : 12940, "storageSize" : 429027328, "numExtents" : 15, "nindexes" : 6, "lastExtentSize" : 114012160, "paddingFactor" : 1, "systemFlags" : 1, "userFlags" : 1, "totalIndexSize" : 37789472, "indexSizes" : { "_id_" : 899360, "ConceptList.Concept.TermList.Term.String_text" : 7031360, "ConceptList.Concept.TermList.Term.String_1" : 13032544, "ConceptList.Concept.TermList.Term.StringLower_1" : 12983488, "TreeNumberList.TreeNumber_1" : 3066000, "DescriptorUI_1" : 776720 }, "ok" : 1 }
richardpenner commented on Thu, 5 Mar 2015 14:42:24 +0000: Ramón, why does it need to scan the collection, why not just scan the index? We're just returning 16 documents, so I don't see why the collection is loaded into memory at all. This doesn't seem to be "working as designed." ramon.fernandez commented on Wed, 4 Mar 2015 00:21:58 +0000: richardpenner, the issue is not really the size of the index as much as the fact that your query can't benefit from the existence of such index. Looking at the following data from the profile output: "nscanned" : 56348, "nscannedObjects" : 27463, "numYield" : 3246, "nreturned" : 16, I see that to return 16 documents the server had to scan 56348 index keys and 27463 documents. The size of the index doesn't matter that much – the server needs to page in all 400MB+ of data from this collection, and every time it detects that data needs to be paged in it yields to avoid blocking for a long time waiting for I/O. This yielding gives other threads in the machine the opportunity to run, so the busier the machine the longer the query will take. Using the touch command to make sure the data is in memory before use helps, and there's probably room for improvement in the server when it comes to calculating index bounds for regular expressions and yielding – but you may want to reconsider your schema so your queries can take advantage of an index. Since we're not able to provide support in the SERVER project, if you need further assistance I'd recommend you post details about your schema, indexes, and data access needs in the mongodb-user group. Regards, Ramón. richardpenner commented on Mon, 2 Mar 2015 22:15:02 +0000: I'm a bit surprised this ticket is closing. Our indexes are 37MB in size, and take 50 seconds to query when cold. This on a very large server. Is this really the behaviour we should expect from mongo? ramon.fernandez commented on Mon, 2 Mar 2015 22:04:11 +0000: richardpenner, you can take a look at the code for more information about needFetch: // How many fetches weren't in memory? it's common.needFetch. It indeed seems that the data you were querying needed to be paged in, which explains the behavior you're observing. You may want to look at the touch command to make sure the data you need is in memory before it is used. I'm now going to resolve this ticket as I don't see a bug in mongod, and the SERVER project is for reporting bugs or feature suggestions for the MongoDB server. For MongoDB-related support discussion please post on the mongodb-user group or Stack Overflow with the mongodb tag, where your question will reach a larger audience. A question like this involving more discussion would be best posted on the mongodb-user group. Regards, Ramón. richardpenner commented on Mon, 2 Mar 2015 20:24:44 +0000: Thanks for the help. The trickiest part about this bug is that it suddenly occurs, and once it's occurred, the data is clearly loaded into memory, and subsequent queries will be very fast. The queries I mentioned above are the same queries. Here's the output from the command you requested (small note: I had to add a $ to the "natural" hint you typed): db.meshdescriptors.find({"TreeNumberList.TreeNumber": {$in: regExp}}).hint({$natural: 1}).explain(true) { "cursor" : "BasicCursor", "isMultiKey" : false, "n" : 16, "nscannedObjects" : 27465, "nscanned" : 27465, "nscannedObjectsAllPlans" : 27465, "nscannedAllPlans" : 27465, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 214, "nChunkSkips" : 0, "millis" : 296, "allPlans" : [ { "cursor" : "BasicCursor", "isMultiKey" : false, "n" : 16, "nscannedObjects" : 27465, "nscanned" : 27465, "scanAndOrder" : false, "indexOnly" : false, "nChunkSkips" : 0 } ], "server" : "ip-10-165-78-55:27017", "filterSet" : false, "stats" : { "type" : "COLLSCAN", "works" : 27467, "yields" : 214, "unyields" : 214, "invalidates" : 0, "advanced" : 16, "needTime" : 27450, "needFetch" : 0, "isEOF" : 1, "docsTested" : 27465, "children" : [ ] } } Can you explain how you knew the data was cold? I've been trying to figure out the meaning of the output, but many fields, like "needFetch", seem to be undocumented. I think what's most confusing is that this seems like a really small collection: only 37MB of indexes, and 377MB total storageSize. This collection is for a currently unreleased user-facing feature, so at the moment it's accessed very little, and may explain why the data is cold. But it's hard for me to ship the feature hoping the data will eventually stay in memory; especially if the penalty is a 50 second wait. Thanks Ramón. ramon.fernandez commented on Mon, 2 Mar 2015 19:44:37 +0000: richardpenner, the query you posted is doing a full index scan of a multikey index, which may be worse than doing a full collection scan. The profiler output also shows that the system was accessing cold data, so if the complete TreeNumberList.TreeNumber index had to be paged into memory that would explain why this query took over 50s to complete. Could you clarify whether the query returning and in a few milliseconds and the one taking over 50s are the exact same query? Or are they different queries? Also, can you try running var regExp = [new RegExp("^[A-Z]+$")]; models.MeSH.find({"TreeNumberList.TreeNumber": {$in: regExp}}).hint({natural: 1}).explain(true) and post the output here? Thanks, Ramón. richardpenner commented on Wed, 25 Feb 2015 20:04:36 +0000: This is on version 2.6.7
Click on a version to see all relevant bugs
MongoDB Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.