...
tests run on Ubuntu, VMWare, 6 cores. heavy mixed workload (read, update, insert) with 100 clients using a recent pre-rc4 build from master (6105f06402fe1e7578d41f3e4e583a1476ef2455); subsequently reproduced under rc4. Run with replica set of 1: Throughout the run mongod write queues (second line) sporadically build, indicating a write bottleneck in mongod. This may be the same issue as SERVER-16651. At a couple points during the run (A, B, C), mongod log reports isolated (single) write operations (updates or inserts) taking extreme times, up to 111891 ms in this test (you can read this off of the "max logged query" graph: vertical axis is longest operation, in ms, logged in each 0.1 second interval). Unknown if this is the same bottleneck or a different issue, so opening this issue to investigate. The pattern seen in the graph above of 3 or so such isolated extreme latencies of increasing size throughout the run is repeatable with this workload. The length of each one is roughly double the preceding one. For comparison, here's the same workload standalone: Write queues do not build. We see pauses in WT (e.g. A, B; subject of SERVER-16662). Those pauses are less evident in the replica set case above, probably because the mongod bottleneck in that case throttles the workload going into WT. Associated with pauses are numerous long latency operations around 11 seconds, but no extreme long-latency operations as seen in the preceding test. Overall throughput is much higher in the standalone case, possibly because of the bottleneck in the replica set case. Here are the specific queries, two updates and one insert, that showed extreme latencies: 2014-12-24T12:17:14.182-0500 I QUERY [conn95] command testsrv.$cmd command: update { update: "data", writeConcern: {}, ordered: true, updates: [ { q: { _id: { o: ObjectId('549af4f61d41c883ae5622d2'), t: "This is a string, not a small string either, it's in here to make the index larger", seq: -1729382253017956352 } }, u: { $set: { f0: -1097749151, f1: -1097691211, f2: "bkfiv Wfi vWfiv Wf ivW fivW fivW fiv Wfiv Wf i ", f3: "dsz szs zsz szs zs zszszs z szsz szszs z szs zs", f4: "xa jmH szsz sz sz sz sz szszs zszsz sz s zszszs z", f5: "szsz szs zsz szs zs zszszs z szsz szszs z szs zs", f6: "ivWfi vWfi vWf ivW fiv Wf i vW f ivW f ivWfiv W", f7: "yvWfiv Wfiv Wf iv Wf iv WfivW fivWf iv W fivWfi v", f8: "mH s zszs zsz szs zsz sz s zs z szs z szszsz s", f9: "rqjmH szs zszsz sz szs zszs zszs zsz szsz sz s ", fa: 47242, fb: 91501, fc: "ajmH s zszs zs zszsz sz szsz szsz szszs zszsz sz", fd: "mHszsz szsz sz sz sz sz szszs zszsz sz s zszszs z", fe: "uHsz szs zsz szs zs zszszs z szsz szszs z szs zs", ff: "zszsz szs zszsz sz szs zszs zszs zsz szsz sz s ", f10: "kfivW f i v Wf ivWfi vW fi v Wfiv WfivW f ivW fi", f11: "tkfiv Wfi v W f i vWfiv Wf iv WfivW fivWf ivWfi vW", f12: "zszs zszs zsz szs zsz sz s zs z szs z szszsz s", f13: "pKNkf ivW fivWf iv Wfi vWfi vWfi vWf ivWf iv W ", f14: 51733, f15: 90312, f16: "qjmHs zs zszs zsz s zsz szs zsz szs zszsz sz szsz ", f17: "yv W fivW fiv Wfi vWf iv W fi v Wfi v WfivWf i", f18: "bkf i vWfiv Wf ivWf ivW fiv W f iv Wfi v WfivW fi", f19: "ajmHs zs zszs zsz s zsz szs zsz szs zszsz sz szsz ", f1a: "bkfivW fivW fiv Wfi vWf iv W fi v Wfi v WfivWf i", f1b: "cnajmH szs zszsz sz szs zszs zszs zsz szsz sz s ", f1c: "cn a jmHs zsz szs zsz sz s zs z szs z szszsz s", f1d: "gt k fivW fiv Wfi vWf iv W fi v Wfi v WfivWf i" } }, multi: false, upsert: false } ] } ntoreturn:1 keyUpdates:0 reslen:95 24494ms 2014-12-24T12:18:06.958-0500 I QUERY [conn100] command testsrv.$cmd command: insert { insert: "data", writeConcern: {}, ordered: true, documents: [ { _id: { o: ObjectId('549af4f61d41c883a908b7c2'), t: "This is a string, not a small string either, it's in here to make the index larger", seq: 3819052484899373056 }, f0: 1826216753, f1: 1826277117, f2: "bkfivW fivW fiv Wfi vWf iv W fi v Wfi v WfivWf i", f3: "lKNkf iv W f ivW f ivWf ivWf ivWf ivWfiv WfivW fi", f4: "wNkfi vWfi vWf ivW fiv Wf i vW f ivW f ivWfiv W", f5: "qj m Hszs zsz szs zsz sz s zs z szs z szszsz s", f6: "xajmH szs zszsz sz szs zszs zszs zsz szsz sz s ", f7: "oHszs zs zszs zsz s zsz szs zsz szs zszsz sz szsz ", f8: "zsz szs zsz szs zs zszszs z szsz szszs z szs zs", f9: "sz s zszs zsz szs zsz sz s zs z szs z szszsz s", fa: 54913, fb: 94391, fc: "szsz szs zsz szs zs zszszs z szsz szszs z szs zs", fd: "ivWfi vW fivW fiv W fiv Wfi vWf ivW fivWf iv Wfiv ", fe: "jmHszs zszs zsz szs zsz sz s zs z szs z szszsz s", ff: "fivWf iv W f ivW f ivWf ivWf ivWf ivWfiv WfivW fi", f10: "rq jmH szsz sz sz sz sz szszs zszsz sz s zszszs z", f11: "wNkfi vW fivW fiv W fiv Wfi vWf ivW fivWf iv Wfiv ", f12: "lKN k fivWf iv Wfiv Wfi vWf i v Wf ivW f ivWfi vW", f13: "vWf i vWfiv Wf ivWf ivW fiv W f iv Wfi v WfivW fi", f14: 50071, f15: 107864, f16: "mHsz szs zszsz sz szs zszs zszs zsz szsz sz s ", f17: "ezszsz szsz sz sz sz sz szszs zszsz sz s zszszs z", f18: "yvWfiv Wfiv Wf iv Wf iv WfivW fivWf iv W fivWfi v", f19: "tkfiv Wf i v Wfi v Wfiv Wfiv Wfiv WfivWf ivWfi vW", f1a: "ivWfi vWfi vWf ivW fiv Wf i vW f ivW f ivWfiv W", f1b: "fivWf ivW fivWf iv Wfi vWfi vWfi vWf ivWf iv W ", f1c: "mHsz szs zszsz sz szs zszs zszs zsz szsz sz s ", f1d: "oHsz szs zszsz sz szs zszs zszs zsz szsz sz s " } ] } ntoreturn:1 keyUpdates:0 reslen:80 52917ms 2014-12-24T12:19:58.656-0500 I QUERY [conn39] command testsrv.$cmd command: update { update: "data", writeConcern: {}, ordered: true, updates: [ { q: { _id: { o: ObjectId('549af4f61d41c8837743de22'), t: "This is a string, not a small string either, it's in here to make the index larger", seq: 3602879702735257600 } }, u: { $set: { f0: -1097747785, f1: -1097704122, f2: "rqj m Hszsz sz szsz szs zsz s z sz szs z szszs zs", f3: "mHszs zszsz sz szsz szs zsz s z sz szs z szszs zs", f4: "bkfivW fivW fiv Wfi vWf iv W fi v Wfi v WfivWf i", f5: "vWfiv Wfi vWfiv Wf ivW fivW fivW fiv Wfiv Wf i ", f6: "fivW fivW fiv Wfi vWf iv W fi v Wfi v WfivWf i", f7: "xajmH sz s z szs z szsz szsz szsz szszsz szszs zs", f8: "jmH szs zsz szs zs zszszs z szsz szszs z szs zs", f9: "ajmH s zszs zs zszsz sz szsz szsz szszs zszsz sz", fa: 44944, fb: 82734, fc: "qjmHs zs zszs zsz s zsz szs zsz szs zszsz sz szsz ", fd: "uH s zszs zsz szs zsz sz s zs z szs z szszsz s", fe: "mHszsz szsz sz sz sz sz szszs zszsz sz s zszszs z", ff: "sz s zszs zsz szs zsz sz s zs z szs z szszsz s", f10: "dszszs zszs zs zszsz sz szsz szsz szszs zszsz sz", f11: "tkfiv Wfi vWfiv Wf ivW fivW fivW fiv Wfiv Wf i ", f12: "ezszs zszsz sz szsz szs zsz s z sz szs z szszs zs", f13: "ivWf ivW fiv Wfi vW fivWfi v Wfiv WfivW f ivW fi", f14: 40990, f15: 88315, f16: "oHsz szs zszsz sz szs zszs zszs zsz szsz sz s ", f17: "ivWfi vW fivW fiv W fiv Wfi vWf ivW fivWf iv Wfiv ", f18: "xa jmH szsz sz sz sz sz szszs zszsz sz s zszszs z", f19: "iv W fivW fiv Wfi vWf iv W fi v Wfi v WfivWf i", f1a: "fivW fivW fiv Wfi vWf iv W fi v Wfi v WfivWf i", f1b: "bk fiv Wfiv Wf iv Wf iv WfivW fivWf iv W fivWfi v", f1c: "kfivWf ivWf iv Wf iv Wf ivWfi vWfiv Wf i vWfivW f", f1d: "bkfi vWfi vWf ivW fiv Wf i vW f ivW f ivWfiv W" } }, multi: false, upsert: false } ] } ntoreturn:1 keyUpdates:0 reslen:95 111891ms
michael.cahill commented on Tue, 6 Jan 2015 03:56:26 +0000: The issues about scaling with many more threads than cores are captured in SERVER-16662 xgen-internal-githook commented on Fri, 2 Jan 2015 23:32:54 +0000: Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@wiredtiger.com'} Message: SERVER-16667 Fix WT truncate of all records in a capped collection. Branch: master https://github.com/mongodb/mongo/commit/53d7bce974d18fc4a92262de5ac6e604b79d8a04 xgen-internal-githook commented on Fri, 2 Jan 2015 05:24:20 +0000: Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@wiredtiger.com'} Message: SERVER-16667 Only truncate if we find some documents. Branch: master https://github.com/mongodb/mongo/commit/2f6817253ebfef52f06c55f5a58a799997e014ad xgen-internal-githook commented on Fri, 2 Jan 2015 04:33:26 +0000: Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@wiredtiger.com'} Message: SERVER-16667 Faster capped collections in WT. Use a single WiredTiger truncate operation rather than multiple delete operations to delete from a capped collection. Branch: master https://github.com/mongodb/mongo/commit/1b522cf5745c40754f9d5aa4f4e35def4e8fee74 xgen-internal-githook commented on Mon, 29 Dec 2014 05:31:49 +0000: Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@wiredtiger.com'} Message: SERVER-16667 Have threads block if they insert enough additional records while a capped collection is being truncated. Branch: capped-truncate https://github.com/mongodb/mongo/commit/e15538f67138812d647d047efea4c3fb5e5023de xgen-internal-githook commented on Mon, 29 Dec 2014 05:31:48 +0000: Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@wiredtiger.com'} Message: SERVER-16667 Use a single truncate operation to manage capped collections rather than multiple deletes. Bound the work done managing capped collections for any individual update to avoid long latencies if deletes can't keep up with inserts. Branch: capped-truncate https://github.com/mongodb/mongo/commit/79a784b2809ad1ed7ce5be361f66f86d8e7049ea bruce.lucas@10gen.com commented on Fri, 26 Dec 2014 15:06:07 +0000: Looks like the isolated ops with extreme latencies are stuck inside WT, not the mongod layer. Here's a long op that ran for 78 seconds on conn76, from about 09:34:43 to 09:36:01: 2014-12-26T09:36:01.448-0500 I QUERY [conn76] command testsrv.$cmd command: update { update: "data", writeConcern: {}, ordered: true, updates: [ { q: { _id: { o: ObjectId('549d718a1d41c848573ac682'), t: "This is a string, not a small string either, it's in here to make the index larger", seq: -1297036684394758144 } }, u: { $set: { f0: -295740523, f1: -295695630, f2: "dszsz szs zsz szs zs zszszs z szsz szszs z szs zs", f3: "jmHsz sz s z szs z szsz szsz szsz szszsz szszs zs", f4: "hgt kfi vWf ivW fi vWfivW f ivWf ivWfi v Wfi vW", f5: "uHsz s zszs zs zszsz sz szsz szsz szszs zszsz sz", f6: "rqjmH szs zsz szs zs zszszs z szsz szszs z szs zs", f7: "qj m Hszs zsz szs zsz sz s zs z szs z szszsz s", f8: "kfivW f i v Wf ivWfi vW fi v Wfiv WfivW f ivW fi", f9: "wNkfi v W f iv WfivW fi vW f ivWf ivWfi v Wfi vW", fa: 38995, fb: 103702, fc: "ds zsz szsz sz sz sz sz szszs zszsz sz s zszszs z", fd: "uHszs zszs zsz szs zsz sz s zs z szs z szszsz s", fe: "jm Hsz szsz sz sz sz sz szszs zszsz sz s zszszs z", ff: "kfivWf ivWf iv Wf iv Wf ivWfi vWfiv Wf i vWfivW f", f10: "gtkfiv Wfiv Wf iv Wf iv WfivW fivWf iv W fivWfi v", f11: "zsz szs zsz szs zs zszszs z szsz szszs z szs zs", f12: "ezszs zszs zsz szs zsz sz s zs z szs z szszsz s", f13: "uHsz s zszs zs zszsz sz szsz szsz szszs zszsz sz", f14: 52194, f15: 104076, f16: "fivWf ivW fivWf iv Wfi vWfi vWfi vWf ivWf iv W ", f17: "mHsz szs zsz szs zs zszszs z szsz szszs z szs zs", f18: "naj mHs zsz szs zs zszszs z szsz szszs z szs zs", f19: "gtkfiv Wfiv Wf iv Wf iv WfivW fivWf iv W fivWfi v", f1a: "mH s zszs zsz szs zsz sz s zs z szs z szszsz s", f1b: "hgtk fivW fiv Wfi vWf iv W fi v Wfi v WfivWf i", f1c: "oHsz szs zsz szs zs zszszs z szsz szszs z szs zs", f1d: "najmH szs z s z s zszsz sz sz szszs zszsz szszs zs" } }, multi: false, upsert: false } ] } ntoreturn:1 keyUpdates:0 reslen:95 78144ms I had enchanced the "end connection" message to include the tid at the end of the message, which is 18578 in this case: 2014-12-26T09:36:01.449-0500 I NETWORK [conn76] end connection 127.0.0.1:33528 (1 connection now open) 18578 The above run was done with gdb sampling every 5 seconds throughout the run; focusing on the call tree for those stack samples for just tid (LWP) 18578, we see that it's spending its time during that long running op (marked with blue lines below) doing sched_yield in one of three places inside WT, all within __wt_log_write: Also attaching the raw stack trace samples from gdb as gdbmon.log from which the above call tree was generated, and we can double-check there that LWP 18578 is seen in __wt_log_write calling sched_yield in every sample from 09:34:46.342897 onwards, corresponding to the time that the op was stuck.