BugZero | MongoDB BugID 513116 - 3M context switches due to sched

MongoDB - Defect ID: 513116

3M context switches due to sched_yield

MongoDB - Defect ID: 513116

3M context switches due to sched_yield

Last updated on 4/7/2023

Overall: 6.36.3

Severity: 6.46.4

Community: 6.46.4

Lifecycle: 9.19.1

What is the BugZero Risk Score?

Vendor details

Priority: Major - P3
Status: Closed

Overall: 6.36.3

Severity: 6.46.4

Community: 6.46.4

Lifecycle: 9.19.1

What is the BugZero Risk Score?

Vendor details

Priority: Major - P3
Status: Closed

Info

First, excuse me for non informative description Issue happens not often, so I didn't collect more information. During some dumb testing following weird behaviour were observed: test configuration: aws i3.2xlarge (2TB NVMe), ubuntu 14.04, mongodb 3.6.3 table with 3 fields: {_id: , touch: , kv: >} with tts index on touch with expireAfterSeconds: 12hours . "Upsertion" with db.ctx.update({_id: }, {$set: {"kv.": , "touch": }}) , blobs are near 512bytes (msgpack coded structure), is near 40 bytes and highly random. Upsertion are in two streams: first is ingesting huge set of records, so it is at high speed (~12k per second), and second is replicating commands (~2k per second) At the same time, documents are randomly requested by _id (from same stream of replicated commands, with rate ~10k per second) weird behaviour: At some rare case mongo starts to consume a lot of System CPU (instead of User). strace showed a lot of call to '__sched_yield' system call. Number of context switches jumps to 3 millions per second instead of 80 thousands in usual. First time it happend when on-disk database size reaches 64GB. Currently, after couple of days, on-disk size is 234GB, and ingestion finished. Size of data starts to decrease (because of expiring), and this behavior occurs more often. (Picture with process stat are from time when ingestion stops. "perf top" screenshot from time when I first encounter behaviour).

Top User Comments

dmitry.agranat commented on Wed, 9 May 2018 07:05:15 +0000: Hi funny_falcon, I have analyzed the tests performed on April 20th. After upgrading to 3.6.4, I do not see any significant difference compared to the test performed on 3.6.3. Same drops in operation throughput and latency, same 70% kernel CPU, high ~2.6M context switches. All this is correlated with checkpoint scrub dirty target and checkpoint blocking page eviction: Even though there are still periodic drops in operation throughput and latency, after disabling the eviction_checkpoint_target=0, none of the above indicators are present. High kernel CPU was replaced with high iowait and only up to 160K context switches. All most visible changes are related to io, where the bottleneck was moved. There is also some indication that there is a data/journal contention which are both colocated on the same volume: For improved performance, please review our production notes, specifically, this section. Please note that the SERVER project is for reporting bugs or feature suggestions for the MongoDB server. For MongoDB-related support discussion please post on the mongodb-user group or Stack Overflow with the mongodb tag. A question like this involving more discussion would be best posted on the mongodb-users group. Thanks, Dima JIRAUSER1254842 commented on Fri, 20 Apr 2018 20:26:23 +0000: Today we encountered issue in production, when we try to shard big collection to two shards. Production is running using Mongo 3.6.1. Since we need to shard even larger collection, I have to return to investigation. So I'm repeating experiment with Mongo 3.6.4 (to discover: will upgrade help us, or not). So far, it doesn't look upgrade will help. I've started with clear mongo on i3.xlarge amazon instance, and push data with average rate 9ktps, and select with average rate 6ktps near 18:16. Without adding `eviction_checkpoint_target=0` it looks like 3.6.4 starts to spike very early: at 18:32:19 was first spike of context switches. At this moment MongoDB consumed only 7.2GB of memory, while it consumes 13.1GB in peak. Spikes: 18:32:19, 18:35:50, 18:39:22, 18:40:36, 18:41:47, 18:44:07, 18:45:18, 18:45:45, 18:49:58, 18:51:11, 18:51:11, 18:52:28, 18:53:47, 18:55:04, 18:56:20 During spike some requests are stalled for more than 1 second. Screenshot during 18:41:47 : Then I stopped mongo (at 18:57), and added `eviction_checkpoint_target=0`, and then started again (19:09). At first it performs adequately: there is no so much context switches during stalls, and no system time used. But when mongodb fills more memory, it became clear that it still stalls, but now it stall on iowait for disk operations. Remarkable stalls: 20:05:04, 20:06:50, 20:08:36, 20:10:24 . And earlier, not so big: 19:40:40, 19:42:15, 19:43:50 During this stalls some requests freezes for about 1.5-2.0 seconds. Screenshot during 20:10:24 : This stalls and spikes are not that huge compared to I saw in previous experiment, but this experiment just started, and I'm not optimistic on its future. I don't see how I can reopen this Issue. Could you do it, please? michael.cahill commented on Mon, 16 Apr 2018 20:59:33 +0000: We haven't heard back from you about the recommended configuration change, so I am going to resolve this ticket. If you have any further issues with excessive yielding, please reopen and follow up. michael.cahill commented on Thu, 22 Mar 2018 10:30:17 +0000: funny_falcon, after examining all of the data that you have attached to this ticket, it looks like the default configuration of checkpoints is causing these inefficient cases. You can disable the code causing this with the following in your config file: storage: wiredTiger: engineConfig: configString: eviction_checkpoint_target=0 Based on the data you have attached, I would not expect this change to cause problems, but you should verify locally before making this change in production. I am not completely sure about the cause of the segfault based on the information available. I think it could be caused by the bug fixed in WT-3952, which should be in the next release of MongoDB. JIRAUSER1254842 commented on Tue, 20 Mar 2018 09:09:07 +0000: I've attached last diagnostic file. JIRAUSER1254842 commented on Tue, 20 Mar 2018 09:05:32 +0000: today mongod segfaulted. Unfortunately, I didn't tune corepath, so ubuntu's apport (still) tries to write crash to root partition, and I believe there is no enough space for. last lines in a log: 2018-03-20T04:55:35.133+0000 I NETWORK [listener] connection accepted from 127.0.0.1:35526 #764 (283 connections now open) 2018-03-20T08:43:39.023+0000 F - [thread766] Invalid access at address: 0xe8 2018-03-20T08:43:39.054+0000 F - [thread766] Got signal: 11 (Segmentation fault). 0x56322cc15071 0x56322cc14289 0x56322cc148f6 0x7fca0f9c8390 0x56322b4fded1 0x56322b500e6b 0x56322b480674 0x56322b4815e4 0x56322b479941 0x56322b479e73 0x56322b480277 0x56322b4cf1a9 0x7fca0f9be6ba 0x7fca0f6f441d ----- BEGIN BACKTRACE ----- mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x56322cc15071] mongod(+0x2211289) [0x56322cc14289] mongod(+0x22118F6) [0x56322cc148f6] libpthread.so.0(+0x11390) [0x7fca0f9c8390] mongod(+0xAFAED1) [0x56322b4fded1] mongod(__wt_split_reverse+0x11B) [0x56322b500e6b] mongod(+0xA7D674) [0x56322b480674] mongod(__wt_evict+0xF04) [0x56322b4815e4] mongod(+0xA76941) [0x56322b479941] mongod(+0xA76E73) [0x56322b479e73] mongod(__wt_evict_thread_run+0x57) [0x56322b480277] mongod(+0xACC1A9) [0x56322b4cf1a9] libpthread.so.0(+0x76BA) [0x7fca0f9be6ba] libc.so.6(clone+0x6D) [0x7fca0f6f441d] JIRAUSER1254842 commented on Mon, 19 Mar 2018 17:56:26 +0000: I've attached diagnostic_data and stack traces. Stacks were (tried to be) taken at 10 seconds interval. All stacks at 17.36 and 17.37 are all when issue processed. Last 3 stack traces (at 17.38) were collected when System CPU decreases, but whole cpu usage were still 100%. Some of earlier traces are at issue processing, but one is not, and don't remember which (most probably at 17.34.42). There is only single mongod process. btw, when issue happening, request latency can be as large as 2 seconds, even for "lookup by _id" . Every "issue" occurrence leads to latency increasing, and some of them are that bad (at least 10 times during today). ramon.fernandez commented on Mon, 19 Mar 2018 12:13:23 +0000: If this has happened in the last 7 days or so, can you please upload the contents of dbpath/diagnostic.data to this ticket? Also, can you please clarify how many mongod processes are running on this computer? I'm attaching below a small shell script to capture stack traces that you can run if this happens again. This will also provide additional information: #!/bin/bash DELAY=15 for i in {1..5} do gdb -p $(pidof mongod) -batch -ex 'thread apply all bt' > /tmp/stacks-$(date '+%Y%m%dT%H.%M.%S').txt sleep $DELAY done JIRAUSER1254842 commented on Sun, 18 Mar 2018 15:37:11 +0000: Forgot to mention, that it looks to correlate with io latency on stat screenshot. And, probably, with ttl thread doing its work.

Steps to Reproduce

5.9Defect ID: 2956672
Some time-series tests implicitly rely on measurement insertion order for unordered inserts when checking bucket catalog stats
6.14Defect ID: 2965528
Remove push, publish_packages, and crypt_push tasks from Graviton 4 variants in v7.0 and v8.0
6.14Defect ID: 2947969
[SBE] Release storage engine resources when saveState() or restoreState() throws
5.68Defect ID: 2919474
StackLocator broken by v5 toolchain ASAN
5.88Defect ID: 2968769
Make new write path helper functions use acquireAndValidateBucketsCollection instead of acquireCollection

Ready to prevent the next vendor outage?

Get a demo

OPERATIONAL DEFECT DATABASE

MongoDB - Defect ID: 513116

3M context switches due to sched_yield

MongoDB - Defect ID: 513116

3M context switches due to sched_yield

Last updated on 4/7/2023

Vendor details

Vendor details

Description

Info

Top User Comments

Steps to Reproduce

Links

Top MongoDB defects by risk score

Ready to prevent the next vendor outage?