...
A two shards MongoDB database regularly crashes with out-of-memory error or is being killed by the oom-killer. The system runs on GCE Debian 9.4 with MongoDB v3.6.5, WiredTiger storage engine. The servers are n1-highmem-4 (4 vCPUs, 26 GB memory). On the server runs just mongod and there are no other services. mongos are on different servers. Initially the servers were without swap as it is the practice on Google Cloud, but afterwards a small 1GB swap has been added, but the problem is the same. We have tried to play with the cacheSizeGB parameter and reduced it to 10GB but still the crash happens. Usually process exit/crash happens al least once a day. Currently on the database there is a chunk moving process underway and crashes happens only on the shard instances that receive chunks. Sending chunks instances never crashed. If mongod process is killed by oom-killer this can be seen in the syslogs: Jun 15 14:45:17 server4 kernel: [1731430.432189] Out of memory: Kill process 13130 (mongod) score 980 or sacrifice child Jun 15 14:45:17 server4 kernel: [1731430.441717] Killed process 13130 (mongod) total-vm:28280536kB, anon-rss:26174876kB, file-rss:0kB, shmem-rss:0kB {{}}Sometimes mongod exits with leaving this in the mongod.log: 2018-06-15T02:14:32.456+0200 F - [rsSync] out of memory.0x55cbc8535751 0x55cbc8534d84 0x55cbc8623b4b 0x55cbc86c665c 0x55cbc70fccff 0x55cbc70f8b02 0x55cbc707b3f1 0x55cbc86449b0 0x7fbbf3507494 0x7fbbf3249acf ----- BEGIN BACKTRACE ----- {"backtrace":[{"b":"55CBC6305000","o":"2230751","s":"_ZN5mongo15printStackTraceERSo"},{"b":"55CBC6305000","o":"222FD84","s":"_ZN5mongo29reportOutOfMemoryErrorAndExitEv"},{"b":"55CBC6305000","o":"231EB4B"},{"b":"55CBC6305000","o":"23C165C","s":"_Znam"},{"b":"55CBC6305000","o":"DF7CFF","s":"_ZN5mongo4repl8SyncTail7OpQueueC1Ev"},{"b":"55CBC6305000","o":"DF3B02","s":"_ZN5mongo4repl8SyncTail16oplogApplicationEPNS0_22ReplicationCoordinatorE"},{"b":"55CBC6305000","o":"D763F1","s":"_ZN5mongo4repl10RSDataSync4_runEv"},{"b":"55CBC6305000","o":"233F9B0"},{"b":"7FBBF3500000","o":"7494"},{"b":"7FBBF3161000","o":"E8ACF","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.6.5", "gitVersion" : "a20ecd3e3a174162052ff99913bc2ca9a839d618", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "4.9.0-6-amd64", "version" : "#1 SMP Debian 4.9.88-1+deb9u1 (2018-05-07)", "machine" : "x86_64" }, "somap" : [ { "b" : "55CBC6305000", "elfType" : 3, "buildId" : "7D4592BDFAA6C15459D2319DEAB7F10E9EB4E7D7" }, { "b" : "7FFC48D98000", "path" : "linux-vdso.so.1", "elfType" : 3, "buildId" : "A3207CC9FE1CAA3374AE7061AA5C3C5619B8A0E5" }, { "b" : "7FBBF4743000", "path" : "/lib/x86_64-linux-gnu/libresolv.so.2", "elfType" : 3, "buildId" : "713D47D5F599289C0A91ADE8F0122B2B4AA78B2E" }, { "b" : "7FBBF42B0000", "path" : "/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1", "elfType" : 3, "buildId" : "2CFE882A331D7857E9CE1B5DE3255E6DA76EF899" }, { "b" : "7FBBF4044000", "path" : "/usr/lib/x86_64-linux-gnu/libssl.so.1.1", "elfType" : 3, "buildId" : "E2AA3B39763D943F56B3BD05C8E36E639BA95E12" }, { "b" : "7FBBF3E40000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "B895F0831F623C5F23603401D4069F9F94C24761" }, { "b" : "7FBBF3C38000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "5D83E0642E645026DBB11F89F7DF7106BD821495" }, { "b" : "7FBBF3934000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "1B95E3A8B8788B07E4F59EE69B1877F9DEB42033" }, { "b" : "7FBBF371D000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "51AD5FD294CD6C813BED40717347A53434B80B7A" }, { "b" : "7FBBF3500000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "4285CD3158DDE596765C747AE210AB6CBD258B22" }, { "b" : "7FBBF3161000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "AA889E26A70F98FA8D230D088F7CC5BF43573163" }, { "b" : "7FBBF495A000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "263F909DBE11A66F7C6233E3FF0521148D9F8370" } ] }} mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x55cbc8535751] mongod(_ZN5mongo29reportOutOfMemoryErrorAndExitEv+0x84) [0x55cbc8534d84] mongod(+0x231EB4B) [0x55cbc8623b4b] mongod(_Znam+0x21C) [0x55cbc86c665c] mongod(_ZN5mongo4repl8SyncTail7OpQueueC1Ev+0x7F) [0x55cbc70fccff] mongod(_ZN5mongo4repl8SyncTail16oplogApplicationEPNS0_22ReplicationCoordinatorE+0x402) [0x55cbc70f8b02] mongod(_ZN5mongo4repl10RSDataSync4_runEv+0x111) [0x55cbc707b3f1] mongod(+0x233F9B0) [0x55cbc86449b0] libpthread.so.0(+0x7494) [0x7fbbf3507494] libc.so.6(clone+0x3F) [0x7fbbf3249acf] ----- END BACKTRACE ----- I would suspect memory management is not working properly while receiving chunks.
bruce.lucas@10gen.com commented on Thu, 21 Jun 2018 18:01:02 +0000: Hi Sasa, We have prepared a 3.6.6 (not 3.6.7 as I mistakenly typed above) pre-release tarball containing the fix for SERVER-35444 that you can download from here; click on the "binaries" link. Please install and run this with heap profiling enabled at your convenience. Thanks, Bruce sasa.skevin@gmail.com commented on Thu, 21 Jun 2018 17:13:14 +0000: Thanks for the update. Yes, I would try 3.6.7 pre-release version. But can't do that before Sunday. If a crash happens in the mean time, will try to send the data. bruce.lucas@10gen.com commented on Thu, 21 Jun 2018 15:33:24 +0000: Thanks for pointing that out Sasa, and apologies for missing it. Unfortunately the profiling data collection wasn't successful due to SERVER-35444: if there are a lot of unique and/or large stack traces that need to be recorded by the profiling, it can overflow the 16 MB document size limit for the serverStatus document. The fix is not to record the stack traces per se in serverStatus, but record only the amount of memory used for each stack, as this is not needed because the stacks themselves are recorded in the log file. We've fixed this for 4.0, but have not backported it to 3.6. We are working to do that and prepare a 3.6.7 pre-release for you to install for collecting heap profile data one one node, if you are willing to do that. Meanwhile you might continue to run with the heap profiling enabled in your current version of 3.6. In this case the serverStatus document is only 16977838, just a little larger than 16 MB, so it is possible that due to random variation in the heap profiler samples, a future incident may not hit the 16 MB limit. Let me know if and when you another incident you would like me to look at. Thanks for your help, Bruce sasa.skevin@gmail.com commented on Wed, 20 Jun 2018 20:12:57 +0000: The node is a secondary for the last couple of hours after it crashed. Before that all night long it was a primary and receiving chunks. Could you please look at this period? bruce.lucas@10gen.com commented on Wed, 20 Jun 2018 17:03:35 +0000: Thanks Sasa. It looks like this node is not experiencing memory growth as it is currently a secondary. In the previous data in each case the node transitioned from secondary to primary about 6 hours after it was started, and at that point the memory growth began because the node starting processing chunk moves. This is seen in the graph above where the member state chart goes from yellow indicating secondary to green indicating primary. I didn't look into why the node transitioned from secondary to primary - I assumed either there was some action you took to cause it to happen, or perhaps the current primary that was processing chunk moves hit OOM causing a failover. In any case please allow this node to continue collecting heap profiling data until it becomes primary, or if necessary force a failover by stepping down the current primary. sasa.skevin@gmail.com commented on Wed, 20 Jun 2018 16:34:13 +0000: New diagnostic data with profiling option turned on is uploaded to secure portal. bruce.lucas@10gen.com commented on Tue, 19 Jun 2018 19:11:13 +0000: Some numbers: in one instance there were 430 chunk moves before crashing, and this accumulated about 15 GB of excess memory, or about 35 MB per chunk move. The rate at which the excess memory increased ("allocated minus wt cache") seems to roughly correspond to the rate at which chunk moves are occurring (_recvChunkStart), suggesting this is memory accumulated per chunk move. There are several things happening at that rate that are possible suspects; the heap profiler information should tell us. bruce.lucas@10gen.com commented on Tue, 19 Jun 2018 19:02:44 +0000: Hi Sasa, Thanks for the data. It shows the node as you say repeatedly crashing after allocating 15 GB of memory aside from the WiredTiger cache. This allocated memory does indeed seem to be related to the chunk moves as the rate at which it increases appears to be roughly proportional to the number of chunk moves executed. Unfortunately the normal data collection is not able to identify a specific cause. Would you be able to enable the built-in heap profiler on the primary of a node that is receiving chunks? To do this please restart mongod setting the heapProfilingEnabled parameter; for example, on the command line: mongod --setParameter heapProfilingEnabled=true ... Then, when the node encounters OOM, please upload the contents of diagnostic.data the complete mongod log file covering the time from the restart to the OOM. Thanks, Bruce sasa.skevin@gmail.com commented on Tue, 19 Jun 2018 18:33:13 +0000: @Ramon Fernandez diagnostic data and last two logs uploaded. ramon.fernandez commented on Tue, 19 Jun 2018 12:02:58 +0000: ssasa, will you please upload the contents of the diagnostic.data directory for the affected node as well as the full logs? You can use this secure portal. Thanks, Ramón.
Initiate chunk moving Wait for several hours and mongod will either exit with out-of-memory error or be killed by oom-killer