...
We've created on our cluster an index in the foreground which blocked not only the database on which the index was being built but it blocked also all other databases during the whole index build. Here are the related logs from the primary during index build: 2020-01-30T17:09:25.975+0100 I INDEX [conn26758520] build index on: price_alert.priceAlert properties: { v: 2, key: { priceAlertId: 1 }, name: "priceAlertId", ns: "price_alert.priceAlert" } 2020-01-30T17:09:25.975+0100 I INDEX [conn26758520] building index using bulk method; build may temporarily use up to 500 megabytes of RAM 2020-01-30T17:09:28.004+0100 I - [conn26758520] Index Build: 74000/22140565 0% ... 2020-01-30T17:16:06.002+0100 I - [conn26758520] Index Build: 21622800/22140565 97% 2020-01-30T17:16:18.046+0100 I INDEX [conn26758520] build index done. scanned 22140565 total records. 412 secs 2020-01-30T17:16:18.048+0100 I COMMAND [conn26758520] command price_alert.$cmd command: createIndexes { createIndexes: "priceAlert", indexes: [ { key: { priceAlertId: 1 }, name: "priceAlertId", ns: "price_alert.priceAlert" } ], shardVersion: [ Timestamp(0, 0), ObjectId('000000000000000000000000') ], lsid: { id: UUID("1e225ae9-0a74-4a71-99ca-dcca84b62fb5"), uid: BinData(0, CA28DAE5564B626F92E2D21DD43742531AE0FED5604FFFEB8AB31BB5F101B7EC) }, $clusterTime: { clusterTime: Timestamp(1580400565, 19066), signature: { hash: BinData(0, 738B9B6A8BEB897ADDAB8B5D9E04DF77626A574E), keyId: 6729445335938105350 } }, $client: { driver: { name: "mongo-java-driver", version: "3.8.2" }, os: { type: "Linux", name: "Linux", architecture: "amd64", version: "3.16.0-10-amd64" }, platform: "Java/Oracle Corporation/1.8.0_171-8u171-b11-1~bpo8+1-b11", mongos: { host: "mongo-hotel-01:27017", client: "10.142.145.194:44050", version: "3.6.13" } }, $configServerState: { opTime: { ts: Timestamp(1580400565, 18247), t: 14 } }, $db: "price_alert" } numYields:0 reslen:364 locks:{ Global: { acquireCount: { r: 3, w: 3 } }, Database: { acquireCount: { w: 3, W: 1 } }, Collection: { acquireCount: { w: 2 } }, oplog: { acquireCount: { w: 1 } } } protocol:op_msg 412115ms You see that the index was built from 17:09:25 to 17:16:18. Our log file analysis by mtools shows (see attached) that everything was blocked during this time period. Many requests got queued and were executed only after the index build had finished. It might be a coincidence but one of the 6 routers even crashed during the index build: 2020-01-30T17:14:21.324+0100 I - [listener] pthread_create failed: Resource temporarily unavailable 2020-01-30T17:14:21.324+0100 W EXECUTOR [conn332411] Terminating session due to error: InternalError: failed to create service entry worker thread 2020-01-30T17:14:21.329+0100 F - [LogicalSessionCacheRefresh] std::exception::what(): Resource temporarily unavailable Actual exception type: std::system_error 0x5581643011b1 0x558164300b95 0x5581643f5276 0x5581643f52c1 0x558163ac8c6f 0x558163ac94c8 0x558163abd042 0x558163ab1f8a 0x558163ab6d22 0x558163ab7562 0x5581637a8994 0x5581637a8ca3 0x558163bbed54 0x558163bc0238 0x55816378e02c 0x5581644104d0 0x7fe04e1c14a4 0x7fe04df03d0f ----- BEGIN BACKTRACE ----- {"backtrace":[{"b":"55816320E000","o":"10F31B1","s":"_ZN5mongo15printStackTraceERSo"},{"b":"55816320E000","o":"10F2B95"},{"b":"55816320E000","o":"11E7276","s":"_ZN10__cxxabiv111__terminateEPFvvE"},{"b":"55816320E000","o":"11E72C1"},{"b":"55816320E000","o":"8BAC6F","s":"_ZN5mongo10ThreadPool25_startWorkerThread_inlockEv"},{"b":"55816320E000","o":"8BB4C8","s":"_ZN5mongo10ThreadPool8scheduleESt8functionIFvvEE"},{"b":"55816320E000","o":"8AF042","s":"_ZN5mongo30ConfigServerCatalogCacheLoader14getChunksSinceERKNS_15NamespaceStringENS_12ChunkVersionESt8functionIFvPNS_16OperationContextENS_10StatusWithINS_18CatalogCacheLoader26CollectionAndChangedChunksEEEEE"},{"b":"55816320E000","o":"8A3F8A","s":"_ZN5mongo12CatalogCache26_scheduleCollectionRefreshENS_8WithLockESt10shared_ptrINS0_17DatabaseInfoEntryEES2_INS_12ChunkManagerEERKNS_15NamespaceStringEi"},{"b":"55816320E000","o":"8A8D22","s":"_ZN5mongo12CatalogCache24getCollectionRoutingInfoEPNS_16OperationContextERKNS_15NamespaceStringE"},{"b":"55816320E000","o":"8A9562","s":"_ZN5mongo12CatalogCache42getShardedCollectionRoutingInfoWithRefreshEPNS_16OperationContextERKNS_15NamespaceStringE"},{"b":"55816320E000","o":"59A994","s":"_ZN5mongo25SessionsCollectionSharded32_checkCacheForSessionsCollectionEPNS_16OperationContextE"},{"b":"55816320E000","o":"59ACA3","s":"_ZN5mongo25SessionsCollectionSharded23setupSessionsCollectionEPNS_16OperationContextE"},{"b":"55816320E000","o":"9B0D54","s":"_ZN5mongo23LogicalSessionCacheImpl8_refreshEPNS_6ClientE"},{"b":"55816320E000","o":"9B2238","s":"_ZN5mongo23LogicalSessionCacheImpl16_periodicRefreshEPNS_6ClientE"},{"b":"55816320E000","o":"58002C"},{"b":"55816320E000","o":"12024D0"},{"b":"7FE04E1BA000","o":"74A4"},{"b":"7FE04DE1B000","o":"E8D0F","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.6.13", "gitVersion" : "db3c76679b7a3d9b443a0e1b3e45ed02b88c539f", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "4.9.0-11-amd64", "version" : "#1 SMP Debian 4.9.189-3+deb9u1 (2019-09-20)", "machine" : "x86_64" }, "somap" : [ { "b" : "55816320E000", "elfType" : 3, "buildId" : "072ABC16ACA2D355506A39C14C0655EBD3C10559" }, { "b" : "7FFEDF5D4000", "path" : "linux-vdso.so.1", "elfType" : 3, "buildId" : "E4710F184535B7465A293C8D84196F236A12D2CF" }, { "b" : "7FE04F404000", "path" : "/lib/x86_64-linux-gnu/libresolv.so.2", "elfType" : 3, "buildId" : "EAD5FD817712E63C1212D1EE7D7EE1B9C29F93A7" }, { "b" : "7FE04EF6A000", "path" : "/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1", "elfType" : 3, "buildId" : "BC8265C9A7EB28F1A9677B7437E4561F383C44D9" }, { "b" : "7FE04ECFE000", "path" : "/usr/lib/x86_64-linux-gnu/libssl.so.1.1", "elfType" : 3, "buildId" : "4EAE4DD6DA20F6CECBEA6688C8ED8E9987CD800F" }, { "b" : "7FE04EAFA000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "DB2CAEEEC37482A98AB1416D0A9AFE2944930DE9" }, { "b" : "7FE04E8F2000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "86B35D63FACD97D22973E99EE9863F7714C4F53A" }, { "b" : "7FE04E5EE000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "4E49714C557CE0472C798F39365CA10F9C0E1933" }, { "b" : "7FE04E3D7000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "51AD5FD294CD6C813BED40717347A53434B80B7A" }, { "b" : "7FE04E1BA000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "16D609487BCC4ACBAC29A4EAA2DDA0D2F56211EC" }, { "b" : "7FE04DE1B000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "775143E680FF0CD4CD51CCE1CE8CA216E635A1D6" }, { "b" : "7FE04F61B000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "606DF9C355103E82140D513BC7A25A635591C153" }, { "b" : "7FE04DC09000", "path" : "/lib/x86_64-linux-gnu/libnss_files.so.2", "elfType" : 3, "buildId" : "8ADB8C8CDDE0E4C3C3BABB8874AE69DCA15A666A" }, { "b" : "7FE04DA03000", "path" : "/lib/x86_64-linux-gnu/libnss_dns.so.2", "elfType" : 3, "buildId" : "B65AA56F4CECC441FCE7F85F0B4AA0F2815E4E34" } ] }} mongos(_ZN5mongo15printStackTraceERSo+0x41) [0x5581643011b1] mongos(+0x10F2B95) [0x558164300b95] mongos(_ZN10__cxxabiv111__terminateEPFvvE+0x6) [0x5581643f5276] mongos(+0x11E72C1) [0x5581643f52c1] mongos(_ZN5mongo10ThreadPool25_startWorkerThread_inlockEv+0x99F) [0x558163ac8c6f] mongos(_ZN5mongo10ThreadPool8scheduleESt8functionIFvvEE+0x398) [0x558163ac94c8] mongos(_ZN5mongo30ConfigServerCatalogCacheLoader14getChunksSinceERKNS_15NamespaceStringENS_12ChunkVersionESt8functionIFvPNS_16OperationContextENS_10StatusWithINS_18CatalogCacheLoader26CollectionAndChangedChunksEEEEE+0x222) [0x558163abd042] mongos(_ZN5mongo12CatalogCache26_scheduleCollectionRefreshENS_8WithLockESt10shared_ptrINS0_17DatabaseInfoEntryEES2_INS_12ChunkManagerEERKNS_15NamespaceStringEi+0xB2A) [0x558163ab1f8a] mongos(_ZN5mongo12CatalogCache24getCollectionRoutingInfoEPNS_16OperationContextERKNS_15NamespaceStringE+0x952) [0x558163ab6d22] mongos(_ZN5mongo12CatalogCache42getShardedCollectionRoutingInfoWithRefreshEPNS_16OperationContextERKNS_15NamespaceStringE+0x52) [0x558163ab7562] mongos(_ZN5mongo25SessionsCollectionSharded32_checkCacheForSessionsCollectionEPNS_16OperationContextE+0x64) [0x5581637a8994] mongos(_ZN5mongo25SessionsCollectionSharded23setupSessionsCollectionEPNS_16OperationContextE+0x33) [0x5581637a8ca3] mongos(_ZN5mongo23LogicalSessionCacheImpl8_refreshEPNS_6ClientE+0x164) [0x558163bbed54] mongos(_ZN5mongo23LogicalSessionCacheImpl16_periodicRefreshEPNS_6ClientE+0x28) [0x558163bc0238] mongos(+0x58002C) [0x55816378e02c] mongos(+0x12024D0) [0x5581644104d0] libpthread.so.0(+0x74A4) [0x7fe04e1c14a4] libc.so.6(clone+0x3F) [0x7fe04df03d0f] ----- END BACKTRACE ----- The other 5 routers survived. I can provide you the mongod and mongos logs.
carl.champain commented on Tue, 18 Feb 2020 22:59:28 +0000: kay.agahd@idealo.de, Apologies for the delayed response. Unfortunately, there is no diagnostic data covering the index build for the primary and the mongos that crashed. This makes it difficult for us to investigate this issue in greater detail since we can’t correlate the events in the log with other diagnostic indicators. If this behavior happens again, please reopen a ticket and provide the diagnostic data and logs covering the event so we can continue to investigate. Kind regards, Carl carl.champain commented on Fri, 7 Feb 2020 15:27:34 +0000: kay.agahd@idealo.de, Yes, the upload succeeded. Thank you! We are investigating the issue and will let you know. kay.agahd@idealo.de commented on Fri, 7 Feb 2020 14:40:06 +0000: Anyone? kay.agahd@idealo.de commented on Wed, 5 Feb 2020 21:26:52 +0000: Hi carl.champain, I've uploaded the router and configserver logs again and also the diagnostic data. Did the upload succeed this time? carl.champain commented on Tue, 4 Feb 2020 20:34:57 +0000: Sorry, I misspoke in my previous comment: I can open sharedSystem01.primary.log.mongo-hotel01-01.db00.pro07.eu.idealo.com.gz. carl.champain commented on Tue, 4 Feb 2020 20:10:43 +0000: kay.agahd@idealo.de, Unfortunately, the upload was not successful. There is just sharedSystem01.primary.log.mongo-hotel01-01.db00.pro07.eu.idealo.com.gz, but, once downloaded, I can't open it. You only need to replace file.name and keep the "@": -F "file=@file.name" Can you also please archive (tar or zip) the $dbpath/diagnostic.data directory (the contents are described here) and upload them? Thanks, Carl kay.agahd@idealo.de commented on Sun, 2 Feb 2020 09:17:50 +0000: Hi carl.champain, I've uploaded router, configserver and mongod log files but I doubt that the upload succeeded because the upload was quite fast terminated. However no error message was thrown. The curl upload instructions said to replace file.name with the file to upload. In order not to overwrite the uploaded files by themselves (because they all have identical file names), I added their hostame to the file name: hn=`hostname -f` mv router.log.gz router.log.${hn}.gz filename="router.log.${hn}.gz" The lines I replaced in the curl statement are: curl -F "key=uploads_valid_180_days/SERVER-45899/\${filename}" ... -F "file=@\${filename}" Is this correct? I removed also the @: -F "file=\${filename}" What's the correct way to use the curl statement? carl.champain commented on Fri, 31 Jan 2020 20:55:03 +0000: Hi kay.agahd@idealo.de, Thank you for the report. Can you please share with us the logs for each of the mongos and primary nodes? I've created a secure upload portal for you. Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time. Kind regards, Carl