...
We are running a cluster consisting of: 1 replSet consisting of 3 mongod hardware nodes (each has 378 GB RAM, 5.8 TB SSD in raid 10) 4 configserver as replSet on VM's, each having 8 CPUs and 4 GB RAM 6 mongoS routers on VM's, whereas only 3 routers are heavily used, each having 8 CPUs and 4 GB RAM While running the command convertToCapped, all databases were blocked. Neither the command killOp nor killSession could kill this operation. Clients to all other databases had to wait until the command terminated. Furthermore, due to command convertToCapped, index builds slowed massively down the primary even though they were built in the background. Neither the command killOp nor killSession could kill this operation. Last but not least, replication got stuck during convertToCapped, so both secondaries were more and more behind the primary. Please see attached some screenshots of our dashboards of the primary replSet member. They show nicely when the primary were massively or even completely blocked. I'll also upload log and diagnostic data files from all mongoD's and mongoS's. This ticket is related to SERVER-45899 where a createIndex command in foreground blocked all databases for the same cluster.
kay.agahd@idealo.de commented on Thu, 2 Apr 2020 23:05:54 +0000: Hi carl.champain and milkie, Many thanks for the analysis! It's scary to learn that a mongodb system, which usually processes more than 70k ops/sec continuously 24/24 & 7/7, has been paralyzed within only 30 minutes from a total of only 5745 queries (= 3 ops/sec) on a single collection that was just being converted, and that this, moreover, qualifies as "works as designed". Why let these queries compromise the entire DBS, when the DBS knows that this collection is being converted and is therefore not accessible? Furthermore the command convertToCapped was only active from 11:54 to 13:15 according to the log file but the whole DBS was quasi blocked until 17:30. From 13:15 to 17:58 the indices of the new capped collection were built. Although the indexes were built in background, the whole DBS was still blocked. Last but not least, all 4 killOp commands executed at 12:56, 13:00, 13:03, 14:42 and 14:43 did not kill anything but my nerfs. Only the shutdown of 99% of all database clients at 15:25 allowed the DBS to return to normal after 2 hours. Is this really the expected behaviour of a DBS? carl.champain commented on Wed, 1 Apr 2020 20:16:57 +0000: kay.agahd@idealo.de, Here is what happened: convertToCapped held a database exclusive lock for its entire duration Each new connection consumed resources All resources were exhausted (connections, tickets, etc) All subsequent operations were blocked DOCS-13557 was opened to add warnings to the documentation about this undesired behavior. We recommend you run convertToCapped during maintenance or reduce the volume of reads/writes going to locked things. I'm going to close this ticket as this is not a bug. Kind regards, Carl milkie commented on Wed, 1 Apr 2020 19:18:41 +0000: Note that a viable workaround for this situation is to do what Kay tried, by raising the number of read tickets temporarily. This will work until the incoming connection limit is hit (or some other resource limit). milkie commented on Wed, 1 Apr 2020 14:08:35 +0000: The convertToCapped command holds a Database exclusive lock for its entire duration. The creation of the temp collection and the final rename also take Database exclusive locks, but those are nested (recursive) and thus have no bearing on the state of the lock queue for the Database. What appears to have happened here is that reads were not stopped to the Database during the run of the command, and those blocked connections did not affect future connection attempts (probably a web server or something as the client application). Each blocked connection consumed a ticket, until finally all the free read tickets were consumed. At this point, all incoming readers for any database were blocked. It looks like something similar did not happen for writers, so this means there were less than 256 incoming connections that attempted to write to the Database prior to point B. The documentation has a brief warning about this behavior, but it doesn't go so far as to mention that this could affect the entire instance: This holds a database exclusive lock for the duration of the operation. Other operations which lock the same database will be blocked until the operation completes. See What locks are taken by some common client operations? for operations that lock the database. We haven't made any changes to this behavior of the command even up to version 4.4. carl.champain commented on Mon, 23 Mar 2020 17:37:57 +0000: Hi kay.agahd@idealo.de, We have received a total of 26 files. Thank you! kay.agahd@idealo.de commented on Sun, 22 Mar 2020 17:14:20 +0000: Thanks dmitry.agranat for the upload location. All files but the last one have been uploaded. Can you confirm that you received them? It should be 26 files in total. The last file is quite huge (>600 MB compressed) and is still uploading. dmitry.agranat commented on Sun, 22 Mar 2020 10:55:35 +0000: Hi kay.agahd@idealo.de, I've created a secure upload portal for you. Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time. Thanks, Dima kay.agahd@idealo.de commented on Fri, 20 Mar 2020 23:04:18 +0000: Could you please give me the link to your upload portal for uploading the log and diagnostic files? Thanks! I tried this one but it did not work: curl -F "key=uploads_valid_180_days/SERVER-47032/\${filename}" \ -F "AWSAccessKeyId=AKIAIU2OWL4F5XDSYVJQ" \ -F "acl=private" \ -F "x-amz-server-side-encryption=AES256" \ -F "policy=eyJleHBpcmF0aW9uIjogIjIwMjAtMDctMjlUMDA6MDA6MDBaIiwKICAgICAgICAgICAgICAgICAgImNvbmRpdGlvbnMiOiBbCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgeyAiYnVja2V0IiA6ICIxMGdlbi1odHRwc3VwbG9hZCIgfSwKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBbICJzdGFydHMtd2l0aCIsICIka2V5IiwgInVwbG9hZHNfdmFsaWRfMTgwX2RheXMvU0VSVkVSLTQ1ODk5LyJdLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHsgImFjbCIgOiAicHJpdmF0ZSJ9LAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHsgIngtYW16LXNlcnZlci1zaWRlLWVuY3J5cHRpb24iIDogIkFFUzI1NiIgfSwKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBbICJjb250ZW50LWxlbmd0aC1yYW5nZSIsIDAsIDUzNjg3MDkxMjAgXQogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBdCiAgICAgICAgICAgICAgICAgIH0=" \ -F "signature=W2+z1k21YqB6nF2/VeQdEfnUIq4=" \ -F "file=@configserver01.log.tgz" \ https://10gen-httpsupload.s3.amazonaws.com
Here are some chronological traces: Command convertToCapped executed at 11:54:13 and terminated at 13:15:34: PRO [21:39][root@mongo-hotel01-01.db00.pro07:/home/kay.agahd]# ✔ grep "convertToCapped" /data/sharedSystem01/log/sharedSystem01.log 2020-03-20T11:54:13.749+0100 I STORAGE [conn39183053] createCollection: mongodba.tmphDRLk.convertToCapped.slowops with generated UUID: 90759c88-dcce-463a-9e95-e29c54ab1d00 2020-03-20T13:15:08.832+0100 I COMMAND [conn39183053] renameCollection: renaming collection no UUID from mongodba.tmphDRLk.convertToCapped.slowops to mongodba.slowops 2020-03-20T13:15:34.323+0100 I COMMAND [conn39183053] command mongodba.tmphDRLk.convertToCapped.slowops appName: "MongoDB Shell" command: convertToCapped { convertToCapped: "slowops", size: 69793218560.0, shardVersion: [ Timestamp(0, 0), ObjectId('000000000000000000000000') ], lsid: { id: UUID("fbe556af-08f9-46af-89fc-09fbde81142a"), uid: BinData(0, 3B408CB48548B5037822C10EB0976B3CBF2CEE3BF9C708796BF03941FBECD80F) }, $clusterTime: { clusterTime: Timestamp(1584701653, 8698), signature: { hash: BinData(0, BC39D22EAB7430F38A7C3701C473DD02FCCB3D2B), keyId: 6796240667325497611 } }, $client: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "3.6.13" }, os: { type: "Linux", name: "PRETTY_NAME="Debian GNU/Linux 9 (stretch)"", architecture: "x86_64", version: "Kernel 4.9.0-11-amd64" }, mongos: { host: "mongo-hotel-01:27017", client: "127.0.0.1:42800", version: "3.6.13" } }, $configServerState: { opTime: { ts: Timestamp(1584701649, 3098), t: 14 } }, $db: "mongodba" } numYields:0 reslen:288 locks:{ Global: { acquireCount: { r: 223917487, w: 223917487 } }, Database: { acquireCount: { w: 223917484, W: 3 } }, Metadata: { acquireCount: { W: 223917482 } }, oplog: { acquireCount: { w: 223917484 } } } protocol:op_msg 4880614ms PRO [22:23][root@mongo-hotel01-01.db00.pro07:/home/kay.agahd] Index Build started at 13:15:38 and finished at 17:58:25. 2020-03-20T13:15:38.654+0100 I - [conn39189395] Index Build (background): 2200/223918168 0% ... 2020-03-20T14:45:33.113+0100 I - [conn39232652] Index Build (background): 5407600/224961851 2% 2020-03-20T17:58:15.033+0100 I - [conn39266887] Index Build (background): 230800/1292208 17% 2020-03-20T17:58:18.003+0100 I - [conn39266887] Index Build (background): 529900/1292208 41% 2020-03-20T17:58:21.002+0100 I - [conn39266887] Index Build (background): 858200/1292208 66% 2020-03-20T17:58:24.006+0100 I - [conn39266887] Index Build (background): 1142800/1292208 88% 2020-03-20T17:58:25.451+0100 I INDEX [conn39266887] build index done. scanned 1292208 total records. 13 secs The first 17% of the index build took nearly 5 hours while the rest took only a few seconds. Or was the index build killed by my command killOp or killSession, executed at 14:53:25 eventually? 2020-03-20T14:53:25.208+0100 I COMMAND [conn39249255] command admin.$cmd appName: "MongoDB Shell" command: killSessions { killSessions: [ { id: UUID("81914921-f985-464d-ad3a-7ab007c2b895"), uid: BinData(0, 4E5C22C0742A0A942E4361737D42A397DBF0E56F30DC26CB617A186C91FF4A48) } ], lsid: { id: UUID("be3079b7-ffb7-435c-938e-262ca10da9a6") }, $clusterTime: { clusterTime: Timestamp(1584712403, 84), signature: { hash: BinData(0, A820811A96105BC4B5D4ADF5FBACAD96714937D2), keyId: 6796240667325497611 } }, $db: "admin" } numYields:0 reslen:268 locks:{} protocol:op_msg 231ms At around 14:20 we remark that read and write ticket are out, so we increase it from 256 to 512: 2020-03-20T14:24:08.435+0100 I COMMAND [conn39247774] successfully set parameter wiredTigerConcurrentWriteTransactions to 512.0 (was 256) 2020-03-20T14:24:27.886+0100 I COMMAND [conn39247774] successfully set parameter wiredTigerConcurrentReadTransactions to 512.0 (was 256) At around 15:00 we remark that replication got stuck on both secondaries. At 15:19:41 we restart one secondary to see if replication will work again, but no success. 2020-03-20T15:19:41.790+0100 I CONTROL [main] ***** SERVER RESTARTED ***** At around 15:25 we shut down the 3 most heavily loaded routers (mongo-hotel-01, mongo-hotel-02, mongo-hotel-03), so that the DB does not get a load from their clients anymore. At about 15:50 the first secondary is back in sync, therefore we stepDown the primary because we (wrongly) think it might be a hardware problem: 2020-03-20T15:52:11.954+0100 I COMMAND [conn39249255] Attempting to step down in response to replSetStepDown command 2020-03-20T15:52:11.973+0100 I COMMAND [conn39249255] command admin.$cmd appName: "MongoDB Shell" command: replSetStepDown { replSetStepDown: 60.0, lsid: { id: UUID("be3079b7-ffb7-435c-938e-262ca10da9a6") }, $clusterTime: { clusterTime: Timestamp(1584715911, 226), signature: { hash: BinData(0, 0A644227C73C9075B61DF7053FB6C4B6A467F789), keyId: 6796240667325497611 } }, $db: "admin" } numYields:0 reslen:268 locks:{ Global: { acquireCount: { r: 2, W: 2 }, acquireWaitCount: { W: 2 }, timeAcquiringMicros: { W: 10861 } } } protocol:op_msg 148ms At about 17:20, we promote the stepped-down server to be Primary again because we don't think that's a hardware problem anymore.