
OPERATIONAL DEFECT DATABASE
...

...
Technical Details: Replica Set Size: 7 members TLS Configuration: mongod --tlsMode allowTLS --tlsAllowInvalidHostnames --tlsCertificateKeyFile /etc/mongo/certs/server.pem --tlsCAFile /etc/mongo/certs/ca.pem --setParameter enableFlowControl=false --ipv6 --bind_ip_all --port 27020 --dbpath=/data/db/wt-27020 --replSet rs-shard-1 --quiet --slowms 500 --logpath /data/db/mongo-27020.log --setParameter minSnapshotHistoryWindowInSeconds=1 --setParameter diagnosticDataCollectionEnabled=false --setParameter oplogFetcherUsesExhaust=false --setParameter allowMultipleArbiters=true --setParameter storageEngineConcurrentWriteTransactions=8 --setParameter storageEngineConcurrentReadTransactions=8 --logappend --oplogSize 100 --logRotate reopen --wiredTigerCacheSizeGB .26 Observed Behavior: Suddenly Slow queries are started after that we see high CPU spike randomly. Replication connections (inter-node) over TLS show higher latency during handshake and connection establishment compared to non-TLS (plain TCP). This is consistent across all replica set members. The issue persists with the provided TLS configuration and valid certificates. High CPU usage: We observe increased CPU utilization on mongod processes when TLS is enabled, especially during connection establishment and replication. Impact: Increased time for replica set members to connect and sync. Overall replication lag is higher when TLS is enabled. Performance issues observed: Due to the increased latency and CPU usage, we are experiencing slower replication, delayed data consistency, and degraded application performance. Steps to Reproduce: Deploy a 7-member replica set with the above TLS configuration. Measure connection establishment time and CPU usage between members with and without TLS. Compare replication lag, initial sync times, and CPU metrics. Request: Investigate the cause of increased TLS connection times and high CPU usage in replica set communication. Provide recommendations or fixes to optimize TLS handshake, connection performance, and CPU utilization in MongoDB. Logs {"t":\{"$date":"2025-08-11T01:41:36.293+00:00"} ,"s":"I", "c":"ACCESS", "id":6788604, "ctx":"conn497","msg":"Auth metrics report","attr":{"metric":"acquireUser","micros":0}} {"t":\{"$date":"2025-08-11T01:41:36.404+00:00"} ,"s":"W", "c":"ACCESS", "id":5626700, "ctx":"conn497","msg":"Client has attempted to reauthenticate as a single user","attr":{"user": {"user":"__system","db":"local"} }} {"t":\{"$date":"2025-08-11T01:41:36.404+00:00"} ,"s":"I", "c":"ACCESS", "id":5286306, "ctx":"conn497","msg":"Successfully authenticated","attr":{"client":"[fd00:3984:3989::1]:35738","isSpeculative":false,"isClusterMember":true,"mechanism":"SCRAM-SHA-256","user":"__system","db":"local","result":0,"metrics":{"conversation_duration":{"micros":110971,"summary":{"0": {"step":1,"step_total":2,"duration_micros":80} ,"1":{"step":2,"step_total":2,"duration_micros":23}}}},"extraInfo":{}}} {"t":\{"$date":"2025-08-11T01:46:36.295+00:00"} ,"s":"I", "c":"ACCESS", "id":6788604, "ctx":"conn497","msg":"Auth metrics report","attr":{"metric":"acquireUser","micros":0}} {"t":\{"$date":"2025-08-11T01:46:36.406+00:00"} ,"s":"W", "c":"ACCESS", "id":5626700, "ctx":"conn497","msg":"Client has attempted to reauthenticate as a single user","attr":{"user": {"user":"__system","db":"local"} }} {"t":\{"$date":"2025-08-11T01:46:36.407+00:00"} ,"s":"I", "c":"ACCESS", "id":5286306, "ctx":"conn497","msg":"Successfully authenticated","attr":{"client":"[fd00:3984:3989::1]:35738","isSpeculative":false,"isClusterMember":true,"mechanism":"SCRAM-SHA-256","user":"__system","db":"local","result":0,"metrics":{"conversation_duration":{"micros":112016,"summary":{"0": {"step":1,"step_total":2,"duration_micros":87} ,"1":{"step":2,"step_total":2,"duration_micros":40}}}},"extraInfo":{}}} {"t":\{"$date":"2025-08-11T01:51:36.294+00:00"} ,"s":"I", "c":"ACCESS", "id":6788604, "ctx":"conn497","msg":"Auth metrics report","attr":{"metric":"acquireUser","micros":0}} TLS connection slow logs and seen "authTimeMillis":334 {"t":\{"$date":"2025-08-08T07:24:20.594+00:00"} ,"s":"I", "c":"ASIO", "id":6496400, "ctx":"MirrorMaestro","msg":"Slow connection establishment","attr":{"hostAndPort":"[2606:ae........244:61]:27045","dnsResolutionTimeMillis":1,"tcpConnectionTimeMillis":111,"tlsHandshakeTime":null,"authTimeMillis":334,"hookTime":null,"totalTimeMillis":446}} {"t":\{"$date":"2025-08-08T07:24:21.333+00:00"} ,"s":"I", "c":"ASIO", "id":6496400, "ctx":"MirrorMaestro","msg":"Slow connection establishment","attr":{"hostAndPort":"[2606:ae......244:ac]:27045","dnsResolutionTimeMillis":0,"tcpConnectionTimeMillis":111,"tlsHandshakeTime":null,"authTimeMillis":332,"hookTime":null,"totalTimeMillis":443}} {"t":\{"$date":"2025-08-08T07:24:21.401+00:00"} ,"s":"I", "c":"ASIO", "id":6496400, "ctx":"MirrorMaestro","msg":"Slow connection establishment","attr":{"hostAndPort":"[2606:ae00:3001:8311:172:16:244:bc]:27045","dnsResolutionTimeMillis":0,"tcpConnectionTimeMillis":110,"tlsHandshakeTime":null,"authTimeMillis":334,"hookTime":null,"totalTimeMillis":444}} TCP connection slow logs and seen "authTimeMillis":123 {"t":\{"$date":"2025-02-17T16:16:06.688+00:00"} ,"s":"I", "c":"ASIO", "id":6496400, "ctx":"MirrorMaestro","msg":"Slow connection establishment","attr":{"hostAndPort":"[2606:ae00:3.......44:9]:27030","dnsResolutionTimeMillis":0,"tcpConnectionTimeMillis":40,"tlsHandshakeTime":null,"authTimeMillis":123,"hookTime":null,"totalTimeMillis":163}} {"t":\{"$date":"2025-02-17T16:19:27.967+00:00"} ,"s":"I", "c":"ASIO", "id":6496400, "ctx":"MirrorMaestro","msg":"Slow connection establishment","attr":{"hostAndPort":"[2606:a....:244:9]:27030","dnsResolutionTimeMillis":0,"tcpConnectionTimeMillis":38,"tlsHandshakeTime":null,"authTimeMillis":126,"hookTime":null,"totalTimeMillis":164}} {"t":\{"$date":"2025-02-17T16:21:41.482+00:00"} ,"s":"I", "c":"ASIO", "id":6496400, "ctx":"MirrorMaestro","msg":"Slow connection establishment","attr":{"hostAndPort":"[2606:.........244:9]:27030","dnsResolutionTimeMillis":0,"tcpConnectionTimeMillis":40,"tlsHandshakeTime":null,"authTimeMillis":123,"hookTime":null,"totalTimeMillis":163}} {"t":\{"$date":"2025-02-17T16:24:57.232+00:00"} ,"s":"I", "c":"ASIO", "id":6496400, "ctx":"MirrorMaestro","msg":"Slow connection establishment","attr":{"hostAndPort":"[2606.........44:9]:27030","dnsResolutionTimeMillis":0,"tcpConnectionTimeMillis":40,"tlsHandshakeTime":null,"authTimeMillis":125,"hookTime":null,"totalTimeMillis":165}} {"t":\{"$date":"2025-02-17T16:26:34.810+00:00"} ,"s":"I", "c":"ASIO", "id":6496400, "ctx":"MirrorMaestro","msg":"Slow connection establishment","attr":{"hostAndPort":"[2606.........16:244:9]:27030","dnsResolutionTimeMillis":0,"tcpConnectionTimeMillis":40,"tlsHandshakeTime":null,"authTimeMillis":121,"hookTime":null,"totalTimeMillis":161}} {"t":\{"$date":"2025-02-17T16:26:38.594+00:00"} ,"s":"I", "c":"ASIO", "id":6496400, "ctx":"ReplNetwork","msg":"Slow connection establishment","attr":{"hostAndPort":"[260..........4:a8]:27030","dnsResolutionTimeMillis":0,"tcpConnectionTimeMillis":38,"tlsHandshakeTime":null,"authTimeMillis":124,"hookTime":null,"totalTimeMillis":162}} Cluster Member Authentication (2-step SCRAM-SHA-256): temp.txt (__system user): Average: ~111,000 microseconds (111ms) Step breakdown: Step 1 (60-270μs), Step 2 (18-31μs) temp-tcp.txt (__system user): Average: ~41,000 microseconds (41ms) Step breakdown: Step 1 (92-883μs), Step 2 (23-32μs)
JIRAUSER1265262 commented on Tue, 2 Sep 2025 20:03:37 +0000: Hi gonugunta.akhil@gmail.com! For diagnosing cpu spikes, we need diagnostic data. I've created a secure upload portal for you. Files uploaded to this portal are hosted on Box, are visible only to MongoDB employees, and are routinely deleted after some time. For each node in the replica set spanning a time period that includes the incident, would you please archive (tar or zip) and upload to that link: the mongod logs the $dbpath/diagnostic.data directory (the contents are described here) In general I would expect TLS to require some additional cpu, but maxing cpu sounds suspicious to me. If your cluster is already in a degraded state, adding additional overhead is possibly going to exasperate it. To set expectations, it's possible this may be working as expected. However, we will be able to investigate whether this is a bug or not.
Deploy a 7-member replica set with the above TLS configuration. Measure connection establishment time and CPU usage between members with and without TLS. Compare replication lag, initial sync times, and CPU metrics.
Click on a version to see all relevant bugs
MongoDB Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.