...
Opening multiple (10+) change stream cursors causes massive delays (up to several minutes) between database writes and notification arrival. A single change stream (or 2-3 of them) does not produce the same issue. In a synthetic test, I wrote 100 small documents per second into a database and listen to changes using change streams. I opened 50 change streams and ran it for 100 seconds. The average delay between DB write and change event arrival was 7.1 seconds; the largest delay was 205 seconds (not a typo, over three minutes). MongoDB version: 3.6.2 Test setup #1: MongoDB Atlas M10 (3 replica set) Test setup #2: DigitalOcean Ubuntu box + single instance mongo cluster in Docker I used a Node.js client, CPU and memory usage was minimal. I tried two ways to set up change streams: {{let cursor = collection.watch([ {$match: {"fullDocument.room": roomId}}, ]); cursor.stream().on("data", doc => {...});}} and {{let cursor = collection.aggregate([ {$changeStream: {}}, {$match: {"fullDocument.room": roomId}}, ]); cursor.forEach(doc => {...});}} Both had the same effect.
shane.harvey commented on Wed, 12 Jul 2023 16:15:39 +0000: Note that there is an application-side workaround which allows many change streams to share a single connection. I've described the PyMongo workaround in this comment on SERVER-42885. Similar patterns should be possible in other drivers as well. yatin.p@kidsxap.com.au commented on Mon, 9 Aug 2021 00:54:02 +0000: Hello Guys, We are using v4.2.8 with a singleton connection MongoDB. We are experiencing that multiple change streams cause severe performance drop. All change streams request scanning all collections. do you guys know what would be the reasons? bruce.lucas@10gen.com commented on Wed, 31 Jan 2018 14:54:14 +0000: Hi Gabor, The client issues a getmore operation on the change stream cursor to obtain each notification, so each change stream that is waiting for its next notification will have an outstanding getmore operation in progress until the next notification becomes available. Each connection can only service one operation at a time, so the number of connections needed will be as large as the number of change streams waiting for their next notification. I opened DOCS-11270 to make the requested clarification to the documentation. Bruce aedm commented on Tue, 30 Jan 2018 20:22:06 +0000: Thank you, Bruce! Adjusting the connection pool size really solved the issue. To be honest I don't quite understand why each change stream opens a new connection since the documentation is rather tight-lipped about when connections are opened. My expectation was that I can open tens of thousands of cursors and get Firebase-like real-time notifications only on the appropriate channels. I probably misunderstood the use case; I guess I'll need to listen to all changes in a collection and filter them manually instead. bruce.lucas@10gen.com commented on Tue, 30 Jan 2018 16:28:44 +0000: Hi Gabor, Thanks for the data and for the repro code. I was able to reproduce the high notification latencies at low operation rates that you saw. I traced the issue to the poolSize setting. The default poolSize in the NodeJS driver is small, which limits the number of concurrent connections. Each change stream will tie up a connection doing a getmore operation on the change stream for the period of time that it waits for the next event; since your test sends events at a very low rate on each change stream, this means that all connections could be tied up for some time, and meanwhile change streams that in fact have results available at the server are not able to get a connection in the client to see those results, resulting in large notification latencies. To avoid this you should ensure that the poolSize is at least as large as the number of change streams, for example mongoConnection = await MongoClient.connect(MONGO_URL, {appname: APP_NAME, poolSize: 200}); With this change I got good latencies in a 60-second run: Average delay (ms): 9.801522842639594 Largest delay (ms): 27 In the process of investigating this I did observer a bottleneck in the server at much higher operation rates that I want to look into further (and may open a ticket for), but that bottleneck was not in play in your test at low opeation rates, so I'll close this ticket. Bruce aedm commented on Tue, 30 Jan 2018 00:55:13 +0000: Hi Bruce, Test setup #2 was a single standalone mongod process, not a replica set. This is how I created it: $ docker run --restart always -d --name mongo -p 27017:27017 mongo --replSet="rs0" And then rs.initiate() in mongo console. Here's a copy of the /data/db directory after the test was run: https://f001.backblazeb2.com/file/korteur/mongo-changestreamtest-db-20180129.zip I uploaded the test code with instructions to GitHub: https://github.com/aedm/mongochangestreamtest bruce.lucas@10gen.com commented on Mon, 29 Jan 2018 14:36:58 +0000: Hi Gabor, Can you clarify your test setup #2 - is this a single standalone mongod process, not a replica set? Can you please upload the mongod log file and archived content of $dbpath/diagnostic.data for one of the tests? If setup #2 is indeed not a replica set this will be simpler to analyze. Also please tell us the exact command used and timeline (including timezone) for the test that you upload. Thanks, Bruce bernard.gorman commented on Mon, 29 Jan 2018 12:59:49 +0000: schwerin, I do see an increase in latency as additional $changeStreams are opened, but not until we reach considerably higher levels of write throughput than the case described here. For instance, for a 90-second test with 15 $changeStreams open on a collection and 4 threads running a mixed ~7K ops/s write workload, average and max latency remains negligible in my tests. By the time we reach 105 parallel $changeStreams with the same workload, I'm seeing latencies more in line with the figures given above. schwerin commented on Mon, 29 Jan 2018 02:00:38 +0000: bernard.gorman, is this consistent with the data you're seeing? aedm commented on Sun, 28 Jan 2018 22:53:40 +0000: Sorry for the code formatting issues. I can't edit my submission now. I'll try again: let cursor = collection.watch([ {$match: {"fullDocument.room": roomId}}, ]); cursor.stream().on("data", doc => {...}); and let cursor = collection.aggregate([ {$changeStream: {}}, {$match: {"fullDocument.room": roomId}}, ]); cursor.forEach(doc => {...});
Write documents to the collection at 100+/sec speed. They should contain a timestamp. Open 50+ change streams on the same collection with unique $match conditions. Use the timestamp to measure arrival delay. Experience a huge lag.