...
I frequently run two benchmark workloads – Linkbench and the insert benchmark. Secondary indexes are created for both after there is some data in the indexed table. For Linkbench there is one secondary index while in the insert benchmark I use 3 secondary indexes per collection and usually use 8 collections. For the insert benchmark creating the secondary indexes is done concurrently with one client per collection (so 8 in parallel). There is only one secondary index for Linkbench. Creating the secondary index is faster in 4.4 than 4.2 for Linkbench. But for the insert benchmark it is almost 2X faster in 4.2 than 4.4. My scripts use "indexed rows/second" rather than time to report performance (so larger == faster) and that is ips in the data that my test scripts report. There is a lot of data here but I will explain what I see using the results in this section for IO-bound Linkbench indexed rows/s is 247146 and 271190 for 4.2.8 without and with Snappy compression for the database compared to 113216 and 145950 for 4.4.0rc14 without and with Snappy compression. 4.4 uses less CPU than 4.2 – see the cpupq column which is CPU per indexed row 4.4 databases (see dbgb1) are larger than 4.2 which might explain why 4.4 reads more from storage / indexed row than 4.2 (see rkbpq). But this doesn't explain a 2X perf difference rkbps (storage read KB/s from iostat) is almost 2X larger in 4.2 than 4.4 and that might explain the 2X perf difference. however from a CPU-bound test where the collections are cached and there are no storage reads, 4.2 is still a lot faster (maybe 1.5X) than 4.4. Results are here Some performance results from Linkbench are here and the indexed rows/s rate (ips) is better for 4.4 than 4.2 Next steps for me: Figure out why 4.2 can read from storage almost 2X faster despite using the same setup via DSI (c3.8xlarge with 20k EBS IOPs) Re-learn what the limits are for memory used by create index sorts in 4.2 and 4.4 upload ftdc
louis.williams commented on Tue, 28 Jul 2020 17:08:13 +0000: Since this has come up before, index builds in 4.4 should be faster in almost every way due to KeyString sorting improvements we made. That said, 4.4 introduced some slowness due to the two-phase commit protocol, but unless there’s a large amount of replication lag, it shouldn’t be significant at scale. The observation that index builds are 2x slower seems related only to concurrent behavior when building more indexes than the default limit of 3, but I am interested to see what we find here. louis.williams commented on Tue, 28 Jul 2020 16:51:50 +0000: The maxNumActiveUserIndexBuilds server parameter is hardcoded in 4.2 and not tunable. I filed SERVER-49948 to fix that. louis.williams commented on Tue, 28 Jul 2020 16:23:03 +0000: Yes, actually. Wiredtiger has "read once" cursors that we use for index builds. Enabled by the useReadOnceCursorsForIndexBuilds by default. I did a small writeup in SERVER-37590. mark.callaghan commented on Tue, 28 Jul 2020 16:16:14 +0000: louis.williams - are there plans to get and use a "don't cache" flag in WiredTiger so that scans done for create index don't wipe the cache? We added such an option to MyRocks (MySQL+RocksDB) and used it for full scans done during daily logical backup. mark.callaghan commented on Tue, 28 Jul 2020 16:08:23 +0000: I have more to investigate. I have index build results for all 4.2 releases and I don't see a change at 4.2.3. louis.williams commented on Tue, 28 Jul 2020 15:28:45 +0000: There is actually more history than I remember: A limit of 10 was originally introduced by SERVER-38323 (for 4.1.7). Undocumented as far as I can tell. The limit was lowered to 3 by SERVER-44984 (for 4.2.3 and 4.4.0-rc0) and the default memory usage was lowered to 200MB. Only the memory usage change was documented by DOCS-13432. SERVER-47155 changed the implementation to be less buggy and introduced the new parameter that only exists in 4.4. In your tests, both 4.2.8 and 4.4.0-rc* should be limited to 3 concurrent index builds, so maybe there is more to investigate there. louis.williams commented on Tue, 28 Jul 2020 15:01:49 +0000: I'm sorry you ran into this issue. I updated the description in SERVER-47155 to provide more context for the change, which was to bound resource utilization of index builds by a safe default. mark.callaghan commented on Tue, 28 Jul 2020 14:37:21 +0000: I was not aware of that change and will repeat my tests. Can server-47155 get updated to explain why this was added? Also, I don't see docs for this: https://docs.mongodb.com/master/reference/operator/aggregation/lookup/?searchProperty=current&query=maxNumActiveUserIndexBuilds louis.williams commented on Tue, 28 Jul 2020 14:01:55 +0000: In 4.4, we introduced a limit for the maximum number of concurrent index builds to 3 by default (see SERVER-47155). Are you raising the maxNumActiveUserIndexBuilds parameter in your test that is running 8 concurrent index builds? mark.callaghan commented on Tue, 28 Jul 2020 01:43:57 +0000: For insert benchmark with 1 collection and serial create index, creating secondary indexes is slightly faster in 4.4 than in 4.2. I will run a few more tests to determine how that changes as concurrency is increased.
Create indexes concurrently for large collections