...
In certain queries, where there isn't an index available, wiredTiger takes way too long, when comparison with MMAPv1. In the example below the difference is 6min vs 0.2 sec. Example: db.forms.find({'dimension': 2,'degree':2 }).count() And, there isn't any entry with the key "degree". wiredTiger: config: engine: 'wiredTiger' wiredTiger: collectionConfig: blockCompressor: none We also have tried with snappy and zlib. The filesystem is ext4, and we have also tried xfs without a significant difference (when compared to MMAPv1). log entry: 2016-06-28T05:09:22.454+0000 I COMMAND [conn4] command hmfs.forms command: count { count: "forms", query: { dimension: 2, degree: 2 } } planSummary: IXSCAN { dimension: -1, field_label: 1, level_norm: -1 } fromMultiPlanner:1 keyUpdates:0 writeConflicts:0 numYields:13831 reslen:62 locks:{ Global: { acquireCount: { r: 27664 } }, Database: { acquireCount: { r: 13832 } }, Collection: { acquireCount: { r: 13832 } } } protocol:op_query 369464ms MMAPv1 config: engine: 'mmapv1' log entry: 2016-06-28T05:10:12.412+0000 I COMMAND [conn15] command hmfs.forms command: count { count: "forms", query: { dimension: 2, degree: 2 } } planSummary: IXSCAN { dimension: -1, field_label: 1, level_norm: -1 } keyUpdates:0 writeConflicts:0 numYields:447 reslen:62 locks:{ Global: { acquireCount: { r: 896 } }, MMAPV1Journal: { acquireCount: { r: 448 } }, Database: { acquireCount: { r: 448 } }, Collection: { acquireCount: { R: 448 } } } protocol:op_query 181ms This two examples were run on the same machine, on different empty dbs where the only difference is the storage engine. In both cases, the collections was restored from the same dump file. Perhaps we are doing something wrong, if so, please enlighten us, as we are planning switch all our servers to MMAPv1.
thomas.schubert commented on Mon, 3 Oct 2016 21:44:18 +0000: Hi edgarcosta, We have not been able to reproduce this issue. I see that in your logs after the WiredTiger cache is hot, the query takes ~600ms to complete. As you know, MMAPv1 relies on the filesystem cache, whereas WiredTiger utilizes both its own cache and the filesystem cache. This difference may explain the results you are observing. Kind regards, Thomas edgarcosta commented on Fri, 1 Jul 2016 06:29:37 +0000: Thank you. Also, I did these experiments on a "n1-highmem-4" machine from google compute engine, i.e., 4 cpu with 26 GB of memory. Further, mongo was installed directly from the mongodb repository: "http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.2" mongodb-org/trusty,now 3.2.7 amd64 [installed] mongodb-org-mongos/trusty,now 3.2.7 amd64 [installed,automatic] mongodb-org-server/trusty,now 3.2.7 amd64 [installed,automatic] mongodb-org-shell/trusty,now 3.2.7 amd64 [installed,automatic] mongodb-org-tools/trusty,now 3.2.7 amd64 [installed,automatic] I have attached ticket.tar where: $ tar -tf ticket.tar ticket/ ticket/diagnostic.data.tar.bz2 ticket/mongod-mmap.conf ticket/log-mmap ticket/log-wt ticket/mongod-wt.conf $ tar -tf diagnostic.data.tar.bz2 mongodb-mmap/diagnostic.data/ mongodb-mmap/diagnostic.data/metrics.2016-06-28T03-03-13Z-00000 mongodb-mmap/diagnostic.data/metrics.interim mongodb-wt/diagnostic.data/ mongodb-wt/diagnostic.data/metrics.interim mongodb-wt/diagnostic.data/metrics.2016-06-28T05-03-02Z-00000 mongodb-wt/diagnostic.data/metrics.2016-06-28T03-03-04Z-00000 thomas.schubert commented on Wed, 29 Jun 2016 05:49:02 +0000: Hi edgarcosta, Thank you for opening this ticket. I've downloaded the dataset and ran the same query twice on WiredTiger with default configuration (first run with the cache empty, the second run with the cache hot). When the cache was empty it took 14 seconds, with the cache hot it took 0.2 seconds to complete. The performance is on the same order of magnitude for MMAPv1. Please see the logs of the queries below. WiredTiger: 2016-06-29T01:38:57.747-0400 I COMMAND [conn3] command dump.forms command: count { count: "forms", query: { dimension: 2.0, degree: 2.0 }, fields: {} } planSummary: IXSCAN { dimension: -1, field_label: 1, level_norm: -1 } fromMultiPlanner:1 keyUpdates:0 writeConflicts:0 numYields:1700 reslen:47 locks:{ Global: { acquireCount: { r: 3402 } }, Database: { acquireCount: { r: 1701 } }, Collection: { acquireCount: { r: 1701 } } } protocol:op_command 14476ms 2016-06-29T01:39:01.427-0400 I COMMAND [conn3] command dump.forms command: count { count: "forms", query: { dimension: 2.0, degree: 2.0 }, fields: {} } planSummary: IXSCAN { dimension: -1, field_label: 1, level_norm: -1 } keyUpdates:0 writeConflicts:0 numYields:447 reslen:47 locks:{ Global: { acquireCount: { r: 896 } }, Database: { acquireCount: { r: 448 } }, Collection: { acquireCount: { r: 448 } } } protocol:op_command 216ms MMAPv1: 2016-06-30T15:35:24.301-0400 I COMMAND [conn8] command dump.forms command: count { count: "forms", query: { dimension: 2.0, degree: 2.0 }, fields: {} } planSummary: IXSCAN { dimension: -1, field_label: 1, level_norm: -1 } fromMultiPlanner:1 keyUpdates:0 writeConflicts:0 numYields:3015 reslen:47 locks:{ Global: { acquireCount: { r: 6032 } }, MMAPV1Journal: { acquireCount: { r: 3016 } }, Database: { acquireCount: { r: 3016 } }, Collection: { acquireCount: { R: 3016 } } } protocol:op_command 18768ms 2016-06-30T15:35:27.790-0400 I COMMAND [conn8] command dump.forms command: count { count: "forms", query: { dimension: 2.0, degree: 2.0 }, fields: {} } planSummary: IXSCAN { dimension: -1, field_label: 1, level_norm: -1 } keyUpdates:0 writeConflicts:0 numYields:447 reslen:47 locks:{ Global: { acquireCount: { r: 896 } }, MMAPV1Journal: { acquireCount: { r: 448 } }, Database: { acquireCount: { r: 448 } }, Collection: { acquireCount: { R: 448 } } } protocol:op_command 161ms To continue to investigate this issue, would you please archive (tar or zip) the $dbpath/diagnostic.data directory and attach it to this ticket for both the WiredTiger and MMAPv1 nodes? Thank you, Thomas
Start 2 monodb servers, one with mmpav1 and the other with wiredTiger. Dowload: http://lmfdb.warwick.ac.uk/data/dump/latest/hmfs/forms.bson (70GB) http://lmfdb.warwick.ac.uk/data/dump/latest/hmfs/forms.metadata.json restore that collection, and perform the query on both servers.