...
Normal write permissions are the only requirement for writing to the following collections in `local`: replset.election replset.minvalid replset.oplogTruncateAfterPoint startup_log Could we require more explicit write privileges to prevent changing these collections? original description I'm doing testing of interacting with mongo and replicaset. And during the test suite, we start up a Mongodb, and clean it up between tests. One of our tests is testing the infrastructure can recover if a test resets a password and the server needs to be torn down and set up again. Without --replSet having been set, the test passes just fine. With --replSet, I get a failure with this backtrace: mongod(_ZN5mongo15printStackTraceERSo+0x3A) [0x5650b0d4597a] mongod(+0x222EC3E) [0x5650b0d44c3e] mongod(+0x222F112) [0x5650b0d45112] libpthread.so.0(+0x12890) [0x7ff109b80890] libc.so.6(gsignal+0xC7) [0x7ff1097bbe97] libc.so.6(abort+0x141) [0x7ff1097bd801] mongod(_ZN5mongo22invariantFailedWithMsgEPKcRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES1_j+0x0) [0x5650af3d90e4] mongod(_ZNK5mongo4repl33ReplicationConsistencyMarkersImpl11getMinValidEPNS_16OperationContextE+0x258) [0x5650af6c7d78] mongod(_ZN5mongo4repl8SyncTail17_oplogApplicationEPNS0_11OplogBufferEPNS0_22ReplicationCoordinatorEPNS1_14OpQueueBatcherE+0x170) [0x5650af79c2c0] mongod(_ZN5mongo4repl8SyncTail16oplogApplicationEPNS0_11OplogBufferEPNS0_22ReplicationCoordinatorE+0x11E) [0x5650af79d8ee] mongod(+0xC76EB6) [0x5650af78ceb6] mongod(_ZN5mongo8executor22ThreadPoolTaskExecutor11runCallbackESt10shared_ptrINS1_13CallbackStateEE+0x19A) [0x5650b05dc13a] mongod(+0x1AC682A) [0x5650b05dc82a] mongod(_ZN5mongo10ThreadPool10_doOneTaskEPSt11unique_lockISt5mutexE+0x1AF) [0x5650b056181f] mongod(_ZN5mongo10ThreadPool13_consumeTasksEv+0xAA) [0x5650b0561b5a] mongod(_ZN5mongo10ThreadPool17_workerThreadBodyEPS0_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x95) [0x5650b05624c5] libstdc++.so.6(+0xBD57F) [0x7ff10a40057f] libpthread.so.0(+0x76DB) [0x7ff109b756db] libc.so.6(clone+0x3F) [0x7ff10989e88f] ----- END BACKTRACE ----- 2019-04-10T17:36:10.691+0400 I NETWORK [conn6] end connection 127.0.0.1:44610 (0 connections now open Digging through the code it seems to be this invariant: https://github.com/mongodb/mongo/blob/master/src/mongo/db/repl/replication_consistency_markers_impl.cpp#L117 bool ReplicationConsistencyMarkersImpl::getInitialSyncFlag(OperationContext* opCtx) const { auto doc = _getMinValidDocument(opCtx); invariant(doc); // Initialized at startup so it should never be missing. I don't know why the doc is missing in this case.
xgen-internal-githook commented on Mon, 27 Jan 2020 23:27:08 +0000: Author: {'email': 'spencer.jackson@mongodb.com', 'username': 'spencerjackson', 'name': 'Spencer Jackson'} Message: SERVER-40560 Make local.replset.* collections non-normal resources Branch: master https://github.com/mongodb/mongo/commit/e846a6d700a9e4a5b67b7fe4360de9ee97ffe846 eric.sedor commented on Tue, 16 Apr 2019 19:50:42 +0000: jameinel, we are assigning this to an appropriate team to evaluate against our planned work. Updates will be posted as they happen. eric.sedor commented on Thu, 11 Apr 2019 20:34:22 +0000: Hi jameinel, thank you for your report. We especially thank you very much for the additional proactive effort you've taken to find the triggering action for the failure, and related documentation. We'll look into this. jameinel commented on Thu, 11 Apr 2019 14:55:53 +0000: So if there is a bug, it is that local.replset.minvalue isn't local.system.replset.minvalue and/or that Mongod doesn't prevent you from deleting a document that it will later invariant() that it exists. I'm happy that it is a bug in our side that we can fix, but it should be something that users can't actually do. jameinel commented on Thu, 11 Apr 2019 13:22:31 +0000: oddly, I found it with: --setParameter "logComponentVerbosity={verbosity:1}" With that, I was able to see that we were indeed removing the collection: I COMMAND [conn7] command config.$cmd command: delete { delete: "transactions", ordered: true, writeConcern: { getLastError: 1 }, $db: "config" } numYields:0 reslen:229 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, Database: { acquireCount: { w: 1 } }, Collection: { acquireCount:{ w: 1 } } } protocol:op_query 0ms I COMMAND [conn7] command local.$cmd command: listCollections { listCollections: 1, cursor: { batchSize: 0 }, $db: "local" } numYields:0 reslen:239 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { R: 1 } } } protocol:op_query 0ms I COMMAND [conn7] getmore local.$cmd.listCollections command: { getMore: 6923256052014468026, collection: "$cmd.listCollections", batchSize: 0 } originatingCommand: { listCollections: 1, cursor: { batchSize: 0 }, $db: "local" } planSummary: QUEUED_DATA cursorid:6923256052014468026 ntoreturn:0 keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:0 nreturned:7 reslen:1354 locks:{} 0ms I COMMAND [conn7] command local.me command: collStats { collstats: "me", $db: "local" } numYields:0 reslen:13597 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_query 1ms I WRITE [conn7] remove local.me command: { q: {}, limit: 0 } planSummary: COLLSCAN keysExamined:0 docsExamined:1 ndeleted:1 keysDeleted:1 numYields:0 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, Database: { acquireCount: { w: 1 } }, Collection: { acquireCount: { w: 1 } } } 0ms I COMMAND [conn7] command local.$cmd command: delete { delete: "me", ordered: true, writeConcern: { getLastError: 1 }, $db: "local" } numYields:0 reslen:229 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, Database: { acquireCount: { w: 1 } }, Collection: { acquireCount: { w: 1 } } } protocol:op_query 0ms I COMMAND [conn7] command local.replset.election command: collStats { collstats: "replset.election", $db: "local" } numYields:0 reslen:13641 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_query 0ms I WRITE [conn7] remove local.replset.election command: { q: {}, limit: 0 } planSummary: COLLSCAN keysExamined:0 docsExamined:1 ndeleted:1 keysDeleted:1 numYields:0 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, Database: { acquireCount: { w: 1 } }, Collection: { acquireCount: { w: 1 } } } 0ms I COMMAND [conn7] command local.$cmd command: delete { delete: "replset.election", ordered: true, writeConcern: { getLastError: 1 }, $db: "local" } numYields:0 reslen:229 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, Database: { acquireCount: { w: 1 } }, Collection: { acquireCount: { w: 1 } } } protocol:op_query 0ms I COMMAND [conn7] command local.replset.minvalid command: collStats { collstats: "replset.minvalid", $db: "local" } numYields:0 reslen:13639 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_query 1ms I WRITE [conn7] remove local.replset.minvalid command: { q: {}, limit: 0 } planSummary: COLLSCAN keysExamined:0 docsExamined:1 ndeleted:1 keysDeleted:1 numYields:0 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, Database: { acquireCount: { w: 1 } }, Collection: { acquireCount: { w: 1 } } } 0ms I COMMAND [conn7] command local.$cmd command: delete { delete: "replset.minvalid", ordered: true, writeConcern: { getLastError: 1 }, $db: "local" } numYields:0 reslen:229 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, Database: { acquireCount: { w: 1 } }, Collection: { acquireCount: { w: 1 } } } protocol:op_query 0ms Looking here: https://docs.mongodb.com/manual/reference/local-database/ It doesn't define quite a few of those collections. "local.replset.election" "config.transaction" We already have code that explicitly skips all collections that start with "system.", is there a reason that there are now "local.replset" collections that aren't under "local.system.replset" ? jameinel commented on Thu, 11 Apr 2019 12:59:26 +0000: more debugging today. If I launch the mongodb server with: --profile=2 Then it stops crashing. (I was hoping that by setting profiling, I would see what the bad query was that causes it to fail.) jameinel commented on Thu, 11 Apr 2019 09:36:02 +0000: I'm able to replicate the invariant failure even easier with this change: --- a/mgo.go +++ b/mgo.go @@ -332,9 +332,14 @@ func (inst *MgoInstance) run() error { defer session.Close() session.SetMode(mgo.Monotonic, true) if err := session.Run(bson.D{{"replSetInitiate", nil}}, nil); err != nil { - panic(err) + return err } - + // now wait until we can get a valid connection before continuing. + session2, err := inst.Dial() + if err != nil { + return err + } + session2.Close() return nil } Essentially, if I just connect immediately after calling replSetInitiate. Note that I am waiting for it to return, but maybe there is something else I should be waiting for? I haven't been able to replicate this with the mongo shell: #!/bin/bashrm -rf 0 mkdir 0 # Start mongo in the background, and wait for its message mongod --dbpath 0 --bind_ip 127.0.0.1 --nssize 1 --noprealloc --smallfiles --replSet=sstxn-test --nounixsocket --setParameter enableTestCommands=1 --oplogSize 10 >0/out.txt & # Wait until mongod is waiting for connections (tail -f -n0 0/out.txt) | grep -q --line-buffered "waiting for" mongo local --eval " db.replset.minvalid.find().pretty(); rs.initiate(); db.replset.minvalid.find().pretty(); " mongo local --eval "db.replset.minvalid.find().pretty()" killall -SIGINT mongod wait But I also notice that mongo shell always prints out a Implicit session: session { "id" : UUID("41b5734c-74b0-4837-a6cb-8a73cd1033c5") } so it is doing more than just connecting to mongod.
original reproduction I'll give the exact steps I used to reproduce, including the binary that I am testing against. I'm assuming the issue is not about the binary, though. snap install juju-db PATH="/snap/juju-db/current/bin:$PATH" go get -v github.com/juju/testing/... cd github.com/juju/testing git remote add jameinel github.com/jameinel/juju-testing git checkout mgo-with-replSet export TEST_LOGGING_CONFIG="juju.testing=TRACE" go test -check.v -check.f 'mgoSuite.*(Reset|Proxied)' It does seem to require that both TestNewProxiedSession and TestResetWhenUnauthorizied are run. I'm guessing it is actually an issue with the test suite tear down followed by another setup. I realize this isn't a minimalist reproduction with only existing mongo tools in play, but given the bit that is failing is an invariant deep in the code, it does seem like a missed expectation. Note also that the test does not fail 100% of the time, but probably close to 90%. So there is a bit of a race involved, just one that typically fails. Note the setup of Mongodb during the test suite is being done here: https://github.com/jameinel/juju-testing/blob/d2cd4adac5bf0330ef9cea5dcc18d485c2cb2b3c/mgo.go#L221 And if there is no --replSet (and also no replSetInitiate call), then the tests pass 100% of the time. I can run other tests/try to find a smaller reproducer if that is relevant.
Click on a version to see all relevant bugs
MongoDB Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.