...
Mongo 4.0.12 ( ~2.3 TB) After an unclean shutdown of a replica set member, the replica member tried to do a replication recovery by replaying stored operation from the oplog. 2020-01-23T21:44:10.390-0600 I REPL [initandlisten] Replaying stored operations from { : Timestamp(1578569519, 42) } (exclusive) to { : Timestamp(1579831824, 148) } (inclusive). I couldn't wait for it to replay 15 days of stored operations because that was the only member in the replica set(the secondary was down for some other problem). I decided to start it as a standalone and drop the local database, but got :> db.dropDatabase() { "ok" : 0, "errmsg" : "not authorized on local to execute command { dropDatabase: 1.0, lsid: { id: UUID(\"60d44cb0-c23e-4c0e-a9b1-c6939e326c40\") }, $readPreference: { mode: \"secondaryPreferred\" }, $db: \"local\" }", "code" : 13, "codeName" : "Unauthorized"} I was only able to delete the collection oplog.rs, but when the node starts as a replica set, the member is in `STARTUP2` state and tries to perform an initial sync. 2020-01-23T1... E REPL [replication-0] Initial sync attempt failed -- attempts left: 0 cause: InitialSyncOplogSourceMissing: No valid sync source found in current replica set to do an initial sync.... 2020-01-23T... F REPL [replication-0] The maximum number of retries have been exhausted for initial sync. 2020-01-23T... I STORAGE [replication-0] Finishing collection drop for local.temp_oplog_buffer (d663e044-dbec-4167-b95a-c9a703595672). 2020-01-23T22:... E REPL [replication-0] Initial sync failed, shutting down now. Restart the server to attempt a new initial sync. 2020-01-23T22:... F - [replication-0] Fatal assertion 40088 InitialSyncOplogSourceMissing: No valid sync source found in current replica set to do an initial sync. at src/mongo/db/repl/replication_coordinator_impl.cpp 727 2020-01-23T22:... F - [replication-0] ***aborting after fassert() failure starting the node with a different rs name doesn't help either 2020-01-23T19:... I REPL [conn117] replSet set names do not match, ours: rs1; remote node's: rs5 ... 2020-01-23T1... E REPL [replication-0] Initial sync attempt failed -- attempts left: 0 cause: InitialSyncOplogSourceMissing: No valid sync source found in current replica set to do an initial sync. 2020-01-23T22::... F REPL [replication-0] The maximum number of retries have been exhausted for initial sync. 2020-01-23T22:... E REPL [replication-0] Initial sync failed, shutting down now. Restart the server to attempt a new initial sync. 2020-01-23T22:... F - [replication-0] Fatal assertion 40088 InitialSyncOplogSourceMissing: No valid sync source found in current replica set to do an initial sync. at src/mongo/db/repl/replication_coordinator_impl.cpp 727 2020-01-23T22:... F - [replication-0] ***aborting after fassert() failure I want to be able to restart the node as a replica set, but it looks like this can't be done without dropping the local database.
carl.champain commented on Mon, 3 Feb 2020 18:36:39 +0000: anas.mansouri10@gmail.com, The following log indicates that the node is being restarted with a replica set config that already has members in it, which implies that it is not a new replica set: 2020-01-29T19:17:10.598-0600 I REPL [replexec-0] New replica set config in use: { _id: "rs5", version: 321542, protocolVersion: 1, writeConcernMajorityJournalDefault: true, members: [ { _id: 1, host: "nj-mongo1.chi-dc-10.com:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 2, host: "mongo10.chi-dc-10.com:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 3, host: "mongo12.chi-dc-10.com:26016", arbiterOnly: true, buildIndexes: true, hidden: false, priority: 0.0, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMillis: 60000, catchUpTakeoverDelayMillis: 30000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 } } } You are using the name of an existing replica set, that's why the node is still trying to initial sync with other members of that existing set. You should first remove the member from the replica set. If you are unable to do so, then restart the node on a different port number, this will make it unreachable by the other members. Finally, try to convert the node as a new replica set (with a new unused name). That said, the SERVER project is for bugs and feature suggestions for the MongoDB server. As this ticket does not appear to be a bug, I will now close it. If you need further assistance troubleshooting, I encourage you to ask our community by posting on the mongodb-user group or on Stack Overflow with the mongodb tag. Kind regards, Carl anas.mansouri10@gmail.com commented on Mon, 3 Feb 2020 03:40:08 +0000: @carl.champaign I can't remove the member from the replica set because this node is a standalone, and what we're trying to do is restart it as a new replica set. As mentioned before the node was in a replica set(1 primary, 1 seconday(node A down), 1 arbiter) before the unclean shutdown, but decided to restart it as a standalone, because the it tried to do a replication recovery of 15 days of stored operations. I followed the steps to convert the standalone into a new replica set, but I can't execute step 4, because the node is in `STARTUP2` state, and tries to do a resync (knowing that I have dropped the local database when the node was a standalone). What I think is happening here, is even after dropping the local database, the member still remembers the old replica set config, and tries to perform a resync. I attached a copy of the log. mongod-log carl.champain commented on Fri, 31 Jan 2020 20:04:32 +0000: anas.mansouri10@gmail.com, A couple of things: Did you follow the steps to remove a member from a replica set? Then you can convert the standalone into a new replica set. Can you please provide the full mongod.log file for this node? Thanks, Carl anas.mansouri10@gmail.com commented on Thu, 30 Jan 2020 01:48:43 +0000: We were able to drop the local Database. However, the member is still in STARTUP2 and tries to perform an initial sync when restarting it as replica set. 2020-01-29T19:18:49.876-0600 I REPL [replication-0] Initial sync attempt finishing up. 2020-01-29T19:18:49.876-0600 I REPL [replication-0] Initial Sync Attempt Statistics: { failedInitialSyncAttempts: 9, maxFailedInitialSyncAttempts: 10, initialSyncStart: new Date(1580347030615), initialSyncAttempts: [ { durationMillis: 0, status: "InitialSyncOplogSourceMissing: No valid sync source found in current replica set to do an initial sync.", syncSource: ":27017" }, { durationMillis: 0, status: "InitialSyncOplogSourceMissing: No valid sync source found in current replica set to do an initial sync.", syncSource: ":27017" }, { durationMillis: 0, status: "InitialSyncOplogSourceMissing: No valid sync source found in current replica set to do an initial sync.", syncSource: ":27017" }, { durationMillis: 0, status: "InitialSyncOplogSourceMissing: No valid sync source found in current replica set to do an initial sync.", syncSource: ":27017" }, { durationMillis: 0, status: "InitialSyncOplogSourceMissing: No valid sync source found in current replica set to do an initial sync.", syncSource: ":27017" }, { durationMillis: 0, status: "InitialSyncOplogSourceMissing: No valid sync source found in current replica set to do an initial sync.", syncSource: ":27017" }, { durationMillis: 0, status: "InitialSyncOplogSourceMissing: No valid sync source found in current replica set to do an initial sync.", syncSource: ":27017" }, { durationMillis: 0, status: "InitialSyncOplogSourceMissing: No valid sync source found in current replica set to do an initial sync.", syncSource: ":27017" }, { durationMillis: 0, status: "InitialSyncOplogSourceMissing: No valid sync source found in current replica set to do an initial sync.", syncSource: ":27017" } ] } 2020-01-29T19:18:49.876-0600 E REPL [replication-0] Initial sync attempt failed -- attempts left: 0 cause: InitialSyncOplogSourceMissing: No valid sync source found in current replica set to do an initial sync. 2020-01-29T19:18:49.876-0600 F REPL [replication-0] The maximum number of retries have been exhausted for initial sync. 2020-01-29T19:18:49.876-0600 I STORAGE [replication-0] Finishing collection drop for local.temp_oplog_buffer (1a12ab88-9ae7-46ff-919c-d2a0d6a8028d). 2020-01-29T19:18:49.887-0600 E REPL [replication-0] Initial sync failed, shutting down now. Restart the server to attempt a new initial sync. 2020-01-29T19:18:49.887-0600 F - [replication-0] Fatal assertion 40088 InitialSyncOplogSourceMissing: No valid sync source found in current replica set to do an initial sync. at src/mongo/db/repl/replication_coordinator_impl.cpp 727 2020-01-29T19:18:49.888-0600 F - [replication-0] ***aborting after fassert() failure We can't perform an initial sync since the second node's(A) data dir was wiped out, and is currently down. We were hoping to restart the standalone ^ as a rs and add (A) to the replica set. carl.champain commented on Mon, 27 Jan 2020 16:51:14 +0000: Hi anas.mansouri10@gmail.com, Thank you for the report. Unfortunately, the described approach is detrimental to the replica set's state. First, the error from running db.dropDatabase() indicates that you don't have the permission to run such operations, so please check your user's privileges. For example, the role of clusterManager should allow you to drop a database. Second, only dropping the collection oplog.rs is not enough, that's why you are getting the "Initial sync attempt failed" error. You need to drop the local database to avoid this behavior. Third, changing the name of the replica set is not the correct procedure to restart the node as a member of a new replica set since the local database wasn't dropped. A few options which you get this node back into an operational state: Perform an initial sync—you will need one of the other nodes back online, the node will not replay from the oplog but will take time to complete. Restore the data from a back up. Convert the standalone into a new replica set—that should avoid replaying the oplog too. Kind regards, Carl