Info
This is a very unlikely bug that has only reproduced by building an index on a tiny capped collection (maxSize=1) with a high number of concurrent inserts.
Update: This was also observed in SERVER-56062 on collections that were not trivially small.
If an index build collection scan recovers from yielding and can't restore its cursor because the saved position was deleted, then an index build will crash at this invariant with a CappedPositionLost error.
Example:
Invariant failure","attr":{"expr":"status.isA() || status.isA()","msg":"Unnexpected error code during index build cleanup: CappedPositionLost: CollectionScan died due to position in capped collection being deleted. Last seen record id: RecordId(1)
It wouldn't be a complete solution to just abort the index build, because a secondary could hit this error independently of a primary and still crash.
I think we can safely restart the collection scan if we hit a CappedPositionLost error. While this poses a liveness issue, I think the circumstances of hitting this bug are extreme enough to warrant this solution.
Top User Comments
gregory.wlodarek commented on Mon, 3 May 2021 22:46:42 +0000:
Marking this as a duplicate of SERVER-56062. SERVER-56062 restarts the collection scan phase when it encounters CappedPositionLost.
louis.williams commented on Thu, 18 Jun 2020 19:23:00 +0000:
Assuming we agree on the solution, I think this would involve moving this call to initiateBulk inside insertAllDocumentsInCollection and then wrap that in a retry if we hit a CappedPositionLost exception.