
OPERATIONAL DEFECT DATABASE
...

...
Inserting record with thousands of values in a field that is indexed caused system to hang. Database has approximately 60 billion records. I had inserted a record with ~100,000 values for one key and the mongod process essentially hung. It was processing the request for 3 days and each shard in the replicaset would not accept new connections. Reads meanwhile still worked just fine from open connections. The application I wrote can handle failures on write/read gracefully, so I restarted the primary shard, and neither secondary (PSS architecture) (26 shards total) shard promoted. The newly restarted PRIMARY couldn't connect to any secondary. Restarted all the shards in the replicaset and none of them will come back up. They keep replaying the oplog trying to write this same transaction but keep erroring, then try again, etc and they try to connect outbound, but none of them are listening because of processing the oplog. Ubuntu 22.04 - 5.15.0-86-generic #96-Ubuntu SMP Mongo 6.0.12 ..."attempts":13463}}... {"t":\{"$date":"2023-12-03T04:27:39.130+00:00"} ,"s":"D3", "c":"STORAGE", "id":22413, "ctx":"ReplWriterWorker-0","msg":"WT rollback_transaction","attr":{"snapshotId":1518010}} {"t":\{"$date":"2023-12-03T04:27:39.130+00:00"} ,"s":"D1", "c":"WRITE", "id":4640401, "ctx":"ReplWriterWorker-0","msg":"Caught WriteConflictException","attr":{"operation":"applyOplogEntryOrGroupedInserts_CRUD","namespace":"database.collection","attempts":13463}} {"t":\{"$date":"2023-12-03T04:27:39.185+00:00"} ,"s":"D1", "c":"WTWRTLOG", "id":22430, "ctx":"JournalFlusher","msg":"WiredTiger message","attr":{"message": {"ts_sec":1701577659,"ts_usec":185089,"thread":"4152853:0x7fdc39a99640","session_name":"WT_SESSION.log_flush","category":"WT_VERB_LOG","category_id":20,"verbose_level":"DEBUG","verbose_level_id":1,"msg":"log_flush: flags 0x4 LSN 28293/7000192"} }} {"t":\{"$date":"2023-12-03T04:27:39.185+00:00"} ,"s":"D4", "c":"STORAGE", "id":22419, "ctx":"JournalFlusher","msg":"flushed journal"} {"t":\{"$date":"2023-12-03T04:27:39.236+00:00"} ,"s":"D3", "c":"REPL", "id":21254, "ctx":"ReplWriterWorker-0","msg":"Applying op (or grouped inserts)","attr":{"op":{"lsid":{"id": {"$uuid":"64649147-795b-4d60-b933-75f9342cf688"} ,"uid":{"$binary": {"base64":"u4nTF1+wmByGgmwndZCCo3FgRx9gUEtGEkFRhsYwq3A=","subType":"0"} }},"txnNumber":1151,"op":"i","ns":"database.collection","ui":{"$uuid":"4e33280f-7578-49a7-8cf0-4eb149c9d6a2"},"o":{"_id": {"$oid":"6566a09432acf3ad46c309d5"} ,"linenum":35,"sessionids":["+++KIRT3UUNDXVYN7YCSCNJZ","+++UMMUA1LJAMWCME2GDTYD0","++-FPRX7E1RKPDKBEUIAK9AR","++-IKOOJT1XDCBXJALSRTCDX","++1RWUL5LRKT17YZGOPYYYOO","++1VB7TRZDLHIQDLEF1KAOO7","++37XZVPOOX60IY4LFQQDNZE","++3MQNL4LYTBNLBEPSTRQ8SK","++3ZDK4AUGOVTGKOKW2EKLS0","++4WQCQ3TP4ISAD930LVSO-8","++8NKSSBTUZCQWKQPWXMD-GN","++9HZQFHPOCKEKNGKEC--ASM","++9YB9GIEKTTHPYTBCVI+6SY","++AOULKEEIN37QO98A276S+I","++AQJA3IN3THJBJJ6IRMEV-P","++BEJTRTWPOR-H-6361MTSYG","++BEK0DPY0KJ4LXV1A9SBMNG","++BXGBXMFTIEGTVSX8M9SDZM","++BY-PBDWMJAJ16Y7S75VHEU","++C5DWFXPQPR1CTMFVRWPUY9","++CNGNVNNKEB+1OF0Z5LMRWM","++CS8PDWEGROS4B75E-IBGWG","++D0SZNJJ18BI213TPAG5VC3","++DN8PF8JENFEORYBVAZQXEV","++DPO8TMSJRAXQGWBINV1RGY","++E0ZMXRGP3CXMNUOCSJ8ZBV","++E2PJRT08CLLINTZL38KTH4","++EAOELV8LADXNNMIAAU6ZL2","++EBP4XO2ZGCISAPRKG1QKUO","++EGPO0VVQJHTCNIZEL8C8G7","++EI88+SHXUTJK4MSC18QKTE","++EWKA+HFIRLBSHBTXOETUC1","++EXZDPX08TGDJA+BTK9OKXP","++FXT-QXSYHICYUC+BULLXIV","++G64XD4BSY7KHJZIMPHU444","++GDS5VBIZCKQNDWALSDXVCT","++GTXVO5KSG-W+SXH5IHZTLF","++HF91OASMTDLNH2DCELLZBR","++HGSLNWUNA+SDSJH9HUX7EN","++I6NLNZHZNQL0OTSPIXDCP8","++ITOMUQIVWLYMM22ERISKNR","++IXC7LZ5E6JM+TGPRZXXYC1","++IY3LN4VHEWTLWSMJCT0YRA","++IYUNJUVVUXRFUNEJMTY5CZ","++J9JYB5ZORHBGYOQUNWWBIR","++JGOJ3DXZTXQSH7+TYRA6LE","++JH1737JT5GTYPZULVRWQVZ","++JIZ6TXRCKMW8GOD6GCIQM+","++JRX2QSUYQMT-SWMRGZYJRS","++KNX5O5VYPDFVJQB49VUW95","++KSYFOKXELJRJL2VMHO9G27","++L25IY4BR9FH9W0I0G-4ABV","++LI4KECOOTV5JM9HHRVMUTI","++LSSYTOXNR-BHBXVL90AXPC","++LX9K5V0L5ZWKLXL7J8BK6J","++M5DGSW2+AOEXD-H6XZGAZ3","++MJ63-8HJCMNDQUKIWOE3IO","++O+GA3R6MMIOQDGEL08JERI","++ODDSV3IC20TKQIR3WSJASE","++OU1Z3OVRRPGE5POKZMJN+H","++OZLPHL2MV45ZERWSBXT76Q","++PDP3-1DGSCGAPETITMLSEY","++POF8VYYFIDCGM9NULFDIIY","++Q1MO3L9MBHDCNN40CZDZGE","++Q6VYHQNWW3CIQ+B5ELYBGF","++QDAOR4RZLMI2GUGIQ8TQDL","++QQ8JYRAJP25DALW+HPTYAE","++QVTCINSTMZY1KQXJDM1GPJ","++R6YUDKAK5WHXD3D7XZKT6J","++RI9AMLJHVLKFEVIFYIROMY","++RUYJ8OCYVTFGIX8V2ZT4JA","++SYHLP2DH8NPCAOOQ10WRCB","++TGOTE-UGEOAIGYJ8HQBCSC","++THIK9VJYWDDTPQDRSI302B","++TSAF6T8YAI2JR-DIDHR7VD","++U6HWDH3E5WV7J3LAK0MWLB","++UK52G4VQR9IAYA5ILMJQGP","++URNZ3TITOEMAGWANHT+RHN","++UVNYHMV+HS+9FPQZ4LIQ06","++VGOM71BRJTPB1TM0GGQGOE","++VJTQBV4NNHDB8CJKUALX1V","++W-864G0W0KRSV7GTUZWIQ3","++WACSHB3NRHBVYPSVVI3XTX","++WAQ852AAH-DHOB5BBR48QX","++WIDTRBGRV9ZNHB6G0PFQ5L","++WM05POAA0GLAI41IGZXNRJ","++WNZERWKUIHR9JQADR0ANIN","++WOCC8VGLKAWYQW9WXM2TTN","++X-SUWR0M5FOJDRC--JNSUL","++XPRKKQW4SVU4CC+EVVPXHP","++XSDL6AS1YN+FO+URXGN88G","++XURA-PITSNPFWYWK7KAAPI","++YG0WP6KAVHEFLFXV34L7-S","++YNRYYWV+U8ZA2NBBUP5F1Q","++YP+W-OCVM4POQNCO03O3PV","++YTLQBI0KJ-PPBEVMXV7BPV","++ZDDIUJFLDZCUCSDXG-GRL9","++ZNMSBNQ+FAR-NA4ABN7SAL","++ZO0MVVQZULPJWT70TOW4N5","++ZTWHOCG+SXSGSHG5FWBJ6G","++ZZC8CXSDMMGCLUAZI+ACSI","+-+AWAFVNG51VRT0LVUTHCDW","+-+WODNRZJJSWYJSXQ2WVILZ","+---C-OIVPQCDZTZ2HEAQVOP","+--ZPFHVPEAMMDAPN7T3YUT4","+-2+D2EZRG2ZN5A+2+ERSXAY","+-2-O-ZDB0XM3RY-NTDGQQIL","+-2TMEMNXQ8UHZEJRFAQRZTY","+-4PTRHL9ZOC-ZL2Y3+VMOL7","+-50STQP+0DECDNBEKSKHB-E","+-5CUYOTSH3AIG-IMJRL-L8G","+-5J3DBY3Y-N-63PTFVIGODW","+-5VDAN766MQIVHV4E2FBJHR","+-7IXR3HTXJNDAQW-E3MVHKQ","+-7LMR5TZ8VFYOCFI3FIF5IG","+-8KVICJWARD2M5HJFSGY3ML","+-9KUIB-JGMXYTVGM38COIHG","+-9T1TWRRTNW0W-K14R7Y0J7","+-9U0BJZHVLCW7CHH9O-K-QC","+-ACUUGGLTEITH3P+A+R19I2","+-B-XPRGAVX9TMPWTDTVRZ5V","+-BBEXSUOL9JEQUTRH9GZI6P","+-BGF3JLNOQS3EZOMA2OV0CX","+-BKHNJYXDVOXRJ98I2ER8KN","+-BW-OES0BUOIDRHJCAPG0ER","+-CMVZJGNA5DOJYGCGH-JZ1+","+-CQHTSKRXNL-OSI81J6Y2I8","+-CWQNEW7JLQNY-A+6WU-P+I","+-DJ-I1-GLTZYOOSEZL-LXGP","+-DPNUWWTBV+EMBMXZWYL03S","+-DQWANHPA04GO0CYOMHLPGY","+-DWMWYZ6UUHYC8ILGYYYFJT","+-E5F-7U9QH8-JEWONQDDYQP","+-EJC-A73HK4TRK+WFBLP4WS","+-EPGZCXFDYERPI6HBSWPMP0","+-F2+AAX9LF6GFTTDLDALKMD","+-FBZAW0MRRJXOUXGNTCZAPV","+-FCP7F6MH2RADZAYVM3DHVG","+-FSXRO44UVIKCZ6R50Z7OCM","+-HMVAFCLAALREWNBWC5-OKP","+-HODPM7AZ3R6OE2ZS8HJSFL","+-HQKVU7VTSIVSYWMXJBB6LX","+-HRVYLQVJOZELZ4KWV6LR8B","+-IIAZ+MYAU+RQSWX5HK22JM","+-JDJE37IE72BUD3NSSMTF2G","+-JGF3DJR1+AQKAGD04D+MYM","+-JL7GHDIML-ZNYX4CISEVSZ","+-JPR0KJBB-2QTIVI-TGWH-R","+-JXRJ3PJMKQCG8GOW-NQVFL","+-K28AS7CF6SX3PQ6QIOQ7ZI","+-KJYZVXEVW1SBAI4V1RELGW","+-L2P98FMONRPWVEVUF23BZV","+-MJM1I0ZJFKT6C-N-F2Z6QU","+-MRWJJ4E+DKETFYF9-1NN+2","+-MT4DDIWQRGLD2T-VVRLQE+","+-MTA+PWK3EF0XK-AQB5A29Z","+-N+SO5ST4F5SOWXPEQBAJX6","+-NSZMW5V71TOUGP+74P5JKJ","+-OH0MSJEJXKWYOMX9RG7UAV","+-OUMIL5D6IYEL1S6KHIR+9H","+-P11OJZJA3RZ+P2FIRYDWC2","+-P7FD-C9PHXEIJCEASDDOHZ","+-PH-YJGUFD8PAGY2RBXZWOQ","+-PI4LAME4HKV15LI+CZY2DY","+-PNRASMXI3QPBAF+FSZUQFH","+-Q7GUECU7JUVRIZO67L1VRD","+-QCUYF+6M8TTMJOARGGCPBE","+-QHOYKH27EQUA+3OJSQSYPV","+-RADDP6NXGB3YSOVO80ZKYY","+-RAGF5KSEMF9IQE1ZMFYDZ9","+-RGTKBZQV1AXEKQSOK9KT3G","+-RKTJCHEA1DZF3S81MM6BH+","+-RUJ+LC5CE8+YPPXIOR17JH","+-RY0L8S3YL8AROA+JKC0COW","+-SBMTKIBEYZUPIOX6D89UCC","+-SECVDFOSRTE2S5REITZ1WZ","+-SLLFJIGFYN52MPS-OASVRI","+-SNDMVYUWH2-GAGF86520TG","+-SQRHJONZHBZUDQYBCV8BQA","+-T2Y4OA1O7Y2ZECGKZTT7OL","+-U2TO6VLPTELEE1RTXVN5CR","+-UE+WEHVBVHQGGP9WBYWTHL","+-UEG6ASOJDR94ZI8RNIWZP7","+-UGHTK-8EBKCYJ5C95U7MSD","+-UIBM24NX8PZCDMJILFNHGN","+-UXUG2F04YJYVGLMJFOE3-A","+-UYYYMYWFL6RBWFEFLXVPQ2","+-VL2VMTO0ZCNEE-XQMKY+RT","+-WA5ZM0QKQGSZKVTWDK6E7A","+-WD3NEW5IJAMHRY0VAHY5UW","+-WNTAM5XSWWGTJSYO0QGC49","+-XC7V9ZCCUIFSVVCQRUX3V9","+-YGVJIIQZ+XTVQPGLYBUSMD","+-YK-DHG5KAV-Q89DVTNYY78","+-YME3PNZ3QTBXQP0JK2ZUPE","+-YXIJKZBNJ6GOY4SSBBNDSK","+-Z4KWO3OBBVHUCYHEN9XP8Q","+-ZPCCUKPFHFDGAYGNIHHBM-","+-ZTELTHVROW4LZSKF-KW8NQ","+-ZTUC3HS6KBKRMOMY0AN8XQ","+-ZZBPMCXWHQBT2LA6V39K-F","+0+CK-EEUUXY1DM3XMJUC9Q-","+0-DKMCV1LE4XU7KPQZPCJCF","+0-LLV1ULPUD0PXNYPJ6T5YH","+00JE2MRU3QRWOMAU1VHVLXH","+01EEBAETPCIBRKGUPIAM25H","+01N8GCPCTZIKTKR649BW1O7","+01QV9E2AIJFJRC5TPYMO0-D","+02HPZQPP07BN5SKSZONIFQP","+02OBZB9SC3DBD1SCJTCVTFG","+036-8FHZIVLZ+INC6X6TYNI","+03YRMSZYZ2GWYNOD8SUQUEM","+05FARD7M3FMZRFUEH6QBY6L","+05T8MEPQMLSYRANULM2RERP","+06L52FSMIQTEPV9OI5UHQMC","+06YM-ALNECUCMG7BM0C4PLR","+07HLLKFRUTFD1AAZ6ID8X5+","+07TWY0YQZRU1GV2OWZ0BAWU","+08MHDMCBE6G+2R3FATH4A+J","+08V9YAFQNVDHES6FXCHBO+M","+08Y4WNJQ2UNLMXJX4C1GFJB","+09-2UGNXYT5CHEUECIZOVBS","+0A2RXHECHEC1QVAAQFU2GO9","+0ALA4C-OODZXIXG-QKWEKSR","+0AR3D4NMAX6A4WX+2SB3O65","+0AU+N9U44V6I4EGKJE4PYBL","+0BDLICC8DZ0DJKDU6YZSTC8","+0BTQNHHUFKSV7POVQ0OGQAR","+0C++KWLLTG4R2AYXEBPHZM7","+0CADRGL7BJNAHJQN1WA2HWG","+0CLDLZQ6DZPEPUDCVNB7QXB","+0CLZ6NUO0FMDIHM7BVQUITF","+0CNK2OAVMNVD+-E9ATXGHMQ","+0D6MNSXYQ2O2KTBDXYZQX4A","+0DD06NEKUX1NCAVFMSBLB9I","+0DJXT3PQCQCGXHCHWRWEHUO","+0DQA822YTUJP5LM8D5LXEK8","+0ETFSGVKFFLOHSCDYKQBPVN","+0FB1KO8J4S05HA2LLBPBOR9","+0FD8U1PCV8RHDQMN8VDPQZK","+0FUDWDTX9TDS7WMDDSBARN7","+0G1B24WYOJLKJZ-S6AYPVSM","+0G4T8G4RUCHQATQPBZX7V0R","+0GBXEESZESGRTR5+SPZMINC","+0GLPTOTGYECFNQYTZUS9GTM","+0GQM-MNT+V6MUBVOCTALTKT","+0GYWW6ZWN0JZYEVVJ2N7F2A","+0HYDCXBFVPFXBWGIKYCBBAI","+0HZRQY2MK8W3CNRCGHC1SNG","+0IBNKJ92A4N-1ZUPWBQYSZU","+0ICKIPBZUG6NL2FKQWMRRV0","+0IXHCKNPWDPAH2GNAPU3V0V","+0IXUE8NLIYYCYXOQCXL3ZBC","+0J6JC7THCLMO6V49R5LI646","+0J96AIEWUJFEBHZWVJHALHI","+0JMOAIIYBVGRPUJEMCQTAJL","+0JMXYDBSUOALA49GEDC5AIG","+0JVNNMB2MGXC0FDSZ5NUEU8","+0K-V408F78JT07-D2ZN3HMZ","+0KB-2RX6G6R20KWBJRCQPN-","+0KFXODTZKXMLBI-NKBMBPEN","+0LKKBZF+VX6HQ4SSGEEU26J","+0LN0H6NCMUXM9I3AREAOOEJ","+0LNGP0W4DZXXJUF2YDL3WTY","+0MDHHMGNIFTP-9OB3W3O140","+0MMR8X52M5A09+B22QNHHUF","+0MWJCASDO8UTDT-QUJN9IA2","+0MYKL7LXJWBZLANKGN1DJQV","+0MZKK2O714HLW8LVWG9FBVD","+0NNP5IPSBABWSHBC342KNBP","+0NYNIMSBMCIRWDXS7+O3SOE","+0O3M-TCSNX972G9OMPVUIM1","+0OJ+RJXW2IYLFB3T0XFFCD5","+0OJCL-B5MDUG+FH2ZB3IETF","+0OLAQSTKA7ACVKYFW94PDUA","+0PTYS-CWAVQXNOWDWXRM+CR","+0QGJ8U3297WBHTWESP2FWTM","+0R+0DROSOXD0QE5LZMUHUZT","+0S6L0E1QBKU4UTRVALAQBNI","+0S7NRLHLCQSWQXSCHZMGNA7","+0SCP8MJH8SISI237E2VA3EK","+0SDFH1EGEAEYEQMQHFDBGAL","+0SZOP5DS7XMGUREJEIUHR-V","+0T2ZWUOWAMIL0JIUOOM807S","+0TFOY9ZUXJNYBVBS2P1IMAT","+0TKUTMOAZD8AGCQSVAWF30A","+0UIY95RXIKOGSSFV2GBB0KU","+0ULQPDUQFFXXJQE9U4PW+KV","+0UOUTSX+UGLTM9FVHX2LRTF","+0UTW2-WFMN1MPF2717I3V92","+0UYTEH-N-DX-HIKMY78QJJ8","+0VYT-QGUNWUT854EYXNRM2D","+0W81L55OSXXBNJFNLIVDGYH","+0WNEZPR4BZWRDG5NNSIRAQX","+0XO25UA4P+CLLWYKBBNKAH6","+0XPAV-CD7DA-KEGJZYR0EIF","+0YD7WQVCXSWQPWF39FZLSNE","+0YZDHFUEEP6RAF+MPVSNXBY","+0Z-EPTUEFPQB8NWAKRP478B","+0ZER+ZQLAQJYIWZ5BDUAHY4","+0ZJLQGFI+SOQW3ZYW7RJYGN","+0ZPNUYSJFGEXZD9NWG4JSZW","+0ZSMRKHGTXLO4SFL8CPA6JG","+1++TDWVTNNQXN4DLS9QII+P","+1+F5VMU2JOARIUNTR-+FMYW","+1+KB+I8IZHEVJHJOXVSVVUW","+1-KPPJU7IQALSWUGF7QZZ4+","+1-L27J0B1P-YYOKQMFEIRCX","+10Q0WMELFLDSJ93EY-4JYUL","+12CKX-LJ3EWL1TOAX4EP41F","+13S+SD8NX2QTIAYFRKOLRQH","+13SPNQGWD0YHKBDCDW8UJCB","+14PRSMKDHHO-T3DEZG5BTIF","+15QW2PYAX25EJJYKFMEBDHU","+179MDCTAVRFHW84NMDBQYWS","+17GUGBXIBMDX7HGTWTI7UUA","+17ICNPG+OKZBNPXGKF7UOW1","+17IEWUYUK7FGENDF+D8ZEZ8","+192E6DHEXD7PRUKKLFQXXJA","+19WOMK5U8L0WZ-GTKZAOC8Y","+1A54IYNNBPZ-R7QOBXAJSFM","+1AB2ROEM2T2I+DDTTSJ08HR","+1ANP-8+BOEIMCC-S+LYHNTC","+1BFVXQ42BSLNK5UTD8JS6WR","+1CEWTNV1MTD-LKP7CKQ2UN8","+1CF7P3AKIUE3BZN3WMLA8NV","+1CGIHBTPKUBPR45PCOR7SC2","+1CSBZ0OITD2VE9QF2HDHTK2","+1CTBVJ8UIN6G41DJ71OLDLT","+1CX+C6ENG4IK9UWMTK1T2X+","+1DGKCR7E0QZFIOUQYQ9QZBS","+1DZQLTAN7VMNQ1P5LZVRR6N","+1E2EADOXCJJS3USXWMWOFJW","+1F0MFRNQMOOM5GOWQ6S9BZF","+1FKMRKYAHWWJEQWZWXEIPUS","+1FTZBMEQIH0TMVAICRHVHUR","+1G9SAIZGA4TTU--HXZVPXI-","+1GHLQPAFNW8G7BHZ5NAOGTJ","+1GLHM-VIUW0G4LPFMLD0C3G","+1GN5GMFBRK0ELI9BS83F3-9","+1GW8VPHEU9ZAAWHRPYZKKIC","+1GYE+DLD3NGJKW-JAIJCLHY","+1IP9F5EV1Y4S10ZDSI0OKAR","+1IQE+MIFOBNQ6FOCG1P2AX4","+1IVUQQMBI7ECWAMNYJMFVL1","+1IW8XVTJILT7GONOJRAUPHR","+1JSGFN7ATAB49KZG6W-D7BT","+1JVEB-MMKHXHBDFIIB528SN","+1KYSDIX1AKPYW1NCUQYXP3O","+1LE701ZU1KKNV36WPQ6EACF","+1LGPSLO+NDDW1KG3JKBUUUD","+1LKVG2OQL9ZK1QHCEKB9SZX","+1LM-5BJVE3AHH3BURVQEXKB","+1LYOWX+NNVKOY1ETOUMY5S-","+1MHATVU-SIPQ9FCYVPCCVDP","+1MX+YXDQDWZZG07Q32LUOPB","+1NFVHVFXTW3BBJACQOEODK1","+1NJL99FL+XAZDXUOVDROA37"]}},"oplogApplicationMode":"StableRecovering"},"truncated":{"op":{"o":{"sessionids":{"358": {"type":"string","size":29} }}}},"size":{"op":14493733}} {"t":\{"$date":"2023-12-03T04:27:39.236+00:00"} ,"s":"D3", "c":"STORAGE", "id":22414, "ctx":"ReplWriterWorker-0","msg":"WT begin_transaction","attr":{"snapshotId":1518124,"readSource":"kNoTimestamp"}}
eric.sedor commented on Wed, 17 Apr 2024 21:48:24 +0000: We haven’t heard back from you for some time, so I’m going to close this ticket. If this is still an issue for you, please provide additional information and we will reopen the ticket. eric.sedor commented on Wed, 13 Dec 2023 22:35:45 +0000: Hi mzimmerman@gmail.com, I'm glad to hear you found a path forward that didn't involve forking. My strong suspicion based on your report and reported solution is that this is related to current transaction and cache size necessities. It may or may not be a bug that we couldn't get to a better error during a state of high cache thrash. I've created a secure upload portal for you. Files uploaded to this portal are hosted on Box, are visible only to MongoDB employees, and are routinely deleted after some time. If you still have this information available, then for each node in the replica set spanning a time period that includes the incident, would you please archive (tar or zip) and upload to that link: the mongod logs the $dbpath/diagnostic.data directory (the contents are described here) Gratefully, Eric mzimmerman@gmail.com commented on Mon, 4 Dec 2023 14:28:31 +0000: Cache full was a hint that I then followed – the size of our cache was configured to 16GB (our workload is very write-heavy, so having all indexes in RAM wasn't a thing I was trying to do, though I know that's best practice). Using unmodified code, increasing that cache size then allowed the transaction to be written fully from the oplog and the system didn't hang. After this particular transaction was inserted and the shard/replicaset was clean in a PSS architecture, I was then able to stop/restart each shard with the configured 16GB of memory and they've been running fine for ~8 hours without my modified code. Will I hit this limit again somehow in the future? What is it about inserting a large transaction that would cause this issue? Any one record can only be 16MB in size... so how could writing just one larger record hang a system with 16GB of memory for the cache configured? mzimmerman@gmail.com commented on Mon, 4 Dec 2023 03:39:40 +0000: After commenting out some of the error handling (attempting to tell mongo that the write succeeded and to just move on with recovering everything else in the oplog (I don't care about this one transaction that it's stuck on)), I got another error message further along: {"t":\{"$date":"2023-12-04T03:38:49.357+00:00"} ,"s":"D1", "c":"WTTXN", "id":22430, "ctx":"ReplWriterWorker-1","msg":"WiredTiger message","attr":{"message": {"ts_sec":1701661129,"ts_usec":357612,"thread":"390727:0x7fb4944ca640","session_dhandle_name":"file:index-76--2567016947708944443.wt","session_name":"WT_CURSOR.insert","category":"WT_VERB_TRANSACTION","category_id":39,"verbose_level":"DEBUG","verbose_level_id":1,"msg":"Rollback reason: Cache full"} }} I can see where this is called/triggered in the code, but I'm not sure how to "skip" this to continue along with this troubleshooting idea.
Click on a version to see all relevant bugs
MongoDB Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.