Symptom
On a s/w SDR reload, the old active node takes too long to go down, resulting in HEADLESS_SDR warning logs and core dumps.
Conditions
Exact scenario may vary, but a software reload with >100 interfaces in Linux - which includes those in the default vrf unless TPA is disabled for the default VRF - is likely to hit this issue.
Workaround
If TPA is not required (e.g. if ZTP or gRPC telemetry are not used), TPA can be disabled for the default VRF using "tpa vrf default disable".
If TPA is required, and a software reload is needed, the kim process should be manually shutdown ("process shutdown kim") before the reload happens to avoid this issue being hit.
Further Problem Description
The issue is caused by the TPA Linux kernel module taking a long time to clean up while the VM/container is being shut down. For small numbers of interfaces the delay isn't significant, but if there are more than 100-ish interfaces, then the router may hit this issue.
For accurate diagnosis, the kernel logs should be checked (e.g. kern.log from 'show tech os admin') for large numbers of logs such as:
Oct 26 20:16:08 host kernel: [ 2832.169911] lcndklm_vrf: inf: Deleting netdevice: ifh Tg0_0_0_0/0x120 vrf default/0x60000000 (via lci)