Symptom
N3500 reloads with "spm hap reset" or only SPM process crashes. A core file may or may not be generated.
Conditions
Following error might be seen before the crash:
%AFM-2-AFM_TCAM_UTIL_ABOVE_THRESHOLD: nat tcam utilization has gone above the threshold of 90% on ASIC id 0
which means NAT has to be configured and heavily utilized.
Also, the following library in SPM is constantly increasing its Current allocated memory, for example:
TYPE NAME ALLOCS BYTES
CURR MAX CURR MAX
...
116 [r-xp]/isan/plugin/1/isan/lib/libspm.so 2913705 2916326 46619280 46771792
...
116 [r-xp]/isan/plugin/1/isan/lib/libspm.so 7628055 7630322 122048880 122196488
this can be checked with "show system internal spm mem-stats detail".
Workaround
None at the moment apart from removing NAT configuration or reloading the Switch to release memory. To know at what moment to reload the Switch monitor the following command:
show system internal process memory | i spm
which should generate similar output to this:
`show system internal process memory | i spm`
3660 ? Ss 03:24:27 0 0 370684 586484 2.3 /isan/bin/spm
29971 pts/0 S+ 00:00:00 0 99 956 3012 0.0 egrep -- spm
and, the second to last column is the value of Virtual Memory used by the process. In other words, it is the total memory used currently by SPM and should it reach value of >500000 - reload the Switch. Above output was captured from a Switch which was about to crash.
Further Problem Description
Simply put: this is a day one issue triggered by the amount of translations Switch is handling. If translations are being cleared in large quantities at once, SPM will not correctly release TVL allocations.