...
A Cat9K switch may face high kernel memory utilization due to a memory leak in tams_proc. Users can check overall memory utilization with the following command: Cat9K# show platform software status control-processor brief Memory (kB) Slot Status Total Used (Pct) Free (Pct) Committed (Pct) 1-RP0 Critical 7748648 7357056 (95%) 391592 ( 5%) 8454256 (109%) <<<----- "Used" is at 95% here. And users can check how much memory utilization is being used in tams_proc specifically: Cat9K# show processes memory platform sorted System memory: 7748556K total, 7340948K used, 407608K free, Pid Text Data Stack Dynamic RSS Name ---------------------------------------------------------------------- 13101 63 3264356 136 3261012 3264356 tams_proc <<<----- The RSS value shows that tams_proc is holding over 3GB here
An issue with the underlying ACT2 chip being offline results in a failure to get entropy, which allocates memory that is never freed. The following log indicates an issue with collecting entropy: %ENTROPY-0-ENTROPY_ERROR: Unable to collect sufficient entropy And users can check the status of the ACT2 chip using the following command: Cat9K# show crypto entropy status # Entropy source Type Status Requests Entropy Bits 1 ACT-2 HW Offline 112861 -- <<<----------------- 2 randfill SW Working 112861 128/14461056(*) 3 getrandombytes SW Working 112861 160/18076320(*) Secure mode enabled (*) - The entropy collected from SW sources were not counted as a part of achieving the entropy target Fresh entropy collected once every 60 minutes Entropy most recently collected 72192 minutes ago Entropy target = 256 bits; entropy actually collected = 384 bits
A reload can potentially reset the ACT2 chip, bringing it back online. Please check the show crypto entropy status output post-reload to confirm the status of the ACT2 chip. A reload will also free up the memory leaked by tams_proc.
This bug is caused by a hardware malfunction (in the ACT2 chip). While the memory leak itself can be fixed (and has been fixed in 17.13), it doesn't fix the fact that the entropy source is broken on this particular device, and will only push the problem down the line. Since this bug is not triggered on normal devices, the fix will not be backported to versions of code earlier than 17.13.