Symptom
Cisco NX-OS crashes due to aclqos hap reset on Cisco Nexus 9300-GX/GX2 switches when redirect action is used as part of port or VLAN ACL. This issue is also seen after interface flap on a Cisco Nexus switch even when there is no change in configuration.
Conditions
The bug is seen only when redirect action is used as part of Port or Vlan ACLS on GX platform and can happen even with one redirect ACL.
This is a timing issue that happens when more than 1 notification of a particular port (that is part of redirect list) going down are received by aclqos process.
While trying to remove that port down from the redirect list for the second time, the problem happens.
For example:
ip access-list REDIRECT_ACL
10 permit ip any any vlan 10 redirect port-channel12
interface port-channel115
switchport
switchport mode trunk
ip port access-group REDIRECT_ACL in
+ reset reason:
Reason: Reset Requested due to Fatal Module Error
Service: aclqos hap reset
Decoded Stack Trace:
1: 0x7f302a31bfd2 aclqos_nx_redirect_mcast_group_update ---> ../platform/dc3/aclqos/nx/aclqos_nx_redirect.c:1587
2: 0x7f302caf301d aclqos_update_redirect_mcast_group ---> ../platform/dc3/aclqos//common/cl/aclqos_pltfm_init.c:2973
3: 0x561c3583ae1c aclqos_process_pacl_txlist ---> ../feature/forwarding-sw/aclqos/server/aclqos_msg_handlers.c:5163
4: 0x561c357e7d1b aclqos_mts_msg_handler ---> ../feature/forwarding-sw/aclqos/server/aclqos_fu.c:379
5: 0x561c357e80cf aclqos_demux ---> ../feature/forwarding-sw/aclqos/server/aclqos_fu.c:446
6: 0x7f302997cf8f fu_fsm_engine_process_app_ev ---> ../utils/fsmutils/fsm.c:2216
7: 0x7f302997eb5f fu_fsm_engine ---> ../utils/fsmutils/fsm.c:2548
8: 0x561c357ef106 main ---> ../feature/forwarding-sw/aclqos/server/aclqos_main.c:851
Decoded core file/aclqos:
Service: aclqos
Description: ACLQOS Daemon
Executable: /lc/isan/bin/aclqos
Start type: SRV_OPTION_RESTART_STATEFUL (24)
Death reason: SYSMGR_DEATH_REASON_FAILURE_SIGNAL (2)