...
If a partition qkview is generated, a number of issues can subsequently occur if blades in the partition have not been rebooted: - If both system controllers are rebooted, backplane connectivity between the blades will fail, resulting in a complete loss of tenant connectivity. The system controllers do not need to be rebooted simultaneously. There will be repeated messages in the partition VELOS.log showing VQF IMM and EMM watchdogs from all blades in the partition: fpgamgr[14]: priority="Info" version=1.0 msgid=0x305000000000021 msg="Adding slot to VQF database set" slot=4. fpgamgr[14]: priority="Info" version=1.0 msgid=0x305000000000023 msg="Enabling VQF synchronization with slot" slot=4. fpgamgr[14]: priority="Info" version=1.0 msgid=0x305000000000024 msg="Slot activation status updated in VQF" slot=4 status=1. fpgamgr[14]: priority="Info" version=1.0 msgid=0x305000000000005 msg="VoQ programmed" blade=4 port=15 module="EMM" state="enabled". fpgamgr[14]: priority="Info" version=1.0 msgid=0x305000000000027 msg="VQF remote slot update" slot=4 ports="4 9 15 ". fpgamgr[14]: priority="Info" version=1.0 msgid=0x305000000000005 msg="VoQ programmed" blade=4 port=15 module="EMM" state="enabled". fpgamgr[14]: priority="Info" version=1.0 msgid=0x305000000000005 msg="VoQ programmed" blade=4 port=15 module="IMM" state="enabled". fpgamgr[14]: priority="Info" version=1.0 msgid=0x305000000000027 msg="VQF remote slot update" slot=4 ports="4 9 15 ". fpgamgr[14]: priority="Warn" version=1.0 msgid=0x305000000000008 msg="VQF IMM Watchdog." slot=4 port=15. fpgamgr[14]: priority="Info" version=1.0 msgid=0x305000000000005 msg="VoQ programmed" blade=4 port=15 module="IMM" state="disabled". fpgamgr[14]: priority="Info" version=1.0 msgid=0x305000000000027 msg="VQF remote slot update" slot=4 ports="". - F5OS software will report incorrect link status for external interfaces if links change state. - The system may continue to treat interfaces configured as members of a LAG as being up even after the interface goes down, resulting in traffic failures. - Interfaces that were down at the point a partition Qkview was generated may fail to pass traffic properly even if the link comes up. Traffic will appear in a packet capture, but the system will not transmit egress traffic out the interface.
- Tenant connectivity is broken. - Incorrect link status reported in F5OS. - Interface link state changes not reflected in LAG state.
A VELOS partition qkview is generated
An Engineering Hotfix (EHF) is available on MyF5 Downloads at https://my.f5.com/manage/s/downloads?productFamily=F5OS&productLine=F5OS_for_VELOS&version=1.6.2&container=1.6.2-EHF. The EHF has fixes that prevent ID 1559525 and ID 1576241 from occurring To recover an affected partition without the EHF, reboot all the blades in the partition. To prevent a partition Qkview from triggering this issue, do the following: 1. Log into the VELOS system controller CLI as root 2. Log into each blade in the partition via SSH: ssh blade-<number> e.g. ssh blade-1 3. While logged into the blade, run the following command: for layer in $(docker inspect partition_fpga | jq --join-output '.[0].GraphDriver.Data | .LowerDir, ":", .MergedDir' | tr ':' '\n'); do if [ -f "$layer/etc/qkview-collect/fpgatoolFpgaVqfSnapshotCmds.txt" ]; then sed -i 's/^linkscan/#linkscan/g' $layer/etc/qkview-collect/fpgatoolFpgaVqfSnapshotCmds.txt; fi; done 4. Reboot the affected blade 5. Repeat steps 2 through 4 for every other blade in the partition only after the previous blade has returned to service.
None