...
ScaleIO Hardware awareness feature fills PERC termlog with messages, thus all the log is getting full with ScaleIO Hardware awareness prints.PERC termlog fills, and is unusable for analysis. PERC termlog output filled with the following: Example 1 07/11/18 18:00:18: C0:process_dcdb_callback: No valid issuer Sense address for the sense Data sdAddr:0 sdLen:0 cmdId:c 07/11/18 18:00:18: C0:cmdId= c: cmd=3, cmdStat=0, num_sg_elements=1, status=1 [PCI_COMMAND], chainFrameAllocated=1 07/11/18 18:00:18: C0:mfa=366f0400, mf=4164a000, mfSge=4164a030, bytesTransferred=0, next ffff 07/11/18 18:00:18: C0:lmid=25b smid=39b ioCmplType=2 iotype=0 LockGranted=0 lockType=0 isRunOnSecondary 0 07/11/18 18:00:18: C0:startTime=1a07ed3, waitTime=0 lines=4479bee0, lineMap=0, activeRecoveryCount=0, lockPromotedByRec=0 07/11/18 18:00:18: C0:ldbbmAlreadyTried=0, ldbbmIssueWriteAsWV=0, recoveryAttempted 0 bypassIo 0 07/11/18 18:00:18: C0:SGLIsIEEE=1, dontGroupFurther=0, isParentOfSplitCmd=0, isChildOfSplitCmd=0 07/11/18 18:00:18: C0:hdr.length=4 hdr.targetId=e rsLen=0 flags=c02dbb70, msg=40d7be00 07/11/18 18:00:18: C0:CDB: 4d 00 51 00 00 00 00 00 04 00 07/11/18 18:00:18: C0:CMD_PCI: cmd=03, cmdId=c, nsg=1, pd=0e, timeout=0, cdb= 4d 00 51 00 00 00 00 00 04 00 07/11/18 18:00:18: C0:process_dcdb_callback: No valid issuer Sense address for the sense Data sdAddr:0 sdLen:0 cmdId:73 07/11/18 18:00:18: C0:cmdId= 73: cmd=3, cmdStat=0, num_sg_elements=1, status=1 [PCI_COMMAND], chainFrameAllocated=1 07/11/18 18:00:18: C0:mfa=366f0400, mf=4158f000, mfSge=4158f030, bytesTransferred=0, next ffff 07/11/18 18:00:18: C0:lmid=235 smid=39b ioCmplType=2 iotype=0 LockGranted=0 lockType=0 isRunOnSecondary 0 07/11/18 18:00:18: C0:startTime=1a07ed3, waitTime=0 lines=448857d8, lineMap=0, activeRecoveryCount=0, lockPromotedByRec=0 07/11/18 18:00:18: C0:ldbbmAlreadyTried=0, ldbbmIssueWriteAsWV=0, recoveryAttempted 0 bypassIo 0 07/11/18 18:00:18: C0:SGLIsIEEE=1, dontGroupFurther=0, isParentOfSplitCmd=0, isChildOfSplitCmd=0 07/11/18 18:00:18: C0:hdr.length=4 hdr.targetId=e rsLen=0 flags=c02dbb70, msg=40cbec00 07/11/18 18:00:18: C0:CDB: 4d 00 51 00 00 00 00 00 04 00 07/11/18 18:00:18: C0:CMD_PCI: cmd=03, cmdId=73, nsg=1, pd=0e, timeout=0, cdb= 4d 00 51 00 00 00 00 00 04 00 Example 2 03/02/21 9:43:21: C0:EVT#8287997-03/02/21 9:43:21: 113=Unexpected sense: PD 08(e0x20/s8) Path 500056b3f17ebac8, CDB: 4d 00 51 00 00 00 00 00 04 00, Sense: 5/24/00 03/02/21 9:43:21: C0:Raw Sense for PD 8: 70 00 05 00 00 00 00 0a 00 00 00 00 24 00 00 00 00 00 03/02/21 9:43:21: C0:EVT#8287998-03/02/21 9:43:21: 113=Unexpected sense: PD 08(e0x20/s8) Path 500056b3f17ebac8, CDB: 4d 00 51 00 00 00 00 00 04 00, Sense: 5/24/00 03/02/21 9:43:21: C0:Raw Sense for PD 8: 70 00 05 00 00 00 00 0a 00 00 00 00 24 00 00 00 00 00 03/02/21 9:43:21: C0:EVT#8287999-03/02/21 9:43:21: 113=Unexpected sense: PD 03(e0x20/s3) Path 500056b3f17ebac3, CDB: 4d 00 51 00 00 00 00 00 04 00, Sense: 5/24/00 03/02/21 9:43:21: C0:Raw Sense for PD 3: 70 00 05 00 00 00 00 0a 00 00 00 00 24 00 00 00 00 00
ScaleIO Hardware Awareness feature issues pagecodes to the SDS devices via SMART commands. One of the commands (cdb= 4d 00 51 00 00 00 00 00 04 00) is unsupported by certain vendor's SSDs, and every one minute the command is executed, the termlog fills with the above output. Here is the example types of SSDs that observed this issue: TOSHIBA 3.84TB KHK6YRSE3T84, SKhynix 3.84TB HFS3T8G32FEH 741, INTEL 3.84TB SSDSC2KB038T8R, DELL 3.84 TBMZ7LH3T8HMLT0D3
WORKAROUND: Customers can disable the Hardware Awareness feature to avoid this issue. Also see NOTE below: 1. Edit /opt/emc/scaleio/sds/cfg/conf.txt: vi /opt/emc/scaleio/sds/cfg/conf.txt 2. Add the following line: tgt_dev__enable_metadata_polling=0 3. Save the file. 4. Place the SDS into Instant Maintenance Mode. 5. Restart the SDS process: pkill sds 6. Exit the SDS from Instant Maintenance Mode. 7. Repeat for the remaining SDSes in the cluster. To revert the workaround, just remove the "tgt_dev__enable_metadata_polling=0" line, save, and restart the SDSes per the above procedure. NOTE: There is a side effect of disabling this feature. scli --query_sds_device_info command will no longer return up-to-date S.M.A.R.T. information.S.M.A.R.T. related alerts "SMART_TEMPERATURE_STATE_FAILED_NOW", "SMART_END_OF_LIFE_STATE_FAILED_NOW" and "SMART_AGGREGATED_STATE_FAILED_NOW" will not be generated.