
Contents
Safety notices ................................. v
Beginning troubleshooting and problem analysis .................. 1
Determining the problem analysis procedure to perform..................... 1
Resolving a BMC access problem ............................ 2
Resolving a power problem .............................. 3
Resolving a system firmware boot failure .......................... 4
Resolving a VGA monitor problem ............................ 8
Resolving an operating system boot failure ......................... 9
Resolving a sensor indicator problem ........................... 11
Resolving a hardware problem ............................. 12
Resolving a GPU, PCIe adapter, or device problem ...................... 13
Resolving a RAID adapter problem .......................... 14
Resolving a network adapter problem ......................... 15
Resolving a graphics processing unit problem ....................... 16
Resolving an NVMe Flash adapter problem ....................... 19
Resolving a storage device problem .......................... 20
Identifying the location of the PCIe adapter by using the slot number ............... 21
Identifying the location of the GPU .......................... 22
Identifying the location of the NVMe Flash adapter ..................... 23
Identifying the location of the storage device ....................... 24
User guides for GPUs and PCIe adapters ........................ 25
Resolving an over temperature problem for a water-cooled 8335-GTB system ............. 26
Identifying a service action .............................. 27
Identifying a service action by using system event logs.................... 27
Identifying service action keywords in system event logs ................... 36
Identifying a service action by using sensor and event information ................ 37
Identifying a service action by using sensor and event information for the 8335-GCA and 8335-GTA ... 37
Identifying a service action by using sensor and event information for the 8335-GTB ......... 57
Identifying a service action by using sensor and event information for the 8348-21C ......... 78
Isolation procedures ................................ 96
EPUB_PRC_FIND_DECONFIGURE_PART isolation procedure ................. 96
EPUB_PRC_SP_CODE isolation procedure ........................ 97
EPUB_PRC_PHYP_CODE isolation procedure ....................... 97
EPUB_PRC_ALL_PROCS isolation procedure ....................... 98
EPUB_PRC_ALL_MEMCRDS isolation procedure...................... 98
EPUB_PRC_LVL_SUPPORT isolation procedure ...................... 99
EPUB_PRC_MEMORY_PLUGGING_ERROR isolation procedure ................ 100
EPUB_PRC_FSI_PATH isolation procedure ....................... 100
EPUB_PRC_PROC_AB_BUS isolation procedure...................... 101
EPUB_PRC_PROC_XYZ_BUS isolation procedure ..................... 101
EPUB_PRC_EIBUS_ERROR isolation procedure ...................... 102
EPUB_PRC_POWER_ERROR isolation procedure ..................... 103
EPUB_PRC_MEMORY_UE isolation procedure ...................... 104
EPUB_PRC_HB_CODE isolation procedure ....................... 104
EPUB_PRC_TOD_CLOCK_ERR isolation procedure .................... 106
EPUB_PRC_COOLING_SYSTEM_ERR isolation procedure .................. 106
EPUB_PRC_GPU_ISOLATION_PROCEDURE isolation procedure ................ 107
Verifying a repair ................................. 108
Collecting diagnostic data .............................. 109
Contacting IBM service and support........................... 110
Finding parts and locations .......................... 111
8335-GCA and 8335-GTA locations ........................... 111
8335-GCA and 8335-GTA parts ............................ 115
© Copyright IBM Corp. 2015, 2019 iii