NEC Express5800/A1040b User manual

Express5800/A2040b,A2020b,
A2010b,A1040b
Machine Check Monitoring Service
User's Guide
(Release 1.5)
May 2015
NEC Corporation
© 2015 NEC Corporation
855-900937

Notes on Using This Manual
No part of this manual may be reproduced in any form without the prior written
permission of NEC Corporation.
The contents of this manual may be revised without prior notice.
The contents of this manual shall not be copied or altered without the prior written
permission of NEC Corporation.
Trademarks
Linux is a trademark or registered trademark of Linus Torvalds in Japan and other
countries.
Red Hat® and Red Hat Enterprise Linux are trademarks or registered trademarks of
Red Hat, Inc. in the United States and other countries.
Intel and its log are registered trademarks of Intel Corporation in the United States and
other countries.
Emulex and LightPulse are registered trademarks of Emulex Corporation.
Broadcom, NetXtreme, Ethernet@Wirespeed, LiveLink, and Smart Load Balancing
trademarks of Broadcom Corporation and/or its associated company in the United
States and other countries.
All other product, brand, or trade names used in this publication are the trademarks or
registered trademarks of their respective trademark owners.
Related Documents
Express5800/A2040b, A2020b, A2010b, A1040b User's Guide
Capacity Optimization (COPT) User's Guide

Contents
1. Introduction.............................................................................................................................. 1
1.1 Overview............................................................................................................................ 1
1.2 Operating Environment ..................................................................................................... 1
1.3 Terminology....................................................................................................................... 2
1.4 Access Limitation............................................................................................................... 2
2. Features of Machine Check Monitoring Service .................................................................. 3
2.1 Features of Machine Check Monitoring Service ............................................................... 3
2.2 System Configuration of Machine Check Monitoring Service ........................................... 3
2.3 Functional Drawing of Machine Check Monitoring Service............................................... 4
2.4 Features of Machine Check Monitoring Service ............................................................... 5
3. Installation and Configuration ............................................................................................... 6
3.1 Installation ......................................................................................................................... 6
3.1.1 Installing acpi_call ......................................................................................................... 6
3.1.2 Installing capmonitor...................................................................................................... 8
3.1.3 Installing mcemonitor..................................................................................................... 9
3.2 Upgrade........................................................................................................................... 10
3.2.1 Upgrading acpi_call..................................................................................................... 10
3.2.2 Upgrading capmonitor..................................................................................................11
3.2.3 Upgrading mcemonitor................................................................................................ 12
3.3 Configuration ................................................................................................................... 13
3.3.1 capmonitor configuration file ....................................................................................... 13
3.3.2 mcemonitor configuration file ...................................................................................... 13
3.3.3 Disabling CMCI............................................................................................................ 14
3.3.4 Disabling kdump restart on udev triggered by logical processor offline...................... 14
3.3.5 Script file to be executed after Core Offline................................................................. 15
3.3.6 Disabling EDAC........................................................................................................... 15
3.4 Uninstallation................................................................................................................... 16
3.4.1 Uninstalling acpi_call................................................................................................... 16
3.4.2 Uninstalling capmonitor ............................................................................................... 17
3.4.3 Uninstalling mcemonitor.............................................................................................. 17
4. Log.......................................................................................................................................... 18

4.1 Logging Destination......................................................................................................... 18
4.2 Output Format ................................................................................................................. 18
5. Command Reference ............................................................................................................ 19
5.1 Show CPU / Memory Status............................................................................................ 19
6. Messages ............................................................................................................................... 22
6.1 On-screen Message........................................................................................................ 22
6.1.1 On-screen messages output from mcemonitor........................................................... 22
6.1.2 On-screen messages output from capmonitor............................................................ 24
6.1.3 On-screen messages output from acpi_call................................................................ 26
6.1.4 Other on-screen messages......................................................................................... 27
6.2 syslog Messages............................................................................................................. 27
6.3 Operation Log Messages ................................................................................................ 28
6.3.1 Operation log messages output from mcemonitor ...................................................... 28
6.3.2 Operation log messages output from capmonitor ....................................................... 35
7. Restrictions and Precautions............................................................................................... 39
7.1 Manual Onlining CPU being Core Offlined...................................................................... 39
7.2 cpuspeed Error Message Output at OS Shutdown......................................................... 39

1
1. Introduction
1.1 Overview
Machine Check Monitoring Service provides a service to identify fault component of hardware by
sending logs of correctable error occurred on CPU and memory of Linux server to the firmware in the
server.
If the number of times correctable error occurrence exceeds threshold value, Machine Check
Monitoring Service performs Core Offline (offlining of CPU) or Page Offline (offlining memory page) to
prevent system down due to uncorrectable error. If the OS supports Core Online feature and the
system has spare CPU, Machine Check Monitoring Service adds spare CPU automatically (Core
Online) after Core Offline completes. The Offline and Online operations are performed in cooperation
with kernel on Linux server.
Machine Check Monitoring Service is composed of firmware and software on Linux server. Software
includes mcemonitor (Machine Check Monitoring Service) and capmonitor (Capacity Monitoring
Service).
Note
Refer to "Capacity Optimization (COPT) User's Guide" for details of Core
Online feature.
Core Offline, Core Online, and Page Offline are not supported on
Express5800/A1040b.
1.2 Operating Environment
Machine Check Monitoring Service requires operating environment as shown below:
Table 1-1 Operating Environment
Hardware
Express5800/A1040b
Express5800/A2010b
Express5800/A2020b
Express5800/A2040b
OS
Red Hat Enterprise Linux 6.6

2
1.3 Terminology
Terms used in Machine Check Monitoring Service are as shown below:
Table 1-2 Terminology
Term
Description
mcemonitor
Software that realizes higher RAS feature.
When mcemonitor receives logs from mce mechanism of Linux kernel,
analyze it, and monitors fault occurrence in cooperation with system.
mcemonitor instructs Core Offline and Page Offline to the kernel.
capmonitor
Software that controls Core Offline for failed core, and Core Online that
COPT feature provides.
Refer to "Capacity Optimization (COPT) User's Guide" for details of COPT
feature.
acpi_call
Driver used to access ACPI
ACPI
Advanced Configuration and Power Interface
Open industry specification related power management and hardware
configuration.
MCE
Machine Check Exceptions
Hardware error detected by CPU
CMC
Corrected Machine Check
Correctable error detected by CPU
CPU socket
Means a single Intel Xeon processor. One CPU socket can have several
cores. With Express5800/A2040, up to 4 CPU sockets can be installed in
the server.
CPU core
Core portion of CPU that performs arithmetic processing and others. One
or more cores can exist in CPU socket.
Physical CPU socket
number
Means physical mounting position of a CPU socket in the server. The
number from No. 1 to No. 4 is assigned for every CPU socket.
Logical processor
Means the processor where OS actually executes task and threads. When
Hyper-Threading feature is enabled, two logical processors exist in one
CPU core. When Hyper-Threading feature is disabled, only one logical
processor exists in one CPU core.
1.4 Access Limitation
Only the privileged user (root account) can use mcemonitor.

3
2. Features of Machine Check Monitoring Service
This section describes features and characteristics of Machine Check Monitoring Service.
2.1 Features of Machine Check Monitoring Service
For the server that is used in mission critical domain, it is required to identify the failing component,
online degrade it, and online replace it before system down occurs on the server.
If the Machine Check Monitoring Service detects a correctable failure in CPU and memory in Linux
server, it sends log to firmware in the server to identify the failed component. When the correctable
error exceeds threshold value, the Machine Check Monitoring Service degrades CPU or memory page
online (Core Offline, Page Offline). In addition, if the server uses an OS that supports Core Online
feature and spare CPU is equipped in the server, the Machine Check Monitoring Service adds the
spare CPU automatically (Core Online) after Core Offline. Thus the performance deterioration can be
prevented.
Note
Refer to "Capacity Optimization (COPT) User's Guide" for details of Core
Online feature.
Express5800/A1040b does not support Core Offline, Core Online, and Page
Offline.
2.2 System Configuration of Machine Check Monitoring Service
The system configuration of Machine Check Monitoring Service is shown below.
Figure 2-1 System Configuration of Machine Check Monitoring Service
Server
OS
MC Scope
mcemonitor
capmonitor
acpi_call
Firmware

4
2.3 Functional Drawing of Machine Check Monitoring Service
Functional drawing of Machine Check Monitoring Service and its associated components are shown
below.
Figure 2-2 Functional drawing
capmonitor (log)
syslog
Firmware
mcemonitor (log)
mcemonitor
Hardware
CPU
Memory
Fault
kernel
capmonitor
acpi_call

5
2.4 Features of Machine Check Monitoring Service
Process flow of Machine Check Monitoring Service is shown below.
Table 2-1 Process flow of Machine Check Monitoring Service
Features
Process flow
Monitoring
CPU failure
When mcemonitor detects occurrence of CPU failure, send CPU fault information to
firmware.
When the firmware receives CPU fault information, it determines the failed
component.
The firmware manages failure occurrence count, and when it exceeds threshold
value, the firmware instructs Core Offline to mcemonitor.
When mcemonitor receives Core Offline instruction from firmware, it issues CPU
Offline instruction to kernel.
If Hyper Threading Mode is set to OFF, one logical CPU in CPU core is made
offline. If Hyper Threading Mode is set to ON, two logical CPUs in CPU core are
made offline.
When CPU Offline succeeds, the relevant CPU is disabled for OS and software.
Thus, the number of available CPUs is reduced.
Note: Express5800/A1040b does not support Core Offline feature.
mcemonitor notifies the firmware of result of CPU Offline.
When CPU Offline succeeds and if the server has spare CPU, the spare CPU is
added automatically (Core Online feature).
Note: For details of Core Online, refer to Capacity Optimization (COPT) User's
Guide.
CPU fault information and result of CPU Offline can be confirmed by mcemonitor
command.
See 5.1 Show CPU / Memory Status for details of mcemonitor command.
Monitoring
memory failure
If the correctable memory error on a certain memory page exceeds threshold value,
the firmware instructs Memory Page Offline to mcemonitor.
When mcemonitor receives Memory Page Offline instruction from firmware, it sends
Memory Page Offline instruction to kernel.
Memory Page Offline is performed in unit of 4K bytes.
When Memory Page Offline succeeds, the relevant memory page is disabled for OS
and software. Thus, the number of available memory capacity is reduced.
Note: Express5800/A1040b does not support Page Offline feature.
mcemonitor notifies the firmware of result of Memory Page Offline.
Result of Memory Page Offline can be confirmed by mcemonitor command.
See 5.1 Show CPU / Memory Status for details of mcemonitor command.

6
3. Installation and Configuration
This section describes how to install, configure, and start the service of Machine Check Monitoring
Service.
3.1 Installation
Machine Check Monitoring Service is provided as RPM package. Install it by using rpm command as
shown below:Install packages acpi_call, capmonitor, and mcemonitor in order.
3.1.1 Installing acpi_call
1. Login to the target machine as a root user.
2. The most recent version of RPM are available for download from the following website.
http://www.58support.nec.co.jp/global/download/index.html
3. Install acpi_call RPM package of Machine Check Monitoring Service using rpm command.
# rpm -ivh mcl-acpicall-2.4-3.01.2.6.32.504.23.4.el6.x86_64.rpm
Preparing... ########################################## [100%]
1:mcl-acpi_call ########################################## [100%]
Starting acpi_call driver[ OK ]
4. Confirm that acpi_call RPM package of Machine Check Monitoring Service is installed
correctly. The following is displayed when installation completes successfully.
# rpm -qa | grep acpicall
mcl-acpicall-2.4-3.01.2.6.32.504.23.4.el6.x86_64
5. Check if acpi_call driver is started normally. If the following 3 acpi_call are displayed,
acpi_call driver is started normally.
# lsmod | grep acpi
acpi_clpcall 6897 0
acpi_capcall 6897 0
acpi_mcecall 6897 0
6. Installation of package may not complete if the following message is displayed. Repeat from
Step 3 according to "Solution".
Error message
package mcl-acpicall-2.4-3.01.2.6.32.504.23.4.el6.x86_64 is already installed
Solution
Uninstall acpi_call, and install it again.
Error message
error: unpacking of archive failed
on file: cpio: write failed - No space left on device
Solution
Disk space is insufficient. Increase free space, and install it again.

7
7. Configure /etc/sysconfig/kdump.
Creation of initrd file for kdump may fail if an external module unnecessary for dump collection
is incorporated. To prevent this, add MKDUMPRD_ARGS="--allow-missing".
Sample configuration of /etc/sysconfig/kdump
MKDUMPRD_ARGS="--allow-missing"
With this configuration, the following warning may appear when kdump service is started. This
message indicates that the external module was not incorporated, and it is not the problem.
WARNING: No module xxx found for kernel 2.6.32-504.23.4.el6.x86_64, continuing anyway
(xxx represents external module name)

8
3.1.2 Installing capmonitor
1. Login to the target machine as a root user.
2. The most recent version of RPM are available for download from the following website.
http://www.58support.nec.co.jp/global/download/index.html
3. Install capmonitor RPM package of Machine Check Monitoring Service using rpm command.
# rpm -ivh mcl-capmonitor-2.4-2.12.el6.x86_64.rpm
Preparing... ######################################## [100%]
1:mcl-capmonitor ######################################## [100%]
Starting capmonitor daemon[ OK ]
Note: acpi_call must be installed before installing capmonitor.
If capmonitor is installed while acpi_call has not been installed, the following message
is output and installation of capmonitor fails.
# rpm -ivh mcl-capmonitor-2.4-2.12.el6.x86_64.rpm
error: Failed dependencies:
mcl-acpicall is needed by mcl-capmonitor-2.4-2.12.el6.x86_64
4. Confirm that capmonitor RPM package of Machine Check Monitoring Service is installed
correctly. The following is displayed when installation completes successfully.
# rpm -qa | grep capmonitor
mcl-capmonitor-2.4-2.12.el6.x86_64
5. Check if capmonitor is started normally. If the following is displayed, capmonitor is started
normally.
# ps aux | grep monitor
root 6044 0.0 0.0 4068 324 ? Ss 06:18 0:00 /opt/nec/capmonitor/capmonitor
6. Installation of package may not complete if the following message is displayed. Repeat from
Step 3 according to "Solution".
Error message
package mcl-capmonitor-2.4-2.12.el6.x86_64 is already installed
Solution
Uninstall capmonitor, and install it again.
Error message
error: unpacking of archive failed
on file: cpio: write failed - No space left on device
Solution
Disk space is insufficient. Increase free space, and install it again.

9
3.1.3 Installing mcemonitor
1. Login to the target machine as a root user.
2. The most recent version of RPM are available for download from the following website.
http://www.58support.nec.co.jp/global/download/index.html
3. Install mcemonitor RPM package of Machine Check Monitoring Service using rpm command.
# rpm -ivh mcl-mcemonitor1-2.4-2.02.el6.x86_64.rpm
Preparing... ######################################### [100%]
1:mcl-mcemonitor1 ######################################### [100%]
Starting mcemonitor daemon[ OK ]
Note: acpi_call must be installed before installing mcemonitor.
If mcemonitor is installed while acpi_call has not been installed, the following message
is output and installation of mcemonitor fails.
# rpm -ivh mcl-mcemonitor1-2.4-2.02.el6.x86_64.rpm
error: Failed dependencies:
mcl-acpicall is needed by mcl-mcemonitor1-2.4-2.02.el6.x86_64
4. Confirm that mcemonitor RPM package of Machine Check Monitoring Service is installed
correctly. The following is displayed when installation completes successfully.
# rpm -qa | grep mcemonitor
mcl-mcemonitor1-2.4-2.02.el6.x86_64
5. Check if mcemonitor is started normally. If the following is displayed, mcemonitor is started
normally.
# ps aux | grep monitor
root 6078 0.0 0.0 4076 328 ? Ss 06:19 0:00 /opt/nec/mcemonitor/mcemonitor
6. Installation of package may not complete if the following message is displayed. Repeat from
Step 3 according to "Solution".
Error message
package mcl-mcemonitor1-2.4-2.02.el6.x86_64 is already installed
Solution
Uninstall mcemonitor, and install it again.
Error message
error: unpacking of archive failed
on file: cpio: write failed - No space left on device
Solution
Disk space is insufficient. Increase free space, and install it again.

10
3.2 Upgrade
Use rpm command to upgrade Machine Check Monitoring Service from old to new version.
3.2.1 Upgrading acpi_call
1. Login to the target machine as a root user.
2. Confirm that the current version of acpi_call RPM package of Machine Check Monitoring
Service is older than that of acpi_call RPM package you are going to upgrade.
# rpm -qa | grep acpi_call
mcl-acpicall-2.4-3.01.2.6.32.504.23.4.el6.x86_64
3. Copy RPM to desired directory in target machine.
The most recent version of RPM is available for download from the following website.
http://www.58support.nec.co.jp/global/download/index.html
4. Upgrade acpi_call RPM package of Machine Check Monitoring Service using rpm command.
# rpm -Uvh mcl-acpicall-2.4-3.02.2.6.32.504.23.4.el6.x86_64.rpm
Preparing... ########################################### [100%]
1:mcl-acpi_call ########################################### [100%]
Starting acpi_call driver[ OK ]
5. Confirm that acpi_call RPM package of Machine Check Monitoring Service is upgraded
correctly. The following is displayed when upgrade completes successfully.
# rpm -qa | grep acpicall
mcl-acpicall-2.4-3.02.2.6.32.504.23.4.el6.x86_64
6. Check if acpi_call driver is started normally. If the following 3 acpi_call are displayed,
acpi_call driver is started normally.
# lsmod | grep acpi
acpi_clpcall 6897 0
acpi_capcall 6897 0
acpi_mcecall 6897 0

11
3.2.2 Upgrading capmonitor
1. Login to the target machine as a root user.
2. Confirm that the current version of capmonitor RPM package of Machine Check Monitoring
Service is older than that of capmonitor RPM package you are going to upgrade.
# rpm -qa | grep capmonitor
mcl-capmonitor-2.4-2.12.el6.x86_64
3. Copy RPM to desired directory in target machine.
The most recent version of RPM is available for download from the following website.
http://www.58support.nec.co.jp/global/download/index.html
4. Upgrade capmonitor RPM package of Machine Check Monitoring Service using rpm
command.
# rpm -Uvh mcl-capmonitor-2.4-2.13.el6.x86_64.rpm
Preparing... ######################################### [100%]
4048 /opt/nec/capmonitor/capmonitor
Stopping capmonitor[ OK ]
1:mcl-capmonitor ######################################### [100%]
Starting capmonitor daemon[ OK ]
If capmonitor.conf was changed, the following message will be displayed. The message can
be safely ignored because your configuration of the capmonitor.conf is preserved.
capmonitor.conf.rpmnew is the default capmonitor.conf file.
warning: /opt/nec/capmonitor/conf/capmonitor.conf created as
/opt/nec/capmonitor/conf/capmonitor.conf.rpmnew
5. Confirm that capmonitor RPM package of Machine Check Monitoring Service is upgraded
correctly. The following is displayed when upgrade completes successfully.
# rpm -qa | grep capmonitor
mcl-capmonitor-2.4-2.13.el6.x86_64
6. Check if capmonitor is started normally. If the following is displayed, capmonitor is started
normally.
# ps aux | grep monitor
root 4141 0.0 0.0 4068 352 ? Ss 13:54 0:00 /opt/nec/capmonitor/capmonitor

12
3.2.3 Upgrading mcemonitor
1. Login to the target machine as a root user.
2. Confirm that the current version of mcemonitor RPM package of Machine Check Monitoring
Service is older than that of mcemonitor RPM package you are going to upgrade.
# rpm -qa | grep mcemonitor
mcl-mcemonitor1-2.4-2.02.el6.x86_64
3. Copy RPM to desired directory in target machine.
The most recent version of RPM is available for download from the following website.
http://www.58support.nec.co.jp/global/download/index.html
4. Upgrade mcemonitor RPM package of Machine Check Monitoring Service using rpm
command.
# rpm -Uvh mcl-mcemonitor1-2.4-2.03.el6.x86_64.rpm
Preparing... ######################################### [100%]
4083 /opt/nec/mcemonitor/mcemonitor
Stopping mcemonitor[ OK ]
1: mcl-mcemonitor1 ######################################### [100%]
Starting mcemonitor daemon[ OK ]
If mcemonitor.conf was changed, the following message will be displayed. The message can
be safely ignored because your configuration of the mcemonitor.conf is preserved.
mcemonitor.conf.rpmnew is the default mcemonitor.conf file.
warning: /opt/nec/mcemonitor/conf/mcemonitor.conf created as
/opt/nec/mcemonitor/conf/mcemonitor.conf.rpmnew
5. Confirm that mcemonitor RPM package of Machine Check Monitoring Service is upgraded
correctly. The following is displayed when upgrade completes successfully.
# rpm -qa | grep mcemonitor
mcl-mcemonitor1-2.4-2.03.el6.x86_64
6. Check if mcemonitor is started normally. If the following is displayed, mcemonitor is started
normally.
# ps aux | grep monitor
root 4189 0.0 0.0 4076 364 ? Ss 13:56 0:00 /opt/nec/mcemonitor/mcemonitor

13
3.3 Configuration
Machine Check Monitoring Service provides the following two configuration files. You can change
behavior of Machine Check Monitoring Service by modifying these configuration files. This section
describes available parameters and how to specify them.
/opt/nec/capmonitor/conf/capmonitor.conf
/opt/nec/mcemonitor/conf/mcemonitor.conf
3.3.1 capmonitor configuration file
capmonitor configuration file /opt/nec/capmonitor/conf/capmonitor.conf is used for configuration related
to CPU Core Online.
Note
For details of capmonitor configuration file, refer to "Capacity Optimization
(COPT) User's Guide".
3.3.2 mcemonitor configuration file
mcemonitor configuration file /opt/nec/mcemonitor/conf/mcemonitor.conf is used for configuration
related to CPU Core Offline and Memory Page Offline. Modify this file according to description below.
・mcemonitor.conf
# vi /opt/nec/mcemonitor/conf/mcemonitor.conf
#
# Config file for mcemonitor
#
# specify the internal action in mcemonitor to a cpu error
# off no action
# account only account errors
# soft try to offline CPU
core-ce-action = soft
# specify the internal action in mcemonitor to a page error
# off no action
# soft try to soft-offline page without killing any processes
memory-ce-action = soft

14
Table 3-1 mcemonitor configuration file(core-ce-action)
Setting in mcemonitor.conf
Description
core-ce-action = soft
Collects log and makes CPU Core Offline if the CPU error count
exceeds the threshold value. (Default)
core-ce-action = account
Collects log but does not make CPU Core Offline even if the CPU
error count exceeds the threshold value.
core-ce-action = off
Does not collect log nor make CPU Core Offline.
Table 3-2 mcemonitor configuration file(memory-ce-action)
Setting in mcemonitor.conf
Description
memory-ce-action = soft
Collects log and makes Memory Page Offline if the memory error count
exceeds the threshold value. (Default)
The process running on the relevant memory is transferred to another
memory.
memory-ce-action = off
Does not collect log nor make Memory Page Offline.
The system must be rebooted if configuration file is modified.
3.3.3 Disabling CMCI
In RHEL6.6 kernel 2.6.32-504.23.4.el6.x86_64, it is reported that the frequent occurrence of
CMCI(Corrected Machine Check Interrupt), which notifies the operating system of the detected
corrrectable error, may cause System panic.
To change the error detecting mode from "interrupt mode" to "polling mode", you need to add
"mce=no_cmci" to the kernel line in the "/boot/efi/EFI/redhat/grub.conf".
The system must be rebooted if configuration file is modified.
title Red Hat Enterprise Linux Server (2.6.32-504.23.4.el6.x86_64)
root (hd0,0)
kernel /vmlinuz-2.6.32-504.23.4.el6.x86_64 ro
root=/dev/mapper/VolGroup00-LogVol00
rd_LVM_LV= VolGroup00/LogVol00 rd_NO_LUKS nomodeset rd_NO_MD rhgb quiet
crashkernel=256M KEYBOARDTYPE=pc KEYTABLE=jp106 LANG=ja_JP.UTF-8 rd_NO_DM
mce=no_cmci
initrd /initramfs-2.6.32-504.23.4.el6.x86_64.img
3.3.4 Disabling kdump restart on udev triggered by logical processor offline
Add # at the top of the following line in /etc/udev/rules.d/98-kexec.rules file to disable the rule.
#SUBSYSTEM=="cpu", ACTION=="offline", PROGRAM="/etc/init.d/kdump restart"
Restart udev after modifying configuration file.
udevadm control --reload-rules
Note
kdump is restarted when capmonitor executes script upon completion of Core
Offline. You need to place the script file to be used after Core Offline according to
"3.3.5 Script file to be executed after Core Offline".

15
3.3.5 Script file to be executed after Core Offline
capmonitor executes all script files stored in the directory /opt/nec/capmonitor/script/cpu/offline.d upon
completion of Core Offline.
If several logical processors are made offline by a single Core Offline, the script file is executed only
once after the last processor is offlined.
Place the script /opt/nec/capmonitor/script/03kdump.sh under the directory
/opt/nec/capmonitor/script/cpu/offline.d to restart kdump as an alternative of kdump that was disabled in
3.3.4.
If you use the software that requires reboot after Core Offline (number of logical processors is reduced),
create a script file containing the necessary processes and store it under the directory
/opt/nec/capmonitor/script/cpu/offline.d.
Table 3-3 Script under /opt/nec/capmonitor/script/cpu/offline.d
Script file name
Description
How to install script file
03kdump.sh
Script that restarts kdump daemon
as needed so that crash dump can
be collected after Core Offline.
Copy from /opt/nec/capmonitor/script/ to
/opt/nec/capmonitor/script/cpu/offline.d.
XX~.sh
User script
XX: Specify execution order
by 2-digit decimal lnumber.
(Starts from younger
number.)
~: Arbitrary character string
If you use the software that
requires reboot after Core Offline,
create a script file containing the
necessary processes
Create a script and store it under
/opt/nec/capmonitor/script/cpu/offline.d.
3.3.6 Disabling EDAC
If the EDAC is running in the system, Machine Check Monitoring Service will not run correctly. Disable
the EDAC by creating a file /etc/modprobe.d/disable_edac.conf with the following contents:
install *_edac /bin/true
install edac_* /bin/true
After saving the file, reboot the system. After the system is rebooted, confirm the EDAC was disabled
as shown below.
# lsmod | grep edac

16
3.4 Uninstallation
Use rpm command to uninstall Machine Check Monitoring Service.
Uninstall packages mcemonitor, capmonitor, and acpi_call in order.
3.4.1 Uninstalling acpi_call
1. Login to the target machine as a root user.
2. Uninstall acpi_call RPM package of Machine Check Monitoring Service using rpm command.
# rpm -e mcl-acpicall-2.4-3.01.2.6.32.504.23.4.el6.x86_64
Note: mcemonitor and capmonitor must be uninstalled before uninstalling acpi_call.
If acpi_call is uninstalled while mcemonitor and capmonitor have not been uninstalled,
the following message is output and uninstallation of acpi_call fails.
# rpm -e mcl-acpicall-2.4-3.01.2.6.32.504.23.4.el6.x86_64
error: Failed dependencies:
mcl-acpicall is needed by mcl-capmonitor-2.4-2.12.el6.x86_64
mcl-acpicall is needed by mcl-mcemonitor1-2.4-2.02.el6.x86_64
3. Confirm that acpi_call RPM package of Machine Check Monitoring Service is uninstalled
correctly. Uninstallation completes successfully if "acpi_call" is not displayed as shown below.
# rpm -qa | grep acpicall
4. Check if acpi_call driver is uninstalled correctly. If the 3 acpi_call are not displayed, acpi_call
driver is uninstalled correctly.
# lsmod | grep acpi
This manual suits for next models
3
Table of contents