Cray Urika-GX Instruction Manual

Urika®-GX System Administration Guide
(2.2.UP00)
S-3016

Contents
1 About the Urika®-GX System Administration Guide............................................................................................... 7
2 The Urika-GX System.............................................................................................................................................9
2.1 Administrative Components of Urika-GX...................................................................................................9
2.2 Network Components.............................................................................................................................. 10
2.3 File Systems............................................................................................................................................ 11
2.4 System Nodes......................................................................................................................................... 12
2.5 Restrictions on Use................................................................................................................................. 12
3 System Management............................................................................................................................................15
3.1 Check the Current Service Mode............................................................................................................ 15
3.2 Urika-GX Component Naming Conventions............................................................................................15
3.3 System Management Workstation (SMW)...............................................................................................17
3.3.1 Power On the System Management Workstation (SMW).......................................................... 17
3.3.2 About the Integrated Dell Remote Access Controller (iDRAC).................................................. 17
3.3.3 Control System Management Workstation (SMW) Power with the iDRAC8 Web Console........17
3.3.4 Synchronize the System Management Workstation (SMW) to the Site NTP Server..................20
3.3.5 Synchronize Time of Day on System Nodes.............................................................................. 21
3.3.6 Reboot a Stopped System Management Workstation (SMW)................................................... 22
3.4 Hardware Supervisory System (HSS)..................................................................................................... 22
3.4.1 Hardware Supervisory System (HSS) Architecture Overview....................................................24
3.4.2 The xtdiscover Command.......................................................................................................... 25
3.4.3 Hardware Supervisory System (HSS) Component Location Discovery..................................... 25
3.4.4 Hardware Supervisory System (HSS) Daemons........................................................................26
3.4.5 Hardware Supervisory System (HSS) Administration and Diagnostic Commands
Supported on Urika-GX.............................................................................................................27
3.4.6 Hardware Supervisory System (HSS) Environments................................................................. 30
3.4.7 High Speed Network (HSN) Management................................................................................. 32
3.4.8 Create Direct Connection between the System Management Workstation (SMW) and a
Compute Node Console............................................................................................................32
3.4.9 Disable Hardware Components................................................................................................. 33
3.4.10 Enable Hardware Components................................................................................................ 33
3.4.11 Set Hardware Components to EMPTY...................................................................................... 34
3.4.12 Stop Components Using the Hardware Supervisory System (HSS)........................................ 34
3.4.13 Unlock Hardware Components................................................................................................ 35
3.4.14 Capture and Analyze System-level and Node-level Dumps.....................................................35
3.4.15 Collect Debug Information From Hung Nodes Using the xtnmi Command.............................. 36
Contents
S3016 2

3.4.16 Find Node Information..............................................................................................................36
3.4.17 Request and Display System Routing......................................................................................37
3.4.18 Initiate a Network Discovery Process.......................................................................................38
3.4.19 Power Up a Rack or Dual Aries Network Card (dANC)............................................................38
3.4.20 Check the Status of System Components................................................................................38
3.4.21 Check Compute Node High Speed Network (HSN) Connection..............................................39
3.4.22 Monitor the Health of PCIe Channels.......................................................................................39
3.4.23 Poll a Response from an HSS Daemon, Manager, or the Event Router..................................39
3.4.24 View Component Alert, Warning, and Location History............................................................40
3.4.25 Display Alerts and Warnings.................................................................................................... 40
3.4.26 Display Error Codes................................................................................................................. 40
3.4.27 Display Component State Information......................................................................................41
3.4.28 Clear Component Flags........................................................................................................... 41
3.4.29 Flash Management on Urika-GX..............................................................................................41
3.4.30 Create and Modify the authorized_keys File Using the xtcc-ssh-keys Command....... 42
3.4.31 Change the Passwords of RC, dANCCs and iSCB using the xtccpasswd Command.......... 42
3.4.32 Gather Troubleshooting Information Using the xtdumpsys Command.....................................43
3.5 Dual Aries Network Card (dANC) Management...................................................................................... 43
3.6 Analyze Node Memory Dump Using the kdump and crash Utilities on a Node.....................................44
3.7 Cray Lightweight Log Management (LLM) System................................................................................. 45
3.8 Urika-GX Node Power Management....................................................................................................... 45
3.9 Power Up the Urika-GX System.............................................................................................................. 46
3.10 Power Down the Urika-GX System....................................................................................................... 49
3.11 Urika-GX CLI Commands for Managing Services................................................................................. 51
3.12 Remote HDFS Remote Access and Multihoming on Urika-GX.............................................................54
3.13 Update the InfluxDB Data Retention Policy...........................................................................................54
3.14 Service to Node Mapping...................................................................................................................... 55
3.15 Image Management with Docker and Kubernetes................................................................................ 59
3.15.1 Execute Spark Jobs on Kubernetes......................................................................................... 60
3.15.2 Multi-tenant Spark Thrift Server on Urika-GX...........................................................................62
4 System Monitoring................................................................................................................................................ 65
4.1 System Monitoring Tools......................................................................................................................... 65
4.2 Monitor Resource Utilization and Node Status Using Nagios................................................................. 66
4.2.1 Configure SSL/TLS for Nagios Core.......................................................................................... 67
4.2.2 Configure the Nagios Server to Send Email Notifications.......................................................... 70
4.2.3 Change the Default Log File Path and Rotation Interval............................................................ 73
4.2.4 Configure Email Alerts................................................................................................................74
4.2.5 Modify Nagios Plug-in Threshold............................................................................................... 75
Contents
S3016 3

4.3 Get Started with Using Grafana...............................................................................................................77
4.4 Default Grafana Dashboards...................................................................................................................79
4.5 Update InfluxDB Security Settings.......................................................................................................... 90
4.6 Update the InfluxDB Data Retention Policy.............................................................................................91
4.7 Configuration Settings of Grafana........................................................................................................... 93
4.8 Change the Default Timezone Displayed on Grafana............................................................................. 93
4.9 Create a New Grafana Dashboard.......................................................................................................... 95
4.10 Add a New Graph to the Grafana Dashboard....................................................................................... 97
4.11 Start InfluxDB Before Hadoop Services.............................................................................................. 100
4.12 Monitor Subrack Attributes.................................................................................................................. 101
4.13 Analyze Node Memory Dump Using the kdump and crash Utilities on a Node.................................102
4.14 Retrieve System Status Information Using the urika-check-platform Command.................................103
4.15 iSCB Description................................................................................................................................. 104
4.15.1 Log on to the iSCB................................................................................................................. 104
4.15.2 iSCB Command Reference.................................................................................................... 105
5 Resource Management...................................................................................................................................... 124
5.1 Manage Resources on Urika-GX...........................................................................................................124
5.2 Use Apache Mesos on Urika-GX ..........................................................................................................126
5.2.1 Access the Apache Mesos Web UI.......................................................................................... 128
5.3 Use mrun to Retrieve Information About Marathon and Mesos Frameworks........................................129
5.4 Launch an HPC Job Using mrun........................................................................................................... 133
5.5 Manage Long Running Services Using Marathon................................................................................. 133
5.6 Manage the Spark Thrift Server as a Non-Admin User......................................................................... 136
5.7 Manage Jobs Using the Cray Application Management UI................................................................... 137
5.7.1 Overview of the Cray Application Management UI...................................................................138
6 Cray DVS............................................................................................................................................................140
6.1 Introduction to DVS............................................................................................................................... 140
6.1.1 Use Cray DVS on Urika-GX..................................................................................................... 141
6.1.2 DVS ioctl Interfaces..................................................................................................................141
6.1.3 DVS Client Mount Point Options.............................................................................................. 143
6.1.4 DVS Environment Variables..................................................................................................... 149
6.1.5 Modes.......................................................................................................................................150
6.1.6 Resiliency and Diagnostics...................................................................................................... 154
6.1.7 Caveats.................................................................................................................................... 157
6.1.8 Administrative Tasks.................................................................................................................158
7 Security...............................................................................................................................................................175
7.1 Authentication and Authorization...........................................................................................................175
7.2 Urika-GX Service Modes....................................................................................................................... 177
Contents
S3016 4

7.2.1 Modify the Service Mode..........................................................................................................180
7.2.2 User Interface Access in the Secure Service Mode................................................................. 181
7.3 Security Architecture Overview............................................................................................................. 181
7.4 Set up Passwordless SSH.....................................................................................................................182
7.5 Tenancy................................................................................................................................................. 183
7.5.1 Configure a Bridge Port............................................................................................................185
7.5.2 Tenant Management.................................................................................................................189
7.5.3 Tenant Virtual Machine States..................................................................................................195
7.5.4 Tenant Management CLI Commands.......................................................................................196
7.5.5 Execution of Lustre Sub-Commands Inside Tenant VMs......................................................... 198
7.5.6 Get Started with Tenant Management......................................................................................199
7.5.7 Multi-Tenancy........................................................................................................................... 202
7.5.8 Multi-tenant HDFS....................................................................................................................203
7.6 Authorized User Management...............................................................................................................205
7.7 Guidance on LDAP Forwarding.............................................................................................................208
7.8 Authentication Mechanisms...................................................................................................................216
7.9 Change Default Passwords................................................................................................................... 217
7.9.1 Default Urika-GX System Accounts......................................................................................... 219
7.9.2 Change the Default Nagios Password..................................................................................... 220
7.9.3 Change the Default iDRAC8 Password....................................................................................221
7.9.4 Change the Default System Management Workstation (SMW) Passwords.............................223
7.9.5 Change LDAP Password on Urika-GX.....................................................................................224
7.9.6 Reset a Forgotten Password for the Cray Application Management UI...................................224
7.9.7 Reset an Administrator LDAP Password on Systems Using Urika-GX 1.2UP01 and Earlier
Releases................................................................................................................................. 225
7.9.8 Reset an Administrator LDAP Password when the OLC Schema Password is Unknown....... 226
7.9.9 Reset an Administrator LDAP Password when the OLC Scheme Password is Known........... 228
7.10 Tableau Authorization and Authentication Mechanisms...................................................................... 229
7.11 Enable SSL..........................................................................................................................................229
7.12 Enable SSL for Spark Thrift Server of a Tenant.................................................................................. 234
7.13 Install a Trusted SSL Certificate on Urika-GX..................................................................................... 235
7.14 Enable LDAP Authentication on Urika-GX ......................................................................................... 236
7.14.1 Enable LDAP for Connecting Tableau to HiveServer2........................................................... 238
7.15 Enable SQL Standard based Authorization for HiveServer2...............................................................239
7.16 File System Permissions..................................................................................................................... 240
7.17 Urika-GX Security Quick Reference Information................................................................................. 240
7.18 Port Assignments................................................................................................................................ 241
8 Troubleshooting.................................................................................................................................................. 245
Contents
S3016 5

8.1 System Management Log File Locations.............................................................................................. 245
8.2 Default Log Settings.............................................................................................................................. 246
8.3 Analytic Applications Log File Locations............................................................................................... 248
8.4 Security Related Troubleshooting Information.......................................................................................250
8.4.1 Save and Restore Tenant Information......................................................................................254
8.4.2 LDAP Server Start-up Issues................................................................................................... 256
8.5 Modify the Secret of a Mesos Framework............................................................................................. 256
8.6 Clean Up Log Data................................................................................................................................ 257
8.7 Diagnose and Troubleshoot Orphaned Mesos Tasks............................................................................258
8.8 Troubleshoot Common Analytic and System Management Issues ...................................................... 259
8.9 Troubleshoot mrun Issues.....................................................................................................................268
8.10 Troubleshoot: Application Hangs as a Result of NFS File Locking..................................................... 270
8.11 Troubleshoot: DVS does not Start after Data Store Move...................................................................270
8.12 Troubleshoot: DVS Ignores User Environment Variables....................................................................271
8.13 Clear Leftover hugetlbf Files................................................................................................................271
8.14 Remove Temporary Spark Files from SSDs........................................................................................271
Contents
S3016 6

1About the Urika®-GX System Administration Guide
This publication contains administrative information about using the Cray® Urika®-GX system.
Typographic Conventions
Monospace Indicates program code, reserved words, library functions, command-line prompts,
screen output, file/path names, key strokes (e.g., Enter and Alt-Ctrl-F), and
other software constructs.
Monospaced Bold Indicates commands that must be entered on a command line or in response to an
interactive prompt.
Oblique or Italics Indicates user-supplied values in commands or syntax definitions.
Proportional Bold Indicates a graphical user interface window or element.
\ (backslash) At the end of a command line, indicates the Linux® shell line continuation character
(lines joined by a backslash are parsed as a single line). Do not type anything after
the backslash or the continuation feature will not work correctly.
Scope and Audience
The audience of this publication is system administrators of the Urika®-GX system. This publication is not
intended to provide detailed information about open source products used in the system. References to online
documentation are included where applicable.
Record of Revision
Date Addressed Release
September, 2018 2.2UP00
May, 2018 2.1UP00
December, 2017 2.0UP00
April, 2017 1.2UP00
December, 2016 1.1UP00
August, 2016 1.0UP00
March, 2016 0.5UP00
Record of Revision
This revision includes updates to Tableau related topics.
About the Urika®-GX System Administration Guide
S3016 7

Trademarks
The following are trademarks of Cray Inc. and are registered in the United States and other countries: CRAY and
design, SONEXION, Urika-GX, Urika-XA, Urika-GD, and YARCDATA. The following are trademarks of Cray Inc.:
APPRENTICE2, CHAPEL, CLUSTER CONNECT, CRAYDOC, CRAYPAT, CRAYPORT, DATAWARP, ECOPHLEX,
LIBSCI, NODEKARE. The following system family marks, and associated model number marks, are trademarks
of Cray Inc.: CS, CX, XC, XE, XK, XMT, and XT. The registered trademark LINUX is used pursuant to a
sublicense from LMI, the exclusive licensee of Linus Torvalds, owner of the mark on a worldwide basis. Other
trademarks used in this document are the property of their respective owners.
About the Urika®-GX System Administration Guide
S3016 8

2The Urika-GX System
The Urika-GX system is a big data analytics platform optimized for analytic workflows. It combines a highly
advanced hardware platform with a comprehensive analytic software stack to help derive optimal business value
from data. The Urika-GX platform provides the tools required for capturing and organizing a wide variety of data
types from different sources and enables analyzing big data and discovering hidden relationships.
The Urika-GX system also features a number of workload management tools as well as an optimized system
administration tool for performing monitoring and management tasks.
For a list of features of the Urika-GX system, see S-3017, "Urika®-GX System Overview".
2.1 Administrative Components of Urika-GX
Urika-GX platforms have been developed by tightly integrating commodity hardware components, open-source
software, and Cray proprietary hardware, to provide users a high performance, scalable and open compute
platform.
Major administrative components of Urika-GX include:
●System Management Workstation (SMW) - The SMW is a server that acts as a single-point interface to a
system administrator's environment. It provides an interface for performing administrative and monitoring
capabilities.
○Hardware Supervisory System (HSS) - HSS is an integrated system of hardware and software
components that are used for managing and monitoring the system.
○Cobbler - Cobbler is used on Urika-GX for provisioning and deployment.
●Rack Controller (RC) - The RC monitors the environmental sensors within the rack and manages
communication between the SMW and other physical system components, including the rack, sub-rack and
dANC (Dual Aries Network Card).
●Intelligent Subrack Control Board (iSCB) - The iSCB status command can be used to monitor the physical
attributes of the sub-rack, such as the power supply, amperage, fan status, and temperature.
●Aries Network Card Controller (ANCC) - Each sub-rack chassis of the Urika-GX system contains two
dANCs (dual Aries Network Cards). Each dANC contains 2 Aries chips, an Advanced RISC Machines (ARM)
processor, and a number of environmental sensors to help monitor the system.
●Integrated Dell Remote Access Controller (iDRAC) - The iDRAC is a hardware that provides advanced
agentless system management functionality for the SMW. It operates independently of the SMW's CPU and
operating system. The version of iDRAC used on the Urika-GX system is iDRAC8.
●System Monitoring and Performance Analysis Tools - Urika-GX ships with Grafana and Nagios. These
tools enable monitoring system resources and viewing performance statistics of various system components.
For more information, see S-3015, "Urika®-GX Analytic Applications Guide".
The Urika-GX System
S3016 9

●Data Analytic Components - Urika-GX features a number of data analytic tools that help perform analytic
tasks, including managing and monitoring clusters, executing Hadoop and SPARK jobs, performing graph
analytics, etc. For more information, see S-3015, "Urika®-GX Analytic Applications Guide" and S-3010,
"Cray™ Graph Engine User Guide".
●Security and Tenant Management Tools - Secret files used on the system are managed by the Urika-GX
Secret Manager. Tenancy is implemented through the use of a tenant VM that runs on physical nodes and
provides controlled access to services on the physical nodes through a command proxy mechanism. For
more information, refer to Urika-GX Service Modes on page 177 and Tenancy on page 183.
NOTE: Only Spark and HDFS commands can be executed within a tenant VM in this release. All the
commands for flexing the cluster, mrun and Cray Graph Engine (CGE) CLI commands cannot be
executed within a tenant VM.
In addition, Urika-GX features a number of CLI scripts that facilitate system management and monitoring the
system.
2.2 Network Components
There are 3 networks deployed on the Urika®-GX platform:
●Aries High Speed Network (HSN) - The Aries HSN provides high speed application and data network
connectivity between nodes. This network provides node interconnect via high bandwidth, low latency DMA
access. The hardware to support this network consists of an Aries Interface Board (AIB) connected to an
available PCIe slot on each Urika-GX node and integrated into the node chassis assembly. The AIB is
connected to the dANC integrated in the Urika-GX sub-rack. Copper cables provide an all-to-all connection of
all dANCs in the system.
●Operational Ethernet network- The operational Ethernet network is used for ingesting user data. This
network is comprised of a single unit 48-port GigE switch that provides dual 1GigE and/or dual 10GigE
interfaces to the site network. Urika-GX's login nodes do not route through this switch and need to be directly
connected to the site network. The operational network allows node connectivity externally from Urika-GX to
the site network. The Urika-GX compute and I/O nodes are connected to a single managed Brocade ICX
6450-48, 48 port switch with a single power supply. Connectivity of this network to the site network is made
possible by two available Gigabit Ethernet ports and/or two 10 Gigabit Ethernet ports on the ICX 6450-48
switch.
The operational network can also be used to access data streaming applications and services directly from
compute nodes.
●Management Ethernet network - The management Ethernet network is primarily used for system
management, and not for user data. The management Ethernet network is comprised of two stacked 1U 48-
port switches, which are located at the top of the Urika-GX rack, and can optionally contain redundant switch
power supplies. These switches provide GigE management Ethernet connectivity to every node, System
Management Workstation (SMW), Rack Controller (RC), Intelligent Subrack Control Board (iSCB), Power
Distribution Units (PDUs), Dual Aries Network Cards (dANCs) and to the operational network that connects to
the nodes.
The Urika-GX system also contains the following subnets:
○ SMW subnet, which provides connectivity to the SMW and the RC.
○ Rack subnet, which provides connectivity to the dANCs and iSCB module.
This network is supported by two managed Brocade ICX 6450-48, 48 port switches stacked together with two
10gigE optical interconnects. Each switch contains a single power supply, and can optionally contain
The Urika-GX System
S3016 10

redundant switch power supplies. The following VLANs are defined for this network to support management
network traffic:
○ VLAN 102 - Uses ports 1-5 on each ICX 6450-48 switch. This is a dual-mode (tagged dual-mode for
VLAN 102 and tagged for VLAN 103) VLAN. Untagged traffic on these ports belongs to VLAN 102. Traffic
can be tagged for VLAN 103. The SMW HSS interface, the RC for a given rack, and the PDUs for a given
rack are connected to these ports.
○ VLAN 103 Ports 6-12 on each ICX 6450-48 switch. Untagged traffic on these ports belongs to VLAN 103.
The iSCBs and dANC cards are connected to these ports.
○ VLAN 104 Ports 13-48 on each ICX 6450-48 switch.
NOTE: Traffic on this VLAN may be reduced if VLAN 105 is needed for storage as long as each
compute node is connected to VLAN 104
Untagged traffic on these ports belongs to VLAN 104. The compute nodes and the SMW node-side
network are connected to these ports.
○ VLAN 105 Some number of Ports 13-48 on each ICX 6450-48 switch, as needed for storage
management. Untagged traffic on these ports belongs to VLAN 105. The Storage Management Ports are
connected to these ports.
○ VLAN 1 (default) is unused.
Traffic from the SMW to the subcomponents in the rack subnet, and vice versa, is routed through the
corresponding RC.
For additional information, see the Urika®-GX Hardware Guide.
2.3 File Systems
Supported file system types on Urika-GX include:
●Internal file systems
○ Hadoop Distributed File System (HDFS) - Hadoop uses HDFS for storing data. HDFS is highly fault-
tolerant, provides high throughput access to application data, and is suitable for applications that have
large data sets. Urika-GX also features tiered HDFS storage. HDFS data is transferred over the Aries
network.
○ Network File System (NFS) - The Urika-GX SMW hosts NFS, which is made available to every node via
the management network.
○/mnt/lustre - This is a directory that hosts Lustre file system data if DAL/Sonexion is used.
CAUTION: Avoid using NFS for high data transfers and/or large writes as this will cause the network
to operate much slower or timeout. NFS, as configured for Urika-GX home directories, is not capable
of handling large parallel writes from multiple nodes without data loss. Though It is possible to
configure NFS to handle parallel writes, it would require a hard mount, which would have undesired
consequences.
File Locations
● Home directories are mounted on (internal) NFS, with limited space
● Distributed file system (Lustre), if provisioned, is mounted at /mnt/lustre and is suitable for larger files.
Lustre mounts are isolated, with individual tenants having their own mount point.
The Urika-GX System
S3016 11

2.4 System Nodes
Each Urika-GX node is a logical grouping of a processor, memory, and a data routing resource. Nodes can be
categorized as compute, I/O, service and login nodes.
Table 1. Node Types and Descriptions
Node Type Description
Compute nodes Compute nodes run application programs.
I/O nodes I/O nodes facilitate connecting to the supported external storage system.
Login nodes Users log in to the Urika-GX system via login nodes and virtual machines
(VMs). Login nodes store users' local files and facilitate launching jobs from the
command line. They also offer the environment for users to build, compile, and
monitor analytics applications.
Service nodes Service nodes handle support functions such as user login and I/O.
All Urika-GX nodes run the CentOS operating system (version 7.3) as well as portions of the Cray Linux
Environment (CLE).
2.5 Restrictions on Use
Hardware Considerations
The following items should be kept under consideration when using Urika-GX hardware:
● High speed network/management network switches must not be modified as this network is internal to Urika-
GX.
● Moving the system from the rack Cray supplies to customer provided racks is not supported.
● Sub-rack and SMW hardware configuration must not be changed.
● PCIe devices should not be modified.
● Hardware and drivers installed on the SMW and nodes should not be modified.
● PDUs installed on the system should not be replaced.
Contact Cray Support if it is required to swap nodes between slots. The following options are supported:
● Connecting to the internal PDU power switches.
● Changing the hosts names of login nodes and the SMW.
● The single top of rack switch used for the operational network may be modified to meet site-specific needs.
This switch is expected to be used to enable a direct connection from the site network to the compute and I/O
nodes to support data ingestion and streaming analytics. This network may be modified to reflect site-specific
IP addresses and node names that would be directly exposed to the site network. For information on how to
configure the operational network, contact Cray support.
The Urika-GX System
S3016 12

● The available space in the rack can be used for additional hardware, however proper power and cooling for
that gear needs to be ensured.
Contact Cray Support for information related to:
● Optionally switching to higher bandwidth NICs on the login nodes or SMW connections to the site network.
● Changing the internal range of Cray's IP addresses in case there is a conflict.
Software Considerations
The following items should be kept under consideration when using Urika-GX software:
● Spark Shells using Kubernetes (i.e., those launched under the secure service mode) will be limited to 16
cores and 60 GiB memory and this cannot be overridden at the command line. This is due to a limitation of
the lack of native Spark Shell support in the Spark on Kubernetes project that Cray has provided a
workaround for in this release.
● Modifying the iSCB firmware is not supported.
● Modifying switch firmware (both Ethernet/InfiniBand) is not supported.
● Modifying node BIOS settings is not supported.
● Modifying the kernel and/or kernel modules is not supported.
● Deleting any factory installed software is not supported.
● Changing the default configurations of Mesos, Marathon, mrun, and Grafana is not supported.
● Launching of Docker containers through Docker commands is not supported. Users must use the Marathon
interface for launching containers. For more information, refer to S-3015, "Urika®-GX Analytic Applications
Guide".
● Building and managing new Docker images is currently not supported on Urika-GX. For more information,
contact Cray Support.
Before installing any additional software on the Urika-GX system, a ticket should be opened with Cray Support to
verify that the software will have no impact on the system. The following options are supported:
● Adding CentOS 7 packages that do not cause dependency issues with the Cray installed software. Only Cray-
provided Linux updates and YUM repositories should be used.
● Installing additional HDP 2.6.1.0-129 compliant packages and modifying these packages for integrating into
the existing software stack. This applies in the default service mode for HDP related items, except Spark. For
more information, refer to Urika-GX Service Modes on page 177
● Tuning Hadoop and Spark configuration parameters listed in section "Tunable Hadoop and Spark
Configuration Parameters" of S-3015, "Urika®-GX Analytic Applications Guide".
NOTE: Contact Cray Support if it is required to modify additional software configurations.
Security Considerations
If the Urika GX system is running in the secure mode in production, Cray does not recommend toggling back to
the default mode while in production because, in the default mode, the security assurances provided by secure
mode are not in place, and the security of data that was protected by secure mode may be compromised while
running in the default mode. Cray cannot extend the secure mode security assurances to any system that has run
in a production state in the default mode until that system has been fully re-deployed.
The following actions are not supported:
The Urika-GX System
S3016 13

● It is recommended not to make any changes to the default set of Kubernetes and Kerberos configurations
without consulting Cray Support, as doing so can adversely affect the functionality of the system in the secure
service mode.
● Enabling of the PermitUserEnvironment option in sshd_config(5) or the passing of environment
variables beyond those listed on the ssh(1) manual page.
● Changing any settings listed in /etc/environment into login sessions on Urika-GX physical nodes outside
of the system through the login mechanism.
● Modifying the list of whitelist commands. For a list of commands that are part of the whitelist, see Tenancy on
page 183
● Tenant NameNode configuration is managed automatically by the Urika-GX tenant management scripts.
Manually altering the configurations of the tenant NameNode is not supported.
● Tenant names may only contain:
○ the letters a-z
○ the numbers 0-9
○ the characters '-' and '.'
NOTE: The name 'default' is reserved for the sample tenant configuration and cannot be used as a
tenant name.
● The following items need to be kept under consideration while using the ux-tenant-alter-vm command:
○ Number of CPUs: At least 2 CPUs need to remain available when the number of CPUs is changed by this
script. That is, if there are N number of CPUs, a maximum of N-2 CPUs can be assigned to a VM.
○ Amount of memory: At least 50% of the memory must remain available after a VM has been assigned
memory using this command.
The Urika-GX System
S3016 14

3System Management
3.1 Check the Current Service Mode
Prerequisites
This procedure requires root privileges on the SMW.
About this task
Urika-GX supports two service modes, which dictate the list of services available. These modes include:
● Default
● Secure
Use the following instructions to determine the service mode the system is currently running in.
Procedure
1. Log on to the SMW as root.
# ssh root@hostname-smw
2. Display the current service mode by using one of the following options:
● Execute the urika-state command. This displays the current service mode as well as the status of all
the services that are supported in that mode.
● Execute the urika-service-mode command.
# urika-service-mode
Current mode is: default
For more information, refer to the urika-service-mode and urika-state man pages.
3.2 Urika-GX Component Naming Conventions
The following table contains the component naming format for Urika®-GX systems:
System Management
S3016 15

Table 2. Urika-GX Component Naming Conventions
Component/Subject Naming Pattern Range
SMW s0 N/A
Wild card, similar to s0.all N/A
Wild card, which refers to all the
compute nodes
all_comp N/A
Wild card, which refers to all the
service nodes
all_serv N/A
Machine partition pPp0
Rack rR R:0 to 161
Sub-rack. There are up to 4 sub-
racks per rack. Each sub-rack
contains up to 2 dual Aries Network
Card (dANC) cards, up to 16
compute nodes, and up to 2
Intelligent Subrack Control Boards
(iSCBs)
rRsS S:0 to 3
Intelligent Subrack Control Board
(iSCB)
rRsSiI I:0 to 1
Dual Aries Network Card (dANC).
There are up to 2 dANCs per sub-
rack, accommodating up to 16
nodes
rRsScC C:0 to 1
High Speed Network (HSN) cable.
The "j" name is visible on the cable
connector face plate
rRsScCjJ J:0-15
Aries ASIC within a dANC card rRsScCaA A:0 to 1
Aries link control block within an
Aries ASIC.
rRsScCaAlRC R:0 to 5
C:0 to 7
Network Interface Controller (NIC)
within an Aries ASIC
rRsScCaAnN N:0 to 3
Node within a dANC card rRsScCnN N:0 to 7
Accelerator within a node rRsScCnNaA A:0 to 7
Board Management Control (BMC)
within a node
rRsScCnNbB B:0
Tenant Naming Conventions
Tenant names may only contain:
● the letters a-z
System Management
S3016 16

● the numbers 0-9
● the characters '-' and '.
NOTE: The name default is reserved for the sample tenant configuration and cannot be used as a
tenant name.
3.3 System Management Workstation (SMW)
The System Management Workstation (SMW) is the system administrator's console for managing a Cray system.
The SMW is a server that runs the CentOS (version 7.3) operating system, Cray developed software, and third-
party software. The SMW is also a point of control for the Hardware Supervisory System (HSS). The HSS data is
stored on an internal hard drive of the SMW.
The SMW provides shell and web access to authorized users to perform administrative and monitoring tasks. The
Nagios Core service also runs on the SMW.
Most system logs are collected and stored on the SMW. The SMW plays no role in computation after the system
is booted. From the SMW an administrator can initiate the boot process, access the database that keeps track of
system hardware, analyze log messages, and perform standard administrative tasks.
CAUTION: The SMW is a critical system component, which facilitates the operation of other hardware
and software components. Therefore, it is important that all instructions in this publication be followed
before making any changes/reconfigurations to the SWM, as well as before restarting the SMW.
3.3.1 Power On the System Management Workstation (SMW)
The SMW can be turned on by:
● Physically turning the SMW on via the power button.
● Using the iDRAC.
CAUTION:
The SMW is a critical system component, which facilitates the operation of other hardware and software
components. Therefore, it is important that all instructions in this publication be followed before making
any changes/reconfigurations to the SWM, as well as before restarting the SMW.
3.3.2 About the Integrated Dell Remote Access Controller (iDRAC)
The iDRAC is a systems management hardware and software solution that provides remote management
capabilities, crashed system recovery, and power control functions for the System Management Workstation
(SMW). The iDRAC alerts administrators to server issues, helps them perform remote server management, and
reduces the need for physical access to the server. The iDRAC also facilitates inventory management and
monitoring, deployment and troubleshooting. To help diagnose the probable cause of a system crash, the iDRAC
can log event data and capture an image of the screen when it detects that the system has crashed.
For more information about the iDRAC, refer to online documentation at http://www.dell.com.
System Management
S3016 17

3.3.3 Control System Management Workstation (SMW) Power with the iDRAC8 Web
Console
Prerequisites
Ensure that the SMW is up and running.
About this task
Use the iDRAC's web console to start up and shut down the System Management Workstation (SMW).
Procedure
1. Point a browser to the site-specific iDRAC IP address, such as https://system-smw-ras
The iDRAC console's login screen appears.
2. Enter root and initial0 in the Username and Password fields respectively. These are the default
credentials that should only be used if the default credentials have not been changed.
System Management
S3016 18

Figure 1. iDRAC Login Screen
3. On the Quick Launch Tasks section of the iDRAC UI, click on Power ON/ OFF link to control the SMW's
power.
System Management
S3016 19

Figure 2. iDRAC Console
For more information about the iDRAC, visit http://www.dell.com
3.3.4 Synchronize the System Management Workstation (SMW) to the Site NTP Server
Prerequisites
This procedure requires root privileges.
About this task
The components of the Cray system synchronize time with the System Management Workstation (SMW) through
Network Time Protocol (NTP). By default, the NTP configuration of the SMW is configured to stand alone;
however, the SMW can optionally be configured to synchronize with a site NTP server. Follow this procedure to
configure the SMW to synchronize to a site NTP server.
System Management
S3016 20
Table of contents
Other Cray Server manuals
Popular Server manuals by other brands

Sollae Systems
Sollae Systems CSE-H21 user manual

Dot Hill
Dot Hill AssuredSAN 4004 Series FRU Installation and Replacement Guide

HP
HP D5970A - NetServer - LCII Configuration guide

HP
HP NetServer AA 4000 manual

Gigabyte
Gigabyte R261-3C0 user manual

Intplus
Intplus Network Video Server FlexWATCH 3110 user guide