Graphcore IPU-POD128 Manual

IPU-POD128 build and test guide
Version latest
Graphcore Ltd
Nov 25, 2021

CONTENTS
1 Overview 1
1.1 Acronyms and abbreviations .......................................... 2
2 IPU-POD128 design components 3
2.1 IPU-POD64 components ............................................ 3
2.2 IPU-M2000s ................................................... 3
2.2.1 Overview ................................................ 3
2.2.2 QR code label .............................................. 4
2.2.3 LED indicators ............................................. 4
2.3 Server ...................................................... 5
2.4 Switches ..................................................... 6
2.4.1 100GE RoCE/RDMA switch (ToR switch) .............................. 6
2.4.2 1GE management switch ....................................... 6
2.5 Power distribution units ............................................ 6
2.6 Rack ....................................................... 6
2.7 Supplementary mounting components .................................... 6
2.8 Cables ...................................................... 6
2.8.1 RJ45 cables ............................................... 7
2.8.2 OSFP cables ............................................... 7
2.8.3 QSFP cables ............................................... 7
2.9 Connecting cables between IPU-POD64 logical racks ........................... 7
3 IPU-POD64 rack assembly 8
3.1 Equipment checklist .............................................. 10
3.2 Document reproduction ............................................ 11
3.3 Required tools .................................................. 11
3.4 Preparing the rack ............................................... 11
3.4.1 Rail distance ............................................... 11
3.4.2 Unpacking the rack ........................................... 12
3.4.3 Removing the side panels and doors ................................. 12
3.4.4 Removing the vertical accessory channels ............................. 14
3.4.5 Adjusting the rear accessory channels ................................ 14
3.4.6 Adjusting the rear vertical rails .................................... 15
3.4.7 Adjusting the front vertical rails ................................... 15
3.4.8 Installing the rack rails ......................................... 16
3.4.9 Installing PDU brackets ........................................ 19
3.5 Installing the equipment ............................................ 21
3.5.1 Installing the IPU-M2000s ...................................... 21
3.5.2 Installing the management switch .................................. 26
3.5.3 Installing the ToR switch ........................................ 26
3.5.4 Installing the PDUs ........................................... 27
3.5.5 Installing the Dell R6525 server(s) .................................. 28
3.6 Cabling the rack ................................................. 30
3.6.1 IPU-M2000 to IPU-M2000 IPU-Link connectivity (OSFP) .................... 31
Version: latest (2021-11-25) i

3.6.2 IPU-M2000 to IPU-M2000 Sync-Link cabling ........................... 34
3.6.3 IPU-M2000 to management switch cabling (RJ45) ........................ 36
3.6.4 Management switch: BMC cabling .................................. 38
3.6.5 Management switch: BMC + GW SoC cabling ........................... 40
3.6.6 IPU-M2000 to ToR switch cabling (QSFP) ............................. 42
3.6.7 Dell R6525 server(s) cabling ..................................... 45
3.6.8 ToR switch to Dell server(s) ...................................... 46
3.6.9 Management switch to Dell server(s): iDRAC ............................ 48
3.6.10 Management switch to Dell server(s): network connector .................... 48
3.6.11 Management switch to Dell server(s): switch management .................... 49
3.6.12 Management switch to PDUs ..................................... 50
3.7 Power cabling .................................................. 51
3.7.1 IPU-M2000 power cabling ...................................... 53
3.7.2 Server power cabling: Dell R6525 .................................. 54
3.7.3 Switch power cabling ......................................... 54
3.8 Completing the rack .............................................. 55
3.8.1 Blanking panels ............................................. 55
3.8.2 Front and rear doors .......................................... 55
3.8.3 Side panels ............................................... 55
3.8.4 PDU plugs ................................................ 56
3.8.5 Packaging ................................................ 56
4 IPU-POD64 server and switch configuration 57
4.1 Server configuration .............................................. 57
4.1.1 Hardware recommendations ..................................... 57
4.1.2 Storage configuration recommendations .............................. 58
4.1.3 Memory configuration recommendations .............................. 58
4.1.4 BIOS configuration ........................................... 58
4.1.5 Operating system installation ..................................... 60
4.1.6 User accounts and groups ....................................... 62
4.1.7 DHCP Service (Dynamic Host Configuration Protocol) ...................... 63
4.1.8 Rsyslog service ............................................. 66
4.1.9 NTP service (Network Time Protocol) ................................ 67
4.1.10 Other configuration files and folders ................................ 68
4.2 Network configuration ............................................. 69
4.2.1 Overview ................................................ 69
4.2.2 IPU-POD64 network interfaces .................................... 71
4.2.3 Management switch configuration .................................. 71
4.2.4 ToR switch configuration ....................................... 73
4.2.5 IPU-POD64 VLAN assignments .................................... 74
4.2.6 Server network configuration ..................................... 75
5 IPU-POD64 software installation and configuration 76
5.1 Management server .............................................. 76
5.2 V-IPU software installation and configuration ................................ 76
5.3 IPU-M2000 software installation and configuration ............................ 77
5.3.1 Download IPU-M2000 software update bundle .......................... 78
5.3.2 Software update of all IPU-M2000s ................................. 78
5.3.3 IPU-M2000 GW root file system config files ............................ 78
5.4 Rack tool ..................................................... 79
6 IPU-POD64 manual installation tests 80
6.1 Running system tests .............................................. 80
6.2 Troubleshooting ................................................. 80
6.2.1 BMC BISTs ............................................... 80
6.2.2 V-IPU built in self tests ........................................ 81
7 IPU-POD128 installation 85
Version: latest (2021-11-25) ii

8 IPU-POD128 network configuration 88
8.1 Overview ..................................................... 88
8.2 Useful resources ................................................ 89
8.3 IP addressing .................................................. 89
8.4 Merging IPU-POD64 racks to create IPU-POD128 ............................. 90
8.4.1 Networking pre-requisites ...................................... 90
8.4.2 Phase 1: Edit configuration files ................................... 90
8.4.3 Phase 2: Activate new configuration ................................. 93
8.5 IPU-M2000 setup files ............................................. 97
8.5.1 Syslog and chrony on the IPU-Gateway ............................... 97
8.5.2 Syslog on BMC ............................................. 98
8.6 DHCP files .................................................... 98
8.6.1 Lrack1 and lrack2: /etc/dhcp/dhcpd.conf .............................. 98
8.6.2 Lrack1: /etc/dhcp/dhcpd.d/ipum-dhcp.conf ............................ 98
8.6.3 Lrack2: /etc/dhcp/dhcpd.d/ipum-dhcp.conf ............................ 99
8.6.4 Lrack1 and lrack2: /etc/dhcp/dhcpd.d/ files ............................ 99
8.6.5 Lrack1: /etc/dhcp/dhcpd.d/lrack1 files ............................... 99
8.6.6 Lrack1: /etc/dhcp/dhcpd.d/lrack2 files ...............................100
8.6.7 Lrack2: /etc/dhcp/dhcpd.d/lrack2 files ...............................101
8.7 /etc/netplan files ................................................102
8.7.1 1GbE management interface on lrack1 server ...........................102
8.7.2 RNIC interfaces on the servers ....................................102
8.7.3 Lrack1: rack_config.json file ......................................103
9 System integration testing 107
9.1 Cluster tests ...................................................107
10 Revision history 108
11 Trademarks & copyright 109
Version: latest (2021-11-25) iii

CHAPTER
ONE
OVERVIEW
The IPU-POD128 is a rack solution containing 32 IPU-M2000s, up to four host servers, network switches and IPU-
POD software. There are 128 Mk2 GC200 IPUs in total with four IPUs in each IPU-M2000. For more information
on IPU-POD systems available from Graphcore see https://www.graphcore.ai/products.
Warning: This guide is for properly trained service personnel and technicians who are required to install the
IPU-POD128.
If you have any questions then please contact your Graphcore representative or use the resources on the Graph-
core support portal: https://www.graphcore.ai/support.
Version: latest (2021-11-25) 1

IPU-POD128 build and test guide
1.1 Acronyms and abbreviations
This is a short list that describes some of the most commonly used terms in this document.
Table 1.1: Glossary
Term Description
AOC Active optical cable
BMC Baseboard Management Controller: standby power domain service processor doing system
hardware management
BOM Bill of Materials
GCD A graph compile domain is operated by a single Poplar Instance within the system, either
within a single IPU-M2000 unit or within several units connected by IPU-Link cables
IPU-Gateway A device that disaggregates the server(s) and the four IPUs in the IPU-M2000 across a RoCE
network, provides external IPU Exchange Memory, and enables IPU scaleout across 100GbE
(GW-Links) for rack-to-rack connectivity
GW-Link High speed communication links that connect IPU-M2000s horizontally across IPU-POD64
racks. Special cables are required for GW-Links between IPU-M2000 units
IPU-Link High speed communication links that connect IPUs within an IPU-M2000 and between IPU-
M2000s within an IPU-POD64. Special cables are required for IPU-Links between IPU-
M2000 units
PDU Power Distribution Unit
RDMA Remote DMA
RNIC RDMA Network Interface Controller
RoCE RDMA over converged Ethernet
ToR Top of Rack. Often used in combination with the ToR RDMA switch that is placed on top of
the IPU-M2000 stacked units
Version: latest (2021-11-25) 2

CHAPTER
TWO
IPU-POD128 DESIGN COMPONENTS
This section describes the components in the IPU-POD128. Each IPU-POD128 is made from two IPU-POD64 logical
racks with GW-Links connected between them.
2.1 IPU-POD64 components
Each IPU-POD64 has the following:
•16 IPU-M2000s
•1Server (default configuration is one host server, up to four can be supported)
•2Switches (one 1GbE management switch and one 100GbE ToR switch)
•2Power distribution units
•1Rack
•Supplementary mounting components
•Cables
2.2 IPU-M2000s
2.2.1 Overview
There are 32 IPU-M2000s in each IPU-POD128 (16 in each IPU-POD64) making a total of 128 IPUs: 4 IPUs per
IPU-M2000). The IPU-M2000 front panel contains:
•2 RNIC ports
•8 IPU-Link ports
•2 management GbE ports (BMC/GW SoC management ports)
•2 GW-Link ports
•8 Sync-Link ports
•3 LED indicators
Fig. 2.1: Front panel
The IPU-M2000 back panel contains:
Version: latest (2021-11-25) 3

IPU-POD128 build and test guide
•2 power connectors per IPU-M2000
•5 fan units
•5 LED indicators
•Unit QR code
Fig. 2.2: Back panel
2.2.2 QR code label
There is a QR code label on the back panel of each IPU-M2000. The QR code contains the following information
for each IPU-M2000:
•Company name (Graphcore)
•Serial number
•Part number
•BMC Ethernet MAC address
•GW Ethernet MAC address
•URL for Graphcore support portal
2.2.3 LED indicators
The IPU-M2000 has LED indicators on both sides of the chassis.
Rear side LEDs
The rear side LEDs (Fig. 2.3) indicate the state of the 5 fans on the IPU-M2000. All the indicators should normally
be off. A lit LED (amber) indicates a fan module fault and the corresponding fan module should be replaced as
soon as possible to maintain maximum cooling.
Fig. 2.3: Rear side LED indicators
Version: latest (2021-11-25) 4

IPU-POD128 build and test guide
Front side LEDs
The front side LEDs indicate the status of the IPU-M2000. Fig. 2.4 and Table 2.1 show the colour scheme and
indications.
Fig. 2.4: Front side LEDs
Table 2.1: Front LED indicators
LED Colour Function
1 Green “OK”, ”Normal”, ”Satisfactory operation”, ”Active”, or “In service”
10 Hz: BMC running on flash (instruction fetch from flash)
2 Hz: BMC running on DRAM without interrupt enabled (instruction fetch from DRAM)
0.5 Hz: BMC running on DRAM with interrupt enabled (system in standby mode)
0.1 Hz: BMC abnormal mode, some interrupts are not serviced for over 2 seconds
Steady green light: System operational
2 Amber “Attention” or “Service action required”
3 White “Here I am”,”This is the item being sought” or “Unit ID”
2.3 Server
The default configuration of each IPU-POD64 uses a single PowerEdge R6525 server but up to four servers can be
used. Contact Graphcore sales for details of other supported server types. This document describes the default
server (PowerEdge R6525) installation only. Other servers may have different installation requirements.
The default server configuration is described in Section 4.1, Server configuration.
Since there is at least one server per IPU-POD64 there will be a minimum of two servers in the IPU-POD128.
Version: latest (2021-11-25) 5

IPU-POD128 build and test guide
2.4 Switches
Each IPU-POD64 contains two network switches serving different purposes.
2.4.1 100GE RoCE/RDMA switch (ToR switch)
The 100GbE RoCE/RDMA switch (also referred to as the ToR switch) is used by the end user’s machine learning
(ML) jobs as a data-plane, connecting the host servers running the Poplar®SDK with the IPUs running the ML
model in the IPU-M2000s. The default ToR switch is an Arista DCS-7060CX-32S-F. Contact Graphcore sales for
details of other supported switch types. This document describes the default switch (7060CX) installation only.
Other switches may have different installation requirements.
2.4.2 1GE management switch
The 1GbE management switch is used for connecting the management ports together inside the rack. The default
management switch is an Arista DCS-7010T-48-F. Contact Graphcore sales for details of other supported switch
types. This document describes the default switch (7010T) installation only. Other switches may have different
installation requirements.
2.5 Power distribution units
Two power distribution units (PDUs) are installed in each IPU-POD64. The default unit is an APC AP8886.
2.6 Rack
The IPU-M2000s, servers, switches, and PDUs for each of the two IPU-POD64 racks are installed in an APC
AR3300SP rack. This rack has a packing system designed to safely transport and unload the rack.
It is important to follow the instructions carefully when packing or unpacking the rack.
2.7 Supplementary mounting components
The supplementary components listed below also need to be installed in each rack.
•Cable organizer
•Blanking panel
2.8 Cables
Each of the two IPU-POD64 racks has three types of cabling:
1) RJ45 cables
2) OSFP cables
3) QSFP cables
There are also GW-Link cables between the two IPU-POD64 racks.
Version: latest (2021-11-25) 6

IPU-POD128 build and test guide
2.8.1 RJ45 cables
•Red: IPU-M2000 to IPU-M2000 within-rack IPU-Link connectivity
•Blue: Connecting IPU-M2000s to the management switch (BMC + IPU-Gateway management)
•Blue: Connecting servers to the management switch
•Yellow: Connecting IPU-M2000s to the management switch (BMC only management)
2.8.2 OSFP cables
•IPU-M2000 to IPU-M2000 (IPU-Link) connectivity
2.8.3 QSFP cables
•IPU-M2000 to ToR switch connectivity
•For server to ToR switch connectivity
All IPU-POD64 cable connections are described in Section 3, IPU-POD64 rack assembly.
2.9 Connecting cables between IPU-POD64 logical racks
The GW-Link connecting cables can be either electrical (copper) or optical Ethernet cables; electrical cables can
be used when the two IPU-POD64 racks are installed next to each other, and optical cables can be used when
they are installed further apart. This cabling is described in Section 7, IPU-POD128 installation.
Version: latest (2021-11-25) 7

CHAPTER
THREE
IPU-POD64 RACK ASSEMBLY
You will need to follow the instructions in this section for each of the two IPU-POD64 logical racks you need for
the IPU-POD128.
Note the correct orientation of the IPU-M2000, server and switch units in the rack to ensure correct airflow.
The front interface of the IPU-M2000 units (connectivity ports) should be matched with the front door of the rack
(cold aisle). The rear interface of the server and switches (power and fans) should be matched with the rear door
of the rack (hot aisle).
Fig. 3.1: Completed rack: cold aisle (four-server version)
Note that Fig. 3.1 shows a four-server version of the IPU-POD64. The default reference design has one server
which would be the server in the lowest position, closest to the switches.
Version: latest (2021-11-25) 8

IPU-POD128 build and test guide
Fig. 3.2: Completed rack: hot aisle (four-server version)
Note that Fig. 3.2 shows three blue RJ45 cables in each R6525 server. In the default build, servers 2 to 4 only
have two blue RJ45 cables. See Section 3.6.7, Dell R6525 server(s) cabling for more information about server
cabling.
Note also that Fig. 3.2 shows a four-server version of the IPU-POD64. The default reference design has one server
which would be the server in the lowest position, closest to the switches.
Version: latest (2021-11-25) 9

IPU-POD128 build and test guide
3.1 Equipment checklist
Table 3.1: Equipment checklist
Description Quantity (1 server) Quantity (4 server)
Rack (AR3300SP) 1 1
Blanking panels (APC AR8136BLK) 23 pieces for 42U reference rack
(delivered in packs of 10)
20 pieces for 42U reference rack
(delivered in packs of 10)
AP8886 PDU 2 2
Hardware mounting kit APC
AR8100
1 1
PDU bracket kit APC AR7711 2 2
Graphcore IPU-M2000 16 16
IPU-M2000 slider kits 16 16
Dell R6525 server 1 4
Arista DCS-7010T-48-F switch 1 1
Arista DCS-7060CX-32S-F switch 1 1
2m purple Ethernet 2 2
1.5m blue Ethernet 11 17
1m blue Ethernet 9 9
1m yellow Ethernet 12 12
1.5m yellow Ethernet 4 4
1m red Ethernet 2 2
0.15m red Ethernet 30 30
1m QSFP28 8 8
1.5m QSFP28 10 16
0.3m OSFP 60 60
1m OSFP 4 4
0.5m red 10A C14 to C15 12 12
1m red 10A C14 to C15 4 4
0.5m blue 10A C14 to C15 12 12
1m blue 10A C14 to C15 4 4
1m red C13 to C14 2 5
1.5m red C13 to C14 1 1
1m blue C13 to C14 2 5
1.5m blue C13 to C14 1 1
Velcro 1 1
Version: latest (2021-11-25) 10

IPU-POD128 build and test guide
3.2 Document reproduction
Ideally you should reference this document from a tablet device to allow you to zoom in on photographs. If you
reproduce this document on paper it should be done in colour otherwise you will not be able to see cable colours
and other notations properly.
3.3 Required tools
You will need the following tools and equipment:
•Pallet truck with wide fork spacing (at least 32cm between forks) and at least 750KG SWL
•Knife suitable for cutting plastic tape
•No 2 Phillips screwdriver
•Torx TX30 screwdriver
•Scissors suitable for cutting Velcro
If you will be re-packaging the rack after building it then you will also need:
•Pallet banding tape, banding clips and banding crimping tool
•Packaging tape
3.4 Preparing the rack
3.4.1 Rail distance
The IPU-M2000 mounting system requires a rail-to-rail distance of 720mm. This document describes the adjust-
ments required for an AR3300SP rack. If using a different rack this rail distance must be observed.
Version: latest (2021-11-25) 11

IPU-POD128 build and test guide
3.4.2 Unpacking the rack
Follow the instructions to remove the outer packaging of the APC AR3300SP rack, ensuring that you safely store
these materials for later repackaging. Do not remove the rack from the shock pallet. Remove the white bag (Fig.
3.3) from the rack. This contains screws and cage nuts to be used in the assembly of the components into the
rack.
Fig. 3.3: White bag with rack
3.4.3 Removing the side panels and doors
Remove the front and rear doors from the rack. Ensure the earth straps (Fig. 3.4) are disconnected before the
doors are removed.
Version: latest (2021-11-25) 12

IPU-POD128 build and test guide
Fig. 3.4: Rack earth straps
Remove the top and bottom side panels. The vertical accessory channels should be positioned at the very front
and very rear of the rack. If necessary, move these from their shipping positions (Fig. 3.5).
Fig. 3.5: Top and bottom side panel removal
Version: latest (2021-11-25) 13

IPU-POD128 build and test guide
3.4.4 Removing the vertical accessory channels
Using a Torx TX30 screwdriver, remove two accessory channels from the rack (Fig. 3.6).
Fig. 3.6: Accessory channel removal
3.4.5 Adjusting the rear accessory channels
Set the rear accessory channel to the furthest position in the rack. Tighten up the screws ensuring the teeth
engage into the slots in the rail, as shown in Fig. 3.7.
Fig. 3.7: Rear accessory channel (ensure teeth engage in slots)
Version: latest (2021-11-25) 14

IPU-POD128 build and test guide
3.4.6 Adjusting the rear vertical rails
Using a Torx TX30 screwdriver, make both rear vertical rack rails loose and freely movable.
Position the rear vertical rack rails such that there is 20mm of distance between the rear face of the vertical rack
rail and the racks rear frame. This should result in a square symbol being visible through the alignment window at
the top and bottom of the rail, as shown in Fig. 3.8.
Fig. 3.8: Rear vertical rails
Secure the rail into position by moving the TX30 screws back upwards such that the teeth engage with both the
supporting rails. This must be done at the top and bottom of the bracket.
3.4.7 Adjusting the front vertical rails
Using a Torx TX30 screwdriver, make both front vertical rack rails loose and freely movable. Install the accessory
channels in the front of the rack (one on the left hand side, one on the right hand side) at the frontmost position
possible, moving the TX30 screws back upwards such that the teeth engage with both the supporting rails. This
must be done at the top and bottom of the bracket.
Note: To ensure the clips on the accessory channels align with the channel in the rack, lift the accessory channels
through the cut-out in the top of the rack and then drop them down onto the channels.
Move the vertical rack rails tight against the vertical cable organisers such that only a single diamond symbol is
visible through the alignment window at the top and bottom of the rail, see Fig. 3.9.
Version: latest (2021-11-25) 15

IPU-POD128 build and test guide
Fig. 3.9: Front vertical rail alignment
Secure the rail into position by moving the TX30 screws back upwards such that the teeth engage with both the
supporting rails. This must be done at the top and bottom of the bracket.
3.4.8 Installing the rack rails
Fig. 3.10: M2000 rack rail kit: unboxed
The IPU-M2000 rail kit comprises two mated inner and outer rack rails and an accessory bag containing screws.
The inner rail affixes to the body of the IPU-M2000 and the outer rail affixes to the vertical rack rails in the server
cabinet. Firstly, separate the mated inner and outer rails:
1) Fully extend the rails by pulling on the end which has the captive thumb screw attached (Fig. 3.11):
Version: latest (2021-11-25) 16
Table of contents
Popular Network Hardware manuals by other brands

Tiandy
Tiandy H.264 POE SERIES user manual

MicroStrain
MicroStrain V-Link-LXRS Technical notes

Munters
Munters Comm-Box Series Manual for use and maintenance

D-Link
D-Link DNS-313 user manual

New Media Technology
New Media Technology IP Power 9212 user manual

Fortinet
Fortinet FortiDDoS 3000F quick start guide

DivioTec
DivioTec SRA312-016 Series Quick installation guide

Wi-Tek
Wi-Tek WI-IOTBOX01 installation guide

Omron
Omron CS1W-SLK11-21 - 06-2004 manual

BeyondTrust
BeyondTrust B Series Installation

Interlogix
Interlogix TruVision NVR 21 user manual

Telestream
Telestream SPG9000 Installation and safety instructions