Nortel Dms-100 series User manual

Critical Release Notice

Publication number: 297-8021-545

Publication release: Standard 14.03

The content of this customer NTP supports the

SN09 software release.

Bookmarks used in this NTP highlight the changes between the NA015 baseline and the current

release. The bookmarks provided are color-coded to identify release-specific content changes. NTP

volumes that do not contain bookmarks indicate that the NA015 baseline remains unchanged and is

valid for the current release.

Bookmark Color Legend

Orange: Applies to new or modified content for SN09 that is valid through the current

release.

Attention!

Adobe

Acrobat

Reader

™

5.0 or higher is required to view bookmarks in color.

Publication History

January 2006

Standard release 14.03 for software release SN09. Updates made for this release are shown

below:

Changed procedure: Emergency power conservation Shutdown – Showing that the XPM

unit to be powered down must match the ENET/JNET plane number that is to be powered

down.

297-8021-545

DMS-100 Family

North American DMS-100

Recovery Procedures

LET0015 and up Standard 14.02 May 2001

DMS-100 Family

North American DMS-100

Recovery Procedures

Publication number: 297-8021-545

Product release: LET0015 and up

Document release: Standard 14.02

Date: May 2001

Printed in the United States of America

NORTEL NETWORKS CONFIDENTIAL: The information contained herein is the property of Nortel Networks and is

strictly conﬁdential. Except as expressly authorized in writing by Nortel Networks, the holder shall keep all information contained

herein conﬁdential, shall disclose the information only to its employees with a need to know, and shall protect the information, in

whole or in part, from disclosure and dissemination to third parties with the same degree of care it uses to protect its own

conﬁdential information, but with no less than reasonable care. Except as expressly authorized in writing by Nortel Networks, the

holder is granted no rights to use the information contained herein.

Information is subject to change without notice. Nortel Networks reserves the right to make changes in design or components as

progress in engineering and manufacturing may warrant. Changes or modiﬁcation to the DMS-100 without the express consent of

Nortel Networks may void its warranty and void the user’s authority to operate the equipment.

Nortel Networks, the Nortel Networks logo, the Globemark, How the World Shares Ideas, Uniﬁed Networks, DMS, DMS-100,

Helmsman, MAP, Meridian, Nortel, Northern Telecom, NT, SuperNode, and TOPS are trademarks of Nortel Networks.

DMS-100 Family NA100 Recovery Procedures LET0015 and up

iii

Contents

Recovery Procedures

About this document vii

How to check the version and issue of this document vii

References in this document vii

What precautionary messages mean viii

How commands, parameters, and responses are represented ix

Input prompt (>) ix

Commands and fixed parameters ix

Variables ix

Responses ix

1 System recovery controller 1-1

About the system recovery controller 1-1

SRC functions 1-2

Required SRC conditions for PM recovery 1-4

SRC triggers 1-4

SRC dependency manager 1-5

SRC group manager 1-6

Limit of concurrent load activities 1-9

Example of automatic broadcast loading 1-9

Automatic single loading 1-10

SRC recovery methods 1-10

Series I PMs that the SRC automatically recovers 1-11

Series II PMs that the SRC automatically recovers 1-11

Series II XMS-based PMs that the SRC automatically recovers 1-12

Series III LPP-based PMs that the SRC automatically recovers 1-13

How to monitor SRC operation 1-13

MAP terminal displays 1-14

Log reports 1-29

Failure of SRC to recover a PM 1-32

Manual override of the SRC 1-33

2 System level recovery procedures 2-1

Introduction to system level recovery procedures 2-1

Explanatory and context-setting information 2-1

Summary flowchart 2-1

Step-action instructions 2-1

iv Contents

297-8021-545 Standard 14.02 May 2001

Booting a DMS switch 2-2

DMS-Spectrum Peripheral Module recovery process 2-17

Performing a cold restart 2-21

Performing a reload-restart 2-30

Performing a warm restart 2-39

Recovering a composite clock 2-48

Recovering from a dead system in a SuperNode switch 2-55

Recovering from a dead system in a SuperNode SE switch 2-86

3 Node level recovery procedures 3-1

Introduction to node level recovery procedures 3-1

Explanatory and context-setting information 3-1

Summary flowchart 3-1

Step-action instructions 3-1

Recovering the enhanced network 3-2

Recovering link peripheral processors 3-15

Recovering SuperNode SE application specific units 3-34

4 Recovery procedures for individual devices and services 4-1

Introduction to recovery procedures for individual devices and services 4-1

Explanatory and context-setting information 4-1

Summary flowchart 4-1

Step-action instructions 4-1

Checking for call completion 4-3

Checking for message throughput 4-16

MP position (integrated) recovery 4-30

MP position (standalone) recovery 4-35

PM TPC recovery 4-41

Recovering AMA data with block numbers 4-53

Recovering AMA data without DIRP block numbers 4-59

Recovering CCS7 linksets 4-76

Recovering CompuCALL 4-88

Recovering data from a disk to tape 4-96

Recovering a dead DIRP utility 4-102

Recovering enhanced link peripheral processors 4-106

Recovering a stuck HLIU or HSLR 4-114

Recovering a stuck HLIU under a composite clock failure 4-122

Recovering a stuck LIU7 4-128

Recovering volumes marked INERROR 4-137

5 Emergency power conservation recovery procedures 5-1

Introduction to emergency power conservation recovery procedures 5-1

Explanatory and context-setting information 5-1

Summary flowchart 5-1

Step-action instructions 5-1

Restoration 5-3

Restoring the CM to duplex operation in SuperNode 5-7

Restoring the CM duplex operation in SuperNode SE 5-12

Restoring the ELPP LIM to duplex operation 5-18

Restoring the junctored network to duplex operation 5-23

Contents v

DMS-100 Family NA100 Recovery Procedures LET0015 and up

Restoring the LCMs to duplex operation 5-28

measures Restoring the LGCs, LTCs, and DTCs to duplex operation 5-32

Restoring the line modules to duplex operation 5-40

Restoring the LPP LIM to duplex operation 5-45

Restoring the maintenance trunk modules to service 5-49

Restoring the MS to duplex operation 5-53

Restoring the MSB7 to duplex operation 5-58

Restoring the remote oscillator shelf to duplex operation 5-65

Restoring a SuperNode ENET to duplex operation 5-68

Restoring a SuperNode SE ENET to duplex operation 5-77

Shutdown 5-85

Emergency shutdown of DMS system 5-91

Emergency shutdown of maintenance trunk modules 5-95

Emergency shutdown of one DMS SuperNode CM plane 5-100

Emergency shutdown of one DMS SuperNode MS plane 5-107

Emergency shutdown of one enhanced network plane 5-112

Emergency shutdown of one half of a line module pair 5-121

Emergency shutdown of one junctored network plane 5-125

Emergency shutdown of one LGC, LTC, and DTC unit 5-129

Emergency shutdown of one LIM unit on each ELPP 5-137

Emergency shutdown of one LIM unit on each LPP 5-143

Emergency shutdown of one MS plane 5-149

Emergency shutdown of one remote oscillator shelf plane 5-155

Emergency shutdown of one SuperNode SE CM plane 5-159

Emergency shutdown of one SuperNode SE MS plane 5-169

Emergency shutdown of one unit of LCMs 5-174

Emergency shutdown of one unit of MSB7s 5-178

DMS-100 Family NA100 Recovery Procedures LET0015 and up

vii

About this document

How to check the version and issue of this document

The version and issue of the document are indicated by numbers, for example,

01.01.

The ﬁrst two digits indicate the version. The version number increases each

time the document is updated to support a new software release. For example,

the ﬁrst release of a document is 01.01. In the next software release cycle, the

ﬁrst release of the same document is 02.01.

The second two digits indicate the issue. The issue number increases each

time the document is revised but rereleased in the same software release cycle.

For example, the second release of a document in the same software release

cycle is 01.02.

To determine which version of this document applies to the software in your

ofﬁce and how documentation for your product is organized, check the release

information in Product Documentation Directory, 297-8991-001.

References in this document

The following documents are referred to in this document:

•Alarm Clearing and Performance Monitoring Procedures

•Bellcore Format Automatic Message Accounting Reference Guide,

297-1001-830

•Card Replacement Procedures

•DMS-100 Family Commands Reference Manual, 297-1001-822

•Log Report Reference Manual

•Magentic Tape Reference Manual, 297-1001-118

•Operational Measurements Reference Manual

•Routine Maintenance Procedures

viii

297-8021-545 Standard 14.02 May 2001

As of NA0011 (LEC and LET) and EUR010 (EUR) releases, any references

to the data schema section of the Translations Guide will be mapped to the

Customer Data Schema Reference Manual.

What precautionary messages mean

The types of precautionary messages used in Nortel Networks documents

include attention boxes and danger, warning, and caution messages.

An attention box identiﬁes information that is necessary for the proper

performance of a procedure or task or the correct interpretation of information

or data. Danger, warning, and caution messages indicate possible risks.

Examples of the precautionary messages follow.

ATTENTION - Information needed to perform a task

DANGER - Possibility of personal injury

WARNING - Possibility of equipment damage

ATTENTION

If the unused DS-3 ports are not deprovisioned before a DS-1/VT

Mapper is installed, the DS-1 trafﬁc will not be carried through the

DS-1/VT Mapper, even though the DS-1/VT Mapper is properly

provisioned.

DANGER

Risk of electrocution

Do not open the front panel of the inverter unless fuses F1,

F2, and F3 have been removed. The inverter contains

high-voltage lines. Until the fuses are removed, the

high-voltage lines are active, and you risk being

electrocuted.

WARNING

Damage to the backplane connector pins

Align the card before seating it, to avoid bending the

backplane connector pins. Use light thumb pressure to

align the card with the connectors. Next, use the levers on

the card to seat the card into the connectors.

DMS-100 Family NA100 Recovery Procedures LET0015 and up

CAUTION - Possibility of service interruption or degradation

How commands, parameters, and responses are represented

Commands, parameters, and responses in this document conform to the

following conventions.

Input prompt (>)

An input prompt (>) indicates that the information that follows is a command:

>BSY

Commands and ﬁxed parameters

Commands and ﬁxed parameters that are entered at a MAP terminal are shown

in uppercase letters:

>BSY CTRL

Variables

Variables are shown in lowercase letters:

>BSY CTRL ctrl_no

The letters or numbers that the variable represents must be entered. Each

variable is explained in a list that follows the command string.

Responses

Responses correspond to the MAP display and are shown in a different type:

FP 3 Busy CTRL 0: Command request has been submitted.

FP 3 Busy CTRL 0: Command passed.

CAUTION

Possible loss of service

Before continuing, conﬁrm that you are removing the card

from the inactive unit of the peripheral module.

Subscriber service will be lost if you remove a card from

the active unit.

DMS-100 Family NA100 Recovery Procedures LET0015 and up

1-1

1 System recovery controller

This chapter describes the operation of the system recovery controller (SRC).

The subsequent sections explain the operation of the SRC as follows:

About the system recovery controller on this page describes the SRC

functions, triggers, and dependencies.

SRC recovery methods in this document describes how the SRC recovers PMs,

and lists the PMs that the SRC recovers.

HowtomonitorSRCoperationinthisdocumentdescribestheresponseson the

MAP (maintenance and administration position) display when the SRC is

recovering a node. This section also describes the logs generated during node

recovery, SRC failure, and how to manually override the SRC.

About the system recovery controller

The SRC coordinates recovery activities in a DMS switch. The SRC

optimizes recovery through the correct use of resources and automatic

operation.

The SRC coordinates the recovery of nodes in the DMS switch so that when

one node is dependant on another for operation, the node which is depended

upon must be inservice before a recovery attempt is made on the dependant

node. As it progresses through the dependancy hierarchy, the SRC schedules

recovery activities to run at appropriate times, thereby reducing the length of

outages.

The SRC makes several attempts to recover a node. During each recovery

attempt, the SRC performs a more detailed analysis. If necessary, the SRC

reloads a node's software and returns the node to service as part of a full

recovery process. When the SRC reloads a node, removal of the node from

service occurs for a period of time, so the SRC only reloads nodes when

required.

1-2 System recovery controller

297-8021-545 Standard 14.02 May 2001

SRC functions

The SRC coordinates the recovery activities of different subsystems outside

the DMS-core, also refered to as the computing module (CM).

The subsystems include the following:

• the message switch (MS)

• network (JNET or ENET)

• series I, II, and III peripheral modules (PM).

Figure Figure 1-1, "System recovery controller" on page 1-3 shows how the

SRC interfaces with the DMS-core and with the subsystems.

The SRC performs the following functions:

• The dependency manager of the SRC enforces inter-subsytem

dependencies. Before the SRC recovers a node, the subsystems, the

subsystems that the node depends on must be operating.

• The group manager groups nodes for broadcast loading in conditions

where the process applies. The SRC sends common commands to a group

of nodes at the same time, instead of one after another.

• The concurrent activity manager balances the amount of recovery work

against other activities that occur on the switch. The SRC attempts to

recover as many critical subsystems as the CM allows.

• The SRC initiates recovery applications and monitors each step in the

application to make sure that the application ends quickly.

The SRC coordinates two separate activities for series II XMS-based PMs

(XPM) and line concentrating modules (LCM):

• system recovery of PM nodes after core restart or core switch of activity

through the use of the dependency manager

Note: System recovery of DLMs and IPEs does not always occur on

core switch of activity.

• loading of PM units after system maintenance detects a load loss through

the use of the group manager

For LCMs, an audit veriﬁes the node status of each LCM unit before the

execution of the recovery activity. If both units are SysB, the audit executes

and forces the units into service. If one unit is SysB, an evaluation of the fault

occurs and the SRC attempts a recovery. A recovery attempt occurs a

maximum of three times in 1-min intervals.

System recovery controller 1-3

DMS-100 Family NA100 Recovery Procedures LET0015 and up

Oneconnection is present between the twoactivitiesthattheSRC coordinates.

The connection is that PM maintenance initiated through the dependency

manager can lead to loading of one or more PM units.

Figure 1-1 System recovery controller

Series II PM

maintenance

Series I PM

maintenance Message

switch

maintenance

Dependency manager Concurrent activity

manager

Group manager

Database

System recovery controller

DMS core

Subsystems

Switch

operating system

Network

maintenance

Series III PM

maintenance

1-4 System recovery controller

297-8021-545 Standard 14.02 May 2001

Required SRC conditions for PM recovery

The PM recovery that the SRC coordinates requires the following conditions:

• all equipment must have power

• for automatic broadcast loading, series II XPMs must have NT6X45BA or

newer processor cards installed

Note: Series II XPMs with pre-NT6X45BA control cards are loaded

one by one instead of in groups for broadcast loading.

• all PM load names (including series I PM load names) must be entered in

table PMLOADS

SRC triggers

The following events trigger the SRC to query and, if necessary, begin

recovery of subsystems:

• warm restart of the core

• cold restart of the core

• reload-restart of the core

• loss of load in a PM

• manual RESTART SWACT, ABORT SWACT, or NORESTART SWACT

of the core

Additional SRC triggers to load series II XMS-based PMs again

There are four additional triggers for the SRC to reload series II XPMs:

• the XPM reports a memory parity error during a periodic audit by the

switch operating system

• the ROM/RAM query step in the series II XPM return-to-service task

detects a loss of load

• the initialization of the series II XPM during a return-to-service task fails

two consecutive times. This failure indicates a problem with the software

load

• the ROM/RAM query step in the series II XPM system busy task detects a

load loss

Core restarts

During a restart, the switch operating system initializes again. Reinitialization

restores the operating system software and the subsystems outside the

DMS-core to a known, steady state.

A system restart includes initialization of the modules in the CM, MS,

network, and PMs. A system restart also includes the restoration of services.

System recovery controller 1-5

DMS-100 Family NA100 Recovery Procedures LET0015 and up

Theperiodof a restart is the time taken to recoverthe whole system to the point

that all services are available again.

The symbol A1 ﬂashes on the reset terminal interface (RTIF) when

initialization of the software in the CM is complete. The recovery for the rest

of the system starts after the ﬂashing A1 appears.

The following list describes how each type of restart affects calls in progress

and billing data:

• A warm restart of the core is the least severe of restart. Audits of XPMs

occur. The XPMs remain in service during a warm restart. Calls in

progress that reached the talking state continue. Any calls that did not

reach the talking state are disconnected. Any calls that disconnect during

the restart are disconnected after the restart is complete and the system

records billing data.

• A cold restart of the core is more severe than a warm restart. Audits of

XPMs occur. The XPMs remain in service during a cold restart. Calls that

reach the talking state retain the connections during the restarts. The calls

can disconnect if their connections are used again by new calls after the

restart . There is no record made of calls in progress during a cold restart

and no billing data is recorded for these calls. A manual cold restart occurs

on DTCs while the equipment is in service. This manual cold restart

meansthatallcallsaredropped,butthetwounits are removedfrom service

one at a time. This process minimizes the length of the XPM outage.

• A reload-restart of the core is the most severe restart. During a

reload-restart, all PMs initialize again. All calls in progress are dropped.

Loss of billing data for the dropped calls occurs.

Loss of load in a PM

The removal of a card loaded with software causes a loss of load in a PM. The

interruption of power to a card loaded with software also causes a loss of load

in a PM. A PM becomes system busy when a loss of load occurs. The SRC

begins recovery when system maintenance detects a loss of load.

Manual commands

The SRC initializes PMs again after the use of one of the following manual

commands during an upgrade of BCS software:

• RESTART SWACT

• ABORT SWACT

• NORESTART SWACT

SRC dependency manager

For some recovery actions on objects to occur, other objects must be in a given

state to support the action. The dependency manager of the SRC uses the set

1-6 System recovery controller

297-8021-545 Standard 14.02 May 2001

of dependencies that applies to the type of restart to manage object

dependencies . The SRC dependency manager prevents failure caused by

early starts. The SRC dependency manager also reduces recovery times.

An object is any entity in the DMS switch. An object can be:

• physical (for example, an ENET plane, an XPM, an IPML, or a set of lines)

• a service (for example, line trunk server [LTS] call processing)

• software (for example, an entry code)

• an event, (for example, the initialization of core software)

Management of dependencies

The action on the dependent object must not proceed until the object depended

on is in the required state. The dependency manager makes sure that the object

satisﬁes the requirements for an action on an object, before the action

proceeds.

Dependencies are speciﬁed for each action for each object. Examples of

dependencies in DMS include

• one part of the software that must initialize before another

• ordered initialization of nodes to make sure that paths to the nodes are in

service before a recovery attempt of a node occurs

• data that must download to a node after other nodes return to service

• the recovery of a service in one node after the recovery of other parts of the

service in other nodes

A dependency can change when for one type of recovery needs it but not

another. For example, an action can have different dependencies in different

restart types. The SRC provides the applications with the means of indicating

which dependencies are applicable.

SRC group manager

PMs can be grouped for bulk maintenance action. The group manager

coordinates the PM groups. The group manager designates one PM as the

“seed" PM in a group. The CM sends messages to the seed PM. The seed PM

forwards the messages to the other PMs in the group.

Note: Series I PMs do not support broadcast loading. The group manager

does not group the series I PMs.

System recovery controller 1-7

DMS-100 Family NA100 Recovery Procedures LET0015 and up

The group manager uses several standards when it groups PMs for a bulk

action. For example, when the SRC broadcast loads to nodes, the group

manager can use the following:

• a group of PMs with the same node type

• a group of PMs with the same load ﬁle name

• a group of PMs with the same loading method

How to group series II XMS-based PMs

The group manager uses the following standards when it groups series II

XPMs together for broadcast loading:

• the load ﬁle name

• the CMR (class modem resource) ﬁle name

• the presence of 6X45BA or higher controller cards

For example, two XPMs can have the same load ﬁle name and have

NT6X45BA controller cards, but have different CMR ﬁle names. The group

manager puts these two XPMs in different groups.

The XPMs that cannot be in the same group with other XPM units for

broadcast loading are single-loaded. The XPMs are single-loaded when the

XPM units do not have the hardware to support broadcast loading. The XPMs

are also single-loaded when the XPMs cannot be in a group with other units

during dynamic grouping. The group manager only groups the XPMs that

have NT6X45BA or higher controller cards. The XPMs that do not have

NT6X45BA or higher controller cards are not in the same group as other

XPMs. This condition occur even if the other XPMs have the same load ﬁles.

The SRC continues to coordinate single loading for purposes of concurrency

management.

Static and dynamic groups

You can identify PMs in the same group from dataﬁll. The dataﬁll speciﬁes

the load ﬁle names and hardware conﬁgurations of the PMs. The system

maintains these static groups automatically over time as the dataﬁll changes.

During recovery, the SRC forms dynamic groups from the subgroups based on

which elements require recovery and availability of resources to perform the

recovery.

Broadcast loading

Broadcast loading is a bulk action. Broadcast loading can operate on more

than one PM at the same time. To save time, the SRC performs an action on a

group of PMs instead of many separate PMs.

Automatic broadcast loading sends a request to load software to several PMs

at the same time.

Other manuals for DMS-100 Series

Table of contents

Other Nortel Conference System manuals

Nortel

Nortel Meridian 1 Option 11C Technical manual

Nortel

Nortel 3100 Series User manual

Nortel

Nortel NN42030-107 User manual

Nortel

Nortel NN42030-101 User manual

Nortel

Nortel BCM50 Product manual

Nortel

Nortel Meridian 1 Mini-Carrier Remote Operating and installation instructions

Nortel

Nortel BCM200 User manual

Nortel

Nortel Agent Greeting 2.0 Operating instructions

Nortel

Nortel Meridian Norstar User manual

Nortel

Nortel Meridian Meridian 1 User manual

Nortel

Nortel NN42030-102 User manual

Popular Conference System manuals by other brands

Linear

Linear DMC1PACK instructions

Cisco

Cisco Webex Room Kit Mini installation guide

Crestron

Crestron UC-MX70-NC-Z quick start

Mitsubishi Electric

Mitsubishi Electric GOT 1000 Series user manual

Polycom

Polycom V500 Getting started guide

AoFrio

AoFrio Network Pro ONE installation guide

Matrix Telecom

Matrix Telecom ETERNITY NE System manual

Toa

Toa TS-D1000 Series instruction manual

Cisco

Cisco TelePresence IX5000 First-time setup

Panasonic

Panasonic KX-VC1300 user manual

Sena

Sena SMH5 MULTICOM quick start guide

Mitel

Mitel SX-200 ICP - 1.0 5020 Technician's handbook

Bose Professional

Bose Professional VIDEOBAR VB-S owner's guide

Cardo Systems

Cardo Systems PACKTALK Pocket guide

Grandstream Networks

Grandstream Networks GXV34 0 Series manual

Polycom

Polycom RealPresence Practitioner Cart 8000 user guide

MKS

MKS EtherCAT 390 Series User instruction manual

Lucent Technologies

Lucent Technologies MERLIN LEGEND Release 7.0 System programming

Nortel DMS-100 Series User manual

Popular Conference System manuals by other brands