Nortel DMS-100 Series User manual

Critical Release Notice
Publication number: 297-8021-545
Publication release: Standard 14.03
The content of this customer NTP supports the
SN09 software release.
Bookmarks used in this NTP highlight the changes between the NA015 baseline and the current
release. The bookmarks provided are color-coded to identify release-specific content changes. NTP
volumes that do not contain bookmarks indicate that the NA015 baseline remains unchanged and is
valid for the current release.
Bookmark Color Legend
Orange: Applies to new or modified content for SN09 that is valid through the current
release.
Attention!
Adobe
®
Acrobat
®
Reader
™
5.0 or higher is required to view bookmarks in color.

Publication History
January 2006
Standard release 14.03 for software release SN09. Updates made for this release are shown
below:
Changed procedure: Emergency power conservation Shutdown – Showing that the XPM
unit to be powered down must match the ENET/JNET plane number that is to be powered
down.

297-8021-545
DMS-100 Family
North American DMS-100
Recovery Procedures
LET0015 and up Standard 14.02 May 2001


DMS-100 Family
North American DMS-100
Recovery Procedures
Publication number: 297-8021-545
Product release: LET0015 and up
Document release: Standard 14.02
Date: May 2001
Copyright © 1996-2001 Nortel Networks,
All Rights Reserved
Printed in the United States of America
NORTEL NETWORKS CONFIDENTIAL: The information contained herein is the property of Nortel Networks and is
strictly confidential. Except as expressly authorized in writing by Nortel Networks, the holder shall keep all information contained
herein confidential, shall disclose the information only to its employees with a need to know, and shall protect the information, in
whole or in part, from disclosure and dissemination to third parties with the same degree of care it uses to protect its own
confidential information, but with no less than reasonable care. Except as expressly authorized in writing by Nortel Networks, the
holder is granted no rights to use the information contained herein.
Information is subject to change without notice. Nortel Networks reserves the right to make changes in design or components as
progress in engineering and manufacturing may warrant. Changes or modification to the DMS-100 without the express consent of
Nortel Networks may void its warranty and void the user’s authority to operate the equipment.
Nortel Networks, the Nortel Networks logo, the Globemark, How the World Shares Ideas, Unified Networks, DMS, DMS-100,
Helmsman, MAP, Meridian, Nortel, Northern Telecom, NT, SuperNode, and TOPS are trademarks of Nortel Networks.


DMS-100 Family NA100 Recovery Procedures LET0015 and up
iii
Contents
Recovery Procedures
About this document vii
How to check the version and issue of this document vii
References in this document vii
What precautionary messages mean viii
How commands, parameters, and responses are represented ix
Input prompt (>) ix
Commands and fixed parameters ix
Variables ix
Responses ix
1 System recovery controller 1-1
About the system recovery controller 1-1
SRC functions 1-2
Required SRC conditions for PM recovery 1-4
SRC triggers 1-4
SRC dependency manager 1-5
SRC group manager 1-6
Limit of concurrent load activities 1-9
Example of automatic broadcast loading 1-9
Automatic single loading 1-10
SRC recovery methods 1-10
Series I PMs that the SRC automatically recovers 1-11
Series II PMs that the SRC automatically recovers 1-11
Series II XMS-based PMs that the SRC automatically recovers 1-12
Series III LPP-based PMs that the SRC automatically recovers 1-13
How to monitor SRC operation 1-13
MAP terminal displays 1-14
Log reports 1-29
Failure of SRC to recover a PM 1-32
Manual override of the SRC 1-33
2 System level recovery procedures 2-1
Introduction to system level recovery procedures 2-1
Explanatory and context-setting information 2-1
Summary flowchart 2-1
Step-action instructions 2-1

iv Contents
297-8021-545 Standard 14.02 May 2001
Booting a DMS switch 2-2
DMS-Spectrum Peripheral Module recovery process 2-17
Performing a cold restart 2-21
Performing a reload-restart 2-30
Performing a warm restart 2-39
Recovering a composite clock 2-48
Recovering from a dead system in a SuperNode switch 2-55
Recovering from a dead system in a SuperNode SE switch 2-86
3 Node level recovery procedures 3-1
Introduction to node level recovery procedures 3-1
Explanatory and context-setting information 3-1
Summary flowchart 3-1
Step-action instructions 3-1
Recovering the enhanced network 3-2
Recovering link peripheral processors 3-15
Recovering SuperNode SE application specific units 3-34
4 Recovery procedures for individual devices and services 4-1
Introduction to recovery procedures for individual devices and services 4-1
Explanatory and context-setting information 4-1
Summary flowchart 4-1
Step-action instructions 4-1
Checking for call completion 4-3
Checking for message throughput 4-16
MP position (integrated) recovery 4-30
MP position (standalone) recovery 4-35
PM TPC recovery 4-41
Recovering AMA data with block numbers 4-53
Recovering AMA data without DIRP block numbers 4-59
Recovering CCS7 linksets 4-76
Recovering CompuCALL 4-88
Recovering data from a disk to tape 4-96
Recovering a dead DIRP utility 4-102
Recovering enhanced link peripheral processors 4-106
Recovering a stuck HLIU or HSLR 4-114
Recovering a stuck HLIU under a composite clock failure 4-122
Recovering a stuck LIU7 4-128
Recovering volumes marked INERROR 4-137
5 Emergency power conservation recovery procedures 5-1
Introduction to emergency power conservation recovery procedures 5-1
Explanatory and context-setting information 5-1
Summary flowchart 5-1
Step-action instructions 5-1
Restoration 5-3
Restoring the CM to duplex operation in SuperNode 5-7
Restoring the CM duplex operation in SuperNode SE 5-12
Restoring the ELPP LIM to duplex operation 5-18
Restoring the junctored network to duplex operation 5-23

Contents v
DMS-100 Family NA100 Recovery Procedures LET0015 and up
Restoring the LCMs to duplex operation 5-28
measures Restoring the LGCs, LTCs, and DTCs to duplex operation 5-32
Restoring the line modules to duplex operation 5-40
Restoring the LPP LIM to duplex operation 5-45
Restoring the maintenance trunk modules to service 5-49
Restoring the MS to duplex operation 5-53
Restoring the MSB7 to duplex operation 5-58
Restoring the remote oscillator shelf to duplex operation 5-65
Restoring a SuperNode ENET to duplex operation 5-68
Restoring a SuperNode SE ENET to duplex operation 5-77
Shutdown 5-85
Emergency shutdown of DMS system 5-91
Emergency shutdown of maintenance trunk modules 5-95
Emergency shutdown of one DMS SuperNode CM plane 5-100
Emergency shutdown of one DMS SuperNode MS plane 5-107
Emergency shutdown of one enhanced network plane 5-112
Emergency shutdown of one half of a line module pair 5-121
Emergency shutdown of one junctored network plane 5-125
Emergency shutdown of one LGC, LTC, and DTC unit 5-129
Emergency shutdown of one LIM unit on each ELPP 5-137
Emergency shutdown of one LIM unit on each LPP 5-143
Emergency shutdown of one MS plane 5-149
Emergency shutdown of one remote oscillator shelf plane 5-155
Emergency shutdown of one SuperNode SE CM plane 5-159
Emergency shutdown of one SuperNode SE MS plane 5-169
Emergency shutdown of one unit of LCMs 5-174
Emergency shutdown of one unit of MSB7s 5-178

DMS-100 Family NA100 Recovery Procedures LET0015 and up
vii
About this document
How to check the version and issue of this document
The version and issue of the document are indicated by numbers, for example,
01.01.
The first two digits indicate the version. The version number increases each
time the document is updated to support a new software release. For example,
the first release of a document is 01.01. In the next software release cycle, the
first release of the same document is 02.01.
The second two digits indicate the issue. The issue number increases each
time the document is revised but rereleased in the same software release cycle.
For example, the second release of a document in the same software release
cycle is 01.02.
To determine which version of this document applies to the software in your
office and how documentation for your product is organized, check the release
information in Product Documentation Directory, 297-8991-001.
References in this document
The following documents are referred to in this document:
•Alarm Clearing and Performance Monitoring Procedures
•Bellcore Format Automatic Message Accounting Reference Guide,
297-1001-830
•Card Replacement Procedures
•DMS-100 Family Commands Reference Manual, 297-1001-822
•Log Report Reference Manual
•Magentic Tape Reference Manual, 297-1001-118
•Operational Measurements Reference Manual
•Routine Maintenance Procedures

viii
297-8021-545 Standard 14.02 May 2001
As of NA0011 (LEC and LET) and EUR010 (EUR) releases, any references
to the data schema section of the Translations Guide will be mapped to the
Customer Data Schema Reference Manual.
What precautionary messages mean
The types of precautionary messages used in Nortel Networks documents
include attention boxes and danger, warning, and caution messages.
An attention box identifies information that is necessary for the proper
performance of a procedure or task or the correct interpretation of information
or data. Danger, warning, and caution messages indicate possible risks.
Examples of the precautionary messages follow.
ATTENTION - Information needed to perform a task
DANGER - Possibility of personal injury
WARNING - Possibility of equipment damage
ATTENTION
If the unused DS-3 ports are not deprovisioned before a DS-1/VT
Mapper is installed, the DS-1 traffic will not be carried through the
DS-1/VT Mapper, even though the DS-1/VT Mapper is properly
provisioned.
DANGER
Risk of electrocution
Do not open the front panel of the inverter unless fuses F1,
F2, and F3 have been removed. The inverter contains
high-voltage lines. Until the fuses are removed, the
high-voltage lines are active, and you risk being
electrocuted.
WARNING
Damage to the backplane connector pins
Align the card before seating it, to avoid bending the
backplane connector pins. Use light thumb pressure to
align the card with the connectors. Next, use the levers on
the card to seat the card into the connectors.

ix
DMS-100 Family NA100 Recovery Procedures LET0015 and up
CAUTION - Possibility of service interruption or degradation
How commands, parameters, and responses are represented
Commands, parameters, and responses in this document conform to the
following conventions.
Input prompt (>)
An input prompt (>) indicates that the information that follows is a command:
>BSY
Commands and fixed parameters
Commands and fixed parameters that are entered at a MAP terminal are shown
in uppercase letters:
>BSY CTRL
Variables
Variables are shown in lowercase letters:
>BSY CTRL ctrl_no
The letters or numbers that the variable represents must be entered. Each
variable is explained in a list that follows the command string.
Responses
Responses correspond to the MAP display and are shown in a different type:
FP 3 Busy CTRL 0: Command request has been submitted.
FP 3 Busy CTRL 0: Command passed.
CAUTION
Possible loss of service
Before continuing, confirm that you are removing the card
from the inactive unit of the peripheral module.
Subscriber service will be lost if you remove a card from
the active unit.


DMS-100 Family NA100 Recovery Procedures LET0015 and up
1-1
1 System recovery controller
This chapter describes the operation of the system recovery controller (SRC).
The subsequent sections explain the operation of the SRC as follows:
About the system recovery controller on this page describes the SRC
functions, triggers, and dependencies.
SRC recovery methods in this document describes how the SRC recovers PMs,
and lists the PMs that the SRC recovers.
HowtomonitorSRCoperationinthisdocumentdescribestheresponseson the
MAP (maintenance and administration position) display when the SRC is
recovering a node. This section also describes the logs generated during node
recovery, SRC failure, and how to manually override the SRC.
About the system recovery controller
The SRC coordinates recovery activities in a DMS switch. The SRC
optimizes recovery through the correct use of resources and automatic
operation.
The SRC coordinates the recovery of nodes in the DMS switch so that when
one node is dependant on another for operation, the node which is depended
upon must be inservice before a recovery attempt is made on the dependant
node. As it progresses through the dependancy hierarchy, the SRC schedules
recovery activities to run at appropriate times, thereby reducing the length of
outages.
The SRC makes several attempts to recover a node. During each recovery
attempt, the SRC performs a more detailed analysis. If necessary, the SRC
reloads a node's software and returns the node to service as part of a full
recovery process. When the SRC reloads a node, removal of the node from
service occurs for a period of time, so the SRC only reloads nodes when
required.

1-2 System recovery controller
297-8021-545 Standard 14.02 May 2001
SRC functions
The SRC coordinates the recovery activities of different subsystems outside
the DMS-core, also refered to as the computing module (CM).
The subsystems include the following:
• the message switch (MS)
• network (JNET or ENET)
• series I, II, and III peripheral modules (PM).
Figure Figure 1-1, "System recovery controller" on page 1-3 shows how the
SRC interfaces with the DMS-core and with the subsystems.
The SRC performs the following functions:
• The dependency manager of the SRC enforces inter-subsytem
dependencies. Before the SRC recovers a node, the subsystems, the
subsystems that the node depends on must be operating.
• The group manager groups nodes for broadcast loading in conditions
where the process applies. The SRC sends common commands to a group
of nodes at the same time, instead of one after another.
• The concurrent activity manager balances the amount of recovery work
against other activities that occur on the switch. The SRC attempts to
recover as many critical subsystems as the CM allows.
• The SRC initiates recovery applications and monitors each step in the
application to make sure that the application ends quickly.
The SRC coordinates two separate activities for series II XMS-based PMs
(XPM) and line concentrating modules (LCM):
• system recovery of PM nodes after core restart or core switch of activity
through the use of the dependency manager
Note: System recovery of DLMs and IPEs does not always occur on
core switch of activity.
• loading of PM units after system maintenance detects a load loss through
the use of the group manager
For LCMs, an audit verifies the node status of each LCM unit before the
execution of the recovery activity. If both units are SysB, the audit executes
and forces the units into service. If one unit is SysB, an evaluation of the fault
occurs and the SRC attempts a recovery. A recovery attempt occurs a
maximum of three times in 1-min intervals.

System recovery controller 1-3
DMS-100 Family NA100 Recovery Procedures LET0015 and up
Oneconnection is present between the twoactivitiesthattheSRC coordinates.
The connection is that PM maintenance initiated through the dependency
manager can lead to loading of one or more PM units.
Figure 1-1 System recovery controller
Series II PM
maintenance
Series I PM
maintenance Message
switch
maintenance
Dependency manager Concurrent activity
manager
Group manager
Database
System recovery controller
DMS core
Subsystems
Switch
operating system
Network
maintenance
Series III PM
maintenance

1-4 System recovery controller
297-8021-545 Standard 14.02 May 2001
Required SRC conditions for PM recovery
The PM recovery that the SRC coordinates requires the following conditions:
• all equipment must have power
• for automatic broadcast loading, series II XPMs must have NT6X45BA or
newer processor cards installed
Note: Series II XPMs with pre-NT6X45BA control cards are loaded
one by one instead of in groups for broadcast loading.
• all PM load names (including series I PM load names) must be entered in
table PMLOADS
SRC triggers
The following events trigger the SRC to query and, if necessary, begin
recovery of subsystems:
• warm restart of the core
• cold restart of the core
• reload-restart of the core
• loss of load in a PM
• manual RESTART SWACT, ABORT SWACT, or NORESTART SWACT
of the core
Additional SRC triggers to load series II XMS-based PMs again
There are four additional triggers for the SRC to reload series II XPMs:
• the XPM reports a memory parity error during a periodic audit by the
switch operating system
• the ROM/RAM query step in the series II XPM return-to-service task
detects a loss of load
• the initialization of the series II XPM during a return-to-service task fails
two consecutive times. This failure indicates a problem with the software
load
• the ROM/RAM query step in the series II XPM system busy task detects a
load loss
Core restarts
During a restart, the switch operating system initializes again. Reinitialization
restores the operating system software and the subsystems outside the
DMS-core to a known, steady state.
A system restart includes initialization of the modules in the CM, MS,
network, and PMs. A system restart also includes the restoration of services.

System recovery controller 1-5
DMS-100 Family NA100 Recovery Procedures LET0015 and up
Theperiodof a restart is the time taken to recoverthe whole system to the point
that all services are available again.
The symbol A1 flashes on the reset terminal interface (RTIF) when
initialization of the software in the CM is complete. The recovery for the rest
of the system starts after the flashing A1 appears.
The following list describes how each type of restart affects calls in progress
and billing data:
• A warm restart of the core is the least severe of restart. Audits of XPMs
occur. The XPMs remain in service during a warm restart. Calls in
progress that reached the talking state continue. Any calls that did not
reach the talking state are disconnected. Any calls that disconnect during
the restart are disconnected after the restart is complete and the system
records billing data.
• A cold restart of the core is more severe than a warm restart. Audits of
XPMs occur. The XPMs remain in service during a cold restart. Calls that
reach the talking state retain the connections during the restarts. The calls
can disconnect if their connections are used again by new calls after the
restart . There is no record made of calls in progress during a cold restart
and no billing data is recorded for these calls. A manual cold restart occurs
on DTCs while the equipment is in service. This manual cold restart
meansthatallcallsaredropped,butthetwounits are removedfrom service
one at a time. This process minimizes the length of the XPM outage.
• A reload-restart of the core is the most severe restart. During a
reload-restart, all PMs initialize again. All calls in progress are dropped.
Loss of billing data for the dropped calls occurs.
Loss of load in a PM
The removal of a card loaded with software causes a loss of load in a PM. The
interruption of power to a card loaded with software also causes a loss of load
in a PM. A PM becomes system busy when a loss of load occurs. The SRC
begins recovery when system maintenance detects a loss of load.
Manual commands
The SRC initializes PMs again after the use of one of the following manual
commands during an upgrade of BCS software:
• RESTART SWACT
• ABORT SWACT
• NORESTART SWACT
SRC dependency manager
For some recovery actions on objects to occur, other objects must be in a given
state to support the action. The dependency manager of the SRC uses the set

1-6 System recovery controller
297-8021-545 Standard 14.02 May 2001
of dependencies that applies to the type of restart to manage object
dependencies . The SRC dependency manager prevents failure caused by
early starts. The SRC dependency manager also reduces recovery times.
An object is any entity in the DMS switch. An object can be:
• physical (for example, an ENET plane, an XPM, an IPML, or a set of lines)
• a service (for example, line trunk server [LTS] call processing)
• software (for example, an entry code)
• an event, (for example, the initialization of core software)
Management of dependencies
The action on the dependent object must not proceed until the object depended
on is in the required state. The dependency manager makes sure that the object
satisfies the requirements for an action on an object, before the action
proceeds.
Dependencies are specified for each action for each object. Examples of
dependencies in DMS include
• one part of the software that must initialize before another
• ordered initialization of nodes to make sure that paths to the nodes are in
service before a recovery attempt of a node occurs
• data that must download to a node after other nodes return to service
• the recovery of a service in one node after the recovery of other parts of the
service in other nodes
A dependency can change when for one type of recovery needs it but not
another. For example, an action can have different dependencies in different
restart types. The SRC provides the applications with the means of indicating
which dependencies are applicable.
SRC group manager
PMs can be grouped for bulk maintenance action. The group manager
coordinates the PM groups. The group manager designates one PM as the
“seed" PM in a group. The CM sends messages to the seed PM. The seed PM
forwards the messages to the other PMs in the group.
Note: Series I PMs do not support broadcast loading. The group manager
does not group the series I PMs.

System recovery controller 1-7
DMS-100 Family NA100 Recovery Procedures LET0015 and up
The group manager uses several standards when it groups PMs for a bulk
action. For example, when the SRC broadcast loads to nodes, the group
manager can use the following:
• a group of PMs with the same node type
• a group of PMs with the same load file name
• a group of PMs with the same loading method
How to group series II XMS-based PMs
The group manager uses the following standards when it groups series II
XPMs together for broadcast loading:
• the load file name
• the CMR (class modem resource) file name
• the presence of 6X45BA or higher controller cards
For example, two XPMs can have the same load file name and have
NT6X45BA controller cards, but have different CMR file names. The group
manager puts these two XPMs in different groups.
The XPMs that cannot be in the same group with other XPM units for
broadcast loading are single-loaded. The XPMs are single-loaded when the
XPM units do not have the hardware to support broadcast loading. The XPMs
are also single-loaded when the XPMs cannot be in a group with other units
during dynamic grouping. The group manager only groups the XPMs that
have NT6X45BA or higher controller cards. The XPMs that do not have
NT6X45BA or higher controller cards are not in the same group as other
XPMs. This condition occur even if the other XPMs have the same load files.
The SRC continues to coordinate single loading for purposes of concurrency
management.
Static and dynamic groups
You can identify PMs in the same group from datafill. The datafill specifies
the load file names and hardware configurations of the PMs. The system
maintains these static groups automatically over time as the datafill changes.
During recovery, the SRC forms dynamic groups from the subgroups based on
which elements require recovery and availability of resources to perform the
recovery.
Broadcast loading
Broadcast loading is a bulk action. Broadcast loading can operate on more
than one PM at the same time. To save time, the SRC performs an action on a
group of PMs instead of many separate PMs.
Automatic broadcast loading sends a request to load software to several PMs
at the same time.
Other manuals for DMS-100 Series
19
Table of contents
Other Nortel Conference System manuals

Nortel
Nortel BCM50 Product manual

Nortel
Nortel 3100 Series User manual

Nortel
Nortel Meridian Norstar User manual

Nortel
Nortel NN42030-102 User manual

Nortel
Nortel NN42030-101 User manual

Nortel
Nortel Meridian Meridian 1 User manual

Nortel
Nortel Meridian 1 Mini-Carrier Remote Operating and installation instructions

Nortel
Nortel NN42030-107 User manual

Nortel
Nortel Meridian 1 Option 11C Technical manual

Nortel
Nortel BCM200 User manual