Scotch brand 5.1.10 User manual

Scotch and libScotch 5.1 User’s Guide

(version 5.1.10)

Fran¸cois Pellegrini

Bacchus team, INRIA Bordeaux Sud-Ouest

ENSEIRB & LaBRI, UMR CNRS 5800

Universit´e Bordeaux I

351 cours de la Lib´eration, 33405 TALENCE, FRANCE

pelegrin@labri.fr

August 29, 2010

Abstract

This document describes the capabilities and operations of Scotch and

libScotch, a software package and a software library devoted to static

mapping, partitioning, and sparse matrix block ordering of graphs and

meshes/hypergraphs. It gives brief descriptions of the algorithms, details

the input/output formats, instructions for use, installation procedures, and

provides a number of examples.

Scotch is distributed as free/libre software, and has been designed such

that new partitioning or ordering methods can be added in a straightforward

manner. It can therefore be used as a testbed for the easy and quick coding

and testing of such new methods, and may also be redistributed, as a library,

along with third-party software that makes use of it, either in its original or

in updated forms.

Contents

1 Introduction 6

1.1 Static mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.2 Sparse matrix ordering . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3 Contents of this document . . . . . . . . . . . . . . . . . . . . . . . . 7

2 The Scotch project 7

2.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Algorithms 8

3.1 Static mapping by Dual Recursive Bipartitioning . . . . . . . . . . . 8

3.1.1 Static mapping . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.1.2 Cost function and performance criteria . . . . . . . . . . . . . 8

3.1.3 The Dual Recursive Bipartitioning algorithm . . . . . . . . . 9

3.1.4 Partial cost function . . . . . . . . . . . . . . . . . . . . . . . 11

3.1.5 Execution scheme . . . . . . . . . . . . . . . . . . . . . . . . 11

3.1.6 Graph bipartitioning methods . . . . . . . . . . . . . . . . . . 12

3.1.7 Mapping onto variable-sized architectures . . . . . . . . . . . 15

3.2 Sparse matrix ordering by hybrid incomplete nested dissection . . . 15

3.2.1 Minimum Degree . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.2.2 Nested dissection . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.2.3 Hybridization . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.2.4 Performance criteria . . . . . . . . . . . . . . . . . . . . . . . 16

3.2.5 Ordering methods . . . . . . . . . . . . . . . . . . . . . . . . 17

3.2.6 Graph separation methods . . . . . . . . . . . . . . . . . . . . 18

4 Updates 18

4.1 Changes from version 4.0 . . . . . . . . . . . . . . . . . . . . . . . . 18

4.2 Changes from version 5.0 . . . . . . . . . . . . . . . . . . . . . . . . 19

5 Files and data structures 19

5.1 Graph ﬁles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

5.2 Mesh ﬁles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

5.3 Geometry ﬁles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

5.4 Target ﬁles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

5.4.1 Decomposition-deﬁned architecture ﬁles . . . . . . . . . . . . 23

5.4.2 Algorithmically-coded architecture ﬁles . . . . . . . . . . . . 24

5.4.3 Variable-sized architecture ﬁles . . . . . . . . . . . . . . . . . 25

5.5 Mapping ﬁles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

5.6 Ordering ﬁles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5.7 Vertex list ﬁles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

6 Programs 27

6.1 Invocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

6.2 Using compressed ﬁles . . . . . . . . . . . . . . . . . . . . . . . . . . 29

6.3 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

6.3.1 acpl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

6.3.2 amk * . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

6.3.3 amk grf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

6.3.4 atst . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

6.3.5 gcv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

6.3.6 gmap /gpart . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

6.3.7 gmk * . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

6.3.8 gmk msh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

6.3.9 gmtst . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

6.3.10 gord . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

6.3.11 gotst . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

6.3.12 gout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

6.3.13 gtst . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

6.3.14 mcv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

6.3.15 mmk * . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

6.3.16 mord . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

6.3.17 mtst . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

7 Library 46

7.1 Calling the routines of libScotch . . . . . . . . . . . . . . . . . . . 47

7.1.1 Calling from C . . . . . . . . . . . . . . . . . . . . . . . . . . 47

7.1.2 Calling from Fortran . . . . . . . . . . . . . . . . . . . . . . . 47

7.1.3 Compiling and linking . . . . . . . . . . . . . . . . . . . . . . 48

7.1.4 Machine word size issues . . . . . . . . . . . . . . . . . . . . . 49

7.2 Data formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

7.2.1 Architecture format . . . . . . . . . . . . . . . . . . . . . . . 50

7.2.2 Graph format . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

7.2.3 Mesh format . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

7.2.4 Geometry format . . . . . . . . . . . . . . . . . . . . . . . . . 54

7.2.5 Block ordering format . . . . . . . . . . . . . . . . . . . . . . 56

7.3 Strategy strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

7.3.1 Using default strategy strings . . . . . . . . . . . . . . . . . . 57

7.3.2 Mapping strategy strings . . . . . . . . . . . . . . . . . . . . 58

7.3.3 Graph bipartitioning strategy strings . . . . . . . . . . . . . . 59

7.3.4 Ordering strategy strings . . . . . . . . . . . . . . . . . . . . 63

7.3.5 Node separation strategy strings . . . . . . . . . . . . . . . . 66

7.4 Target architecture handling routines . . . . . . . . . . . . . . . . . . 70

7.4.1 SCOTCH archInit . . . . . . . . . . . . . . . . . . . . . . . . . 70

7.4.2 SCOTCH archExit . . . . . . . . . . . . . . . . . . . . . . . . . 70

7.4.3 SCOTCH archLoad . . . . . . . . . . . . . . . . . . . . . . . . . 70

7.4.4 SCOTCH archSave . . . . . . . . . . . . . . . . . . . . . . . . . 71

7.4.5 SCOTCH archName . . . . . . . . . . . . . . . . . . . . . . . . . 71

7.4.6 SCOTCH archSize . . . . . . . . . . . . . . . . . . . . . . . . . 72

7.4.7 SCOTCH archBuild . . . . . . . . . . . . . . . . . . . . . . . . 72

7.4.8 SCOTCH archCmplt . . . . . . . . . . . . . . . . . . . . . . . . 73

7.4.9 SCOTCH archCmpltw . . . . . . . . . . . . . . . . . . . . . . . 73

7.4.10 SCOTCH archHcub . . . . . . . . . . . . . . . . . . . . . . . . . 74

7.4.11 SCOTCH archMesh2D . . . . . . . . . . . . . . . . . . . . . . . 74

7.4.12 SCOTCH archMesh3D . . . . . . . . . . . . . . . . . . . . . . . 75

7.4.13 SCOTCH archTleaf . . . . . . . . . . . . . . . . . . . . . . . . 75

7.4.14 SCOTCH archTorus2D . . . . . . . . . . . . . . . . . . . . . . . 76

7.4.15 SCOTCH archTorus3D . . . . . . . . . . . . . . . . . . . . . . . 76

7.5 Graph handling routines . . . . . . . . . . . . . . . . . . . . . . . . . 77

7.5.1 SCOTCH graphInit . . . . . . . . . . . . . . . . . . . . . . . . 77

7.5.2 SCOTCH graphExit . . . . . . . . . . . . . . . . . . . . . . . . 77

7.5.3 SCOTCH graphFree . . . . . . . . . . . . . . . . . . . . . . . . 77

7.5.4 SCOTCH graphLoad . . . . . . . . . . . . . . . . . . . . . . . . 78

7.5.5 SCOTCH graphSave . . . . . . . . . . . . . . . . . . . . . . . . 79

7.5.6 SCOTCH graphBuild . . . . . . . . . . . . . . . . . . . . . . . 79

7.5.7 SCOTCH graphBase . . . . . . . . . . . . . . . . . . . . . . . . 80

7.5.8 SCOTCH graphCheck . . . . . . . . . . . . . . . . . . . . . . . 81

7.5.9 SCOTCH graphSize . . . . . . . . . . . . . . . . . . . . . . . . 81

7.5.10 SCOTCH graphData . . . . . . . . . . . . . . . . . . . . . . . . 82

7.5.11 SCOTCH graphStat . . . . . . . . . . . . . . . . . . . . . . . . 83

7.6 Graph mapping and partitioning routines . . . . . . . . . . . . . . . 84

7.6.1 SCOTCH graphPart . . . . . . . . . . . . . . . . . . . . . . . . 84

7.6.2 SCOTCH graphMap . . . . . . . . . . . . . . . . . . . . . . . . . 85

7.6.3 SCOTCH graphMapInit . . . . . . . . . . . . . . . . . . . . . . 86

7.6.4 SCOTCH graphMapExit . . . . . . . . . . . . . . . . . . . . . . 86

7.6.5 SCOTCH graphMapLoad . . . . . . . . . . . . . . . . . . . . . . 87

7.6.6 SCOTCH graphMapSave . . . . . . . . . . . . . . . . . . . . . . 87

7.6.7 SCOTCH graphMapCompute . . . . . . . . . . . . . . . . . . . . 88

7.6.8 SCOTCH graphMapView . . . . . . . . . . . . . . . . . . . . . . 88

7.7 Graph ordering routines . . . . . . . . . . . . . . . . . . . . . . . . . 89

7.7.1 SCOTCH graphOrder . . . . . . . . . . . . . . . . . . . . . . . 89

7.7.2 SCOTCH graphOrderInit . . . . . . . . . . . . . . . . . . . . . 90

7.7.3 SCOTCH graphOrderExit . . . . . . . . . . . . . . . . . . . . . 91

7.7.4 SCOTCH graphOrderLoad . . . . . . . . . . . . . . . . . . . . . 91

7.7.5 SCOTCH graphOrderSave . . . . . . . . . . . . . . . . . . . . . 92

7.7.6 SCOTCH graphOrderSaveMap . . . . . . . . . . . . . . . . . . 92

7.7.7 SCOTCH graphOrderSaveTree . . . . . . . . . . . . . . . . . . 93

7.7.8 SCOTCH graphOrderCheck . . . . . . . . . . . . . . . . . . . . 93

7.7.9 SCOTCH graphOrderCompute . . . . . . . . . . . . . . . . . . 94

7.7.10 SCOTCH graphOrderComputeList . . . . . . . . . . . . . . . . 94

7.8 Mesh handling routines . . . . . . . . . . . . . . . . . . . . . . . . . 95

7.8.1 SCOTCH meshInit . . . . . . . . . . . . . . . . . . . . . . . . . 95

7.8.2 SCOTCH meshExit . . . . . . . . . . . . . . . . . . . . . . . . . 96

7.8.3 SCOTCH meshLoad . . . . . . . . . . . . . . . . . . . . . . . . . 96

7.8.4 SCOTCH meshSave . . . . . . . . . . . . . . . . . . . . . . . . . 97

7.8.5 SCOTCH meshBuild . . . . . . . . . . . . . . . . . . . . . . . . 97

7.8.6 SCOTCH meshCheck . . . . . . . . . . . . . . . . . . . . . . . . 99

7.8.7 SCOTCH meshSize . . . . . . . . . . . . . . . . . . . . . . . . . 99

7.8.8 SCOTCH meshData . . . . . . . . . . . . . . . . . . . . . . . . . 100

7.8.9 SCOTCH meshStat . . . . . . . . . . . . . . . . . . . . . . . . . 101

7.8.10 SCOTCH meshGraph . . . . . . . . . . . . . . . . . . . . . . . . 102

7.9 Mesh ordering routines . . . . . . . . . . . . . . . . . . . . . . . . . . 103

7.9.1 SCOTCH meshOrder . . . . . . . . . . . . . . . . . . . . . . . . 103

7.9.2 SCOTCH meshOrderInit . . . . . . . . . . . . . . . . . . . . . 104

7.9.3 SCOTCH meshOrderExit . . . . . . . . . . . . . . . . . . . . . 105

7.9.4 SCOTCH meshOrderSave . . . . . . . . . . . . . . . . . . . . . 105

7.9.5 SCOTCH meshOrderSaveMap . . . . . . . . . . . . . . . . . . . 106

7.9.6 SCOTCH meshOrderSaveTree . . . . . . . . . . . . . . . . . . 106

7.9.7 SCOTCH meshOrderCheck . . . . . . . . . . . . . . . . . . . . . 107

7.9.8 SCOTCH meshOrderCompute . . . . . . . . . . . . . . . . . . . 107

7.10 Strategy handling routines . . . . . . . . . . . . . . . . . . . . . . . . 108

7.10.1 SCOTCH stratInit . . . . . . . . . . . . . . . . . . . . . . . . 108

7.10.2 SCOTCH stratExit . . . . . . . . . . . . . . . . . . . . . . . . 108

7.10.3 SCOTCH stratSave . . . . . . . . . . . . . . . . . . . . . . . . 109

7.10.4 SCOTCH stratGraphBipart . . . . . . . . . . . . . . . . . . . 109

7.10.5 SCOTCH stratGraphMap . . . . . . . . . . . . . . . . . . . . . 110

7.10.6 SCOTCH stratGraphMapBuild . . . . . . . . . . . . . . . . . . 110

7.10.7 SCOTCH stratGraphOrder . . . . . . . . . . . . . . . . . . . . 111

7.10.8 SCOTCH stratGraphOrderBuild . . . . . . . . . . . . . . . . 111

7.10.9 SCOTCH stratMeshOrder . . . . . . . . . . . . . . . . . . . . . 112

7.10.10 SCOTCH stratMeshOrderBuild . . . . . . . . . . . . . . . . . 112

7.11 Geometry handling routines . . . . . . . . . . . . . . . . . . . . . . . 113

7.11.1 SCOTCH geomInit . . . . . . . . . . . . . . . . . . . . . . . . . 113

7.11.2 SCOTCH geomExit . . . . . . . . . . . . . . . . . . . . . . . . . 113

7.11.3 SCOTCH geomData . . . . . . . . . . . . . . . . . . . . . . . . . 114

7.11.4 SCOTCH graphGeomLoadChac . . . . . . . . . . . . . . . . . . 115

7.11.5 SCOTCH graphGeomSaveChac . . . . . . . . . . . . . . . . . . 115

7.11.6 SCOTCH graphGeomLoadHabo . . . . . . . . . . . . . . . . . . 116

7.11.7 SCOTCH graphGeomLoadScot . . . . . . . . . . . . . . . . . . 116

7.11.8 SCOTCH graphGeomSaveScot . . . . . . . . . . . . . . . . . . 117

7.11.9 SCOTCH meshGeomLoadHabo . . . . . . . . . . . . . . . . . . . 118

7.11.10 SCOTCH meshGeomLoadScot . . . . . . . . . . . . . . . . . . . 118

7.11.11 SCOTCH meshGeomSaveScot . . . . . . . . . . . . . . . . . . . 119

7.12 Error handling routines . . . . . . . . . . . . . . . . . . . . . . . . . 120

7.12.1 SCOTCH errorPrint . . . . . . . . . . . . . . . . . . . . . . . 120

7.12.2 SCOTCH errorPrintW . . . . . . . . . . . . . . . . . . . . . . . 120

7.12.3 SCOTCH errorProg . . . . . . . . . . . . . . . . . . . . . . . . 120

7.13 Miscellaneous routines . . . . . . . . . . . . . . . . . . . . . . . . . . 121

7.13.1 SCOTCH randomReset . . . . . . . . . . . . . . . . . . . . . . . 121

7.14 MeTiS compatibility library . . . . . . . . . . . . . . . . . . . . . . . 121

7.14.1 METIS EdgeND . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

7.14.2 METIS NodeND . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

7.14.3 METIS NodeWND . . . . . . . . . . . . . . . . . . . . . . . . . . 123

7.14.4 METIS PartGraphKway . . . . . . . . . . . . . . . . . . . . . . 123

7.14.5 METIS PartGraphRecursive . . . . . . . . . . . . . . . . . . 124

7.14.6 METIS PartGraphVKway . . . . . . . . . . . . . . . . . . . . . 125

8 Installation 126

8.1 Thread issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

8.2 File compression issues . . . . . . . . . . . . . . . . . . . . . . . . . . 126

8.3 Machine word size issues . . . . . . . . . . . . . . . . . . . . . . . . . 127

9 Examples 127

10 Adding new features to Scotch 129

10.1 Graphs and meshes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

10.2 Methods and partition data . . . . . . . . . . . . . . . . . . . . . . . 130

10.3 Adding a new method to Scotch . . . . . . . . . . . . . . . . . . . 130

10.4 Licensing of new methods and of derived works . . . . . . . . . . . . 132

1 Introduction

1.1 Static mapping

The eﬃcient execution of a parallel program on a parallel machine requires that

the communicating processes of the program be assigned to the processors of the

machine so as to minimize its overall running time. When processes have a lim-

ited duration and their logical dependencies are accounted for, this optimization

problem is referred to as scheduling. When processes are assumed to coexist simul-

taneously for the entire duration of the program, it is referred to as mapping. It

amounts to balancing the computational weight of the processes among the proces-

sors of the machine, while reducing the cost of communication by keeping intensively

inter-communicating processes on nearby processors. In most cases, the underlying

computational structure of the parallel programs to map can be conveniently mod-

eled as a graph in which vertices correspond to processes that handle distributed

pieces of data, and edges reﬂect data dependencies. The mapping problem can then

be addressed by assigning processor labels to the vertices of the graph, so that all

processes assigned to some processor are loaded and run on it. In a SPMD con-

text, this is equivalent to the distribution across processors of the data structures

of parallel programs; in this case, all pieces of data assigned to some processor are

handled by a single process located on this processor.

A mapping is called static if it is computed prior to the execution of the program.

Static mapping is NP-complete in the general case [13]. Therefore, many studies

have been carried out in order to ﬁnd sub-optimal solutions in reasonable time,

including the development of speciﬁc algorithms for common topologies such as the

hypercube [11, 21]. When the target machine is assumed to have a communication

network in the shape of a complete graph, the static mapping problem turns into the

partitioning problem, which has also been intensely studied [4, 22, 31, 33, 49]. How-

ever, when mapping onto parallel machines the communication network of which is

not a bus, not accounting for the topology of the target machine usually leads to

worse running times, because simple cut minimization can induce more expensive

long-distance communication [21, 56].

1.2 Sparse matrix ordering

Many scientiﬁc and engineering problems can be modeled by sparse linear systems,

which are solved either by iterative or direct methods. To achieve eﬃciency with

direct methods, one must minimize the ﬁll-in induced by factorization. This ﬁll-in

is a direct consequence of the order in which the unknowns of the linear system are

numbered, and its eﬀects are critical both in terms of memory and computation

costs.

An eﬃcient way to compute ﬁll reducing orderings of symmetric sparse matrices

is to use recursive nested dissection [17]. It amounts to computing a vertex set S

that separates the graph into two parts Aand B, ordering Swith the highest indices

that are still available, and proceeding recursively on parts Aand Buntil their sizes

become smaller than some threshold value. This ordering guarantees that, at each

step, no non-zero term can appear in the factorization process between unknowns

of Aand unknowns of B.

The main issue of the nested dissection ordering algorithm is thus to ﬁnd small

vertex separators that balance the remaining subgraphs as evenly as possible, in

order to minimize ﬁll-in and to increase concurrency in the factorization process.

1.3 Contents of this document

This document describes the capabilities and operations of Scotch, a software

package devoted to static mapping, graph and mesh partitioning, and sparse matrix

block ordering. Scotch allows the user to map eﬃciently any kind of weighted

process graph onto any kind of weighted architecture graph, and provides high-

quality block orderings of sparse matrices. The rest of this manual is organized

as follows. Section 2 presents the goals of the Scotch project, and section 3

outlines the most important aspects of the mapping and ordering algorithms that it

implements. Section 4 summarizes the most important changes between version 5.0

and previous versions. Section 5 deﬁnes the formats of the ﬁles used in Scotch,

section 6 describes the programs of the Scotch distribution, and section 7 deﬁnes

the interface and operations of the libScotch library. Section 8 explains how

to obtain and install the Scotch distribution. Finally, some practical examples

are given in section 9, and instructions on how to implement new methods in the

libScotch library are provided in section 10.

2 The Scotch project

2.1 Description

Scotch is a project carried out at the Laboratoire Bordelais de Recherche en In-

formatique (LaBRI) of the Universit´e Bordeaux I, and now within the ScAlApplix

project of INRIA Bordeaux Sud-Ouest. Its goal is to study the applications of graph

theory to scientiﬁc computing, using a “divide and conquer” approach.

It focused ﬁrst on static mapping, and has resulted in the development of the

Dual Recursive Bipartitioning (or DRB) mapping algorithm and in the study of

several graph bipartitioning heuristics [41], all of which have been implemented in

the Scotch software package [45]. Then, it focused on the computation of high-

quality vertex separators for the ordering of sparse matrices by nested dissection,

by extending the work that has been done on graph partitioning in the context

of static mapping [46, 47]. More recently, the ordering capabilities of Scotch

have been extended to native mesh structures, thanks to hypergraph partitioning

algorithms. New graph partitioning methods have also been recently added [8, 42].

Version 5.0 of Scotch is the ﬁrst one to comprise parallel graph ordering rou-

tines. The parallel features of Scotch are referred to as PT-Scotch (“Parallel

Threaded Scotch”). While both packages share a signiﬁcant amount of code, bea-

cuse PT-Scotch transfers control to the sequential routines of the libScotch

library when the subgraphs on which it operates are located on a single processor,

the two sets of routines have a distinct user’s manual. Readers interested in the

parallel features of Scotch should refer to the PT-Scotch 5.1 User’s Guide [43].

2.2 Availability

Starting from version 4.0, which has been developed at INRIA within the ScAlAp-

plix project, Scotch is available under a dual licensing basis. On the one hand, it

is downloadable from the Scotch web page as free/libre software, to all interested

parties willing to use it as a library or to contribute to it as a testbed for new

partitioning and ordering methods. On the other hand, it can also be distributed,

under other types of licenses and conditions, to parties willing to embed it tightly

into closed, proprietary software.

The free/libre software license under which Scotch 5.1 is distributed is

the CeCILL-C license [6], which has basically the same features as the GNU

LGPL (“Lesser General Public License”): ability to link the code as a library

to any free/libre or even proprietary software, ability to modify the code and to

redistribute these modiﬁcations. Version 4.0 of Scotch was distributed under the

LGPL itself.

Please refer to section 8 to see how to obtain the free/libre distribution of

Scotch.

3 Algorithms

3.1 Static mapping by Dual Recursive Bipartitioning

For a detailed description of the mapping algorithm and an extensive analysis of its

performance, please refer to [41, 44]. In the next sections, we will only outline the

most important aspects of the algorithm.

3.1.1 Static mapping

The parallel program to be mapped onto the target architecture is modeled by a val-

uated unoriented graph Scalled source graph or process graph, the vertices of which

represent the processes of the parallel program, and the edges of which the commu-

nication channels between communicating processes. Vertex- and edge- valuations

associate with every vertex vSand every edge eSof Sinteger numbers wS(vS) and

wS(eS) which estimate the computation weight of the corresponding process and

the amount of communication to be transmitted on the channel, respectively.

The target machine onto which is mapped the parallel program is also modeled

by a valuated unoriented graph Tcalled target graph or architecture graph. Vertices

vTand edges eTof Tare assigned integer weights wT(vT) and wT(eT), which

estimate the computational power of the corresponding processor and the cost of

traversal of the inter-processor link, respectively.

Amapping from Sto Tconsists of two applications τS,T :V(S)−→ V(T) and

ρS,T :E(S)−→ P(E(T)), where P(E(T)) denotes the set of all simple loopless

paths which can be built from E(T). τS,T (vS) = vTif process vSof Sis mapped

onto processor vTof T, and ρS,T (eS) = {e1

T, e2

T,...,en

T}if communication channel

eSof Sis routed through communication links e1

T,e2

T, ..., en

Tof T.|ρS,T (eS)|

denotes the dilation of edge eS, that is, the number of edges of E(T) used to route

eS.

3.1.2 Cost function and performance criteria

The computation of eﬃcient static mappings requires an a priori knowledge of the

dynamic behavior of the target machine with respect to the programs which are

run on it. This knowledge is synthesized in a cost function, the nature of which

determines the characteristics of the desired optimal mappings. The goal of our

mapping algorithm is to minimize some communication cost function, while keeping

the load balance within a speciﬁed tolerance. The communication cost function fC

that we have chosen is the sum, for all edges, of their dilation multiplied by their

weight:

fC(τS,T , ρS,T )def

eS∈E(S)

wS(eS)|ρS,T (eS)|.

This function, which has already been considered by several authors for hyper-

cube target topologies [11, 21, 25], has several interesting properties: it is easy

to compute, allows incremental updates performed by iterative algorithms, and its

minimization favors the mapping of intensively intercommunicating processes onto

nearby processors; regardless of the type of routage implemented on the target

machine (store-and-forward or cut-through), it models the traﬃc on the intercon-

nection network and thus the risk of congestion.

The strong positive correlation between values of this function and eﬀective

execution times has been experimentally veriﬁed by Hammond [21] on the CM-2,

and by Hendrickson and Leland [26] on the nCUBE 2.

The quality of mappings is evaluated with respect to the criteria for quality that

we have chosen: the balance of the computation load across processors, and the

minimization of the interprocessor communication cost modeled by function fC.

These criteria lead to the deﬁnition of several parameters, which are described

below.

For load balance, one can deﬁne µmap, the average load per computational

power unit (which does not depend on the mapping), and δmap, the load imbalance

ratio, as

µmap

def

vS∈V(S)

wS(vS)

vT∈V(T)

wT(vT)and

δmap

def

vT∈V(T)









wT(vT)P

vS∈V(S)

τS,T (vS) = vT

wS(vS)





−µmap



vS∈V(S)

wS(vS).

However, since the maximum load imbalance ratio is provided by the user in input

of the mapping, the information given by these parameters is of little interest, since

what matters is the minimization of the communication cost function under this

load balance constraint.

For communication, the straightforward parameter to consider is fC. It can be

normalized as µexp, the average edge expansion, which can be compared to µdil,

the average edge dilation; these are deﬁned as

µexp

def

=fC

eS∈E(S)

wS(eS)and µdil

def

eS∈E(S)

|ρS,T (eS)|

|E(S)|.

δexp

def

=µexp

µdil is smaller than 1 when the mapper succeeds in putting heavily inter-

communicating processes closer to each other than it does for lightly communicating

processes; they are equal if all edges have same weight.

3.1.3 The Dual Recursive Bipartitioning algorithm

Our mapping algorithm uses a divide and conquer approach to recursively allocate

subsets of processes to subsets of processors [41]. It starts by considering a set of

processors, also called domain, containing all the processors of the target machine,

and with which is associated the set of all the processes to map. At each step, the

algorithm bipartitions a yet unprocessed domain into two disjoint subdomains, and

calls a graph bipartitioning algorithm to split the subset of processes associated with

the domain across the two subdomains, as sketched in the following.

mapping (D, P)

Set_Of_Processors D;

Set_Of_Processes P;

{

Set_Of_Processors D0, D1;

Set_Of_Processes P0, P1;

if (|P| == 0) return; /* If nothing to do. */

if (|D| == 1) { /* If one processor in D */

result (D, P); /* P is mapped onto it. */

return;

}

(D0, D1) = processor_bipartition (D);

(P0, P1) = process_bipartition (P, D0, D1);

mapping (D0, P0); /* Perform recursion. */

mapping (D1, P1);

}

The association of a subdomain with every process deﬁnes a partial mapping of the

process graph. As bipartitionings are performed, the subdomain sizes decrease, up

to give a complete mapping when all subdomains are of size one.

The above algorithm lies on the ability to deﬁne ﬁve main objects:

•adomain structure, which represents a set of processors in the target archi-

tecture;

•adomain bipartitioning function, which, given a domain, bipartitions it in two

disjoint subdomains;

•adomain distance function, which gives, in the target graph, a measure of the

distance between two disjoint domains. Since domains may not be convex nor

connected, this distance may be estimated. However, it must respect certain

homogeneity properties, such as giving more accurate results as domain sizes

decrease. The domain distance function is used by the graph bipartitioning

algorithms to compute the communication function to minimize, since it allows

the mapper to estimate the dilation of the edges that link vertices which belong

to diﬀerent domains. Using such a distance function amounts to considering

that all routings will use shortest paths on the target architecture, which

is how most parallel machines actually do. We have thus chosen that our

program would not provide routings for the communication channels, leaving

their handling to the communication system of the target machine;

•aprocess subgraph structure, which represents the subgraph induced by a

subset of the vertex set of the original source graph;

•aprocess subgraph bipartitioning function, which bipartitions subgraphs in

two disjoint pieces to be mapped onto the two subdomains computed by the

domain bipartitioning function.

All these routines are seen as black boxes by the mapping program, which can thus

accept any kind of target architecture and process bipartitioning functions.

Table of contents

Popular Calculator manuals by other brands

Texas Instruments

Texas Instruments BA II Plus user guide

Kompernass

Kompernass KH 2283 instruction manual

Helwett Packard

Helwett Packard 9100A Operating and programming manual

Calculated Industries

Calculated Industries 3423 user guide

Calculated Industries

Calculated Industries Qualifier Plus IIcx user guide

HP 35s Instruction guide

LEXIBOOK

LEXIBOOK EL350 instruction manual

HP HP-97 Service manual

Casio

Casio fx-3800P Operation manual

Canon

Canon P26-DHIII instructions

Sharp

Sharp EL-9900 Operation manual

Casio

Casio MS-7UC user guide

Sharp

Sharp WriteView EL-W506 operation instruction

Casio

Casio CLASSPad300 - ClassPad 300 Touch-Screen Graphing Scientific... user guide

Casio

Casio CFX-9970G user guide

Sharp

Sharp EL2192RII - Printing Calculator Operation manual

Casio

Casio HR-110S Operation manual

HP 20b Business Consultant manual

Scotch Brand 5.1.10 User manual

Popular Calculator manuals by other brands