# Introduction to Parallel Computing

Exercise 2 (Deadline: June 21)

## Wolfgang Schreiner

RISC-Linz

### May 14, 1999

Write on the Convex SPP a parallel (Fortran or C) program solving the "All
Pairs Shortest Path" problem.Take the sequential program on the course page
as the starting point and use the MPI library to develop the parallel program
in the message passing model.

- Compile the sequential program with
`-O2`

. Measure the execution
times of function `path`

for matrix sizes 256 *and* 384 and take
these times as the base times of all your comparisons with the parallel
program. Make sure that you apply the same optimizations to the sequential
code that you apply to the parallel code for a fair comparison.
- Write a MPI program that implements a row/column-oriented solution
organizing all processors in a ring (as for the matrix multiplication
algorithm) discussed in class. After each "communication round", the result
has to be
*gathered* and *scattered* for the next round.
For developing the MPI program

- call
`mpa -sc parallel`

to move to subcomplex "parallel",
- include
`/usr/convex/mpich/bin`

in your PATH,
- include
`/usr/convex/mpich/man`

in your MANPATH,
- compile the program with
`mpicc`

or `mpif77`

(`-O2`

),
- call
`mpirun -np ``n` `program`

to
run `program` with `n` processes.
- use
`MPI_Wtime`

to measure wall clock times,

Manual pages are available for `MPI`

, `mpicc`

, `mpif77`

,
`mpirun`

, `mpiclean`

, and for all the MPI library calls. Various
environment variables can be used to customize the execution (see the man
pages).
- Compile the program with
`-O2`

and and measure the execution times
for both input sizes and 1, 2, 4, 8, 12 processors.
- Determine the asymptotic
*isoefficiency function* for the algorithm
using *T(s)=s+1* as the time required for sending a message of *s* words
(assume that a scatter to *n* processors takes the same time as sending *n*
partial messages). By which factor should the dimension of the matrix grow to
maintain constant efficiency if we increase the processor number from 1 to 2,
4, 8, 12?
Test this claim by running benchmarks with 1, 2, 4, 8, 12 processors
increasing your matrix size (from 256 respectively 384) correspondingly.

- Write a report documenting your results including
- the source code,
- execution times (numerical tables and graphical diagrams),
- speedups (numerical tables and graphical diagrams),
- efficiencies (numerical tables and graphical diagrams),
- the isoefficiency analysis and the corresponding benchmarks,
- other findings, problems, comments, etc.