Go backward to
Block Matrix Multiplication
Go up to
Top
Go forward to
Distributed Shared Memory
Memory Access
Row algorithm has computation/memory ratio
$O(n/p)$
.
Block algorithm uses computation/memory ratio
$O(n/p)$
.
Block algorithm has higher
data locality
.
Cache performance of algorithm improves.
Large input matices.
Row algorithm: subsequent accesses to
$B$
cannot be cached
$->O(n3/p)$
memory operations.
Block algorithm: subsequent accesses to
$B$
can be cached
$->O(n3/pc)$
memory operations.
Important especially for
distributed shared memory
architectures.
Reduce average memory latency time by increasing locality
.
Author:
Wolfgang Schreiner
Last modification: November 15, 1996