Parallel Functional Models
==========================
The world consists of expressions to be evaluated.
Handouts:
Hudak: Para-Functional Programming
Schreiner: A Para-Functional Programming Interface ...
Program = Set of (mathematical) function definitions
qsort :: [Int] -> [Int]
qsort([]) = []
qsort(h:t) = qsort(filter(t, fun(x)(x=h)))
filter :: ([Int], Int->Bool) -> [Int]
filter([], _) = []
filter(h:t, p) = h:filter(t, p), if p(h)
filter(h:t, p) = filter(t, p), otherwise
plus expression to be evaluated
> qsort([7,5,9,6,1])
[1,5,6,7,9]
Mathematical Basis
------------------
Syntactic sugar for term lambda-calculus
(lambda l. ...)(...)
x variable
(lambda x. T) abstraction
T T application
(lambda x. T1)T2 beta reduction
=>
T1[T2/x]
Church-Rosser property
Any terminating sequence of reductions with
same start term yields same result term
Corollary
If multiple redices (reducible subterms) exist,
they may be evaluated in any order, also *simultaneously*
Relationship to Imperative Programming
--------------------------------------
Functional programming is programming without side-effects:
evaluting f(E1, E2), the evaluation of E1 has *no* effect
on the evaluation of E2
Functional programming is programming without variables/assignments:
variable = box containing an object
assignment updates content of box
qsort(h:t) = qsort(a) ++ [h] ++ qsort(b) where
a = filter(t, fun(x)(x=h))
qsort(h:t) =
let
a = filter(t, fun(x)(x=h))
in
qsort(a) ++ [h] ++ qsort(b)
functional languages only have *constants8
Functional programming is single-assignment programming:
intlist qsort(const intlist l)
{
if (isempty(l)) return(nil);
const int h = head(l);
const intlist t = tail(l);
const intlist a = lesser(t, h)
const intlist b = a = greaterequal(t, h)
return(qsort(a) + list(h) + qsort(b));
}
each variable receives value only *once*
Functional programming is constructive/compositional programming
result is *constructed* of (*composed* from)
input arguments
f(x) =
if (term(x))
then x
else c(h(x), f(s(x)))
(or: f(update(x)))
[ in imperative languages, structures
are destructively transformed to yield result ]
f(x)
{
y = x;
while (!term(y))
update(y)
return(y)
}
Parallelism
-----------
Any subexpression may be evaluated sequentially
or by a parallel process without changing the result
Functional languages yield deterministic parallel programs!
[ advantage but also restriction:
no non-deterministic stream merge possible! ]
Evaluation Strategies
---------------------
Strict languages
in f(E1, E2) both and E1 and E2 are fully evaluated
(possibly concurrently) *before* f is executed.
f(E1, E2):
-> E1 -> E2 -> f
-> E1 -> f
E2
MultiLisp: pcall(f, E1, E2)
[ p1 = start(E1)
p2 = start(E2)
r1 = wait(p1)
r2 = wait(p2)
return f(r1, r2) ]
"Horizontal parallelism": parallel evaluation of function arguments
Non-strict languages
in f(E1, E2) f may be executed *before* both
E1 and E2 are evaluated
f(E1, E2):
-> E1
E2
f
MultiLisp: f(future(E1), future(E2))
[ p1 = start(E1)
p2 = start(E2)
return f(p1, p2) ]
"Vertical parallelism": parallel evaluation of a function parameter
with the application of the function
Lazy languages
in f(E1, E2) neither E1 nor E2 is evaluated
before it is *guaranteed* that the corresponding
result is required
f(E1, E2):
-> f -- -- ->
E1 E2
Special case of non-strict evaluation strategy
(sequential evaluation)
Dataflow model
--------------
Model for *data-driven* ("bottom-up")
parallel execution of functional program
Program is represented by a *dataflow graph*
- Directed acyclic graph (DAG)
- Nodes represent operations to be performed
- Arcs represent data dependencies
- output of each node is linked to the input of
each node that requires corresponding result
Example:
f(x) = g(y,z) where
y = h(x)
z = i(y, j(x))
f(x) = g(h(x), i(h(x), j(x)))
GRAPH
Program execution modeled by *tokens* flowing through graph
- each token represents corresponding program value
Each node obeyes *firing rule*:
- node may fire when tokens are available on all input arcs
- node consumes input tokens and reads their values
- node produces output token with corresponding result value
Concurrency:
Tokens flow simultaneously through graph,
nodes may fire independently and simultaneously.
Determinism:
Although many firing sequences are possible,
they all yield a result token with the same value.
Non-strictness:
Basic operations are strict (firing rule!)
User defined functions may be non-strict:
- input tokens enter graph individually
Above example:
i(x,y) = x*x
Data structures:
Tokens carry *references* to data structure
Select nodes extract components
SLIDE Figure 2.3
Recursion:
Program represented by finite set of DAGs
(one for each function definition)
Different invocations of same function
may be distinguished by attaching a different
color to each invocation and extend the
firing rule to only combine tokens of the same color
Typically used for parallel implementations of non-lazy
functional languages.
Parallel Graph Reduction
------------------------
Model for *demand-driven* ("top-down") parallel execution of
functional program
Program represented by a graph for the expression to be evaluated
Node = [ Function | Pointers to Argument Nodes ]
When node comes to be evaluated, parallel threads are started for the
evaluation of the function and for *some* of the argument graphs
(chosen by user annotation or compile-time analysis or run-time analysis)
- when thread requires value of unevaluated argument graph,
graph is evaluated sequentially
- when thread requires value of graph currently under evaluation
by another thread, it suspends on node and is awaked by that
thread when it has completed the evaluation
Typically used for parallel implementations of lazy
functional languages.
Selecting Expressions for Parallel Evaluation
---------------------------------------------
Crucial problem of parallel functional programming
Various strategies possible:
a) Let *any* expression represent a parallel task
Millions of tasks even for moderate-size programs
Slow execution
Large memory consumption
Not usable even with special hardware support (dataflow machines,
graph reduction machines)
b) Let *runtime system* decide which expressions to evaluate in parallel
Requires uniform representation of program expressions
Prevents compiler optimizations of sequential code
(each value potentially represents a parallel task)
c) Let *compiler* decide which expressions to evaluate in parallel
+: no problem with result semantics
(in contrast to automatic parallelization of imperative languages!)
-: still problem with termination semantics
(lazy languages, parallel implementation must terminate
when sequential one does, sophisticated analysis techniques required)
-: still not clear where to spark tasks for *efficient* parallel
execution
d) Let *programmer* decide which expressions to evaluate in parallel
=> para-functional programming
Para-Functional Programming
---------------------------
Functional language extensions for parallel execution
Imperative parallel constructs
Languages with a large functional subset but imperative features
(Lisp, ML) may be extended by parallel imperative language
constructs => parallel programming takes place on imperative level
SML Threads
let
val m = mutex()
val l = ref []
in
(fork (fn () => push(m,l));
fork (fn () => pop(m,l)))
Higher abstractions may be built on top of these features
(cf. section on Erlang)
Functional parallel constructs
Parallel constructs with functional semantics
(in non-pure functional languages); may change
result if applied improperly!
MultiLisp:
(pcall f e1 ... en)
evaluate e1 ... en in parallel and call
f with results
=> f(e1, ... en)
(future e)
create a parallel thread evaluating e and immediately
return a "future" for the result; whenever a thread
accesses the future, it is blocked until result becomes available
=> e
[ requires tag checking of all primitive operations ]
Data parallel constructs
Apply parallel operations on data aggregates
map f [l1,...,ln] => [f(l1),...,f(ln)]
[ f(x) | x <- l ]
(cf. section on data parallelism, CM-Lisp, xappings, FP, ...)
Parallel annotations
In pure functional languages, provide annotations that
tell system to evaluate in parallel; cannot change
result semantics (perhaps termination semantics)
parAlfl: placed expressions
E $on P
expression E is evaluated on processor P
(f(x) $on 0) + (g(x) $on 1)
(f(x) $on left($self)) +
(f(x) $on right($self))
Calban: process network language
a+b*c where
a = f(x)
b = g(x)
c = h(x)
moreover
(arc a b) and (arc b c)
expressions denoted by a,b,c shall be evaluated by
independent processes and shall be placed on corresponding
processors such that a is connected to b and b is connected to c
pH: parallel declarations
par {
a = f(x)
b = g(x)
}
All definitions are evaluated in parallel
pD: parallel type annotations
para l :: [Task[IntPoly]]
l = ....
individual list elements shall be computed by
parallel tasks
Algorithmic skeletons
Express program in terms of a set of predefined higher-order
functions that represent the structure of typical parallel
algorithms and that are efficiently implementable on parallel
architectures (cf. data-parallel constructs)
DivideAndConquer ::
(*a -> Bool, // is base case?
*a -> *b, // solve base case
*a -> [*a], // decompose problem
[*b] -> *b, // combine solutions
*a) -> *b // argument and result
QuickSort = DivideAndConquer isnil id split combine
...
QuckSort [3,2,1] => [1,2,3]
Single Assignment Languages
Functional languages in an "imperative disguise"
iD, Sisal
f(x1, x2) =
let s = 0.0
x = x1
in
while (x <= x2) do
next s = s + f(x)
next x = x + delta_x
finally s
Syntactic sugar for tail recursive function!
Parallelization concentrates on (but is not restricted to)
iteration constructs over data structures.
Examples:
--------
a) Streams (non-strict lists as communication channels)
producer(n) = [], if n <= 0
producer(n) = f(n):(producer(n-1)@), otherwise
consumer([]) = []
consumer(h:t) = g(x)+consumer(t)
consumer(l) where
l = producer(1000)@
Pipelined parallelism!
b) Efficient quicksort
quicksort(l) = qsort(l,[])
qsort([], r) = r
qsort(h:t, r) = qsort(a, (h:qsort(b, r))@)
where (a, b) = split(h, t)
split(e, []) = ([], [])
split(e, h:t) =
let (a,b) = split(e, t)
in if e < h
then (a, h:b)
else (h:a, b)
Tree of tasks with leaves performing the list concatenation!
c) Cyclic process structures
SLIDE Figure 8.1 (prime numbers)
With non-strict lists as communication "channels",
arbitrary process networks can be constructed!