Parallel Functional Models ========================== The world consists of expressions to be evaluated. Handouts: Hudak: Para-Functional Programming Schreiner: A Para-Functional Programming Interface ... Program = Set of (mathematical) function definitions qsort :: [Int] -> [Int] qsort([]) = [] qsort(h:t) = qsort(filter(t, fun(x)(x=h))) filter :: ([Int], Int->Bool) -> [Int] filter([], _) = [] filter(h:t, p) = h:filter(t, p), if p(h) filter(h:t, p) = filter(t, p), otherwise plus expression to be evaluated > qsort([7,5,9,6,1]) [1,5,6,7,9] Mathematical Basis ------------------ Syntactic sugar for term lambda-calculus (lambda l. ...)(...) x variable (lambda x. T) abstraction T T application (lambda x. T1)T2 beta reduction => T1[T2/x] Church-Rosser property Any terminating sequence of reductions with same start term yields same result term Corollary If multiple redices (reducible subterms) exist, they may be evaluated in any order, also *simultaneously* Relationship to Imperative Programming -------------------------------------- Functional programming is programming without side-effects: evaluting f(E1, E2), the evaluation of E1 has *no* effect on the evaluation of E2 Functional programming is programming without variables/assignments: variable = box containing an object assignment updates content of box qsort(h:t) = qsort(a) ++ [h] ++ qsort(b) where a = filter(t, fun(x)(x=h)) qsort(h:t) = let a = filter(t, fun(x)(x=h)) in qsort(a) ++ [h] ++ qsort(b) functional languages only have *constants8 Functional programming is single-assignment programming: intlist qsort(const intlist l) { if (isempty(l)) return(nil); const int h = head(l); const intlist t = tail(l); const intlist a = lesser(t, h) const intlist b = a = greaterequal(t, h) return(qsort(a) + list(h) + qsort(b)); } each variable receives value only *once* Functional programming is constructive/compositional programming result is *constructed* of (*composed* from) input arguments f(x) = if (term(x)) then x else c(h(x), f(s(x))) (or: f(update(x))) [ in imperative languages, structures are destructively transformed to yield result ] f(x) { y = x; while (!term(y)) update(y) return(y) } Parallelism ----------- Any subexpression may be evaluated sequentially or by a parallel process without changing the result Functional languages yield deterministic parallel programs! [ advantage but also restriction: no non-deterministic stream merge possible! ] Evaluation Strategies --------------------- Strict languages in f(E1, E2) both and E1 and E2 are fully evaluated (possibly concurrently) *before* f is executed. f(E1, E2): -> E1 -> E2 -> f -> E1 -> f E2 MultiLisp: pcall(f, E1, E2) [ p1 = start(E1) p2 = start(E2) r1 = wait(p1) r2 = wait(p2) return f(r1, r2) ] "Horizontal parallelism": parallel evaluation of function arguments Non-strict languages in f(E1, E2) f may be executed *before* both E1 and E2 are evaluated f(E1, E2): -> E1 E2 f MultiLisp: f(future(E1), future(E2)) [ p1 = start(E1) p2 = start(E2) return f(p1, p2) ] "Vertical parallelism": parallel evaluation of a function parameter with the application of the function Lazy languages in f(E1, E2) neither E1 nor E2 is evaluated before it is *guaranteed* that the corresponding result is required f(E1, E2): -> f -- -- -> E1 E2 Special case of non-strict evaluation strategy (sequential evaluation) Dataflow model -------------- Model for *data-driven* ("bottom-up") parallel execution of functional program Program is represented by a *dataflow graph* - Directed acyclic graph (DAG) - Nodes represent operations to be performed - Arcs represent data dependencies - output of each node is linked to the input of each node that requires corresponding result Example: f(x) = g(y,z) where y = h(x) z = i(y, j(x)) f(x) = g(h(x), i(h(x), j(x))) GRAPH Program execution modeled by *tokens* flowing through graph - each token represents corresponding program value Each node obeyes *firing rule*: - node may fire when tokens are available on all input arcs - node consumes input tokens and reads their values - node produces output token with corresponding result value Concurrency: Tokens flow simultaneously through graph, nodes may fire independently and simultaneously. Determinism: Although many firing sequences are possible, they all yield a result token with the same value. Non-strictness: Basic operations are strict (firing rule!) User defined functions may be non-strict: - input tokens enter graph individually Above example: i(x,y) = x*x Data structures: Tokens carry *references* to data structure Select nodes extract components SLIDE Figure 2.3 Recursion: Program represented by finite set of DAGs (one for each function definition) Different invocations of same function may be distinguished by attaching a different color to each invocation and extend the firing rule to only combine tokens of the same color Typically used for parallel implementations of non-lazy functional languages. Parallel Graph Reduction ------------------------ Model for *demand-driven* ("top-down") parallel execution of functional program Program represented by a graph for the expression to be evaluated Node = [ Function | Pointers to Argument Nodes ] When node comes to be evaluated, parallel threads are started for the evaluation of the function and for *some* of the argument graphs (chosen by user annotation or compile-time analysis or run-time analysis) - when thread requires value of unevaluated argument graph, graph is evaluated sequentially - when thread requires value of graph currently under evaluation by another thread, it suspends on node and is awaked by that thread when it has completed the evaluation Typically used for parallel implementations of lazy functional languages. Selecting Expressions for Parallel Evaluation --------------------------------------------- Crucial problem of parallel functional programming Various strategies possible: a) Let *any* expression represent a parallel task Millions of tasks even for moderate-size programs Slow execution Large memory consumption Not usable even with special hardware support (dataflow machines, graph reduction machines) b) Let *runtime system* decide which expressions to evaluate in parallel Requires uniform representation of program expressions Prevents compiler optimizations of sequential code (each value potentially represents a parallel task) c) Let *compiler* decide which expressions to evaluate in parallel +: no problem with result semantics (in contrast to automatic parallelization of imperative languages!) -: still problem with termination semantics (lazy languages, parallel implementation must terminate when sequential one does, sophisticated analysis techniques required) -: still not clear where to spark tasks for *efficient* parallel execution d) Let *programmer* decide which expressions to evaluate in parallel => para-functional programming Para-Functional Programming --------------------------- Functional language extensions for parallel execution Imperative parallel constructs Languages with a large functional subset but imperative features (Lisp, ML) may be extended by parallel imperative language constructs => parallel programming takes place on imperative level SML Threads let val m = mutex() val l = ref [] in (fork (fn () => push(m,l)); fork (fn () => pop(m,l))) Higher abstractions may be built on top of these features (cf. section on Erlang) Functional parallel constructs Parallel constructs with functional semantics (in non-pure functional languages); may change result if applied improperly! MultiLisp: (pcall f e1 ... en) evaluate e1 ... en in parallel and call f with results => f(e1, ... en) (future e) create a parallel thread evaluating e and immediately return a "future" for the result; whenever a thread accesses the future, it is blocked until result becomes available => e [ requires tag checking of all primitive operations ] Data parallel constructs Apply parallel operations on data aggregates map f [l1,...,ln] => [f(l1),...,f(ln)] [ f(x) | x <- l ] (cf. section on data parallelism, CM-Lisp, xappings, FP, ...) Parallel annotations In pure functional languages, provide annotations that tell system to evaluate in parallel; cannot change result semantics (perhaps termination semantics) parAlfl: placed expressions E $on P expression E is evaluated on processor P (f(x) $on 0) + (g(x) $on 1) (f(x) $on left($self)) + (f(x) $on right($self)) Calban: process network language a+b*c where a = f(x) b = g(x) c = h(x) moreover (arc a b) and (arc b c) expressions denoted by a,b,c shall be evaluated by independent processes and shall be placed on corresponding processors such that a is connected to b and b is connected to c pH: parallel declarations par { a = f(x) b = g(x) } All definitions are evaluated in parallel pD: parallel type annotations para l :: [Task[IntPoly]] l = .... individual list elements shall be computed by parallel tasks Algorithmic skeletons Express program in terms of a set of predefined higher-order functions that represent the structure of typical parallel algorithms and that are efficiently implementable on parallel architectures (cf. data-parallel constructs) DivideAndConquer :: (*a -> Bool, // is base case? *a -> *b, // solve base case *a -> [*a], // decompose problem [*b] -> *b, // combine solutions *a) -> *b // argument and result QuickSort = DivideAndConquer isnil id split combine ... QuckSort [3,2,1] => [1,2,3] Single Assignment Languages Functional languages in an "imperative disguise" iD, Sisal f(x1, x2) = let s = 0.0 x = x1 in while (x <= x2) do next s = s + f(x) next x = x + delta_x finally s Syntactic sugar for tail recursive function! Parallelization concentrates on (but is not restricted to) iteration constructs over data structures. Examples: -------- a) Streams (non-strict lists as communication channels) producer(n) = [], if n <= 0 producer(n) = f(n):(producer(n-1)@), otherwise consumer([]) = [] consumer(h:t) = g(x)+consumer(t) consumer(l) where l = producer(1000)@ Pipelined parallelism! b) Efficient quicksort quicksort(l) = qsort(l,[]) qsort([], r) = r qsort(h:t, r) = qsort(a, (h:qsort(b, r))@) where (a, b) = split(h, t) split(e, []) = ([], []) split(e, h:t) = let (a,b) = split(e, t) in if e < h then (a, h:b) else (h:a, b) Tree of tasks with leaves performing the list concatenation! c) Cyclic process structures SLIDE Figure 8.1 (prime numbers) With non-strict lists as communication "channels", arbitrary process networks can be constructed!