Go backward to 2 The System Model Go up to Top Go forward to 4 Correctness of the Algorithm |
We will extend the System interface to
SystemTn in N(such thatact
in Zn->B,chan
in Zn x Zn->Seq(MSG),term
in B)
term
initialized by
Init(...) :<=>signals termination of the system.
/\
- ...
- ~
term
Basic Idea The derivation of the termination detection algorithm is based on the following idea (see Figure 2): If Process 0 wants to detect termination, it sends a signal to process n-1. If an inactive process i > 0 receives the signal, it forwards the signal to process i-1. An active process keeps the signal until it becomes inactive. The signal therefore represents a "token" that circulates through the ring of processes.
We model this circulation by introducing a variable
sig
in Zn->B
initialized by
Init(...) :<=>
/\
- ...
- forall i in Zn: ~
sig
i
and by two actions
- Starti(...) :<=>
/\
- i = 0
sig
'
=sig
[i-n1 |-> true]- Forwardi(...) :<=>
/\
sig
'
= if i != 0 thensig
[i |-> false, i-n1 |-> true] elsesig
[i |-> false]
where a-nb denotes the difference modulo n, i.e., 0-n1 = n-1).
(Modeling token circulation by an array of boolean signals reflects the behavior of the distributed algorithm closer than modeling it by a single integer position that "automatically" guarantees token unicity.)
Figure 2: Token Circulation
However, above specification allows Process 0 to submit a new token before it has received the token it has previously submitted. We therefore introduce another variable
run
in B
that records whether there is a token in the system and then disables Start:
- Starti(...) :<=>
/\
- ...
- ~
run
/\run
'
- Forwardi(...) :<=>
/\
- ...
run
'
= if i != 0 thenrun
else false
However, when Process 0 finally receives the token, it can only deduce that each process has been inactive at some time in the past. Since a process mave have received a message after forwarding the token, it may have become active again. Only if the algorithm could maintain the invariant
forall k in Zn:sig
k =>
forall j in Zn: j > k => ~act
j,
Process 0 could deduce from the receipt of the token (i.e.,
sig
0) and its own
inactivity (i.e., ~
act
0) that all processes are
currently inactive.
Even if Process 0 receives the token under above invariant, there may be still messages pending in the network that can cause processes to become active again. We therefore need some mean to keep track of the number of messages pending in the network. Since a process only knows about the messages this it itself sends or receives, this can be only achieved by introducing a distributed counter
cnt
in Zn->Z
such that every process sending a message increases its counter and every process receiving a message decreases its counter (see Figure 3):
- Sendi, j(...) :<=>
/\
- ...
cnt
'
=cnt
[i |->cnt
i+1],- Receivei, j(...) :<=>
/\
- ...
cnt
'
=cnt
[j |->cnt
j-1]
Figure 3: Message Counters
Then clearly the total sum of the distributed counter equals the number of the messages in the network, i.e., the system maintains the invariant
sumi in Z_ncnt
i = sumi in Z_nsumj in Z_nlen(chan
i, j)
where len(ch
) := such l in N:
exists m0,
..., ml-1:
ch
= < m0, ...,
ml-1 > .
How can Process 0 learn about this sum? Apparently this is only possible, if the circulating token collects this information and delivers it to this process. We therefore introduce
tval
in Z
modeling the value carried by the token. Process 0 initializes the value to 0 when submitting the token; each forwarding process adds the value of its counter (see Figure 4):
- Starti(...) :<=>
/\
- ...
tval
'
= 0- Forwardi(...) :<=>
/\
- ...
tval
'
=tval
+cnt
i
Figure 4: Token Value
If the algorithm could maintain the invariant
forall k in Zn:sig
k =>
/\
- forall j in Zn: j > k => ~
act
jtval
= sumj in Z_n, j > kcnt
j
then Process 0 could on receipt of the token deduce from
~act
0 that all processes are inactive and from
cnt
0+tval
= 0 that no message is
in the network, i.e., that the system has terminated.
Unfortunately, the algorithm cannot maintain this invariant. We therefore have to disjoin the conclusion part (the "core") of the invariant with some weakening conditions such that the algorithm is able to maintain the weaker invariant but Process 0 can still conclude termination from the information stored locally and received by the token.
Figure 5: Invalidating the Core Invariant
Let us investigate how above invariant can be falsified by an action (i.e., the invariant holds in the state before the action but does not any more hold in the state after the action). Since all processes that have already forwarded the token are inactive, the only possibility is that such a process receives a message and becomes active again (see Figure 5). This however means that there must be still pending messages in the network. Since the invariant still holds before invalidation, a weaker version of the variant is
forall k in Zn:sig
k =>
\/
/\
- forall j in Zn: j > k => ~
act
jtval
= sumj in Z_n, j > kcnt
j- 0 <
tval
+sumj in Z_n, j <= kcnt
j
If Process 0 receives the token (i.e., sig
0
holds), it can still conclude termination from ~act
0
and cnt
0+tval
= 0.
Is it still possible for an action to falsify the weaker invariant? Clearly it can be invalidated, if a process that has not yet forwarded the token receives a message, which causes the summation term in above condition to drop by one (see Figure 6). A simple way to record whether such an incident has taken place is to associate to each process a marker that is set on message receipt. We therefore introduce
mark
in Zn->B
which is initialized to false in every position and which is set to true when a message is received:
- Init(...) :<=>
/\
- ...
- forall i in Zn: ~
mark
i- Receivei, j(...) :<=>
/\
- ...
mark
'
=mark
[j |-> true],
Then we can weaken the invariant further to
forall k in Zn:sig
k =>
\/
/\
- forall j in Zn: j > k => ~
act
jtval
= sumj in Z_n, j > kcnt
j- 0 <
tval
+sumj in Z_n, j <= kcnt
j- exists j in Zn: j <= k/\
mark
j
If Process 0 receives the token, i.e., sig
0
holds, it can still conclude termination from
~act
0,
cnt
0+tval
= 0, and
~mark
0.
Figure 6: Process Markers
Clearly this weakened invariant is still falsified when the marked process with the least index forwards the token (see Figure 7). Therefore the token itself has to record the information when it passes a marked process. We introduce a token marker
tmark
in B
which is initialized by Process 0 and forwarded by each process as
- Starti(...) :<=>
/\
- ...
- ~
tmark
'
- Forwardi(...) :<=>
/\
- ...
tmark
'
=tmark
\/mark
i
Then we can weaken the invariant further by
forall k in Zn:sig
k =>
\/
/\
- forall j in Zn: j > k => ~
act
jtval
= sumj in Z_n, j > kcnt
j- 0 <
tval
+sumj in Z_n, j <= kcnt
j- exists j in Zn: j <= k/\
mark
jtmark
This invariant cannot be falsified any more by the activity of any process.
Process 0 can then conclude termination from
sig
0, ~act
0,
cnt
0+tval
= 0,
cnt
0+tval
= 0, and
~tmark
. By definition of tmark
'
and
tval
'
, this can be stated as
Forwardi(...) :<=>
/\
- ...
term
'
<=> if i != 0 thenterm
else (~tmark
'
/\tval
'
= 0).
Figure 7: Token Marker
If Process 0 cannot yet conclude termination after a termination detection round, it may later start another round. Without resetting the process markers, however, such an attempt is doomed to fail. Fortunately, after it has marked the token, the process marker has fulfilled its task and can be reset:
Forwardi(...) :<=>
/\
- ...
mark
'
=mark
[i |-> false]
Since this action takes place simultaneously with token transmission (i.e.,
~sig
'
i holds), it cannot falsify the invariant.
If the system terminates during a termination detection round, Process 0 may not yet conclude termination at the end of this round. However, after this run, no process is active, there are no messages in the network any more, and the sum of the message counters equals 0. Still some processes may be marked; the next termination detection round may thus fail, too. Anyway, after this run all processes are unmarked, and the next termination detection round will succeed.
The complete algorithm is compiled in Specification 2. The existential quantifier
var v
in T: ...
introduces a local program variable v
in a formula like
(exists v in T: ...) introduces a local mathematical
variable v. The crucial difference between both kinds is that the
"variable" v
may have different values in different states of a
program behavior while the "constant" v has always the same value in
all states.
Specification 2: The Termination Detection Algorithm
act
in Zn->B,
chan
in Zn x Zn->Seq(MSG),
term
in B) :<=>
act
, in chan
, in term
, in run
,
in cnt
, in sig
, in mark
) :<=>
term
/\ ~run
cnt
i = 0/\ ~sig
i/\ ~mark
i,
a
, io chan
, io cnt
) :<=>
cnt
'
= cnt
[i |-> cnt
i+1],
act
,
io chan
, io cnt
, io mark
) :<=>
cnt
'
= cnt
[j |-> cnt
j-1]
mark
'
= mark
[j |-> true],
act
) :<=> ...(as in the basic system)
run
,
io sig
, io mark
,
io tval
, out tmark
) :<=>
run
run
'
sig
'
= sig
[i-n1 |-> true]
mark
'
= mark
[i |-> false]
tval
'
= 0
tmark
'
,
a
, in c
, io sig
,
io mark
, io tval
, io tmark
, io term
) :<=> (process i gets token, forwards it, or detects termination)
a
/\ sig
i
run
'
= if i != 0 then run
else false
sig
'
= if i != 0 then sig
[i |-> false,
i-n1 |-> true] else sig
[i |-> false]
mark
'
= mark
[i |-> false]
tval
'
= tval
+c
tmark
'
= tmark
\/ mark
i
term
'
<=> if i != 0 then term
else (~tmark
'
/\ tval
'
= 0),
act
, io chan
,
io term
, io run
,
io cnt
, io sig
, io mark
, io tval
,
io tmark
) :<=>
act
i, io chan
, io cnt
)
act
, io chan
, io cnt
, io mark
)
act
)
run
,
io sig
, io mark
,
out tval
, out tmark
)
act
i,
in cnt
i, io sig
,
io mark
, io tval
, io tmark
, io term
):
var
(internal variables of the algorithm)
run
in B,
cnt
in Zn->Z,
sig
in Zn->B,
mark
in Zn->B,
tval
in Z,
tmark
in B:
act
, in chan
, in term
, in run
,
in cnt
, in sig
, in mark
)
act
, io chan
,
io term
, io run
,
io cnt
, io sig
, io mark
, io tval
,
io tmark
)]