Synchronous versus asynchronous programming

(15 nov 2001)

Synchronous: communication between processes is unbuffered, and processes wait until the data between them has been transferred.

Asynchronous: communication between processes is buffered using buffers of unlimited size. The sender never waits after sending data. The receiver only waits when its buffer is empty.

There is a variety of asynchronous protocols which vary in strictness of message ordering. Some address ordering between a pair of processes, some between multiple processes. However, even when one of the strictest, and hence easiest, asynchronous ordering is used (such as Causal ordering (CO)), asynchronous programming is still hard. However, some degree of asynchrony is required for efficient operation of distributed systems.

In fact, asynchrony is more difficult than most people seem to realise. Various systems use some degree of asynchrony, sometimes without documenting in a sufficiently precise manner what model is used. Applications are written without a clear conception of the underlying communication model. For example, some GUI applications are easily crashed by operating their controls rapidly enough. My own hypothesis is that in many of these cases, a race condition occurs somewhere that crashes the program. Several VR models even exist that use a highly asynchronous model without telling exacly how it works. Take the following example from the DIVE manual:

`[...] you can't be sure that a new value of a property is updated immediately in all connected databases. Therefore, you should take care when writing applications that changes the contents of the database often. If you're changing the database faster than the changes can be distributed you'll probably end up with a paradox situation sooner or later. This is a problem that's inherent in all programs that use the peer-to-peer communication model and cannot really be avoided. Therefore, if you need to make frequent changes to the contents of the world database (or if you're on a slow network), DIVE may not be the appropriate platform to use.'

Instead of explaining precisely how to deal with DIVE's asynchronous model, it is advised that one does not change the database `too' frequently, otherwise, the program `probably' ends up in an illegal state `sooner or later'. So, asynchronous programming is explained as being more or less the same as synchronous programming, instead of explaining the differences. For example, what does happen if changes are made too frequently? Are they replicated in the order they were made? What if multiple parallel nodes update concurrently? What if a node reacts to the update of another?

People can often get away with not understanding the precise communication model because in many cases, it is not fatal to an application that some of its parts do not function correctly. If a screen update goes wrong, the result is only a badly-rendered screen, rather than a crashed program. If a mouse event does not arrive, it is noticed by the user soon enough, who will click again to correct the error, probably thinking it was his/her fault. Users are sometimes so accustomed to such small failures that they correct them without even noticing anymore.

But this is not acceptable if reliable behaviour is required. One would perhaps want the easier synchronous model in some cases, and an asynchronous model only if necessary. However, mixing the two to form a uniform model is not easy. I will try to explain this below by discussing several alternative communication models.

Alternative models

The algorithms below are written in Java-like pseudocode (using the same thread model as Java). The following methods are assumed to be present:

The ordered message queue model.

Each process has a queued channel to the other process. When a process sends a message, it is put in its output queue until the other process is ready to handle the message. For simplicity, assume there is no channel latency. Assume the messages are handled in the order they arrive. Each process consists of an event handler loop:
mainloop() {
	while (TRUE) {
		message = waitformessage();
		switch(message.type) {
			.....
			case msg_type_i :
				/* some code, processing message.data,
				 * possibly sending messages */
				.....
				send_async(somerecipient,somemessage);
				.....
			break;
			.....
		}
	}
}
This form assumes that a message belongs to one of n discrete types, which may for example stand for methods or procedures.

Nothing will happen if all processes just wait for message. There should be one or more sources that generate messages spontaneously (such as timer or user event messages).

The synchronous model

When a process sends a message to the other, it will wait until the other process is finished processing its message. But, on the other hand, it should remain ready for processing calls from the other process, called from within the procedure processing the message. This is an implementation of the regular procedure call semantics of your regular programming language in terms of concurrent processes.
mainloop() {
	while (TRUE) {
		message = waitformessage();
		start_new_thread(message);
	}
}
handling_thread(message) {
		switch (message.type) {
			.....
			case msg_type_i :
				/* some code, processing message.data,
				 * possibly sending messages */
				.....
				sendmessage(somerecipient,somemessage);
				.....
			break;
			.....
		}
		send_async(message.return_signal);
	}
}
returnID sendmessage(recipient,message) {
	send_async(recipient,message);
	waitforreturn(message);
}
Each process effectively maintains multiple local contexts: one is found in each thread. This is the equivalent of the regular call/return stack typically found in a regular programming language.

When only one spontaneous message is generated at a time, this system works fine: the message may cause a number of recursive calls, as a regular proceduce call would. Within each process, only one thread at a time is active: all other threads are waiting on return signals.

A problem occurs when a new message is generated while an old one is still being handled. In this case, uncontrolled concurrency occurs: a process starts handling multiple messages concurrently. So, this is not an acceptable concurrency model. A simple scheme may be used to fix this problem: the spontaneously-generated messages are globally serialised, and each message is queued until the previous one is handled completely.

Combining the two

At first sight, it appears the two may be combined in a straightforward manner, distinguising the two kinds of procedure call (synchronous and asynchronous) within the main handling loop.
mainloop() {
	while (TRUE) {
		message = waitformessage();
		if (message.mode == synchronous) {
			start_new_thread(message);
		} else {
			handling_thread(message);
		}
	}
}
handling_thread(message) {
		switch (message.type) {
			.....
			case msg_type_i :
				/* some code, processing message.data,
				 * possibly sending messages */
				.....
				sendmessage(somerecipient,somemessage);
				.....
				send_async(somerecipient,somemessage);
				.....
			break;
			.....
		}
		if (message.return_signal!=null)
			send_async(message.return_signal);
	}
}
returnID sendmessage(recipient,message) { ..... }
However, this scheme is not acceptable without further modifications. Even when the spontaneous messages are serialised and queued as in the synchronous case, uncontrolled concurrency may occur. Consider the following scheme: a spontaneous message arrives at a process that reacts by sending a number of asynchronous messages to a number of other processes. These processes react by sending a synchronous message back to the sender. The sender will now start handling new messages while it is still handling old ones.

A simple solution is to serialise asynchronous messages in the same way as spontaneous messages. There is effectively a global queue in which all asynchronous and spontaneous messages are processed one at a time. A problem with this approach is that asynchrony is not quite taken advantage of anymore: while asynchronous messages are sent asynchronously, they are effectively treated as synchronous as soon as they enter into the queue. The asynchronous handlers now need to send return signals as well. The amount of parallelism in this model is very limited: only the sending of messages to the queue is executed in parallel with the handler in which the messages were sent.

Further refinements could be made to increase parallelism and reduce latency problems. A straightforward improvement is distinguishing the handlers that are fully-asynchronous, that is, handlers in which no synchronous sendmessages occur. The queue does not have to wait for messages addressed to them to finish, and they can be sent directly to their recipients. They may be handled by a recipient in the spare time between the end of the processing of one queue message and the beginning of the next, i.e. if the process has no more synchronous handler threads running, it may process the messages handled by asynchronous handlers.

The algorithm is described below. First, we need a new waitformessage function:

mainloop() {
	threads_running=0;
	while (TRUE) {
		message = waitformessage(threads_running!=0);
		if (message.mode == synchronous) {
			threads_running++;
			start_new_thread(message);
		} else {
			handling_thread(message);
		}
	}
}
handling_thread(message) {
	.....
	threads_running--;
}
returnID sendmessage(recipient,message) { ..... }

In case all handlers are asynchronous, we obtain the asynchronous model. In case all handlers are synchronous, we obtain the synchronous model.

Adding multicast

Multicast models require decisions to be made about delivery and ordering of messages among more than two processes.

Synchronous multicast

We cannot call multiple recipients in parallel, as they will then execute concurrently, and may call each other synchronously, in which case uncontrolled concurrency will occur. This can however easily be circumvented by having the queue call each recipient serially. This guarantees depth-first handling of a synchronous multicast. A special exception may be made for asynchronous handlers: these may be executed in parallel, but note that these now generate return signals. Note that the order in which multiple recipients are called is not defined, and may make a difference.

Asynchronous multicast

There are various asynchronous multicast message ordering schemes. We will use the following: all recipients receive the messages from different senders in the same order, while for each sender it is guaranteed that its outgoing messages are received in order as well. (which constraint is this?) This may be easily implemented using a queue to serialise messages. Each asynchronous multicast message is sent to the queue, which will put it in the recipients' queues.

Combining the synchronous model with a publisher-subscriber model

In VETk, a publisher-subscriber model is used. This is a kind of multicast model: a publish action is a multicast to all subscribers. Processes communicate by making changes in a database, resulting in updates to be sent to the processes that are subscribed to that part of the database. Naturally, it is a requirement that the subscription information received by the different subscribers is consistent: it should reflect the sequence of states the database has been in.

This places restrictions upon message ordering. For an asynchronous system, the ordering constraint we have described is sufficient to ensure consistency. We may use the central queue implementation to serialise database actions and send subscription updates.

However, when we introduce synchrony the story becomes more complex. As we have seen, regular synchrony means depth-first execution of calls. This does not combine with a publisher-subscriber model. As an example, see the sequence diagram below:

view                                                   view
     P1                  DATABASE                  P2 set bag
{a}  |     delete(a)        |                       | {a} {a}
     |--------------------->|                       |
     |  update(deleted(a))  |                       |
     |<---------------------|                       |
{}   |     add(a)           |                       |
     |--------------------->|                       |
     |  update(added(a))    |                       |
     |<---------------------|                       |
{a}  |        done          |                       |
     |--------------------->|                       |
     |        done          |                       |
     |--------------------->|   update(added(a))    |
     |                      |---------------------->|
     |                      |         done          | {a} {a,a}
     |                      |<----------------------|
     |                      |   update(deleted(a))  |
     |                      |---------------------->|
     |                      |         done          | {}  {a}
     |                      |<----------------------|
     |                      |                       |
The database contains an element, a, which the processes P1 and P2 are subscribed to. Process P1 sends a delete action, deleting a, which causes an update to be sent to both processes. It is however first sent to P1, which reacts by re-adding the element. This update is then sent to both processes before the first update is sent to P2. This results in both processes seeing different sequences of updates.

The problem can be circumvented by avoiding more than one call level for publish actions. Note that, by reducing the strictness of the subscription semantics, we can allow recursive publish actions. What we do is treat the subscription result as a bag instead of a set. While a call tree is being executed, the subscription state of the different processes may differ. However, at the end of the synchronous call, the states are consistent again. This can be seen easily by noting that the amount of add and delete notifications for different subscribers is the same, even though the order may be different. Note that the `bag' representation we need here is not a normal bag: it should also allow negative numbers of elements! However, the usefulness of this scheme is probably limited.

Another option is to recalculate the updates necessary after every call.

Combining the asynchronous model with a publisher-subscriber model

This is easily done by using the asynchronous multicast model as described above. The `central queue' becomes the `central database'.

Combining the synchronous-asynchronous model with publisher-subscriber

Using the described synchronous-asynchronous model combined with the described multicast is not quite sufficient to obtain a valid publisher-subscriber model.

This can be seen by considering the following things: In order to ensure updates arrive in the same order, the order of queueing of all messages should be the same. The algorithm, however, reorders messages when participating in a synchronous call. In other words, updates are received in different order when some processes start participating late, i.e. when the synchronous call begins, they process their asynchronous updates while the other processes queue their asynchronous updates.

The following scheme can be used to circumvent this: When a synchronous handler is called (by an asynchronous or spontaneous message), the queue blocks processing of further asynchronous messages for the entire group of participating processes until call is completed. A group is defined as all processes that may possibly participate in the call. It is not easy to determine this group from the code. As a simple solution the group may be determined by hand.

In other words, the queue ensures that message reordering will never occur. This means the algorithm of each process can be simplified.

mainloop() {
	while (TRUE) {
		message = waitformessage();
		if (message.mode == synchronous) {
			start_new_thread(message);
		} else {
			handling_thread(message);
		}
	}
}
handling_thread(message) { ..... }
returnID sendmessage(recipient,message) { ..... }
The algorithm of the central database is however quite involved.
mainloop() {
	while (TRUE) {
		message = nextmessage();
		if (is_returnsignal(message)) {
			if (call_has_more_recipients(message)) {
				call_next(message);
			} else {
				if (call_was_synchronous(message)) {
					send_return(message);
				} else {
					free_sync_group(message);
				}
			}
		}
		if (is_update(message)) {
			if (has_blocked_recipients(message)) {
				postpone_until_freed(message);
			} else {
				update_msgs = db_update(message);
				if (contains_sync_handler_calls(update_msgs)) {
					foreach group in sync_groups(update_msgs) {
						block_sync_group(group);
						call_first(group,update_msgs);
					}
				}
				call_async_recip_only(update_msgs);
			}
		}
	}
}

Example application: collaborative list

To illustrate the algorithm, we discuss a small example here. It is a multi-user list, in which items can be added by filling in text in a text field, and deleted by clicking on the items displayed in a list. A combination of synchrony and asynchrony is used.

Adding items is asynchronous, since there is no problem if users add items concurrently. Deleting items, however, is synchronous, in order to prevent two users from trying to delete the same item simultaneously.