Boris van Schooten
Parlevink group,
Faculty of Computer Science,
University of Twente, Enschede
P.O. Box 217, 7500 AE, Netherlands
schooten@cs.utwente.nl
http://wwwhome.cs.utwente.nl/~ schooten/vetk/
This papers explores some system design issues regarding the structuring and information flow of complex distributed virtual environments (VEs) containing one or more users and dialogue agents. A specification technique is proposed for modelling such systems. We define a specification technique as a suite of interrelated specification languages.
It is argued that specification techniques should support gradual design refinement. Therefore, the proposed technique provides several detail levels of specification. Furthermore, the technique tries to explicitly address distributed implementation issues right from the start. Even prototypes made using the technique are suited for distributed usage.
The basic idea behind the technique is that all physical objects, such as user interface components and objects in the VE, are modelled as separate software components. These are arranged in a specific structure, reflecting the actual physical or conceptual structure of the objects. A change notification model is provided that enables each component to react to events from its local structural environment, including additions and removals of other components.
The technique is illustrated by means of a Web-based multi-user VE incorporating an environment-aware dialogue agent. The example shows how such a system may be structured in a natural way using the technique and may be specified concisely. The provided change notification model may be considered effective, as it largely replaces the need to directly pass messages between components. However, the still limited expressivity of the change notification model was sometimes found cumbersome, and needs to be improved.
The paper continues on previous research on specification techniques for the development of multimodal dialogue systems and inhabited virtual environments (VEs) ([van Schooten et al., 1999, van Schooten, 2000]). A specification technique is defined here as a suite of specification languages.
Even if good guidelines are provided, development of complex human-computer systems is iterative and interdisciplinary in nature. This holds for systems ranging from dialogue systems ([Bernsen et al., 1998]) to graphical user interfaces (GUIs) ([Shneiderman, 1998]). The resulting system is necessarily tailored to the specific application domain, and each new domain requires a new system.
A specification technique may facilitate development in the following ways: it may be used to document (enabling display-based reasoning ([Davies, 1993]), and communication among developers or with users), execute (enabling generation of quick alternative prototypes including some behaviour, or a complete system to test with users, producing evaluation results), and prove well-behavedness (making some technical issues more tractable, for example in the case of systems with a high degree of concurrency).
Especially the suitability to serve as documentation is particularly subtle. While a specific type of solution might be specified using different languages, it will be easier in some and harder in others. Even apparently small differences in language may have significant impact on the difficulty of the problem solving process ([Sime et al., 1977, Petre, 1990]). A good specification language may facilitate documentation by providing often-occurring or well-established functionality as language constructs, enabling natural expression of this functionality.
Ideally, a single language or small set of tightly-coupled languages should be directly usable for all of these things, reducing the difficulty of matching and maintaining consistency between different specifications. Often, though, there are `gaps' between different specifications ([Wegener, 1995]), which constantly have to be jumped during development. If the gaps are too large, then some of the languages will cease to be used. In particular, the programming language in which the system is eventually encoded often gets to play an overly important role, because of its obviously superior execution capabilities. Even keeping natural-language specifications (comments or API specifications) complete and consistent with respect to a program is often considered a burden, and usually doesn't quite work out. This leads to such specification practices as for example found in extreme programming ([Beck, 1999]), in which the programming language is viewed as the primary specification language, in compliance to the old programmers' saying `the source code is the documentation'. This is actually the viewpoint taken here. The languages provided by the technique are in fact high-level programming languages, and are meant to be suitable for both execution and documentation.
The emphasis of the technique is on the `macro' level of system architecture. It facilitates a fine-grained and highly-structured architecture, in which physical objects are modelled as components, and are structured in a way that maps closely to physical or conceptual structure. Physical objects may include both basic GUI components and VE objects. In fact, the VE concept may be seen as a generalisation of the GUI concept: VR systems are often a complex mixture of traditional GUI elements combined with more novel and `realistic' elements.
The specification technique proposed here is centred around a programming language providing a suite of high-level communication primitives. The language is called Virtual Environment Agent Language (VEAL). The component structure plays a very explicit role in the language, because components' interaction is specified directly in terms of this structure. This should lead to an architecture that can be more closely related to the system's conceptual model (i.e. the users' views of the system), and hence, also to the model that a dialogue agent in the system should have.
To facilitate prototyping, a dataflow model is provided, enabling a rough prototype to be specified by plugging together existing VEAL components. It is shown how it may be combined with a layout language, such as HTML or VRML, to obtain a prototype screen layout with some basic behaviour.
To address technical issues with respect to concurrency and distributed implementation, asynchronous message passing and change notification facilities are provided. To enable a great degree of control in the later stages of development, VEAL is closely integrated with a traditional general-purpose programming language (Java). The relation of VEAL with the underlying programming language is similar to that found in coordination languages ([Gelernter and Carriero, 1992]): a coordination language specifies the integration and coordination of functionality of regular software components into concurrent applications.
Figure 1 gives a rough idea of how the technique might be used. Informal specifications typically found at the beginning of the development cycle (such as screen layouts and models of the system's concepts and behaviour, see for example the technique used in MUSE ([Lim and Long, 1994])) should be quickly translatable to more formal specifications. Further development of the system should be possible by means of incremental changes to the specifications.
Figure 1: Example development cycle using HTML or VRML.
Note that dialogue models ([Bernsen et al., 1998]) and task models are as yet not integrated in the technique. Both may be mapped to behaviour specifications of individual agents within the model, namely that of user agents and dialogue agents. Such `micro-level' specifications are (still) outside the current research focus.
Furthermore, there are no facilities for supporting formal proof. This is currently out of research focus as well. Since proofs generally require abstraction in order to be tractable, proofs typically need to be preceded by some abstraction step, based on the most difficult and relevant parts of the system. The necessary kind of abstraction will only be known after some experience with often-made mistakes using the technique.
The communication model used by the technique is based on communication schemes found in multi-agent models and in user interface component models. In multi-agent models, a system is modelled as a number of concurrent agents with private data, which communicate using messages. Various communication models are found, which are discussed in the next two sections. In user interface component models, a system is often modelled as a number of interrelated components, through which events propagate in specific ways. These may be viewed as change notification models, and will be discussed in sections 1.3.3 and 1.3.4.
While user interface component models are obviously more tailored to user interface design than multi-agent models, they are typically not suited for distributed usage, or are implicit about the precise communication model that is or should be used. The model proposed here tries to combine ideas from both areas.
Many multi-agent systems support asynchronous communication, or provide it as the sole communication model ([Skarmeas, 1999]). Asynchronous communication provides a relatively easy to understand communication model for concurrent systems. It means that a sender or caller process does not wait for a reply after sending a message, but continues execution and picks up any reply later. Note that this requires some degree of message buffering between processes. The main reason for using an asynchronous model as opposed to a completely non-concurrent synchronous model is efficiency. In an asynchronous program, one may parallelise execution of a series of `send-reply' calls by sending them in a burst to different destinations, and then listening for the replies to trickle in. This scheme also reduces the effects of communication latency, which is often found to be a fundamental problem in distributed user interfaces ([Bhola et al., 1998]). This does mean that asynchronous programs are typically more complex than non-concurrent synchronous ones. It is also likely that re-writing a non-concurrent synchronous program to a program involving asynchrony requires some difficult changes.
There are several variants of asynchronous communication. The easiest to understand is buffered communication, in which there is a message queue between each pair of processes. Messages sent in a specific order arrive in a specific order. If this were not the case, replies would typically have to be tagged with context information, in particular references to the original messages they replied to. For this reason, some models provide a standard `reply-to' field in their messages (see for example KQML, [Labrou and Finin, 1997]). In the buffered case, only the replies to messages sent to different processes may arrive out of order. This means that replies only need to be tagged with the identity of the process they came from. This is the model provided here.
Note that non-distributed applications may also benefit from an easy-to-use asynchronous model. In particular, computations taking a lot of time may be more easily set to work in the background. In fact, there is often some asynchrony or concurrency in user applications or user interface models, but it is typically not well-documented and is hand-coded using low-level constructs. This leads to programming errors. Too many GUI and web-based applications can be crashed by clicking on buttons and menus too rapidly, or leaving open multiple windows.
Many multi-agent systems also have explicit support for multi-casting messages. Multi-casting means sending the same message to multiple destinations at once. There are at least two reasons for having a multi-cast facility. One is efficiency: it enables multicast facilities of underlying network protocols to be used. Another is semantics: some systems ([Kaashoek and Tanenbaum, 1996]) model a multicast operation to indicate a specific point in time in which a message is sent. When two agents multicast a message, all recipients receive the two messages in the same order. This model ensures that all agents see things happening in the same order. While there are several other multi-cast semantics possible ([Birman, 1985]), this is one of the easiest to understand, and is the model provided here.
With the general increase in software interactivity, change notification models have enjoyed increased interest lately. They provide a tractable framework for distribution of information in concurrent interactive systems. The most basic form of change notification is that a component can subscribe with another component for information satisfying specific criteria. Each time new information satisfying the criteria becomes available, the subscriber is automatically notified. Note that change notification may be viewed as an implicit kind of multi-cast. The semantics of `regular' multi-casts may be used.
When taking a fine-grained view of the concept of `component' (as is usually done in object-oriented design), many interactive systems can be said to have dynamically changing component structures, of which both changes within components and re-orderings of components need to be propagated through the system. This may be modelled using a change notification scheme.
Support for subscription with changes within a component are often found. Interactor models (such as Model-View-Controller (MVC), [Krasner and Pope, 1988]) prescribe a separate type of component (such as the Model component in MVC) that stores data and notifies changes to subscribers, which are typically supposed to display the data.
A different kind of subscription is found in `one-way constraint' user interface models (in particular Amulet, [Myers et al., 1997]). The `constraints' are, in fact, subscriptions specified in terms of the component structure. These models enable a concise and `declarative' notation of relatively complex subscriptions. In Amulet for example, one may specify a component to subscribe to a data field named X found in a component tree sibling called s. We will call this kind of subscription structure-based. Similar but more implicit structure-based subscriptions are found in other models. In VRML for example, there is a special node called the sensor node, which enables click and drag operations on all nodes within its subtree. To some extent, structure-based subscriptions enable a system to be specified by arranging standard components in a specific manner.
Support for subscription to changes in the component structure is also found, but is typically more basic. Dynamical environments like MOO VEs (see for example LogiMOO, [Tarau et al., 1998]) have the most extensive support. Typically, moving from one room to another inside the VE generates `enter' and `leave' events, which are received by the other objects residing in the rooms. Similar support for these events is found in other user interface models, for example Java's ContainerEvent and VRML's children field.
Typically, models that provide some kind of component structuring mechanism that is used for subscriptions arrange them in trees. In MOOs and GUIs they are called container hierarchies, in VRML they are called scene trees. The proposed model tries to generalise upon this concept by enabling the components to be arranged in any directed labelled graph structure (comparable to an entity-relationship model or a semantic network). Subscriptions may be specified in terms of this structure, and result in sets of values. This enables both structure-based subscriptions and subscription to changes in component structure.
Unlike most of these models, the proposed model is suitable for a distributed implementation, due to its asynchrony. In effect, the component structure may be viewed as a `distributed' shared database through which the components communicate. A similar idea is found in the well-known distributed programming language Linda ([Bjornson, 1992]), which uses an asynchronously-accessed data space called the shared tuple space for this purpose. Linda, though, does not prescribe a specific structuring of its tuples.
Dataflow models may be viewed as an alternative way to specify change notification structures. Dataflows are specified by means of `routes' which are specified from the outside, coupling information sources to subscribers, enabling detailed `plugging' together of standard components into a new system. Dataflow models are often found in `end-user programming' systems, typically using visual representations ([Walton and Dewar, 1996, Banavar et al., 1998]), but they are also found in programming languages. For example, VRML has the ROUTE command, and Java's AWT has the add...Listener methods.
Dataflow models provide a high-level manner of specification that may be suitable to build a system quickly at the expense of precise control. So, dataflows may be particularly suited for prototyping purposes.
However, the basic dataflow model is not compatible with dynamically changing component structures. Its `visual' metaphor, programming by defining individual nodes and drawing arcs between them, is not meaningful when the nodes or even the number of nodes is not known beforehand, and will change dynamically. In the model proposed here, dataflows are specified in a symbolic manner. They are, in fact, specified on top of the provided change notification model, enabling part of agents' subscriptions to be specified as run-time configuration.
The VEAL language is explained in detail in this section. The communication primitives provided by VEAL were originally implemented as a Java API. Java was chosen, because Java classes are potentially embeddable in HTML or VRML code. The VEAL language was created later as a layer on top of Java, for the sake of conciseness. Regular Java expressions and objects may still be used inside VEAL specifications.
The basic unit of specification is the agent, which is similar to a process or object. Each agent may have any private data or have access to private resources. Agents act in parallel, and are specified in an event-driven manner using condition-action rules. Next to agents, there are properties and relations. Properties are data fields that may be published by an agent and subscribed to by other agents, and relations are communication channels between pairs of agents.
To establish a relation, it must be explicitly requested and acknowledged. This way, agents can enforce multiplicity constraints, and concurrent access policies can be modelled using relations. Both parties of a relation have the freedom to remove it at any time, informing the other party of this by means of an event. The relation will also be removed in this manner if one of the agents crashes or is disconnected. This provides a clean model for dealing with component and network failure. The lifecycle of a relation is illustrated in figure 2.
Figure 2: The lifecycle of a relation R(a,b).
The following example is a client agent which tries to connect to a specific server, issue a query, and process its answer.
1 relation R {
2 servermsgs { answer(String s) }
3 clientmsgs { query(String s) }
4 }
5 agent client(Agent server) {
6 R.request(server);
7 when enters R.servers(self) set {
8 R.clientcast(query("some_query"));
9 }
10 when leaves R.servers(self) set { exit("failure"); }
11 when serverrefuses R agt { exit("failure"); }
12 when servermsg agt R answer a {
13 QueryProcessing.process(a.s);
14 exit();
15 }
16 }
The client-server protocol is specified using the relation statement. Relations may define any number of server and client messages with arbitrary parameters. Message parameters are passed by value, i.e. a copy is made of each parameter's contents.
The client agent has one configuration parameter, namely the identity of the server agent it should connect to (line 5). Note that agent identities are specified by the data type Agent, which is a regular Java class. The client first tries to connect to this server (line 6). After that, it starts listening to messages, which are handled by the four when statements. When a message arrives, the condition of each when statement is checked, from top to bottom. As soon as one condition is found true, the body of the statement is executed, and the next message is handled. The first condition (line 7-9) specifies a set change trigger. It should be read as `trigger when elements are added to the set specified by the expression R.servers(self)'. This expression specifies the servers of protocol R that the agent has. If an element is added, the server connection was obviously successfully established. The agent reacts by sending a query message to the server (line 8). Actually, clientcast is a multi-cast statement, sending a message to all the agent's servers of protocol R. When the server sends an answer, it will trigger the fourth when statement (line 12-15), which processes the query using a private module QueryProcessing (line 13), and then exits (line 14). In the spirit of a coordination language, the details of this processing are not specified in the agent's specification, but are delegated to the underlying programming language. The other two when statements specify error conditions: Line 10 specifies the condition that the connection was aborted before an answer was sent, and line 11 specifies the condition that the server refused the R protocol.
The extra complexity introduced by asynchrony and the possibility of failure can clearly be seen here, introducing several extra intermediate states in the agent. On the other hand, the agent's specification does deal with these different situations in a concise way.
Next to these basic message passing facilities, a change notification model is provided. It may be considered an alternative to message passing, and various kinds of communication could be naturally implemented using either. Subscription is done in the form of queries. The most basic kind of query is in fact already found in the above example program, namely the query R.servers(self). Additionally, the servers may have published properties, which can then be subscribed to. For example, servers having the property P may be queried like this: P.has(R.servers(self)). This results in the set of servers that actually published the property p, along with the newest values of the properties. Any additions and deletions to this set are notified, as well as changes in the properties' values. The properties may contain any serializable Java objects. Updates in property values are sent incrementally, using the built-in object caching scheme found in Java's object streams. More complex queries can be built by recursively applying query operators. In addition to the servers(...) and has(...) operators, there are several other ones, in particular a clients(...) operator.
Figure 3: Implementation of a simple blackboard model.
As an example, suppose that we want to specify a simple blackboard architecture (see figure 3). In the blackboard or `publisher-subscriber' model ([Eugster et al., 2000]), there is a blackboard agent which keeps track of information submitted by multiple publishers. Subscribers may access this information by subscribing with the blackboard, specifying what kind of information they are interested in. We model publishers using a relation named P, and subscribers using a relation named S. Neither relation needs to have any direct messages; they just signify the publisher and subscriber roles. A subscriber may easily obtain published values of some property X from the blackboard by a query X.has(S.clients(P.servers(self))).
Simple queries like the above are likely to be the most-used ones in a VE system ([Vander Zanden and Halterman, 1999]). The above query, for example, can be used to model several types of environmental awareness needed in VEs. `Blackboard' may be taken quite literally, and the above model is a model for a shared blackboard system, with the publishers and subscribers being users, and X the type of information that the subscriber is currently viewing. Alternatively, the blackboard might be a room, and the publishers are objects residing in the room.
While the kinds of query currently supported are rather limited, the query model may be extended as needed. Still, as we shall see, the current model is already suitable for some simple yet interesting applications.
While writing applications, there was often a need to pass part of the agents' queries as configuration parameters. For this purpose, a dataflow-like scheme was added to the model, which could conveniently be built on top of the existing query model. Each agent has a special property routenode, which may be given a name. Arbitrary data values may then be published by setting fields within the routenode. Fields are set by specifying the field's name and its value.
In existing dataflow models, a typical dataflow leads from a specific field in a specific component to another specific field or message handler in another. For example, a ROUTE statement in VRML specifies a source node and field and a destination node and field. The statement itself may be placed inside the syntactic scope of a node, and may address any child nodes within that scope by name. In analogy with this concept of scope based on component structure, VEAL provides the routescope statement, coupling a scope name to a query. The query defines the set of agents which the routes may use as sources. Routes are defined by passing a list of routes as a parameter to the destination agent. Each route specifies the routescope, the name of the source node and field, and the name of the event handler to be invoked in the destination agent. In textual form, it is specified as <routescope> : <nodename> . <nodevalue> > <handler>.
Like regular dataflows, multiple routes may be laid from different sources to the same destination. Unlike regular dataflows, it is even possible that one route specifies multiple sources, as there is no uniqueness constraint for node names within a scope. When using a regular event handler, changes in any of the sources generate events that invoke the handler. In the spirit of set-based query facilities, it is also possible to specify the set of all route sources of a specific handler as a query.
As an example, consider again the blackboard example in figure 3. Suppose the blackboard stands for a room, and suppose that agents may speak by publishing last_said properties, containing String values. One of the subscriber agents might have the following specification:
1 agent generic_eavesdropper(Agent room) {
2 routescope objects_in_room = S.clients(P.servers(self))
3 S.provide(room);
4 when routeout speak src String str {
5 if (str!=null) last_said.have("I notice: " + str);
6 }
7 }
This agent defines a routescope objects_in_room, specifying the occupants of any room it is subscribed to (line 2). The routeout statement (line 4-6) specifies a handler called speak, which expects String values. Any invocations will cause the agent to publish the last_said property (line 5) with the contents of the invocation's source. It might be used to track information about a specific kind of object in a specific room. For example, it may be started with a route objects_in_room: chair.description > speak. It will now recite the description fields of any `chair' agents inside the room or entering the room.
While limited and somewhat ad-hoc, this dataflow scheme already enables quite concise specifications for relatively simple applications. It is particularly useful for embedding agents in HTML pages, as will be shown below.
A first implementation of the messaging engine has been built using a `central server' setup. A server process (written in Java) keeps track of the system structure and handles all messages. An agent has to connect to the server once at startup through the server's IP address, and is able to use the VEAL communication facilities from then on. Internally, the communication is handled using an asynchronous remote method invocation scheme, which is built on top of TCP sockets and Java's object streams. While obviously far from optimal, this scheme works reasonably well for small-scale distributed applications.
As an example of high-level specification using a combination of dataflows with visual layout, the technique enables easy embedding of components in HTML pages. Some standard components have been defined, including hyperlink agents which enable the system to track the users' pointing and clicking behaviour. While in this case the layout / dataflow language has a textual form, similar specifications may of course be generated using a visual tool.
While HTML is only two-dimensional and rather static, it is likely that a similar embedding scheme will also work for 3D systems, for example, using VRML. One or more VEAL agents might be embedded in each VRML object, interacting with their local VRML code through internal messages, and integrating it with the rest of the system. A browser agent might control the loading and unloading of VRML objects, driven by requests from the VRML agents. Issues regarding the conceptual structuring of realistic 3D environments are discussed in a paper elsewhere in these proceedings ([Zwiers et al., 2000]).
Figure 1 already suggested the use of traditional entity-relationship diagrams (ERDs) to model the concepts found in a system. In general, an ERD of the objects in a VE or GUI may be used to specify object types, their relations, and their multiplicity. As an example, consider the embodiment of a dialogue agent in a multi-user VE. Should the agent's presence be modelled in the same way as users' presence? Should there be a private agent for each user or should it have a global presence? In this section, we will demonstrate how ERDs relate to the technique, and how such different choices may be tried out by re-plugging components.
In fact, ERDs map quite closely to the specification technique. The structure used by the VEAL standard web component library is given in figure 4. The diagram documents that each page may contain multiple components, that one user may be viewing multiple pages, and that users participate in sessions, with participation being mediated through specific web page components. The entities and relations in the ERD have a one-to-one mapping to agents and relations in the VEAL specifications.
Figure 4: Entity-relationship diagram for a multi-user Web system.
The standard component library enables simple user interfaces to be specified easily. For example, a simple multi-user chat room may be specified using the following piece of HTML:
1 <HTML><HEAD> <TITLE>Web Chat</TITLE> </HEAD><BODY> 2 <applet code="textarea.class" width=600 height=400 align="middle"> 3 <param name="name" value="chatlog"> 4 <param name="routes" value="ve: textout.text_value > new_text"> 5 </applet> 6 <BR> 7 <applet code="textfield.class" width=600 height=30 align="middle"> 8 <param name="name" value="textout"> </applet> 9 <applet code = "ve_user_daemon.class" width = 0 height = 0> </applet> 10 <applet code = "webpage_daemon.class" width = 0 height = 0> 11 <param name="page_name" value="web_chat"> </applet> 12 </BODY></HTML>
A standard session agent needs to be started separately. Then, users may log in by loading this HTML page. The page embeds four components. The ve_user_daemon and webpage_daemon correspond to respectively the user and webpage entities in figure 4. These agents actively try to connect to other agents so as to form the relational structure in the figure. Two agents (textarea and textfield) are wrappers around standard GUI components. The textarea is simply made to listen to text_value output of textout nodes (line 4), in other words, to the text fields' output.
We may want to specify a dialogue agent that tracks the utterances of each user separately and participates in the conversation. If we don't want to change the above code, we may choose to model the agent's participation in the system to `simulate' that of a regular user, i.e. it operates from a web page. This can be done by means of the following code:
1 <HTML><HEAD></HEAD><BODY> 2 <applet code="dialogue_agent.class" width=600 height=400 align="middle"> 3 <param name="name" value="textout"> 4 <param name="routes" value="ve: textout.text_value > new_text"> 5 </applet> 6 <applet code = "ve_user_daemon.class" width = 0 height = 0> </applet> 7 <applet code = "webpage_daemon.class" width = 0 height = 0> 8 <param name="page_name" value="web_chat"> </applet> 9 </BODY></HTML>
Note that HTML is now only used as dataflow specification, as the dialogue manager requires no screen output. The webpage_daemon and ve_user_daemon take care of the participation in the session. The dialogue agent might have the following behaviour:
1 agent dialogue_agent(String name) {
2 routescope webpage=is_on_webpage.servers(is_on_webpage.clients(self))
3 routescope ve = is_on_webpage.servers(is_on_webpage.clients(USERDAEMONS_IN_VE))
4 routenode.have_name(name);
5 AttrSet dialogue_mgrs = new AttrSet();
6 when requests is_on_webpage a { is_on_webpage.serverack(a); }
7 when enters user_utterances.routesources() new_utterances {
8 foreach (user,String utterance; new_utterances) {
9 if (routesource_agent(user).id() != self.id()) {
10 DialogueManager dm=(DialogueManager)dialogue_mgrs.attr_of(user.id());
11 if (dm == null) {
12 dialogue_mgrs.add(user.id(), dm = new DialogueManager());
13 }
14 String reply = dm.process(utterance);
15 if (reply!=null) routenode.have_field("text_value",reply);
16 }
17 }
18 }
19 }
Since it is a web page component, it accepts is_on_webpage relations requested by the webpage_daemon (line 6). The agent then enables route sources from within its own web page (line 2) and any other web pages within the VE, as mediated through the ve_user_daemons (line 3). Note that USERDAEMONS_IN_VE is a macro specifying an often-used query. A route handler user_utterances is specified in line 7-18, and specifies explicitly that the agent maintains a separate dialogue state (encapsulated in DialogueManager) for each user. The new utterances are handled one by one using a foreach loop, which executes the loop body for each element in the set after assigning the element and its attribute to the variables user and utterance. The replies coming from the dialogue manager (line 14) are published through a route field text_value (line 15). Note that the dialogue manager must specify explicitly to ignore its own utterances by checking the source of each utterance with its own identity (line 9). The global structure of this scheme is illustrated in figure 5.
Figure 5: `Web chat' with one user and a dialogue agent logged in.
Various alternatives are possible if we are willing to change the original code. For example, the dialogue agent may conveniently be run stand-alone by simply including an extra glue agent within each user's page which listens specifically to the dialogue agent. Alternatively, we might wish to include the dialogue_agent in each user's webpage by specifying an extra applet in the former HTML code. Now, each user gets a private dialogue agent. The latter scheme is obtained by simply re-routing the existing agents:
1 <HTML><HEAD> <TITLE>Web Chat</TITLE> </HEAD><BODY> 2 <applet code="dialogue_agent.class" width=600 height=400 align="middle"> 3 <param name="name" value="dialogueagent"> 4 <param name="routes" value="webpage: textout.text_value > user_utterances"> 5 </applet> 6 <applet code="textarea.class" width=600 height=400 align="middle"> 7 <param name="name" value="chatlog"> 8 <param name="routes" value=" 9 ve: textout.text_value > new_text 10 webpage: dialogueagent.text_value > new_text "> 11 </applet> 12 <BR> 13 <applet code="textfield.class" width=600 height=30 align="middle"> 14 <param name="name" value="textout"> </applet> 15 <applet code = "ve_user_daemon.class" width = 0 height = 0> </applet> 16 <applet code = "webpage_daemon.class" width = 0 height = 0> 17 <param name="page_name" value="web_chat"> </applet> 18 </BODY></HTML>
The route specifications of the dialogue agent (line 4) and the textarea (line 9-10) specify the new structure. This scheme is illustrated in figure 6.
Figure 6: `Web chat' with one user logged in, having a private dialogue agent.
In this section, a more complex example VE application is discussed. Locations in the environment are modelled using web pages. Other web pages may present textual information. Inside the VE is a dialogue agent, which is aware of the pages each user is viewing, and the objects each user is pointing to.
This application is in fact a two-dimensional version of the Virtual Music Centre (VMC) ([Nijholt et al., 1998]), which is a 3D web-based VE. While this version is only static 2D and does not contain fancy animations like the real VMC, it does support multiple users and a dialogue agent at the same time, and adds some environmental awareness to the dialogue agent.
The application was actually developed by starting with a screen layout sketch (figure 7). Such a sketch is useful for reasoning about alternative solutions. Conveniently, the structure of the graphical objects in the sketch may be made to correspond closely to that of agents in the system.
Figure 7: Sketch of two-dimensional VMC.
The basic idea is to have one `chat' window, like the example application in the previous section. A similar solution is found in the `real' VMC, in both the single-user and multi-user version. In this chat window, there are also facilities for choosing the user's appearance, that is, a name and avatar. Next to the the chat window, there may be VE windows or regular Web pages. VE windows contain an image displaying the VE action, and a small map window.
The screen layout shows one way to display these three kinds of window, namely by dividing the screen into three areas, one for the chat window, one for a VE window, and one for a Web page. This setup may be implemented using HTML frames. In a regular Web browser though, the user is free to open more windows at will. It is even possible that a user is present in multiple VE locations at once. At first glance, there seems to be no good reason not to allow this kind of freedom, and the system can easily be made to support these situations.
This layout and some basic behaviour is readily implementable in HTML using standard agents. The chat functionality was implemented in the same way as the previous example. The VE locations were modelled using a standard canvas agent that can be used to specify an image with clickable areas which bring the user to other locations or bring up regular web pages. This resulted in a prototype which illustrates the chat interface and enables browsing around the (still static) world.
Managing the users' appearances in the VE was next on the menu. To accomplish this, three custom agents were created. The general setup is illustrated in figure 8. A custom agent room_listener was added to the chat window, which converts the user's avatar selection into a set of graphics, and publishes this information. The room_listener also has a route handler converting positional information into a telepointer graphic. This way, telepointers could be added by routing the mouse position published by each VE canvas to the room_listener. In order to simplify other queries, this agent also publishes the set of all rooms the user is in. Each VE location picks up relevant avatar information by means of another custom agent avatar_listener, embedded in the window, which publishes the avatars of all users that are currently in that location. The resulting information is routed to the location's canvas agent. A third custom agent user_textout is used to concatenate the user name to user utterances.
Figure 8: Layout and dataflows of the second VMC prototype.
Finally, a dialogue agent was added to obtain a full prototype. Like the first solution of the text-based example, the dialogue agent is global to the system, and logs in like a regular user. The dialogue agent is a simplified version of the one in the `real' VMC, which is called Karin. Basically, Karin uses a parser which converts a natural language utterance to a tuple which contains a fixed number of fields, relevant to the application domain. One of the fields is the requested operation. Zero or more of these fields may actually be filled in for each utterance. The dialogue manager then reacts by interpreting the tuple with respect to the current dialogue state, which may be thought of as a relatively simple state automaton. Often, users just have queries which Karin answers by presenting query results.
The parser is also capable of resolving anaphoric references by using information from previous utterances, which is stored in a context object, which contains a history list of tuples. Karin was made `context-aware' by adding information from other modalities to this context. Two kinds of context-awareness are modelled: the objects the user is pointing to, and the pages the user is viewing. The user can issue a query by pointing at an object in the VE (such as a poster) and typing `This'. Karin is similar in structure to the dialogue_agent in the previous example.
1 agent karin(String name) {
2 ... some initialisation code ...
3 AttrSet user_infos = query vmc_user_info.has(is_on_webpage.servers(
4 is_on_webpage.clients( USERDAEMONS_IN_VE ) ));
5 AttrSet my_page_name = query webpage_info.has(is_on_webpage.clients(self));
6 AttrSet user_contexts = new AttrSet();
7 when requests is_on_webpage a { is_on_webpage.serverack(a); }
8
9 when changes vmc_user_utterance.has(is_on_webpage.servers(
10 is_on_webpage.clients( USERDAEMONS_IN_VE ) )) user_utterances {
11 foreach (usr,vmc_user_utterance utt; user_utterances) {
12 foreach (i,webpage_info wpinfo; my_page_name) {
13 vmc_user_info user_info = (vmc_user_info)
14 user_infos.attr_of(usr.id());
15 if (user_info!=null) {
16 foreach (r,webpage_info roominfo; user_info.pages) {
17 if (wpinfo.name.equals(roominfo.name)) {
18 // User `usr' is in my room. Get user's context.
19 DialogueManager dm = (DialogueManager)dialogue_mgrs.attr_of(r.id());
20 if (dm == null) {
21 dialogue_mgrs.add(r.id(), dm = new DialogueManager());
22 }
23 String answer = dm.process(
24 utt.contents, // The utterance
25 user_info.pages.attrs_vector(), // Names of pages being viewed
26 user_info.pointing, // Names of items being pointed at
27 db // handle to database
28 );
29 if (answer!=null) routenode.have_field("text_value",answer);
30 }
31 }
32 }
33 }
34 }
35 }
36 }
Lines 3 and 4 specify the query resulting in the information from other modalities. Line 5 queries the web page the agent is on. Lines 9-35 specify the reaction to new user utterances. In line 12-17, the agent checks if the user is in the same room as the agent. Lines 19-29 are similar to the utterance handling found in dialogue_agent, except that extra context information is passed to the dialogue manager.
Most specifications were found to be concise. Typically, no more than one page of code was required per agent. The amount of additional VEAL code in the four non-standard agents is less than 200 lines.
Surprisingly, asynchronous communication did not pose any serious problems. The lack of problems may be attributed to the relatively simple dynamics of most components. Sometimes though, subtle quirks are introduced due to concurrency in general. The order in which relations are created is different each time, and this results in information becoming available in a different order each time. This results in extra checks, such as null pointer checks (karin, line 15). Still, the amount of problems encountered due to asynchrony and concurrency in general was surprisingly limited. Some experience with other example applications indicates that problems especially occur with applications involving concurrent access of shared resources (using relations to model access rights), or dynamical maintenance of complex structural constraints in general. Often, care had to be taken that the delays introduced due to asynchrony did not lead to states that the systems did not account for. A more precise account of exactly what kind of systems are difficult to model would be interesting.
Actually, the most common errors in this example were not concurrency problems, but rather, type errors in the queries and routes. Sometimes a set was expected while a set of sets was actually delivered, and sometimes a property and that property's contents were interchanged accidentally. These kinds of errors could be detected by a compile-time type checking system.
The example systems that were written displayed reasonable robustness against component failures. For example, if an agent on a page would throw an exception for whatever reason, the page could be `revived' by simply reloading it, restarting the agents. If larger parts of the system failed, other parts would typically keep running. In fact, development was typically done while part of the system kept running. In `open' environments like the Web, such robustness is desirable. In some environments, such as MOOs, where people actually do their development within a running system, such robustness is crucial. Since the example systems up till now haven't been primarily focussed on robustness in `open' environments, a more explicit account could be given of this aspect.
The query facilities were sometimes found too limited. They should be extended with at least the ability to match specific values of properties, rather than just their existence. This would also enable routes to be specified directly in terms of regular queries. Furthermore, instead of the simple route specifications currently provided, full queries could be passed as agent parameters. In fact, it may even be a good idea to specify most of each agent's queries at runtime (given appropriate shorthands, maximising readability of the query specifications). The agent then only has to specify that it expects certain dynamically-changing information sets to which it reacts in a specific way.
With sufficiently powerful query facilities, the room_listener agent might have been reduced to providing avatar information only, and the avatar_listener glue agent may have been done away with altogether. Also, some rather ugly pieces of code (such as karin, line 13-17) could be greatly simplified.
We haven't actually shown that the system is suitable for more than prototyping. At the least, if the model should be usable for `mature' or large-scale systems, the efficiency of the current implementation should be worked on. In particular, the current `central server' implementation should be replaced by a more distributed one.
A Specification Technique for Building Interface Agents
in a Web Environment
This document was generated using the LaTeX2HTML translator Version 96.1 (Feb 5, 1996) Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
The command line arguments were:
latex2html -split 0 -show_section_numbers twlt17-schooten.
The translation was initiated by Boris van Schooten on Tue Oct 24 12:50:34 MET DST 2000