| Submission date | Author | #dialogues/#utterances |
| - | Boris van Schooten | 7/54 |
| 20 jan 2005 | Jori Mur | 5/26 |
| 20 jan 2005 | Piroska Lendvai | 4/31 |
| 21 jan 2005 | Lonneke van der Plas | 5/20 |
| 21 jan 2005 | Simon Keizer | 6/53 |
| 21 jan 2005 | Erwin Marsi & Emiel Krahmer | 7/73 |
| 23 jan 2005 | Mariët Theune | 6/49 |
| 31 jan 2005 | Johan de Veth & Els den Os | 3/24 |
| 11 feb 2005 | Erik Tjong Kim Sang | 6/61 |
| 23 feb 2005 | Rieks op den Akker | 2/46 |
| Totaal | 55/391 |
Below you find the original call for dialogues and specification of notation format.
Submissions should be sent to: schooten@cs.utwente.nl. Extended deadline: 21 february 2005
If you like, you can specify speech by an "s" and text by a "t" after the agent's name, like this: "Us:", "Ss:", "Ut:", "St:", "Sst:". Otherwise, the choice of speech or text is left open.
Actions in other modalities than speech or text can be denoted by descriptions in square brackets. You can denote the precise timings of the nonlinguistic actions by inserting them at the appropriate place in the linguistic utterance. If you do not want to indicate precise timings, put the nonlinguistic actions on a separate line.
A dialogue should end with the word "end" specified on a separate line.
Comments may be added between /* and */ symbols, or between the // symbol and the end of a line (i.e. C++-style comments).
Special events in an utterance (sounds etc.) may be denoted between curly braces { and }.
Example:
/* multimodaal voorbeelddialoog */ U: Wat is een goede oefening tegen een muisarm? // variant op corpusvraag S: Een belangrijke oefening is de strekoefening, aangegeven in figuur 1. [plaatje van naar voren gestrekte armen] U: moet ik [wijst op arm in plaatje] die nu naar voren houden? S: Ja, houd de armen naar voren en strek ze volledig. end
The IMIX demonstrator is a multimodal dialogue system which answers questions in the RSI or medical domain using a question answering (QA) engine. We produced a first version of IMIX which answers isolated spoken or typed questions from the user with text, speech, and pictures. We already produced a corpus of 487 example questions in the RSI domain (and selected 10 of them later), in order to test the first version of the demonstrator.
In a similar vein, we now wish to produce a number of example dialogues for the second version of IMIX which will incorporate a dialogue manager. The corpus may be used to provide a coverage of the range of possibilities and abilities for an IMIX-like dialogue system, and informal input and test input for the second demonstrator (and the dialogue manager in particular).
Multimodal dialogue in the next IMIX version means the system can output speech, text, and pictures (diagrams, photos, medical images) and the user can output speech, type text, and point at the screen or encircle an object, a part of an object, or a word on the screen with a pen. The user can point and speak simultaneously. The user may also combine pointing with text typing (though this will not be simultaneous).
IMIX currently uses GUI buttons to start the user's turn and go on to the next question. We assume that the dialogue of future versions will be less GUI-controlled, for a more smoothly flowing dialogue. At the least, the "New question"/"Stop" buttons will be removed. We also assume, however, that dialogue turns cannot be interrupted (i.e. barge in), unless by a crude method, such as pressing a button. In this corpus we will not account for barge in.
Important differences between user speech and user text are the limited performance of ASR technology and the extra (prosodic and perhaps other) information conveyed through speech. Important differences between system speech and system text are the ability to add prosody in speech, perceptual persistence of text, the ability for text to be visually related to pictures, the ability for the user to point at a word or piece of text. In some cases, the system may produce lists (for example, lists of answers or options), which may be presented in table format. The user may select an item naturally by pointing at it.
The level of complexity of the dialogues should be realistic as to what a dialogue system might produce, but need not be limited to what will be supported by IMIX. Even if some kinds of dialogue moves will never be supported, the occurrence of certain problems and the possibility of the user trying to initiate certain kinds of dialogue still have to be accounted for.
We assume here that the system will react intelligently (like a "wizard of oz"). We assume that misunderstanding of both system and user will be limited to a level similar to human-human level. We will assume that the failure of certain subsystems that may be obviously expected (such as failed speech recognition) will not have to be covered extensively in our corpus here, since these are non-IMIX-specific problems, often with standard solutions. Nevertheless, if you have ideas about specific failures or interactions of failures that are of particular interest here, you are invited to illustrate them.
User utterances.
The dialogue may start with a question out of the example question list. Other kinds of questions are most certainly welcome, enriching the corpus with both different classes of questions, and questions which will cause problems which a dialogue will need to solve. After the "first question", users may for example pose follow-up or clarification questions. These stand respectively for new questions referring to a system answer, and requests regarding a further zooming in on terms or topics in a system answer.
Next to questions, user may perform various other dialogue acts, such as: reacting to system initiative, indicating misunderstanding or a unsatisfactory answer, and counter-questions (answering question with question). Also possible are politeness utterances, such as "Ok, bedankt" and "Goedendag".
User utterances may both be terse telegram-style (as may be expected from routine users or users experienced with other dialogue systems) or more verbose with possible sidetracks (as may be expected from incidental or naive users).
System utterances.
How will the system be able to react? It may be helpful to classify some of the possible system utterances into different classes.
Further reading:
Note that an important part of multimodal dialogues are linguistic references to nonlinguistic modalities. We suggest the following examples:
S: [shows a list of terms] U: De tweede uit de lijst. S: [shows a picture] U: Wat betekent dat plaatje? U: [points at a picture of a body] S: Bedoel je de arm of de hand?We suggest the following example nonlinguistic actions:
U: [encircles or underlines the word "muisarm"] U: [encircles a part of a picture] U: [encircles multiple objects on the screen, one after the other] S: [shows a picture of for example a body part, a work setting] S: [shows a selectable list] S: [highlights a picture or a certain part of a picture]
Further reading:
In the current IMIX, an answer is usually a single sentence. A picture is added to this by Imogen, though there is no serious database of pictures at the moment. In future versions however, an answer may consist of several sentences, forming a more self-contained story. Appropriate pictures will be added where available from the QA corpus.
Boris van Schooten