Annotation manual for IMIX dialogue act classes

Version: 16 feb 2006

The IMIX dialogue acts are classified into 13 classes. The first 7 are variants of regular followup questions: these can be potentially answered by a QA. The difference is the way they are to be handled by the system. The remaining 6 classes are other kinds of followup utterances.

Some of the followup utterances are made up out of multiple sentences. In some cases, they should be considered as a unit (such as a multi-sentence followup question or a multi-sentence remark). In other cases, only the most important sentence should be considered (such as when that sentence is an appropriate rewritable followup question).

The followup question classes

The 7 followup question classes are mostly classified according to how they may be rewritten into a self-contained question, or whether they can't or shouldn't be rewritten, or whether they should be handled in a different manner.

The classes "anaphor", "anaphor-pp", "other-pp", and "elliptic" are potentially rewritable using simple rewriting techniques. The class "self-contained" indicates that the question does not need rewriting; the class "referencing-other" indicates that the question may be rewritable, but cannot be rewritten using any of the simple techniques we identify.

The class "missing-referent" refers to the user requesting information about something that appears to be missing in the answer text fragment.

anaphor

Summary: Sentence can be rewritten by substituting each anaphor in the sentence by a noun phrase (NP, zelfstandige naamwoordsgroep) antecedent from a previous utterance.

Detail:

Examples:

U1:Waar staat de term grondstofwisseling voor?
S1:De minimale stofwisselingssnelheid, nodig om het organisme normaal te laten functioneren, heet grondstofwisseling.
U2:Hoe meet je dat?
(rewrites to: Hoe meet je grondstofwisseling?)

U1:Wat is een cytostaticum?
S1:Een Cytostaticum (meervoud cytostatica) is een lid van een groep medicijnen die gebruikt worden bij de behandeling van kanker. Een cytostaticum beoogt de deling van cellen te stoppen.
U2:Wordt dit medicijn voor alle kankersoorten gebruikt?
(rewrites to: Worden cytostatica voor alle kankersoorten gebruikt?)

(note the question is matched with the number characteristic of "cytostatica").

anaphor-pp

Summary: The sentence contains a "pp anaphor", which is an anaphor standing for a prepositional phrase (PP, voorzetselvoorwerp). The pp anaphor consists of a preposition (voorzetsel) and an "er" particle ("er", "hier", "daar"). This pp anaphor should be rewritable by removing it, and attaching a pp at the end of the sentence. The pp consists of the preposition as found in the pp anaphor, followed by an appropriate NP from previous utterances.

Detail:

Examples:

U1:Waardoor ontstaat het syndroom van Frei?
S1:Het syndroom ontstaat wanneer de zenuwen die naar de speekselklier lopen beschadigd raken, waarna uitlopers van deze zenuw in de huid groeien en daar de zweetklieren bereiken.
U2:Hoe kun je hiervan genezen?
(rewrites to: Hoe kun je genezen van het syndroom van Frei?)

other-pp

Summary: The sentence does not contain an explicit anaphoric phrase, but can be rewritten to a self-contained sentence by attaching a pp (prepositional phrase, voorzetselvoorwerp) at the end. The pp is constructed by taking an appropriate phrase (either a pp or np) from previous utterances. (note, such an implicit anaphor is also known as "zero anaphor")

Detail:

Examples:

U1:Wat zijn de symptomen van een hersentumor ?
S1:De klachten zijn hoofdpijn en vooral 's morgens braken , zonder misselijkheid .
U2:En wat zijn de genezingskansen?
(rewrites to: Wat zijn de genezingskansen van een hersentumor?)

U1:Is schizofrenie erfelijk?
S1:De reactieve schizofrenie is een reactie op moeilijke levensomstandigheden en duurt in de regel ook korter (tot enkele maanden).
U2:welke andere vormen van schizofrenie zijn er dan ?
(rewrites to: Welke vormen van schizofrenie zijn er behalve reactieve schizofrenie?

elliptic

Summary:The question is an elliptic sentence (a sentence without a verb) which can be completed to form a self-contained question by adding constituents from a specific sentence in one of the previous utterances. The constituents will comprise of one or more verbs, and one or more other appropriate phrases.

Detail:

Examples:

U1:Welke behandeling wordt geadviseerd bij een pianoarm?
S1:Rust, eventueel met behulp van een spalk.
U2:geen fysiotherapie ?
(rewrites to: Wordt er geen fysiotherapie geadviseerd bij een pianoarm?)

U1:Hoeveel negroïde mannen hebben een tekort aan Glucose-6-fosfaat-dehydrogenase?
S1:Ongeveer 10%
U2:En hoeveel blanke mannen?
(rewrites to: Hoeveel blanke mannen hebben een tekort aan Glucose-6-fosfaat-dehydrogenase?)

referencing-other

Referencing-other is a non self contained followup question that does not fall into one of the rewriting classes anaphor, anaphor-pp, other-pp, elliptic, and is not of type missing-referent. In other words, it's a regular non self contained followup question without a known method to rewrite it automatically.

missing-referent

The utterance is a question that asks for information that s/he thinks is missing in the discourse of the answer. The question can be a request for a missing antecedent (there is an anaphor but it refers to a nonexistent antecedent), or another request for missing information. This may be an indication that the QA selected a text fragment that is not self-contained. It may be seen as a request to show more of the document fragment that the answer came from, or a request for an alternative, more self-contained answer.

A question requesting a missing antecedent is typically not in a regular question form, but in a statement form. It has "wat" at the place of the missing referent. For example: "Hartkloppingen zijn de symptomen van wat?".

Often, there are no clear syntactic cues that indicate that the user's request is about something discourse-specific. Then, the missing-referent class can only be determined by looking at the semantics of the user utterance. The following is a sufficient condition for missing-referent: a question is of class missing-referent if there is no way it can be practically rewritten so that it becomes a self-contained question.

Examples:

U1:wat zijn hartkloppingen?
S1:De patiënt neemt dit waar als hartkloppingen.
U2:Hartkloppingen zijn de symptomen van wat?

U1:Hoe lang moet ik een RSI oefening tenminste laten duren ?
S1:tenminste één minuut
U2:een minuut per wat ?

U1:Hoe wordt een schimmelinfectie vastgesteld?
S1:Artsen kunnen een schimmelinfectie vermoeden in geval van een rode, geïrriteerde uitslag in een van de gebieden die meestal worden aangetast.
U2:Maar het is dus niet zeker of er dan sprake is van een schimmelinfectie?

self-contained

A follow-up question that does not need rewriting and can be interpreted as a self-contained QA question. In some cases, a removal of "de" or a replacement of "die" by "de", or some such minor rewriting, would improve the sentence, but this should not be necessary to improve the sentence as an understandable, answerable sentence.

The other utterance classes

verify-question

A question that is meant to verify the meaning or validity of the answer. They are a signal that the user does not trust or understand the answer. They are questions that look similar to follow-up questions. However, they cannot be meaningfully interpreted as QA questions, as QA systems do not understand the concept of checking the validity of something. The questions often start with "dus".

Examples:

U1:Is snurken slecht voor de gezondheid?
S1:Tot slot kunnen het zachte gehemelte en de huig van nature al lang en slap zijn. Vaak echter wordt voor het snurken echter geen oorzaak gevonden.
U2:Dus het antwoord is "nee"?

negative-question

This is an utterance which indicates that the answer is bad: incorrect or not relevant to the question. Negative-question usually has the form of a domain question or domain query in non-question form). Usually, it comes in the form of a rhetorical question challenging the validity of the answer, or is a literal repetition of (the content of) the original question. In the case of the rhetorical form, the question looks like a verify-question. The difference can be made out by looking at the semantics of the question and the last answer.

Examples:

U1:Is snurken slecht voor de gezondheid?
S1:Tot slot kunnen het zachte gehemelte en de huig van nature al lang en slap zijn. Vaak echter wordt voor het snurken echter geen oorzaak gevonden.
U2:Maar is het nou slecht voor de gezondheid of niet?

negative-statement

An utterance indicating that the answer is bad, that is not a negative-question. Usually, it is a basic negative remark (such as "dat vroeg ik niet"), or an attempt at correction of the system's answer or interpretation.

reformulation

A reformulation of the original user question, in an attempt to get a better answer. Reformulations may either be essentially the same question in a different form, or a different question with a meaning that mostly overlaps that of the original question. If the utterance mirrors the question exactly (but usually starting with "maar" or some other negative conjunction word) then it is usually a negative-question.

acknowledge

Any form of acknowledgement that indicates that the answer is valid and the user understands it, or the user accepts that there is no answer. It is usually a simple "ok" or "bedankt", or it may be a remark that the answer is unexpected or interesting.

other

Anything that could not be classified as one of the above. You will find a number of "meta" or "out of domain" type questions in this category. For example: "heb je daar een voorbeeld van?", "uit welke bron komt deze informatie?".

Decision tree

The decision tree below illustrates schematically how to distinguish between the classes.

Using the excel sheet

Each annotation sheet contains 191-192 dialogues of three utterances. Each followup utterance (marked with FUP) has to be annotated with a class. There is a drop down menu containing the classes in the cell left of each followup utterance.

You may want to write down a rewritten followup question to see if it is properly rewritable. This can be done in the cell below the actual followup question.

You may want to write down comments regarding your annotation choice. The column right of the utterances can be used for comments.