Rough notes on QA dialogue and dialogue act types.

Latest update: 2 nov 2004

User model. We start with a short account of the model that the system might maintain of the user. The user's task in our domain is finding the answer to a certain question. There will usually be another intention behind that of finding the answer, but we will not concern us with that now. The dialogue between system and user will be asymmetrical. Some dialogue acts will be typical for the user, others for the system. The system might estimate characteristics of the user: such as experience with the system (extra explanation and more verbose answers for beginners) and experience with the domain (we might model this as a set of questions of which the user already knows the answer).

Cognitive model. Also important is the model that the user maintains of the system. While IMIX more or less assumes that humans will talk to the system as they would to another human, it is likely they will behave differently from the outset, and adapt to the system's behaviour. At the social level, users are likely drop politeness forms and unimportant words, and revert to an ellipical form of language sooner or later. At the pragmatic level, the user may look at the system as human-like, answering questions accurately and cooperatively, or as a search engine, providing one or more "hits" to the user's query. The user may adapt his/her strategies accordingly.

Domain model. QA may initially be seen as a regular search engine, which will produce a flat set of answers for each question. So, each question will have a certain list of N answers. When we add more semantics, the information inside the QA system may be seen as a compositional (or otherwise) structure of information. A question may then be seen as having as its answer a compositional hierarchy of "information units" or a heterarchy of related information units. Each question may be answered by a primary information unit, followed by links to or embeddings of related information units.

QA dialogue strategies. For QA, we may distinguish data driven dialogue and dialogue act based dialogue. Data driven dialogue generates dialogue acts by looking at the data that is still required for the system to give an answer. Data may be missing from both the question and the answer, and the dialogue continues until all missing data is gathered. A simple and common example is slot filling. Data driven systems do not need to have a concept of dialogue history or dialogue state. Dialogue act based systems, on the other hand, look at the nature of previous dialogue acts. Typically, they use a classification of dialogue act types. Both strategies may be combined.

Modalities. Each dialogue act may be coded in one or more modalities. Typical is speech or text. New here is the user's pen input.

Pen + speech: if the user uses both pen and speech in one act, it is very likely the pen is subordinate to the speech, and certainly not independent. This means the dialogue act type will be mostly indicated by the speech act. The pen action may be redundant, or the speech contains missing information that can be reconstructed using the pen action.

Pen actions: Pen actions without speech will indicate a much more limited set of dialogue act types. For simple search engine type QA, simply pointing at a word is likely to indicate "do a search on this word". For pictures, pointing to a specific part of a picture may indicate "what does this mean?" or "give me more information on this". Additionally, pen actions may well be answers to a system question, and the computer may even expect the user to answer by pen, for example, by presenting a table of choices (a la GUI).

An overview of possible dialogue acts. This list is meant to give a broad view on possible dialogue acts that may happen. We cannot support all of these, but they should be accounted for (i.e. we must choose whether or not to support them, or whether to present an intelligible error message when the user tries to do things that the system cannot understand).

Note that the user may pose different kind of queries with the goal of finding information. Not all queries can be naturally formulated as questions. For example: "give me a list of common computer rsis". It is not clear what classes of queries exactly we are talking about.

Overview of clarification dialogues

A number of topics:

We propose a classification of clarification dialogues, based on the understanding level at which they take place, and the kinds of problems they try to solve. We distinguish the following classes:

Search Task

We may view the QA process as a shared task. If we allow more than just one question and one answer, defining an underlying task is a meaningful way to look at the dialogue.

User information need

When the user poses a follow up question, this may be viewed as a step in the user's private task of fulfilling a specific information need. That is, underlying user questions are intentions which may be hidden, and which may help if known by the system (assuming it is intelligent enough, which it usually isn't). When designing a system, we must at least account for this information need.

Suppose the user asks: "do you risk RSI doing <task>?". Suppose the answer depends on age: older people run a greater risk. Then, the system may ask a clarification question: "are we talking about someone aged more than 40?". A human would likely assume that the user is talking about his own situation. Then he could ask instead: "what's your age?". Or perhaps the user's age is already known, and the system may answer promptly (providing we're using implicit verification on age or individual). If it turns out the user was trying to find out information about someone else, he can always reply with a pragmatic repair: "no it's not for me but for someone else".

user/system shared search process

The system too has its agenda to complete. Finding an answer requires working with the underlying qa systems and database. Some questions are unanswerable: they may have many answers, no answers, or they may be ambiguous. For each case, there are several possible actions, depending on various parameters of both system and user. Consider the following example plan:

if multiple_answers:
    if too_many_answers:
        if answer_depends_on_parameter:
            ask_user_parameter_value
        elseif answers_can_be_summarised:
            let_user_make_choice
        else:
            prompt_too_many_answers
    else:
        output_answers_in_table
if single_answer:
    output_answer
if qa_reported_ambiguous_question:
    let_user_make_choice
if qa_reported_unanswerable_question:
    prompt_unanswerable_question
if no_answer:
    if known_likely_related_topics:
        let_user_make_choice
    else:
        prompt_no_answer
This plan incorporates feedback from the underlying system as well as perceived user intention. We may extend such a plan with other user intention tracking mechanisms. In particular, the dialogue history may provide information about what a user wants. The most obvious dialogue history reference is a follow-up question, which is often an elliptical question which has to be augmented with information from the previous question or answer. Various other dialogue history based communication is possible: for example the user may explicitly refer to a previous dialogue act, trying to take the system back to a previous search operation. We may call this dialogue history browsing.