User model. We start with a short account of the model that the system might maintain of the user. The user's task in our domain is finding the answer to a certain question. There will usually be another intention behind that of finding the answer, but we will not concern us with that now. The dialogue between system and user will be asymmetrical. Some dialogue acts will be typical for the user, others for the system. The system might estimate characteristics of the user: such as experience with the system (extra explanation and more verbose answers for beginners) and experience with the domain (we might model this as a set of questions of which the user already knows the answer).
Cognitive model. Also important is the model that the user maintains of the system. While IMIX more or less assumes that humans will talk to the system as they would to another human, it is likely they will behave differently from the outset, and adapt to the system's behaviour. At the social level, users are likely drop politeness forms and unimportant words, and revert to an ellipical form of language sooner or later. At the pragmatic level, the user may look at the system as human-like, answering questions accurately and cooperatively, or as a search engine, providing one or more "hits" to the user's query. The user may adapt his/her strategies accordingly.
Domain model. QA may initially be seen as a regular search engine, which will produce a flat set of answers for each question. So, each question will have a certain list of N answers. When we add more semantics, the information inside the QA system may be seen as a compositional (or otherwise) structure of information. A question may then be seen as having as its answer a compositional hierarchy of "information units" or a heterarchy of related information units. Each question may be answered by a primary information unit, followed by links to or embeddings of related information units.
QA dialogue strategies. For QA, we may distinguish data driven dialogue and dialogue act based dialogue. Data driven dialogue generates dialogue acts by looking at the data that is still required for the system to give an answer. Data may be missing from both the question and the answer, and the dialogue continues until all missing data is gathered. A simple and common example is slot filling. Data driven systems do not need to have a concept of dialogue history or dialogue state. Dialogue act based systems, on the other hand, look at the nature of previous dialogue acts. Typically, they use a classification of dialogue act types. Both strategies may be combined.
Modalities. Each dialogue act may be coded in one or more modalities. Typical is speech or text. New here is the user's pen input.
Pen + speech: if the user uses both pen and speech in one act, it is very likely the pen is subordinate to the speech, and certainly not independent. This means the dialogue act type will be mostly indicated by the speech act. The pen action may be redundant, or the speech contains missing information that can be reconstructed using the pen action.
Pen actions: Pen actions without speech will indicate a much more limited set of dialogue act types. For simple search engine type QA, simply pointing at a word is likely to indicate "do a search on this word". For pictures, pointing to a specific part of a picture may indicate "what does this mean?" or "give me more information on this". Additionally, pen actions may well be answers to a system question, and the computer may even expect the user to answer by pen, for example, by presenting a table of choices (a la GUI).
An overview of possible dialogue acts. This list is meant to give a broad view on possible dialogue acts that may happen. We cannot support all of these, but they should be accounted for (i.e. we must choose whether or not to support them, or whether to present an intelligible error message when the user tries to do things that the system cannot understand).
Note that the user may pose different kind of queries with the goal of finding information. Not all queries can be naturally formulated as questions. For example: "give me a list of common computer rsis". It is not clear what classes of queries exactly we are talking about.
Questions may be quite complex. The kinds of questions supported depend on
the QA systems. Some of the more complex questions are questions with
presuppositions and conditional questions. Presupposition questions assume
something, which may not be true. Ex. "who is the king of the U.S.", while
the U.S. (formally) has no such thing as a king. Conditional questions are
questions with a format like "What about
A dialogue manager may respond meaningfully to such questions if the QA system
can handle them.
Subclasses: "let's start all over again", "take me out of this
subdialogue", "take me to the previous question"
The dialogue control acts that a user is likely to use is related to
expectations towards the system. For example, if the system is presented as a
web browser, explicit "back/forward" type dialogue control is more likely.
Subclasses: meta-questions about the functioning of the system: "What's
this?", "what am I supposed to do?", "what
does this button mean?", Out-of-model-boundary: "do you know something about
<non-domain subject>", Out-of-context:
"where can i find your maker?", Social: "hello how are you", "thanks".
Subclasses: repeating information: "please speak again/play this video again",
clarification of content of images: "what does this red thing mean?", request
of a certain kind of presentation: "could you use more detail?", "do you have
a picture of that?", "please zoom in here", "could you give me a list of ..."
Subclasses: "What does X ... uh ... Y mean?", bad pen action followed by a
better pen action, "no no I meant ...".
Subclasses: Shifting the question because the answer is unsatisfactory (note:
this may imply in many cases the system didn't really understand the
question. Asking a new question as a result of information gathered by the
answer (a follow-up question), usually by referring to the answer. Ask a new,
unrelated, question. Narrow the search because there is too
much information.
Subclasses: No information or no response at all, speech is (partially)
unintelligible, pen input is unintelligible, cannot find relation between pen
input and speech, user utterance is ambiguous.
If we have no answers or too many answers, the system will need to report this
and, if possible, start a dialogue in which the user can change the question.
If nothing is known about the user's underlying intentions, we are left only
with giving helpful feedback concerning the answer set, amounting to things
like "there are N answers, [summary of answers]" and "a related question, X,
has M answers".
Subclasses: no answers, too many answers.
Overview of clarification dialogues
A number of topics:
Suppose the user asks: "do you risk RSI doing <task>?". Suppose the answer depends on age: older people run a greater risk. Then, the system may ask a clarification question: "are we talking about someone aged more than 40?". A human would likely assume that the user is talking about his own situation. Then he could ask instead: "what's your age?". Or perhaps the user's age is already known, and the system may answer promptly (providing we're using implicit verification on age or individual). If it turns out the user was trying to find out information about someone else, he can always reply with a pragmatic repair: "no it's not for me but for someone else".
if multiple_answers:
if too_many_answers:
if answer_depends_on_parameter:
ask_user_parameter_value
elseif answers_can_be_summarised:
let_user_make_choice
else:
prompt_too_many_answers
else:
output_answers_in_table
if single_answer:
output_answer
if qa_reported_ambiguous_question:
let_user_make_choice
if qa_reported_unanswerable_question:
prompt_unanswerable_question
if no_answer:
if known_likely_related_topics:
let_user_make_choice
else:
prompt_no_answer
This plan incorporates feedback from the underlying system as well as
perceived user intention. We may extend such a plan with other user intention
tracking mechanisms. In particular, the dialogue history may provide
information about what a user wants. The most obvious dialogue history
reference is a follow-up question, which is often an elliptical question which
has to be augmented with information from the previous question or answer.
Various other dialogue history based communication is possible: for example
the user may explicitly refer to a previous dialogue act, trying to take the
system back to a previous search operation. We may call this dialogue
history browsing.