Notes on domain and possible questions

Latest update: 8 dec 2004

We assume the medical QA domain. If we look at the tags provided by ROLAQUAD, we may conclude that only a limited number of concepts are involved here. The concepts include those that may be little used for the RSI domain, and seem useful for the general medical domain.

Overview of domain tags

The main concept is aandoening (disease). Most other concepts and relations are secondary to this concept. We have:

aandoening (disease)
Symptoom
Diagnosemethode
Persoon (who can get the disease, who can treat it)
Persoonseigenschap
Lichaamsdeel of stof
Lichaamsfunctie
Behandeling
Advies (prevention)
micro-organisme
duur/tijdstip/periode (any time-related info)

Relations
veroorzaakt (factors that cause a disease)
voorkomt (prevention)
behandelt
diagnosticeert (diagnosemethode)
komt voor bij (may be related to persoon / persoonseigenschap but also to any other circumstance, in which case it is similar to veroorzaakt)
synoniem van
is een soort (subclass?)
is een symptoom van
is een bijwerking van (treatment side effect)
lijkt op (similar looking diseases)
wordt overgedragen door (info about disease carrier or genetic cause)
is eigenschap van (?)
is definitie van (?)

Possible questions and the questions corpus

If the domain corpus is sufficiently annotated and the qa systems sufficiently powerful, a great number of questions can be imagined.

We already obtained a question corpus by asking people to submit questions. The corpus, one file / The corpus, multiple files sorted by author and type.

Most questions would center around a specific disease and request info related to it, but others are possible. I try to give a classification here:

single disease (or disease class) centered:
(1) What properties has [disease Y]?
(2) What disease has [property X]?

Some may be complex, for example by relating multiple concepts or properties to a disease simultaneously. Ex.
How do I treat [disease x]?
Give me a list of diseases related to [property x]; or diagnosable by [method y] (note: explicitly concerns a list of diseases)
Does [activity Y] cause RSI?
Am I particularly vulnerable to getting RSI?
Hoe voorkom je dat RSI chronisch wordt?
Welke tips zijn er voor scholen om RSI bij kinderen te voorkomen?

Multiple disease centered:
(3) How is [disease x] related to [disease y]?
Ex.
Wat is het verschil tussen specfieke en a-specifiek RSI aandoeningen?
Wat is het verschil tussen statische en dynamische RSI?
Wat is het verschil tussen RSI en reuma?
is RSI hetzelfde als een muisarm?

Others:
(4) What properties has [entity x]? (asking info about something that is not a disease)

Ex.
Wat is pauzeersoftware?
Wat zijn de ervaringen met programma's die om de zoveel tijd waarschuwen dat je een pauze moet nemen?
Wat is de letterlijke betekenis van RSI?
Wat is de Nederlandse vertaling van RSI?
Bij welke symptomen moet ik een arts consulteren?
Wat moet ik doen als ik [symptoom x] heb? (noot: dit is iets voor een diagnosesysteem)
Wat zijn de bijwerkingen van [behandeling x]?

If we look at the 487 example questions submitted by the imix members, almost all of them are questions of type (1)! I've included the most important non-(1) questions in the examples above.

A dialogue manager would apparently get very far by using a slot-filling approach and targeting the disease under discussion as its highest priority (as a slot that is filled mandatorily), and with other info supplied and requested as related to this disease (as optional slots, that, together with the mandatory slot, define the request). This seems a practical solution, but certainly not a general one. However, if the domain results in no other kinds of questions being asked, then we might design a general dialogue manager, but we can only test it on what is in fact a degenerate domain! It would be hard to prove that our "general" dialogue manager is really general.

In practice, it is likely we will also have many questions of type (2). It is really a weakness of the supplied 487 questions that so few are of this type. Note that type (2) is more closely related to diagnosis-type questions.

The addition of support for type 3 and other questions would make life more difficult for the different parties.