This web page is a companion to our papers
"On Hypothesis Testing for Statistical Model Checking",
by Daniël Reijsbergen, Pieter-Tjerk de Boer, Werner Scheinhardt, and Boudewijn Haverkort,
in the STTT journal (August 2015),
and
"Interactive comparison of hypothesis tests for statistical model checking",
by Pieter-Tjerk de Boer, Daniël Reijsbergen, and Werner Scheinhardt,
at the Valuetools conference (December 2015).
It is a "living document" that gets enhanced with new features and/or tests as they are developed.
Below the figure and its controls, we
describe the meaning of the figure.
For more details, we refer to the papers;
the STTT paper gives all the mathematical details, while the Valuetools paper gives a gentle
introduction and describes this website.
allowed error probability: | α=β= | |
indifference level: | δ=ζ= | |
guess for p-p₀: | γ= | |
decision threshold: | p₀= | |
N-range: | ||
Z-range: |
show class-I tests: risking wrong conclusion if p close to p₀
|
show class-II tests: risking no conclusion if p close to p₀
|
show class-III tests: risking long calculation if p close to p₀
|
show "expected" lines |
Real p (for simulation): |
Example:
Suppose our threshold is 0.6 (i.e., 60%).
Let's run our simulator for 1000 experiments,
and let's assume it returns 1 in 800 of those 1000 experiments.
Intuitively, that suggests the probability of interest is about 80%, which is well above our
threshold of 60%.
But how sure can we be of this?
That is what this webpage and our papers are about: techniques for drawing statistically
sound conclusions, like the following: "Based on our experiments, we can be 95% sure
that the probability of this behaviour is above the threshold of 60%".
More precisely:
We do N experiments, and call their output (0 or 1) Xi for i=1...N.
We call the threshold p0, and
define Z=X1+...+XN – N p0.
Clearly, Z>0 hints that the true probability p is above p0,
and the other way around.
Then the question is how large to choose N, and for what values of Z we can conclude
with sufficient confidence that p>p0 or p<p0.
A set of rules for drawing such conclusions from N and Z is called a "statistical hypothesis test",
and our work consists of comparing several such tests for this class of problems.
Each of the coloured lines in the figure represents one such test. Experiments consist of repeatedly drawing samples and recording the result, thus following some path in the N,Z-plane that the figure represents. As soon as this path crosses the coloured line of the chosen test, a conclusion may be drawn. In principle, each test comes with a guarantee that its probability of drawing a wrong conclusion is below some (user-chosen) value, called alpha or beta in the figure.
Intuitively, drawing a conclusion is harder when the true p is closer to the threshold p0; more samples are needed in such cases to still draw a conclusion. O.t.o.h., if the real p is indeed very close to the threshold, then perhaps the user does not really care whether the conclusion is correct. Thus, tests can be sorted into three classes, based on their behaviour when p gets close to p0: