Test Theory
Graham Tall research@grahamtall.com
September 2003
Concepts
I. Samples and Populations. In statistical research a population must, if it is not self-evident, be clearly defined. In a small-scale piece of research, involving a fair sample of children in a school, the population will be the children in that school - generalisations beyond that, to all children in all schools are, quite simply, impossible. In educational research generalisations made must always match the sample on which the data was drawn.
Thus, if one is to discover the views of Year 10 pupils in a multi-ethnic, coeducational comprehensive school the sample must include the correct proportions of:
different ability groups
male and female members,
and of,
each ethnic group.
in order to ensure that any generalisations .are fair.
II. Hypothesis and Null Hypothesis (NoH)
Hypotheses are simply ideas, ideas such as:
MEd/ MPhil & PhD students are worried about using statistics in their research
Generally speaking girls do better in modern languages than boys.
The problem in assessing the validity of such hypotheses is that they beg the question how worried, how much better. In statistics it is much easier to test the nul hypothesis that there is no difference.
The null hypothesis is often abbreviated to: NoH.
III. Levels of Significance
Verbal Description |
Probability |
Meaning |
Not Significant: |
N.S. |
This result could have been obtained by chance |
Significant: (Sig.) |
.05 or 5% |
Less than one chance in 20 of the data being obtained by chance. |
Highly Significant: (H.S.) |
.01 or 1% |
Less than one chance in 100 of the data being obtained by chance |
Very Highly Significant (V.H.S.) |
.001 or .1% |
Less than one chance in 1000 of the data being obtained by chance |
The level of confidence of a researcher, in rejecting a null hypothesis at the 1% and .1%. levels, is much greater than if the result is significant at only the 5% level. In that sense the % levels act as a ruler, the higher the level reported (the smaller the %) the more convincing the findings (see types of error below).
IV. TYPES OF ERROR.
Tests cannot prove that something could not have been due to chance factors; they can only tell us the likelihood of getting such a result by chance. Hence a null hypothesis may be accepted, or rejected, in error.
Lack of statistical awareness sometimes results in experiments being designed where it is impractical to expect, indeed it may be impossible to achieve, statistically significant differences
How small? If the numbers in each group are in single figures, the chances of getting a statistically significant result with simpler tests, like Chi square and Analysis of Variance, is very small. When the observed data suggests that there is a difference, but the test result is Not Significant, then the result might best be described using the Scottish verdict of not proven. In such circumstances use a more sophisticated test.
V. ONE AND TWO-TAILED TESTS. Whilst some tests are labelled as one or two tailed, the crucial difference is purely in how they organise the area of probability (i.e. 5%, 1% or .1%). In a two-tailed test, the null hypothesis is simply that there is no difference, say, between the boys and girls achievements in modern languages. In a one-tailed test, the underlying assumption differs and it is assumed that, say, girls will achieve better results! If you can predict the result then use of a one-tailed test increases the likelihood of obtaining a statistically significant result. You must however, ignore contrary results as artefacts.


Home Page Research
Introduction Quantitative Advice Index Statistical
Tests
Research and
Statistics Courses