The domain of any subject is vast and varied, a landscape stretching beyond the horizon with peaks and valleys of content and skills. Our students embark on a journey across this terrain. Our assessments are not merely tools to measure what our students can recall; they help us understand how they navigate, integrate, and apply the subject's vast topography.
If time were unlimited, we could spend almost as much time assessing a pupil’s understanding of our subject as we do teaching it to them. But as we discussed here, we are often afforded a short window of time to work out what pupils know (and don’t know). For this reason, we need sampling.
How do we choose the most representative sample of questions that will provide us with a true reflection of each student's capabilities?
By sampling from the domain thoughtfully, we aim to construct assessments that are not only practical in length but also rich in significance. Our goal is to craft a coherent picture of student capability within the domain. We want to ensure that each sample we select for our assessment is not just a discrete point of knowledge or skill but a reflection of the domain’s integrated whole.
Revisiting scope
In a previous post, we discussed the matter of an assessment’s scope: deciding what terrain an assessment will sample from. On the surface, this sounds like an easy task. We may want to test pupils’ understanding of the topic we have just taught, or of this year’s content perhaps. However, there are three complicating factors:
The mastery of some content is more indicative than other content of what it means to be good at a subject. For example, being good at algebra is a better sign of maths competency than being skilled in base number systems (at least the maths competency we expect for a post-1970s school pupil).
Knowledge-building in subjects is often cumulative to some extent, therefore an assessment will inevitably assess learning from before the time period in question. Prior learning will usually show up in assessment performance.
We value the retention of some content more than other content. For example, future learning may only be accessible if pupils have understood certain content or mastered certain skills. Success in the future will be contingent on being secure in these aspects of the curriculum.
For these reasons, we may choose to include some content as in scope for an assessment and reject other content.
The scope of an assessment must be decided before issues of sampling are considered. Once this domain is determined, the question then is whether to sample evenly from across the domain or to favour particular parts over others.
Choosing between a random and cluster sampling approach
Once teachers have decided what is in scope for an assessment, they face a pivotal decision: should they randomly sample knowledge from the domain or opt for cluster sampling? The former often looks as though we are scraping a thin sample across the knowledge domain whereas the latter looks more like a deep dive into one or two areas. The answer as to which is most appropriate, as always, will depend on the subject’s knowledge architecture and on how you hope to use the assessment.
Random sampling involves selecting elements from the knowledge domain such that each element has an equal probability of being chosen. This method promises a comprehensive, holistic and unbiased representation of the domain and it particularly useful where a broad understanding of strengths and weaknesses of the class is needed.
If you as the teacher randomly sampled, by picking assessment tasks out of a hat, it would minimise any bias you may have in topic selection that may unfairly favour some students over others. This may be particularly useful, for example, if you want to ensure fairness across classes where you have taught some but not others. However, this breadth of coverage might come at the cost of depth, potentially leading to a superficial understanding of each topic. This is more of a problem in knowledge domains where we need to set relatively few, longer assessment tasks to assess knowledge of more complex concepts.
Cluster sampling involves selecting specific clusters or sections of the knowledge domain for in-depth analysis. This approach enables a detailed exploration of certain areas, which may be essential where we need to introduce context specific prompts to inform the assessment or where the knowledge architecture best suits extended time tasks such as essays.
Clearly, whilst this approach is optimal in many circumstances, it does affect the validity of the assessment. By purposefully selecting certain clusters of the knowledge domain over others, there is a potential for bias where students with particular strengths are unfairly disadvantaged and their performance on the assessment does not reflect the overall strength of their attainment across the assessment’s scope. In general, it makes assessing the validity of inference more challenging. If we set students an end of Year 8 history assessment and ask them to revise the entire curriculum, and yet in the exam we pick just one topic, to what extent can we legitimately infer how well they are likely to know history across the parts of the curriculum we did not assess? In assessment, we are usually trying to generalise about performance beyond the tasks we set, but reliability is likely to diminish the more we reach beyond the domain of tasks assessed.
While each method has its merits, combining them can be advantageous. A strategy that employs initial random sampling to ensure broad coverage, followed by cluster sampling for an in-depth exploration of certain areas, can provide a well-rounded assessment approach.
In this post, we have considered two important factors in assessment design: scope and sampling. Deciding what is in scope for an assessment is to make value judgements about the relative importance of subject content. Deciding how to sample from this domain is about validity of the inferences you will later make about pupils’ attainment.
There is one more decision to make, which is what type of questions and tasks will serve your purposes best. We will turn to this question in a future post.
If a test is just to be used within a school, and students have access to past papers and use this to revise, I believe there is another important aspect that you haven't mentioned (I think). I believe it is preferable to ensure that the syllabus from which the tests sample is fully covered over the course of tests over multiple years. This is because the content of past assessments influences students (and teachers) perception of what is within the syllabus and their future behaviour. Therefore this approach discourages gambling on only revising certain elements within the syllabus, as well as ensures that past papers comprehensively cover the syllabus and are more effective as a future exam preparation resource. Obviously over many years this would presumably be more or less okay with random sampling anyway, but I think it is worth being purposeful about.