Can We (and Should We) Write Difficult Multiple-Choice Questions?

Not all hard questions are worth asking

and

May 11, 2025

In our recent post on open-ended assessment tasks, we explored questions that invite varied, constructed responses. At the other end of the spectrum lie multiple-choice questions (MCQs) - tightly closed tasks with a correct answer from a fixed set.

MCQs have grown in popularity in English secondary schools, not least because they reduce marking time and offer scalable, standardised assessment. When well designed, they can give us precise insight into student knowledge.

But what happens when we try to make MCQs hard? Can we, and should we, use them to stretch students or distinguish high attainers?

This post explores that question. As with all assessment design, it comes down to cognition, curriculum, motivation and purpose. We’ll consider what makes an MCQ difficult, when difficulty is helpful, and when it undermines learning.

Discrimination by design

Multiple-choice questions (MCQs) function as difficulty models of assessment. Instead of placing students on a scale by evaluating the quality of their constructed responses, they present a series of hurdles, i.e. questions of varying difficulty, and record whether each is successfully cleared. Their power lies not in any single item, but in the combination of many. A well-constructed MCQ assessment includes some questions that almost all students will answer correctly, others that only the highest attainers will get right, and many that fall in between. This spread allows us to distinguish between students across the full range of attainment.

Including difficult MCQs that most students are likely to get wrong is usually necessary to distinguish performance at the top end of the distribution. There are good reasons to do this. One is motivation. In mixed-attainment classrooms, the most secure students often achieve full marks on standard assessments by design, even without revising at home. If our aim is to encourage effort from all students, this is far from ideal. It can breed complacency or send misleading signals about how the subject will develop in later stages.

We might also include difficult MCQs for diagnostic purposes. These might help the teacher or school make informed decisions not just about who has mastered the basics, but who can apply ideas in more abstract or unfamiliar contexts.

When difficulty goes wrong: Bad practice in MCQ design

If a multiple-choice question is difficult, we must ask: why is it difficult, and what does success or failure actually reveal? The strength of a well-designed MCQ lies in its ability to discriminate fairly, identifying who has grasped concepts we care about without introducing too much noise. When difficulty arises for the wrong reasons, the question becomes misleading.

One pitfall is including out-of-scope content, whether untaught material or peripheral topics not recently studied. Even well-meant ‘stretch’ items can feel arbitrary or punitive if the assessment’s scope hasn't been clearly signalled.

Poor cueing is also a problem. MCQs rely on cues - words, phrases, or diagrams - to direct students to the right domain of knowledge. Vague or ambiguous cues may confuse even well-prepared students, producing incorrect answers that reflect misinterpretation rather than lack of understanding.

Another source of unfair difficulty is limited access. If a question rewards knowledge only available to a subset of students, such as that from a school trip or enrichment activity, it measures prior exposure, not secure understanding.

Timing pressure is a common issue. We generally design assessments (the exception being those designed to measure fluency) so that all students have time to attempt all questions. If the ‘difficult’ question is at the end of an assessment with most running out of time, it may not discriminate well based on the concept being assessed. Equally, if the question demands extended reasoning or calculations, embedded within a relatively fast-paced MCQ test, students may rationally choose not to attempt it.

But not all flawed MCQs are too hard. Sometimes the chances of students guessing the right answer, without the underlying knowledge we mean to assess is very high. For example, with only three options, students have a 33% chance of guessing correctly. Using four or more options reduces this and strengthens reliability.

Implausible distractors in the form of wrong answers so obviously incorrect that they’re easily dismissed can also reduce the discrimination of difficult questions. In these cases, success requires only test-taking savvy, not actual knowledge. Effective distractors should reflect common misconceptions, tempting students who misunderstand, and allowing us to observe whether they can reason correctly.

In all these cases, the message is the same: difficulty alone isn’t useful. A valid MCQ is one where the right students get it right for the right reasons.

What does good difficulty look like in an MCQ?

If we accept that writing difficult multiple-choice questions is useful, then we must ask: what kind of difficulty is worth designing for? The answer is conceptual difficulty that demands more from a student’s understanding: connecting ideas, spotting subtle distinctions, or applying core knowledge in unfamiliar contexts. How we can achieve this rather depends on the knowledge domain of the subject we teach, but there are some common means to achieve this.

In some subjects, it is possible to write MCQs that require a level of precision in understanding a concept that may not be reached by all in the class, whether these assess subtle differences in closely related concepts or fine classifications. These are often the most valuable MCQs because they can be answered quickly without extensive processing.

Another means is through novel application of familiar ideas. The content has been explicitly taught, but the phrasing or context moves beyond rote recall. A science question might ask students to apply a well-understood principle in a new scenario; a grammar item might present a rule in an unusual construction. Strong students will spot the underlying idea despite surface variation; others may be misled. This approach works especially well in hierarchical subjects, where secure knowledge enables flexible application.

It may be appropriate to introduce complex ideas during teaching, with the understanding that not all students will fully grasp them by assessment time. These MCQs may stretch beyond the core curriculum and test whether students are beginning to internalise higher-level content and complex ideas. Particularly in subjects with cumulative knowledge progression, this can be a legitimate way to challenge and differentiate.

Finally, many of the most difficult MCQs involve some form of multi-stage reasoning - but this must still be achievable within a short time. Because each item yields only a right-or-wrong outcome, we need many questions, each cognitively efficient, to build a reliable assessment. This creates a dilemma. The most difficult tasks often ask students to integrate ideas or follow a chain of reasoning, thus revealing depth of understanding, but taking time. If one question takes two or three minutes, the total number of MCQs must fall, reducing reliability. And of course, when only a few students complete the reasoning successfully, everyone else is grouped into the same ‘wrong’ category with no discrimination between them. Partial understanding, plausible reasoning, and thoughtful errors go unrecognised. (And a proportion of those answering correctly will merely be guesses in any case).

The core problem is trying to assess process-heavy reasoning through a binary format. A question that requires several steps of working but produces only a final mark offers no insight into how far the student got or where they faltered. In these cases, short or extended responses might be more appropriate. They reveal reasoning, credit partial success, and necessarily better discriminate between levels of performance.

Difficult MCQs are not inherently good or bad. They can support learning in many contexts, but only when used alongside a large number of similar, quick-to-answer items. It’s this combination that builds a reliable picture of attainment. So before stretching the format, we should pause and ask: Is this the kind of thinking an MCQ is best suited to capture?