AI-enhanced custom tasks: how to set criteria
When considering using AI for judging student work, a common and valid concern arises: will the AI prioritize polished writing over factual accuracy? Could it reward "well-written nonsense" more than a less articulate but factually correct piece?
The short answer is that a well-instructed AI can effectively evaluate both. The key is to provide the AI with clear, pedagogically sound assessment criteria. Here is some general advice based on this principle.
1. The AI Judges What You Tell It to Judge
An AI assessment tool is not an independent thinker with its own biases; it is a system that follows instructions. If you only ask it to evaluate the "quality of writing," it will focus solely on grammar, syntax, structure, and style. However, if you explicitly include "knowledge accuracy" in your evaluation criteria, the AI is very effective at identifying and assessing the factual content of the submission. In a similar way you can ask the AI to consider the ambition of the answers so it doesn't prioritise concision and accuracy over creativity.
2. Prioritise Holistically with Assessment Objectives, Not Checklists
It can be tempting to test for specific pieces of knowledge, for example, by telling the AI, "The student must mention the Battle of Hastings in 1066." However, this "checklist" approach is often poor assessment design, whether for an AI or a human marker. It can encourage superficial name-dropping rather than genuine understanding and can unfairly penalise an otherwise excellent response that demonstrated mastery in a different way. A much better method is to use clear assessment objectives. For example, the following image shows an extract from an AQA English Language mark scheme.
The assessment objectives can be pasted directly into the criteria section of our AI custom tasks.
3. No need for Levels
Typically when writing a mark scheme you need to define levels of achievement. For example you may wish to assign Level 4 to a "perceptive and detailed understanding of language" and level 3 to a "clear understanding of language." Luckily, with Comparative Judgement we are spared the need to provide subtle distinctions between levels which can be hard to apply and lead to poor marker consistency. Rather than apply levels in advance we recommend that you simply provide the criteria at the judging phase, and set the levels once the judging is complete. You can use our results interface to examine work at key score points and assign levels that correspond to scaled score ranges.
Updated on: 09/09/2025
Thank you!