Articles on: AI-enhanced CJ

What to do if you spot an AI error

Overall, the AI error rate is very low, and much lower than the human error rate. This article explains what we mean by an AI error, how to review your results to check there are no errors, and how to fix them.


There are two main sources of AI judging error:


  • The AI can mis-transcribe student handwriting. This is the most common source of error. It is also more prevalent in younger year groups, where handwriting can be especially difficult to read. This can result in a transcription that is better than the original handwritten piece, meaning the script receives a higher score than it should. This is why we do not use AI to judge Year 1 writing.
  • The AI can make the wrong judgement. This is relatively rare. Around 80-85% of the time, the AI agrees with the human. Most of the remaining 15-20% are small, legitimate disagreements about quality. About 0.5-1% are larger disagreements that need explanation. Most of these are caused by human error rather than AI error, but a small number are genuine AI errors. These typically happen when responses fall outside the AI's training patterns (for example, a very short response where an essay was expected).


It is difficult to prevent all anomalies in advance, but it is straightforward to spot and fix them afterwards.


  • For large national tasks, we aim to identify and fix all errors before releasing results. Even so, we still recommend following the checking process below and letting us know if you spot any issues.
  • For custom tasks, you should always follow the checking process below.



How to check for errors


There are two reports you should review:


  • Check your AI-human agreement report. This lists human-AI disagreements in order of size. If the AI has made a major error, it should appear on the first page.
  • Check every student's infit statistic. Infit measures the extent to which judges agreed on a script. Scripts that provoke high disagreement will have high infit. On the Results & Feedback page, inconsistent (high-infit) scores are highlighted in red. We recommend opening each red-flag script and reviewing it.


Please note: most red flags are not AI errors. There are many reasons a script can have high infit (for example, uneven quality that is genuinely hard to judge). A typical secondary school may have around four red scripts, but true AI errors are usually fewer than this.


In short: all AI errors should show a red flag, but not all red flags are AI errors.




What should I do if I find a clear error?


If the AI is clearly wrong for a pupil, you can remove the incorrect decisions and rejudge that pupil.


Remove incorrect decisions (Candidate Decisions page)


  1. Open the pupil's Candidate Decisions page under Results & Feedback.



  1. Review the Result column (Won/Lost) and decision details.
  2. Use the checkbox column at the start of the table to select one or more decisions.



  1. Click the Delete icon button (enabled only when one or more rows are selected).
  2. Confirm deletion.


This removes the selected decision records for that pupil so they can be judged again. This should fix the majority of issues, but if it doesn't, and if you'd like support reviewing a specific case, contact us at support@nomoremarking.com.

Updated on: 10/03/2026

Was this article helpful?

Share your feedback

Cancel

Thank you!