AI handwriting transcriptions - what happens when it hallucinates

AI handwriting hallucinations

The first step in our AI judging process is for the AI to transcribe the handwriting. Sometimes, when the AI encounters particularly hard-to-read handwriting, it can "hallucinate" and see words that aren't there. This is mainly an issue with younger students whose handwriting is still at the mark-making stage.

The problem

AI systems are trained on massive datasets of clean, adult handwriting and typed text. When transcribing handwriting, they predict what should come next based on probability. When a child produces non-linear scribbles, stray marks, or "pre-letter" shapes, the AI tries to force those shapes into a known pattern. A series of loops and dots might be interpreted as a complex sentence. The AI is trying too hard to make sense of the nonsensical.

Obviously, this can distort the results: a script that a human will find unreadable will be transcribed by the AI as featuring complex and sophisticated sentences. It will then end up winning when it is compared against other pieces of writing. It's important to note that the failure in this case is not in the AI's judgement of the writing, but in the AI's transcription of the writing.

The solution

We are working on a number of ways to improve the way the AI transcribes writing. In the mean time, we recommend that you do the following.

Check the original handwritten scripts when you collect them in. If a child has produced handwriting that a human cannot read, the AI is unlikely to be able to read it either, they are unlikely to get a fair score. You might want to exclude these scripts.
Once the scripts are uploaded, check the transcriptions. On the feedback page for a task you can see all transcriptions. A quick check of these will allow you to exclude pupils before judging.
Check the results. If you have an outlier result, you might want to check the transcription.
Add more human judgements. If you are unhappy with your AI judging, you can delete or exclude the AI judgements and add more human judgements.

Human interpretation of Year 1 handwriting

Many of the problems we have encountered with AI transcriptions involve the kind of handwriting that is also hard for humans to interpret. Even before we added AI judges, we found that humans could have big disagreements about Year 1 writing based on the extent to which they could interpret the handwriting. For example, we have had occasions in the past where one teacher will say that they simply cannot understand what a student has written, and therefore they cannot select it as the better piece. Another teacher will say that they can work out what has been written, and that it is actually the better piece.

Here is an example of what we mean.

Some teachers may look at the script on the left and conclude that they cannot generate any meaning from it, and will therefore pick the shorter but more readable script on the right.

Other teachers will say that they can generate enough meaning from the script on the left to judge that it is better than the one on the right.

In some ways, the AI hallucinations have something in common with the challenges humans face in interpreting the writing of young children.

Updated on: 12/02/2026

Was this article helpful?

Thank you!