Articles on: Results

Reliability

Reliability of Comparative Judgement Tasks



What is reliability?

The most important thing to check when you have finished a comparative judging task is the reliability figure.

The reliability figure is between 0 to 1:

0: Very low reliability
1: Very high reliability.

If you have a reliability of 1, what does it mean?

Simply speaking, a high reliability means that the scores you have in your task can be trusted. If you did more judging or added more judges the scores would be very unlikely to change. A low reliability means the scores in your task are likely to change if you do more judging or add more judges.

How high a reliability is good enough?

Typically we hope to achieve a reliability of over 0.8 for a task. At this point adding more judges or doing more judgements will not have a large impact on the scores. The work involved in additional judging is probably not worth it unless you really need very precise scores.

What factors affect reliability?

The main factor that affects reliability is how much judging has been done. The more judging you do, the higher the reliability.

The second most important factor is the extent to which judges agree or disagree in their judging. If you have two judges who never agree about anything the reliability will stay low however much judging they do!

You can check how good your reliability is using our interactive calculator: https://observablehq.com/@nomoremarking/reliability-of-comparative-judgement-tasks

Updated on: 18/11/2024

Was this article helpful?

Share your feedback

Cancel

Thank you!