Articles on: Judging

Judge Infit

Judge infit



The infit for a judge is a measure of how far a judge agrees with the other judges in how they make their decisions. High values may indicate disagreement, but care must be taken with infit as it is sensitive to the nature of what is being judged, how many judges are judging, and how many decisions each judge makes. Our empirical data suggests the following values.

Primary school judging (short texts, lots of judges)



QuotaAverage Infit95% Infit
0 to 201.082.18
21+1.041.74


Secondary school judging (longer texts, fewer judges)



QuotaAverage Infit95% Infit
0 to 200.901.36
21+0.911.29


The 95% Infit value is the value that suggests inconsistency with other judges.

A detailed look at infit



Let us firstly examine in detail how the comparative judgement process works.

In each judgement, two scripts are presented and one is chosen - that decision is recorded (1 for chosen, 0 for not chosen) as the data for that judgement along with which two candidates were judged. A judge carries out a series of these judgements and all the data is collected for that judge. Of course, there may be a number of judges so the total data for the task is the data from all the judges.

This total data is fed into a mathematical model to arrive at the true scores for each candidate - this takes into account not just how many times a candidate’s script was chosen but also who they were compared against each time (and how highly these eventually scored).

Based on these resulting true scores for candidates, we can then look back and calculate the probability of one script being chosen over another (by considering their true scores). We can then compare this probability with the actual decision made for these two candidates. This gives a ‘residual’ value for that decision:

residual for decision = actual decision (1 or 0) - probability for that decision (between 1 and 0)

Actually, to remove negative numbers we look at the residual squared. And this residual squared of the decision is telling us whether the actual decision agrees with what we would expect taking into account all the decisions. A low residual squared (closer to 0) indicates a decision in line with other decisions, a high residual squared (closer to 1) a decision that goes against the others.

Judge infit



So for each decision made, we can obtain a residual squared value. As a judge makes a series of judgements, we can effectively* average all their residual squared values to work out a total infit for that judge. The higher the infit the more out of step the judge is with the other judges and their decisions. With our national assessments, if a judge infit is more than 2 standard deviations higher than other judges on the local task, we exclude that judge in the moderation task to make sure there is consistency in the moderation.

(*We say effectively because it is not quite a straight forward average. There is a weighting given to each residual square value which depends on how close in quality the compared scripts are according to the final true scores. This is why we can get values of over 1 for infit).

Candidate infit



Similarly, we can average over the decisions for a candidate. A candidate with high infit suggests a script where there has been general disagreement amongst the judges about that script. A rule of thumb here is to perhaps look at scripts with candidate infit over 2.0

How can I investigate high infit?



So if a judge is disagreeing with the other judges generally that will likely lead to a high infit. This may be due to making the judgements too quickly (so worth looking at the median judging time for that judge) or simply having a very different perspective on what makes a good script. It doesn’t necessarily mean that the judge is ‘wrong’ though - it simply means they have a different view to the other judges. Also a few mistaken judgements may increase the infit, so it is always worth looking at the information on the judgements made by a judge - more information on how to do that is provided here:
Exploring the decisions made by each judge. Remember, if a judge only makes a few judgements, and one of them is less accurate, they may end up with an unusually high infit. So low numbers of judgements can have an impact too.

Should I exclude my judge?



If you feel that the infit measure for one of your judges is too high, then you might wish to exclude them from the calculation of your candidates’ results. To do that, click the tiny box to the left of that judge on the Judges page, click the ‘Toggle Exclude’ button, and then click the purple ‘Refresh Scores’ button to recalculate your judging statistics and candidate scores without that judge’s judgements.

Updated on: 19/09/2024

Was this article helpful?

Share your feedback

Cancel

Thank you!