Imagine you're given the task of evaluating the abilities of the individuals in a group -- perhaps in sharp-shooting -- and award ratings like A or F. You might plan a series of challenges of increasing difficulty. What should you do if these turn out to be much harder than you planned -- so that while you expected a mean success rate of 60%, it was actually 20%? An obvious solution is to multiply all scores by 3 to bring up the average. But it turns out that this (and any other linear correction) penalizes the weaker performers. Better alternatives exst.

Test Flaws

Ideal Testing

Let's go back to consider the ideal situation, where the test is as hard as you expected. It consists of 20 challenges (or questions) ranging in difficulty from 0 to 1: a challenge of difficulty 0.5 can only be accomplished by a person with 0.5 skill or higher. With such a test bank, 0.5 skill is enough to succeed at exactly 50% of the questions, and such corespondence holds for all other levels:

achievement equals skill level

Some Impossible Challenges

What if we modify the test so that many of the questions (half) are beyond anyone's ability? If the remaining questions are still arrayed linearly from easy to hard, performance is simply depressed by a constant factor (below, blue line). In this case, the fix proposed above works well: we can multiply all scores by some number so that the best score becomes 1. This makes the scores resemble those from an ideal test (dashed purple line).

impossible questions depress scores evenly

Hard Bias

This fix does not work if all challenges are possible but nonetheless tend to be hard (that is, there are more hard than easy challenges). On such a test, strong individuals still do well but weak ones suffer greatly, as seen below. A linear corrective only amplifies this disparity so cannot be used.

An alternative is to take a root of all scores; perhaps the square root, or something smaller if performance is very low. This method cannot discriminate between very low performers, but does raise mediocre scores to where they would be with a better-designed test (if the correct root is chosen, as shown).

performance of weak and moderates low, then suddenly increases for strong individuals

No Easy Challenges

If absolutely no easy questions exist the root correction does somewhat less well, particularly for very weak individuals (below, where the grey line indicates performance under an ideal test).

performance is at 0 for weak individuals, low for moderates

Adding some base score might, on average, make the low performer scores more reasonable, but would amount to a free ride even for those showing no ability. In no case can correction completely make up for a flawed test, but as long as most scores are not extremely low, correction with roots should adjust them reasonably.

Next Post