
10
but over an increasing number of trials will almost certainly tend to approach 50%. For example, the chances
ofacoincomingupheadsontwosuccessivetossesis1in4;onthreesuccessivetosses,1in8;onfoursuccessive
tosses,1 in 16. Getting tosses of heads ten times in a row islikely only one time in 1024. In other words, large
samples tend to smooth out the aberrations of randomness that occur.
Therefore, not only do you need to conduct a fair number of trials in a session, but the results have to be
substantially better than 50% to mean that there is an audible difference between the units under test.
The very minimum number of trials you should do in a session is ten. Out of ten trials, if the listener chose the
correct amplifier seven times, he or she
might have
heard some difference between the two, but it’s really too
inconclusive. Eight correct choices would indicate a
probable
audible difference between the amplifiers, with
about a 95% level of confidence. Nine out of ten would be an even stronger indication. Ten out of ten would
almostcertainly
indicatethelistener heard somesonicdifferencesbetweentheamps,since there’s onlya1in
1024 chance it could randomly.
What we’re really after in these statistics is a high degree of confidence that the results show real conditions
and not random occurrences. For example, a series of 25 trials has 33,554,432 possible combinations of right/
wrong answers, ranging from 0 correct/25 wrong, up to 25 correct/0 wrong. There are 5,200,300 possible
combinations of 12 correct/13 wrong, and an identical number of possible 13 correct/12 wrong. There is only
onecombinationof 25 correct/0 wrong, and whileitis possible that a random sequence ofresponsescould be
right25outof25times,thereisonlya1in33,554,432chanceofithappening.Therefore,wecansaythatthere
isa33,554,431/33,554,432(99.999997%)chanceofit
not
happening;thatwouldalsobeourlevelofconfidence
in the results: 99.999997%.
We won’t be quite as picky with the listening tests; a 95% minimum level of confidence will be good enough.
That is, there should be less than a 5% chance that the results can be attributable to chance.
slairtfo#01112131415161718191021222324252
#muminiM tcerroc
8899 011111212131414151516171
The table on this page lists a recommended range of trials, and the minimum number of correct responses
necessary to reach a 95% or better level of confidence.
Asyouincreasethenumberoftrials,yourdatabecomesmoredependable.Noticethatyouneed8of10butonly
17 of 25 to get the same degree of confidence.
So,ifyour listener correctly “guesses” theidentityof “X” at least theminimumnumber of times in thesession
oftrials,youcanconfidently estimate that there is an audible difference between the amplifiersundertest.Of
course, that may mean that one amp sounds worse than the other—maybe even the amp you favor, if you do
have an interest in one brand or model over the other. On the other hand, if the listener gets fewer than the
minimumnumbercorrect,itdoesn’tnecessarilymeanthattheampsareaudiblyindistinguishable,althoughthey
may be. It more accurately means only that you can’t confidently say there’s an audible difference.
It’s not a good idea to try more than 25 trials in the same sitting with the same listener. After a while, listener
fatigue sets in and it gets harder for him or her to concentrate and judge the sound quality.