190 likes | 301 Views
Mathematical Model for the Law of Comparative Judgment in Print Sample Evaluation. Mai Zhou Dept. of Statistics, University of Kentucky Luke C.Cui Lexmark International Inc. The Problem:. When evaluating several print samples, pair-wise comparison experiments are often used.
E N D
Mathematical Model for the Law of Comparative Judgment in Print Sample Evaluation Mai Zhou Dept. of Statistics, University of Kentucky Luke C.Cui Lexmark International Inc.
The Problem: When evaluating several print samples, pair-wise comparison experiments are often used. Two print samples at a time are judged by a human subject to determine which print sample is “better”. This is repeated with different pairs and different subjects. The resulting data will look like: / 5 4 37 6 / / 7 45 28 / / / 46 40 / / / / 4
How to Summarize the data; Order the print samples in terms of “strength”; Margin of error in the analysis/conclusion. Predict the outcome of future comparisons.
Outline of talk Introduction to Thurstone/Mosteller Model • New model, theoretical formulation Var-Cor modeling, Maximum Likelihood Estimation, Likelihood ratio confidence interval • New model, application to experimental data • Comparisons with classical model, how good is the fit? • Discussion
For pairwise comparisons of stimuli i and k, the observable outcomes are the signs of and the outcomes from different pairs are independent. (but within the pair, they may or may not be independent). Assume Where N( , ) denotes the normal distribution.
If we observed the outcomes of many pairs, the log likelihood function is where And is the cdf of the standard normal distribution (available in many software packages).
Where W (or L) is the times stimulus i is deemed better (or worse) than stimulus k in the pair-wise comparisons. The classical model assumes The new model we propose assumes for the variances
Because the human perceptual process is highly adaptive and is at its best when used as a null tester, ie, more sensitive for closely matched stimuli. • Thus the variances should be related to how closely the strengths are matched. e.g.
Computation Use software Splus (commercial) or R (Gnu) or Mathcad (commercial) or Matlab (commercial) or SAS (commerical) 1. Define the log likelihood function llk() as a function of the parameters.
2. Maximize the llk() or minimize the negative of llk() by using the optimization functions supplied. In R the optimize functions are: nlm( ) optim( ) In SAS iml we could use function nlptr( )
The parameter values that achieve the maximization (max1) are the estimate of the parameters. • Confidence interval of the parameter can be obtained by temporarily fix the value of the parameter at and maximize over the remaining parameters. Suppose it achieved maximum value max2. • those values for which max1 – max2 < 3.84/2 is the 95% confidence interval for the parameter.
Example: Colorfulness data • Nine print samples were compared. • Pairwise experiment, 50 subjects
Models fitted are: 1. Classic model with equal variances. 2. New model
Models fitted are: 2. New model
Differences: (predicted – observed) Model 2 with one more para.
We also fit Bradley-Terry model to the data (use SAS) and the fit is similar to the classic model.
References Acknowledgements: We would like to thank Dr. Shaun Love at Lexmark International Inc. for helpful discussions. • 1. Peter, G. Engeldrum, Psychometric scaling, A toolkit for imaging system development, Imcotek press. (2000) • 2. Torgerson, W.S. Theory and methods of scaling, John Wiley & Sons, Inc. (1958) • 3. Bradley, R.A. and Terry, M. E. "Rank analysis of incomplete block design. I. The method of paired comparisons." Biometrika 39, 324-345. (1952) • 4. P. Hall and B. La Scala, Methodology and algorithms of empirical likelihood, International Statistical Review, 58, 109-127. (1990) R: http://cran.us.r-project.org Updated manuscript: http://www.ms.uky.edu/~mai/research/