170 likes | 277 Views
A Flavour of Errors in Variables Modelling. Jonathan Gillard GillardJW@Cardiff.ac.uk. Constructing the Model. We have two variables, ξ and η . ξ and η are linearly related in the form η = α + βξ .
E N D
A Flavour of Errors in Variables Modelling Jonathan Gillard GillardJW@Cardiff.ac.uk
Constructing the Model • We have two variables, ξ and η. • ξ and η are linearly related in the form η = α+βξ. • Instead of observing n pairs (ξi, ηi) we observe the n data pairs (xi,yi), where xi = ξi + δi yi = ηi + εi and it is assumed that i and i are independent error terms having zero mean and variances σδand σε respectively. 2 2
Down’s Syndrome • Affects 1 in 1000 children born in the UK. • Down’s is caused by the presence of an extra chromosome. An extra copy of chromosome 21 is included when the sperm and the egg combine to form the embryo. • Screening tests are used to calculate the chance of a baby having the condition.
How can we fit a line? • There are clearly errors in both variables. • “To use standard statistical techniques of estimation to estimate β, one needs additional information about the variance of the estimators” – Madansky (1959) • We know the dating error is ±2 days – this is enough information!
Method of Moments • “The method of moments has a long history, involves an enormous amount of literature, has been through periods of severe turmoil associated with its sampling properties compared to other estimation procedures, yet survives as an effective tool, easily implemented and of wide generality” – Bowman and Shenton
Method of Moments • “The maximum likelihood approach to estimation is primarily justified by asymptotic (as the sample size goes to infinity) considerations” – Cheng and Van Ness
Estimating the Parameters • As the dating error is ±2 days, then σδ= 2. • Use a modified ‘y on x’ regression estimator: β = sxy / (sxx - σδ). • Other parameters i.e. intercept α can be estimated from the method of moment equations. 2
Typology of Residuals • What are residuals used for? • Prediction • Model checking • Leverage • Influence • Deletion
Estimating the true points • Two naive m.m.e’s of ξ: The optimal linear combination is:
A residual? • Attempt to write as a usual regression model: y = α + βx + (ε - βδ) 1. x is always random due to random error 2. Cov(x, ε – βδ) = -βσδ 3. Using ordinary l.s. estimates leads to inconsistent estimators 2