260 likes | 362 Views
Toward a Characterization of Measurement Error. Sean Canavan David Hann Oregon State University. Recall: Measurement error enters into forestry in many different ways and forms The errors can have very negative effects on model
E N D
Toward a Characterization of Measurement Error Sean Canavan David Hann Oregon State University
Recall: • Measurement error enters into forestry in many different • ways and forms • The errors can have very negative effects on model • parameters, model estimates, and the variances of model • parameters and model estimates. • Correction techniques do exist for countering the effects of • measurement errors in many situations, but typically require • knowing something about the form of the errors. • People have generally made the assumption that the errors • are Normal in distribution.
Study Data: • Dbh: • n = 2175 • < 0 : 529, = 0 : 368, > 0 : 1278 • 0.8” – 72.1” • Species: DF, TF, PP, SP, IC • Ht: • n = 1238 • < 0 : 722, = 0 : 30, > 0 : 486 • 8.4’ – 231.7’ • Species: DF, TF, PP, SP, IC
The Normal Assumption: • It is often assumed that measurement errors follow a Normal • distribution - (Nester 1981, Garcia 1984, Smith 1986, Päivinen • & Yli-Kojola 1989, Gertner 1991, McRoberts et al. 1994, • Kozak 1998, Kangas 1998, Kangas & Kangas 1999, Phillips • et al. 2000, Williams & Schreuder 2000) • Bias assumption: μ = 0 • Variance assumption: homogeneous (σ2 constant) • heterogeneous (σ2 not constant)
Normal(0,1) PDF 0.45 0.4 0.35 0.3 0.25 f(x) 0.2 0.15 0.1 0.05 0 -4 -3 -2 -1 0 1 2 3 4 5 -5 x
The Normal Assumption: • It is often assumed that measurement errors follow a Normal • distribution - (Nester 1981, Garcia 1984, Smith 1986, Päivinen • & Yli-Kojola 1989, Gertner 1991, McRoberts et al. 1994, • Kozak 1998, Kangas 1998, Kangas & Kangas 1999, Phillips • et al. 2000, Williams & Schreuder 2000) • Bias assumption: μ = 0 • Variance assumption: homogeneous (σ2 constant) • heterogeneous (σ2 not constant) • What happens when there are many correct measurements? • example: Dbh measured to a tenth of an inch
1.0 1.0 0.9 0.9 50% Correct 25% Correct 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 nsig = 115 0.4 nsig = 50 0.3 0.3 0.2 0.2 Cumulative Probability 0.1 0.1 0.0 0.0 -1.5 -1 -0.5 0 0.5 1 1.5 -1.5 -1 -0.5 0 0.5 1 1.5 1 1 0.9 0.9 100% Correct 75% Correct 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 nsig = 12 nsig = 6 0.3 0.3 0.2 0.2 0.1 0.1 Measurement Error Value 0 0 -1.5 -1 -0.5 0 0.5 1 1.5 -1.5 -1 -0.5 0 0.5 1 1.5
Error Distribution Modeling: • First Approach: PDF modeling • Second Approach: CDF modeling • Part 1: Modeling Error Type Probabilities • Part 2: Modeling the Positive and Negative error • portions of the curve
Normal(0,1) CDF 1 0.9 0.8 0.7 0.6 F(x) = P(X < x) 0.5 0.4 0.3 0.2 0.1 0 -5 -4 -3 -2 -1 0 1 2 3 4 5 x
Empirical Dbh Error CDF Surface 1.00 0.75 Cumulative Probability 0.50 0.25 0.0 10.0 20.0 Dbh (inches) 30.0 40.0 1.8 1.2 0.6 0.0 -0.6 Error (inches)
Error Distribution Modeling: • First Approach: PDF modeling • Second Approach: CDF modeling • Part 1: Modeling Error Type Probabilities • Part 2: Modeling the Positive and Negative error • portions of the curve
} 1.00 Pr(ε > 0) } 0.80 0.60 Pr(ε = 0) Cumulative Probabiility 0.40 } 0.20 Pr(ε < 0) 0.00 -1.5 -1 -0.5 0 0.5 1 1.5 Error Size Fitted CDF Equation: { Pr(ε < 0)*Negative Error CDF ε < 0 Pr(ε < 0) + Pr(ε = 0) ε = 0 Pr(ε < 0) + Pr(ε = 0) + Pr(ε > 0)*Positive Error CDF ε > 0 P(X = x) =
f ( Dbh ) i e f ( Dbh ) f ( Dbh ) + + 1 2 1 e e 1 f ( Dbh ) f ( Dbh ) + + 1 2 1 e e • Part 1: Error Type Probability Modeling • Multinomial Regression in S-Plus • GLM with a Poisson link function • Overdispersion/Quasilikelihood • Counts by 1-inch Dbh Classes / 5-ft. & 10-ft. Ht Classes • Candidate predictors: Dbh, Dbh½, Dbh2, Dbh-1 • Ht, Ht½, Ht2, Ht-1 • Probability model forms:
100% 90% 80% 70% 60% P(e > 0) P(e = 0) Probability 50% P(e < 0) 40% 30% 20% 10% 0% 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 Dbh (inches) Fits of Error Type Probabilities
Part 2: Modeling Positive and Negative CDFs • Negative Errors: • CDFs by Dbh class • Step 1: Exponential fits • model form: exp(β*Error Size) • actually fit: 1 – exp(β*Error Size) • Step 2: Parameter Modeling • βi = f(Dbh) • Step 3: Combined Equation Fit • 1 – exp(f(Dbh)*Error Size)
1 1 0.8 0.8 21.5” Class 2.5” Class 0.6 0.6 0.4 0.4 0.2 0.2 0 0 -1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 -1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 Cumulative Probability 1 0.8 45.0” Class 0.6 0.4 0.2 0 -1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 Error Size (inches)
Part 2: Modeling Positive and Negative CDFs • Negative Errors: • CDFs by Dbh class • Step 1: Exponential fits • model form: exp(β*Error Size) • actually fit: 1 – exp(β*Error Size) • Step 2: Parameter Modeling • βi = f(Dbh) • Step 3: Combined Equation Fit • 1 – exp(f(Dbh)*Error Size)
1200 1000 120 100 80 Fitted Exponential Coefficients 60 40 20 0 0 5 10 15 20 25 30 35 40 45 50 Dbh Class Parameter Modeling: 10.04exp(-0.03Dbh + 1.77Dbh-1 + 0.59Dbh-2)
Part 2: Modeling Positive and Negative CDFs • Negative Errors: • CDFs by Dbh class • Step 1: Exponential fits • model form: exp(β*Error Size) • actually fit: 1 – exp(β*Error Size) • Step 2: Parameter Modeling • βi = f(Dbh) • Step 3: Combined Equation Fit • 1 – exp(f(Dbh)*Error Size)
Combined equation fit: • Variable power on error size: • 1 – exp[b0*exp(b1Dbh + b2Dbh-1 + b3Dbh-2)*(error size)c1] • Resulting CDF equation: • exp[10.04*exp(-0.03*Dbh + 1.77*Dbh-1)*(error size)0.59] • adjusted R2 = 0.8664
Fitted Dbh Error CDF Surface 1.00 0.75 0.50 Cumulative Probability 0.25 0.00 8 16 Dbh (inches) 24 32 40 2.00 1.00 0.00 -1.00 Error (inches)
Alternative Surfaces (Dbh): • Normal 1: Unbiased, homogeneous Normal: • μ = 0.0, σ = 0.2237 • Normal 2: Constant bias, homogeneous Normal: • μ = 0.0901, σ = 0.2237 • Normal 3: Non-constant bias, homogeneous Normal: • μ = 0.003983*Dbh + 0.000121*Dbh2, σ = 0.2237 • Normal 4: Unbiased, heterogeneous Normal: • μ = 0.0, σ = σD*exp[0.1145*Dbh] • Normal 5: Non-constant bias, heterogeneous Normal: • μ = μ = 0.003983*Dbh + 0.000121*Dbh2, • σ = σD*exp[0.1145*Dbh]
Conclusions: • Case of many correct measurements • Case of few correct measurements • Drawing random samples • Species differences • Changing precision levels: • Dbh: 0.1” 1.0” 368 1087 out of 2175 • Ht: 0.1’ 1.0’ 30 274 out of 1238
"Sampling gets you to the final answer, if you do it often enough. • Measuring everything correctly gets you to the correct answer. • Don't get those mixed up." • Olde Statistical Sayings • Inventory and Cruising Newsletter • Issue No. 32, October 1995