320 likes | 337 Views
Dive into the world of regression analysis applied in survey sampling, exploring model-assisted and model-based estimators, assumptions from regression analysis, least squares regression, geometric mean regression, height-age curves, and percentile models. Also, uncover the relevance of regression analysis in tree growth models, variance projection, mortality fractions, and fitting techniques.
E N D
WHY ARE YOU USING THAT REGRESSION? Western Mensurationist Meeting Jim Flewelling July, 2003
FOCUS • POPULATIONS • VARIANCE IN RELATIONSHIPS • OBJECTIVES • USE OF REGRESSION • TECHNIQUES are SECONDARY
TWO WORLDS • SURVEY SAMPLING • Fixed Populations • Objective refers to Population • REGRESSION ANALYSIS • Relationships between variables • Objectives refer to individuals or populations
SURVEY SAMPLING • Fixed Population. • Specified probability-sampling processes. • Estimation of population parameters • unbiased estimators.
SURVEY SAMPLING “If we are to infer from sample to population, the selection process is an integral part of the inference.” - Stuart (1984, p. 4)
REGRESSIONS IN SURVEY SAMPLING • AUXILIARY INFORMATION (X) • known for population. • Increased precision. • MODEL-ASSISTED ESTIMATORS COMMON (Särndal et al.,1992) • MODEL-BASED ESTIMATORS
MODEL-ASSISTED SURVEY SAMPLING Ratio of Means Estimator: Asymptotically unbiased, whether or not y proportional to x.Could be used to estimate individual y’s. No claim of unbiasedness here.
MODEL-BASED SURVEY SAMPLING • Assumptions from Regression Analysis. • True model • E(e|x) = 0 • Errors are independent. • Random selection avoids a source of bias. • Inference from regression theory, not the distribution of samples. • Theory from Royall (1970).
REGRESSION ANALYSIS • Least Squares - Legrendre (1805) and Gauss. • Sir Francis Galton (1877, 1885): Offspring of seeds “did not tend to resemble their parent seeds in size, but to be always more mediocre [i.e., more average] than they - to be smaller than the parents, if the parents were large … the mean filial regression towards mediocrity was directly proportional to the parental deviation from it.” (quoted from Draper & Smith)
LEAST SQUARES REGRESSION Var < Var(y)
GEOMETRIC MEAN REGRESSION Preserves Variance Discussion by Ricker (1984)
HEIGHT-AGE CURVES • Site Curves (Curtis) • Site Index Prediction Functions • Geometric Mean Regression • Stochastic Differential Equation • Height Growth Models • Percentile Models
Site Curves and SI Prediction Functions • Curtis et al. (1974) • Site Curve - Yield table construction • H = f(A, SI). • SI Prediction Function - Site Classification • SI = f(A, H).
SITE CURVES, SI PREDICTION, and GMR SI = H (index age) HA = H (age A) 3 Lines: All at mean (HA, SI) Slope = SI/HA { , 1, 1/ } Straight-line assumption valid for bivariate normal.
Stochastic Differential Equation (Garcia, 1979) • dH/dt = (b/c)H{(a/H)c -1} • b is plot-specific, (a, c) are global. • Integrates to Chapman-Richards. • Add Wiener process error to growth. • Add measurement errors at intervals. • Fit with Maximum Likelihood. • It’s a growth-model; also base-age invariant site curve.
Height Growth Model • Family of H-A curves. • From any one age, predict height difference to next or previous age. • Parameters adjusted to minimize errors in predicted growth. (Bonnor et al, 1995), Flewelling et al (2001). • Crude, ignores measurement errors, and correlations between periods. Flexible model form. • It’s a growth model - attempts to model H-A trajectories of plots. Base-age invariant.
Percentile Models • Concept by Pienaar and Clutter (Clutter et al, 1983). • Example by Bi (2002). • Extends to irregular data. (Flewelling, 1982, unpublished). • Current econometrics theory, rich history.
Percentile Models • Pienaar and Clutter: Percentiles as a labeling device: “useful in illustrating the fact that index age is not a fundamental or required concept in the use of site index to express site quality.”
Percentile Models, Example • Bi et al ( 2002) • Temporary plots (age and site assumed orthogonal). • H(t) assumed to have normal distribution. • Q0.75 and Q0.25, fit as functions of t. • methodology from Koenker and Bassett (1978) • Mean H(t) fit with weighted regression.
Percentile Model, Irregular data. • Sectioned tree data, height every year. • Younger ages: full data set. • Older ages: reduced data set. • Establish tree percentiles at young age. • Reassign censored percentiles older ages. • Compute (and model) means and standard deviations from heights and percentiles.
Percentile models, econometrics • Koenker (2000): • wonderful discussion of least squares, alternative methods, and statistical history. • Minimization of summed absolute errors dates from 1760’s.
Height-Age Curves. Questions • Should height growth models be the same as constant percentile curves? • Are regressions from one age to another wanted? • Is there any use for an index age other than as a label?
POPULATIONS WHICH PROJECTION IS WANTED?
TREE GROWTH MODELS • DBH • Mortality fractions. • What ensures that the variance of projected stand table is correct? • Need variance models as constraints? • Different fitting techniques? • Good luck and occasional checking?
RIGHT INDEPENDENT VARIABLES? Regional H-DBH Curves. Biased by Age or position in stand. Alternative: local curves, another variable.
Bayesian Regression • Neglected in Forestry? • Empirical Bayes used in volume equations (Green and Strawderman, 1985). • Taper and volume equations by forest district (McTague, Stansfield and Lan, 1992). • Other opportunities?
Bayesian Opportunity • Fit y = a0 + a1x1 + a2x2 + a3x3 + ….. • Often by species or other category. • Coefficients tested and omitted if non-significant. • Or, selected coefficients fit in common for all species. • Bayesian regression or other methods better?
OTHER REGRESSION TECHNIQUES • ML with better error characterization. • Mixed models. • Systems: Seemingly unrelated regression, 2SLS, 3SLS …….. • Generally are more efficient, better estimates of parameter variance, possibly avoid some biases. Necessary? • Imputation?
SUMMARY • What does population look like? • What should be described? • What techniques allow that?
REFERENCES • Bi, H., A.D. Kozek and I.S. Ferguson. 2002. Quantile-based site index curves: a brief introductory note. Proc of IUFRO Symposium on Statistics and Technology in Forestry, Sept 8-12, 2002 Blacksburg. [ May be a related 2003 paper in J of Agr, Biological, and Environmental Statistics.] • Bonnor, G.M., R. J. DeJong, P. Boudewyn and J. Flewelling. 1995. A guide to the STIM growth model. Nat. Res. Canada. Info Rpt X-353. • Clutter, J.L., J.C. Fortson, L.V. Pienaar, G.H. Brister and R.L. Bailey. 1983. Timber management: a quantitative approach. Krieger Publ., Malamar, FL. 333 p. • Curtis, R.O., D.J. Demars, F.R. Herman 1974. Which dependent variable is site index - height - age relationships? For. Sci. 20: 74-87 • Draper, N. R. and H. Smith. 1998. Applied Regression Analysis. Wiley. New York. 706 p. • Flewelling, J. 1982. Dominant height trends for plantations of loblolly pine at the Mississippi/Alabama region of Weyerhaeuser Company. Research Rpt 050-3415/3. Weyerhaeuser Forestry Research, Hot Springs. (unpublished) • Flewelling, J., R. Collier, B. Gonyea, D. Marshall and E. Turnblom. 2001. Height-age curves for planted stands of Douglas fir, with adjustments for density. SMC Working Paper No. 1, Univ. of WA, Seattle.
REFERENCES • Garcia, O., 1979. A stochastic Differential Equation Model for height growth of forest stands. Biometrics 39: 1059-1072. • Green, E. and W.E. Strawderman. 1985. The use of Bayes/Empirical Bayes Estimation in Individual Tree Volume Equation Development. For. Sci. 31: 975-990. • Koenker, R. 2000. Galton, Edgeworth, Frisch, and prospects for quantile regression in econometrics. J of Econometrics 95: 347-374. • Koenker, R.W. and G.W. Basset. 1978. Regression Quantiles. Econometrica 50, 43-61. • McTague, J.P., W.F. Stansfield, Z. Lan. 1992. Southwestern ponderosa pine, Douglas fir and white fir volume and taper functions. Report to USFS. Northern Arizona University. • Ricker, W.E. Computation and uses of central trend lines. Can. J. Zool. 62:1897-1905 • Royall, R.M. 1970. On finite population sampling theory under certain linear regression models. Biometrika 57: 377-387. • Särndal, C., B. Swensson, J. Wretman . 1992. Model assisted survey sampling. Springer-Verlag, New York. 694 p. • Stuart, A. 1984. The ideas of sampling. Macmillan, New York. 91 p.
COMMENTS? QUESTIONS?