490 likes | 789 Views
TRIM Workshop Arco van Strien Wildlife statistics Statistics Netherlands (CBS). What is TRIM? TR ends and I ndices for M onitoring data Computer program for the analysis of time series of count data with missing observations Loglinear, Poisson regression (GLM)
E N D
TRIM Workshop • Arco van Strien • Wildlife statistics • Statistics Netherlands (CBS)
What is TRIM? • TRends and Indices for Monitoring data • Computer program for the analysis of time series of count data with missing observations • Loglinear, Poisson regression (GLM) • Made for the production of wildlife statistics by Statistics Netherlands (Jeroen Pannekoek / freeware / version 3.0) Introduction
Why TRIM? • To get better indices? No, GLM in statistical packages (Splus, Genstat...) may produce similar results • But statistical packages are often unpractical for large datasets • TRIM is more easy to use Introduction
The program of this workshop • Aim: a basic understanding of TRIM • basic theory of imputation • how to use TRIM to impute missing counts and to assess indices etc. • basic theory of weighting procedure to cope with unequal sampling of areas & how to use TRIM to weight particular sites Introduction
INDEX: the total (= sum of al sites) for a year divided by the total of the base year Introduction
Missing values affect indices Theory imputation
How to impute missing values? 2 6 200 ESTIMATION OF SITE 2 IN YEAR 2? SITE 1 SUGGESTS: TWICE THE NUMBER OF YEAR 1 (site & year effect taken into account) Theory imputation
Another example.. 6 8 200 ESTIMATION OF SITE 2 IN YEAR 2? SITE 1 SUGGESTS: TWICE THE NUMBER OF YEAR 1 Theory imputation
And another example ... 9 12 300 ESTIMATION OF SITE 2 IN YEAR 2? SITE1 SUGGESTS: THREE TIMES AS MANY AS IN YEAR 1 Theory imputation
Try this one….. THERE IS NOT A SINGLE SOLUTION (TRIM will prompt an ERROR) Theory imputation
Difficult to guess missings here.. Theory imputation
Estimating missing values by an iterative procedure (REQUIRED IN CASE OF MORE THAN A FEW MISSING VALUES) Theory imputation
First estimate of site 2, year 2: 1 X 4/7 = 0.6 >>0.6 >>1.6 >>4.6 >>7.6 RECALCULATE THE MARGIN TOTALS AND REPEAT ESTIMATION OF MISSING Theory imputation
2nd estimate of site 2, year 2: 1.6 X 4.6/7.6 = 0.96 REPEAT AGAIN: MISSING VALUE = 1.22, 1.40, 1.54 ETC. … >> 2 Theory imputation
To get proper indices, it is necessary to estimate (impute) missings • Missings may be estimated from the margin totals using an iterative procedure (taking into account both site effect as year effect) (Note: TRIM uses a much faster algorithm to impute missing values). • Assumption: year-to-year changes are similar for all sites (assumption will be relaxed later!) • Test this assumption using a Goodness-of-fit (X2 test) Theory imputation
X2: COMPARE EXPECTED COUNTS WITH REAL COUNTS PER CELL (1.8) (4.2) (1.2) (2.8) X2 IS SUMMATION OF (COUNTED - EXPECTED VALUE)2 / EXP. VALUE (2-1.8)2 /1.8 + (4-4.2)2 /4.2 ETC. >> X2 = 0.08 WITH A P-VALUE OF 0.78 >> MODEL NOT REJECTED (FITS, but note: cell values in this example are too small for a proper X2 test) Theory imputation
Imputation without covariate (X2 = 18 and p-value = 0.18) Theory imputation
Using a covariate: better imputa- tions & indices, X2 = 1.7 p = 0.99 Theory imputation
What is the best model? <<< rejected < not rejected < not rejected Both model 2 and 3 are valid Theory imputation
Summary imputation theory • To get proper indices, it is necessary to impute missings • Assumption: year-to-year changes are similar for all sites of the same covariate category • Test assumption using a GOF test; if p-value < 0.05, try better covariates • If these cannot be found, the resulting indices may be of low quality (and standard errors high). See also FAQ’s! Theory imputation
The program of this workshop • Aim: a basic understanding of TRIM • basic theory of imputation • how to use TRIM to impute missing counts and to assess indices etc. • basic theory of weighting procedure to cope with unequal sampling of areas & how to use TRIM to weigh particular sites Using TRIM
Using TRIM • several statistical models (time effects, linear model) • statistical complications (overdispersion, serial correlation) taken into account • Wald tests to test significances • model versus imputed indices • interpretation of slope Using TRIM
Time effects model (skylark data) without covariate Using TRIM
Time effects model with covariate • 0 = total 1= dunes 2 = heathland Using TRIM
Lineair trend model (uses trend estimate to impute missing values) Using TRIM
Lineair trend model with changepoints at year 2 and 3 Using TRIM
Lineair trend model with all • changepoints = time effects model • Use lineair trend model when: • data are too sparse for the time effects model • one is interested in testing trends, e.g. trends before and after a particular year (or let TRIM stepwise search for relevant changepoints) • But be careful with simple linear models! Using TRIM
Statistical complications: • Serial correlation: dependence of counts of earlier years (0 = no corr.) • Overdispersion: deviation from Poisson distribution (1 = Poisson) Run TRIM with overdispersion = on and serial correlation = on, else standard errors and statistical tests are usually invalid Using TRIM
Running TRIM features • trim command file • output: GOF (as X2) test and Wald tests • output (fitted values, indices) • indices, time totals • overall trend slope • Frequently Asked Questions • different models (lineair trend model, changepoints, covariate) Using TRIM
What is the best model? Both 2 and 3 are valid. Model 3 is the most sparse model. Using TRIM
Model choice • The indices depend on the statistical model! • TRIM allows to search for the best model using GOF test, Akaikes Information Criterion and Wald tests • In case of substantial overdispersion, one has to rely on the Wald tests Using TRIM
Wald tests • Different Wald-tests to test for the significance of: • the trend slope parameters • changes in the slope • deviations from a linear trend • the effect of each covariate Using TRIM
TRIM generates both model indices and imputed indices Using TRIM
Imputed vs model indices • Imputed indices: summation of real counts plus - for missing counts - model predictions. Closer to real counts (more realistic course in time) • Model indices: summation of model predictions of all sites. Often more stable Usually Model and Imputed Indices hardly differ! Using TRIM
TRIM computes both additive and multiplicative slopes • Additive + s.e. Multiplicative + s.e. • 0.0485 0.0124 1.0497 0.0130 • Relation: ln(1,0497) = 0.0485 Multiplicative parameters are easier to understand Using TRIM
Interpretation multiplicative slope • Slope of 1.05 means 5% increase a year Standard error of 0.013 means a confidence interval of 2 x 0.013 = 0.026 Thus, slope between 1.024 and 1.076 Or, 2% to 8% increase a year = significant different from 1 Using TRIM
Summary use of TRIM: • choice between time effects and linear trend model • include overdispersion & serial correlation in models • use GOF and Wald tests for better models and indices & to test hypotheses • choice between model and imputed indices • use multiplicative slope Using TRIM
The program of this workshop • Aim: a basic understanding of TRIM • basic theory of imputation • how to use TRIM to impute missing counts and to assess indices etc. • basic theory of weighting procedure to cope with unequal sampling of areas & how to use TRIM to weight particular sites Weighting
Unequal sampling due to • stratified random site selection, with oversampling of particular strata. Weighting results in unbiased national indices • site selection by the free choice of observers, with oversampling of particular regions & attractive habitat types. Weighting reduces the bias of indices. Weighting
To cope with unequal sampling. • stratify the data, e.g. into regions and habitat types • strata are to be expected to have different indices & trends • weigh strata according to (1) the number of sample sites in the stratum and (2) the area surface of the stratum • or weigh by population size per stratum Weighting
Weighting factor for each stratum or 10 or 5 Weighting factor for stratum i = total area of i / area of i sampled Weighting
Another example .. 100/5= 20 (or 4) 50/10=5 (or 1) Weighting factor for stratum i = total area of i / area of i sampled Weighting
Weighting in TRIM • include weight factor (different per stratum) in data file for each site and year record • weight strata and combine the results to produce a weighted total (= run TRIM with weighting = on and covariate = on) Weighting
Indices for Skylark unweighted • (0 = total index 1= dunes 2 = heath-land) Weighting
Indices for Skylark with weight factor for each dune site = 10 • (0 = total index 1= dunes 2 = heathland) Weighting
Final remarks • To facilitate the calculation of many indices on a routine basis • TRIM in batch mode, using TRIM Command Language (see manual) • Option to incorporate TRIM in your own automation system (Access or Delphi or so) (not in manual)
That’s all, but: • if you have any questions about TRIM, see the manual, the FAQ’s in TRIM or mail Arco van Strien asin@cbs.nl • Success!