180 likes | 397 Views
Environmental statistics. Вонр. проф. д-р Александар Маркоски Технички факултет – Битола 2008 год. Introduction.
E N D
Environmental statistics Вонр. проф. д-р Александар Маркоски Технички факултет – Битола 2008 год. Enviromatics 2008 - Environmental statistics
Introduction • Statistical analysis of environmental data is an important task to extract informationon former and actual states of ecosystems. The estimates are known assample statistics and form a base for prognoses on environmental system developments. • Topics of statistical analysis of environmental data are 1. Data analysis for the requirements of environmental administrations and associations(descriptive statistics, frequency distributions, averages, variances,error corrections, significance tests), 2. Data analysis for the requirements of different users as companies, farmers,tourists (explanatory statistics, multivariate statistics, timeseries analysis), 3. Basic research (regression and correlation analysis, multivariate statistics,advanced statistical techniques). Enviromatics 2008 - Environmental statistics
Environmental data • Environmental data are obtained by field samples and/or laboratory analysis. • They are directly observed (direct observations) or indirectly observed (due tocalibration of analytical instruments and sensors). • Summary data are derivedfrom statistics or by restricted observable indicators. • Simulated data are obtained by simulation models. • Measurement errors and outliers have to be removedfrom data sets. They will not take into account by data processing features. Enviromatics 2008 - Environmental statistics
Probability distributions of environmental data • Environmental data series represent the time and space varying behaviour ofenvironmental processes. Some indicators show a long-wave cycling overlaidby short variations. Other indicators lay out stochastic fluctuations. Some indicatorsrepresent an unique behaviour with some peak events. Enviromatics 2008 - Environmental statistics
Statistical measures • Statistical measures of environmental data are represented by • averages, • variances and • measures of correlation. Enviromatics 2008 - Environmental statistics
Averages • 1. Arithmetic mean: x* = 1/n⋅Σ xi • 2. Empirical median: x~ • 3. Empirical mode: M • 4. Geometric mean: x° • 5. Weighted arithmetic mean: x*g • 6. Weighted geometric mean: lg x° Enviromatics 2008 - Environmental statistics
Variances: • 1. Range: R = xmin - xmax • 2. Empirical variance: s2 • 3. Empirical standard deviation: s = √s2 • 4. Empirical coefficient of variation: v = s/x*⋅100 (%) Enviromatics 2008 - Environmental statistics
Coefficients of correlation • 1. Bivariate correlation coefficient • 2. Performance index (coefficient of determination) B = r2 • 3. Multiple correlation coefficient • 4. Multiple performance index • 5. Spearman’s rank correlation (small sample size, normal probability distributionnot necessary) Enviromatics 2008 - Environmental statistics
Statistical tests • In sample statistics the characteristics of interest are often expressed in termsof sample parameters such as average μ or variance σ 2. Other questions arisefrom comparing two or more samples. They may be expressed by the differencesof averages. • A statistical hypothesis is a statement about the sampledistribution of a random ecological variable. • Hypothesis testing consists of comparingstatistical measures called test criteria (or test statistics) deduced fromdata sample with the values of these criteria taken on the assumption that agiven hypothesis is correct. Enviromatics 2008 - Environmental statistics
Hypothesis testing • In hypothesis testing one examines a Null hypothesis H0 against one or morealternative hypotheses H1, H2,…,Hn which are stated explicitly or implicitly. • Toreach a decision about the hypothesis an arbitrary significance level α is selected(0.05, 0.01 or 0.001). The confidence coefficient ε is given by ε = 1 – α.For hypothesis testing the test criterion (or test statistics) is set up. If this statisticfalls into the range of acceptance, then the Null hypothesis can not be rejected. • On the other hand, when this statistics falls into the region of rejection,then the Null hypothesis is rejected. The probability of the test statistic falling inthe region of rejection is equal to ε. It is expressed in %-values. Enviromatics 2008 - Environmental statistics
Procedure for hypothesis testing • The Null hypothesis H0 and an alternative hypothesis H1 have to be formulated. • The significance level α has to be selected. The test statistic is chosen. The regionof rejection of the test statistic on the basis of its probability distribution andthe significance level is determined. • Test statistic is calculated from data set.The Null hypothesis is rejected and the alternative hypothesis is accepted whenthe value of the test statistic falls into the region rejection. • The Null hypothesis isaccepted if the value of test statistic does not fall into the region of rejection. Enviromatics 2008 - Environmental statistics
Example • From sampled data an average m was calculated and is now comparedwith an expected value K (a fixed number). • The Null hypothesis H0: m = Kis tested against the alternative hypothesis H1: m ≠ K. The significance level α =0.05 is selected and the test statistic is chosen: • t = |m - K|/s ⋅√n. If the test statisticfalls into the region of acceptance of the Null hypothesis, that means tα/2 < t< t1-α/2, H0 cannot be rejected. T • he power of the test depends on sample size n.The bigger the sample size (more information is available), the stronger theconfidence of the test. Enviromatics 2008 - Environmental statistics
t – Test (Student – Test) • The test statistic tcalc = |x* - μ0|/s⋅√n, • where x* - sample mean, • μ0 – expectationvalue of the ensemble, • s – standard deviation, • n – sample size. • Decision: Acceptanceif tcalc < ttab, otherwise rejection. Enviromatics 2008 - Environmental statistics
Comparison of means (t-test) • The test statistic t = |x* - x**|/sd ⋅√n*⋅n** / (n* + n**), where x* - first sample • mean, x** – second sample mean, s* – first standard deviation, s** – second • standard deviation, n* – first sample size, n** – second sample size, n-1 – degrees • of freedom and sd = √((n*-1)s*² + (n**-1)s**²)/(n*+n**-2). Decision: Acceptance • if tcalc < ttab, otherwise rejection. Enviromatics 2008 - Environmental statistics
Comparison of variances (F – Test) • The test statistic: F = (s*/s**)2 ≥ 1, where s* is the standard deviation of the first • sample, s** is the standard deviation of the second sample. Decision: Acceptance • if Fcalc < Ftab, otherwise rejection. Enviromatics 2008 - Environmental statistics
Outlier – Test (NALIMOV-Test) • The test statistic: r = |(x+ - x*)|/s⋅√n/(n-1), where x+ is to be expected as an outlier, • x* is the expectation of the sample, s is the standard deviation of the sample, • and n – sample size. Decision: Acceptance if rcalc < rtab, otherwise rejection Enviromatics 2008 - Environmental statistics
Environmental statistics The End Enviromatics 2008 - Environmental statistics