170 likes | 302 Views
Describing the variety – Descriptive statistics. Reminder I:. Recall that statistics has two main field (see Lecture 1):. Statistics. Descriptive statistics. Inductive (inferential) statistics. Reminder II:. Types of variables (see Lecture 1):. Variable.
E N D
Reminder I: • Recall that statistics has two main field (see Lecture 1): Statistics Descriptive statistics Inductive (inferential) statistics
Reminder II: • Types of variables (see Lecture 1): Variable qualitative(nominal and ordinal scale) called factor in R quantitative(interval and ratio scale) called numeric in R discrete continuous
Descriptive statistics • The aim of descriptive statistics is basically twofold: • exploring the statistical nature of our data • summarizing and displaying our data in a concise, compact way • To achieve this aim we • compute statistics of position and dispersion from the data • make graphs to display graphically the data.
Descriptive statistics computed from the data • Statistics of position • Arithmetic mean (average):(The mean of the statistical population is usually denoted by μ (Greek mu)).
Median: • This is the middle value of a ranked data set. • If your sample size is an odd number, median is simply the data in the middle of the ranked data:E.g: sample size (n): 7 data: 4, 6, 2, 5, 3, 2, 3 ranked data: 2, 2, 3, 3, 4, 5, 6 median: 3 • If your sampe size is an even number, median is the mean of the two middle-positioned data of the ranked data set.E.g: sample size (n): 6 data: 4, 6, 5, 3, 2, 3 ranked data: 2, 3, 3, 4, 5, 6 median: (3 + 4) / 2 = 3.5
Mode: • This is the most frequent data value in the data set.E.g: data: 3.5, 1.1, 2.3, 1.9, 2.3, 2.5, 2.3 mode: 2.3
Statistics of dispersion • Range: • This is simply the difference between the largest and the smallest data value of the data set.E.g: data: 5, 2, 6, 9, 12, 4 largest value (maximum): 12 smallest value (minimum): 2 range: 12 – 2 = 10
Interquartile range (IQR): • This measure of dispersion works on ranked data. • Quartiles (Q): the values that divide tha ranked data into four equal parts. • Fist quartile: the value from which the 25% of the ranked data is smaller. • Third quartile: the value from which the 75% of the ranked data is smaller. • Interquartile range is the difference between the third and first quartile.E.g:sample site: 12ranked data: 2, 3, 3, 4, 4, 5, 6, 6, 6, 7, 8, 9IQR: 6.5 – 3.5 = 3 1st Q: 3.5 3rd Q: 6.5
Variance: • It is the mean of the squared difference of the data values from their mean.(The variance of the statistical population is usually denoted by σ2 (Greek sigma)). • Standard deviation (SD): • This is the square root of the variance.(The variance of the statistical population is usually denoted by σ (Greek sigma)).
Standard error of the mean (SE): • This is the standard deviation of the mean. It describes the stochastic fluctuation of the sample mean.In other words, if we took a large number of repeated samples from the statistical population and computed the means of these samples, the standard deviation of the means would be equal to the value of the standard error.E.g. Standard deviation of the data (s): 2.5 sample size (n): 16 Standard error of the mean: 2.5 / 4 = 0.625
Confidence interval (CI) of the mean: • Provided we have a large sample from a statistical population, the mean of the population (μ) is 95% likely to lie between the values 1.96×SE – and1.96×SE + . • This region is called the 95% confidence interval of the population mean, because we can be 95% certain that it contains the population mean. • The endpoints of the confidence interval are called the lower and upper limit of the confidence interval.(If we have a small sized sample the multiplier is not 1.96, but a different value originated froma t distribution.)
Basic statistical graphs • Graph for a qualitative variable: • Pie chart: • Shows the relative amounts of the categories of a nominal variable (factor)
Graph for a discrete quantitative variable: • Bar chart: • Displays the frequency of the data values of the data set.
Graphs for a continuous quantitative variable: • Histogram: • Displays the frequency of data values fallen within a certain interval of the variable. • Each category on the x-axis represents a range of values. Number of observations (wheet grains) of which values lie between 23 and 24. N.B: Do not confuse with the barplot which displays frequencies of discrete data.
Boxplot: • This is a very useful graph, displays many features of the distribution of the data. maximum 3rd Q IQR median 1st Q minimum
Graph for displaying two quantitative variables together: • Scatterplot: • Displays the statistical relationship between two quantitative variables.