150 likes | 272 Views
Exploratory Data Analysis; coined by Tukey 1977. -Illuminate underlying pattern in noisy data -Predecessor to formal analysis -May lead to different analysis than originally planned. Data visualization (The first thing you do with your data!!).
E N D
Exploratory Data Analysis; coined by Tukey 1977 -Illuminate underlying pattern in noisy data -Predecessor to formal analysis -May lead to different analysis than originally planned Data visualization (The first thing you do with your data!!)
Important functions of exploratory data visualization • Spot outliers • Discriminate clusters • Check distributional and other assumptions • Examine relationships • Compare mean differences • Observe a time-based process http://seamonkey.ed.asu.edu/~alex/teaching/WBI/EDA.html
Univariate data (one variable); frequency distributions Distributions of height, biomass, etc…. often used to describe populations • How are the data distributed (including summary/descriptive statistics) • Are the data normal? (required to meet assumptions of many statistical techniques- more later) • If not normal, can they be transformed?
Histograms • Raw data hidden • Division to categories arbitrary • Excel, many programs Identify skew, non-normality Identify outliers
quiz scores 20 20 21 25 29 32 36 37 38 41 44 46 50 53 58 Stem-leaf plots -show original data -division to categories arbitrary -easier to order data first -a histogram on its side (sort of) Stem leaves 2 0 0 1 5 9 3 2 6 7 8 4 1 4 6 5 0 3 8
Box (box-whisker) plots • -calculate median, draw horizontal line • -draw a box with ends at the quartiles Q1 (25%) and Q3 (75%) • extend the "whiskers" to the farthest points that are not outliers • outliers are outside 3/2 times the interquartile range (Q3-Q1) • Draw a dot for every outlier Can be done for a single distribution or comparing several http://mathworld.wolfram.com/Box-and-WhiskerPlot.html
Bivariate (2 variable) data • -Relationship between the 2 variables • Are there outliers? • Examined by Scatterplots negative none
Non-linear Graphing helps you see relationships. Formal analysis guided by a priori knowledge that one variable causes change in the other (more later)
Classified Data: often result from an ecological experiment • - Bar chart • Shows means and variance • - “shows” treatment differences & magnitude 15 10 5 Epilithon NPP (mg O2/m2/hr) 0 -5 high light low light -10 Mean one S.E.
List things that are wrong with this graph. 15 10 5 Epilithon NPP 0 -5 -10
Graphing Exercise Obtain a dataset, preferably your own or a colleague’s, but can be anything Choose a graphing style that best illustrates the “message” of your data Use Excel or other program to make a graph Print on an overhead to show in class