140 likes | 242 Views
Exploratory Data Analysis and Multi- variate Strategies. Simon French simon.french@warwick.ac.uk. Aims of Session. Understand the value of ‘looking at’ your data rather than analysing it Be aware of terminology and basic ideas in: Exploratory Data Analysis Multivariate Analysis
E N D
Exploratory Data Analysis and Multi-variate Strategies Simon French simon.french@warwick.ac.uk
Aims of Session • Understand the value of ‘looking at’ your data rather than analysing it • Be aware of terminology and basic ideas in: • Exploratory Data Analysis • Multivariate Analysis • Data Mining • Ideas of data presentation and visualisation
Exploring and Visualising Data • EDA • Tables, charts, plots • Look for patterns or something interesting in 2 or 3 dimensions • Simple presentations of data • Multivariate Analysis • Factor Analysis, cluster analysis, etc • Identify patterns or something interesting in more than a few dimensions • Data mining • Automatic/Computer search for patterns in (parts of) large data sets. In all cases anything you find needs checking in all sorts of ways
Cynefin and statistics Uniqueevents exploratoryanalyses Repeatable events Events? Estimation andconfirmatoryanalysis
Cynefin and statistics Uniqueevents Actually you needexploratory statisticshere: outliers, residualanalysis, simple model checking Repeatable events Events?
Exploratory analyses • Look at the data • In any, repeatany analysis, look at the data • It is too easy for data to pass from web questionnaire to Excel to SPSS to analysis without your looking at the data. • Simple plots and tables • Tables – do not think them ‘simple’ to construct! • Histograms, Boxplots, Scatterplots, … • Useful in presenting results too • Generally easy to produce with Excel or SPSS • If you know what you are trying to achieve • References • A.S. Ehrenberg (1986b), "Reading a Table: an Example," Applied Statistics, 35 (3), 237-44 • M. Chapman and B. Mahon (1986). Plain Figures. Edn. London, Her Majesty's Stationery Office. • J. W. Tukey (1977). Exploratory Data Analysis. Edn., Addison-Wesley. • The exploratory data analysis chapter in most statistics texts.
Tables and Charts • Clarify in titles and notes • What the data are and where they come from • Units • 2 or 3 ideas can be shown/explored in a table or chart … no more • Do not make over ‘busy’ • x’s not dustbins for data on waste! • Do not introduce spurious features • E.g. number the data and accidentally introduce a ranking • Watch for cognitive aspects • Appropriate scales • Appropriate number of significant figures • In tables: put important variation down the columns • Use of colour • red-green bad (‘stop’) and good (‘go’) or just colour blind
Regression and Factor Analysis as exploratory analyses • Often (usually!!!) data is multi-dimensional • It is difficult to see the key trends and variations by eye • Regression and factor analyses reduce dimensions to the ‘significant’ ones
Regression Analysis x x x x x x x x x x x x x x x x x Describe the cloud of data points 16 (x,y) points = 32 numbers
Regression Analysis x x x x x x x x x x x x x x x x x Describe the cloud of data points Regression line: y = mx + cPlus standard deviation 3 numbers …Trend, base case, and spread
Factor Analysis x x x x x x x x x x x x x x x x x Describe the cloud of data points 16 (x,y) points = 32 numbers
Factor Analysis x x x x x x x x x x x x x x x x x Describe the cloud of data points Project each point onto line of greatest variation -- 16 numbers Keeps each item separate in summary
Data Mining • Huge data sets, many dimensions, very many inhomogeneous objects • Biological/genetic data • Large scale longitudinal population studies • Loyalty cards • … • Computer searches for patterns (often conditional patterns in parts of the data) • Beware: seek and you will find ….SO CHECK!!!!