200 likes | 500 Views
Advanced Excel for Data Analysis. Yale University StatLab. What will be covered. Following up on Intro to Excel Workshop Importing and exporting data Data management cleaning and manipulation Statistics and data analysis Pivot tables Graphing and charts Writing custom Macros/VBA
E N D
Advanced Excel for Data Analysis Yale University StatLab
What will be covered • Following up on Intro to Excel Workshop • Importing and exporting data • Data management • cleaning and manipulation • Statistics and data analysis • Pivot tables • Graphing and charts • Writing custom Macros/VBA • More time for specific questions at the end
Quick Review of Some Basics • We’re using Excel 2007 • Excel 2010 looks different again (improvement) • Recent Excel improvements/changes • More rows (not 65536 anymore) • Note new file suffixes: .xlsx and .xlsm • Not backwards compatible -save as .xls to re-open in older versions-free download allows older versions to open • More levels of sorting and filtering • Better charting
When and Why Excel • It’s a spreadsheet • Great for calculations and formulas • Okay for Data Management • But that’s what most people use it for • Should probably be using Access • Everybody has Excel (including Macs) • No VBA or SharePoint access for Macs • That’s how we got the Data • Ubiquitous file format widely imported and exported • We’ve been using it our whole lives, right? • Most of us actually only scratch the surface.
Importing And Exporting • Important for Importing and Exporting: Define Variable Types (string, date, int) • Open • “All Files” (*.*) • Text to Column, Delimited (basic) • Let’s see an example http://statlab.stat.yale.edu/help/workshops/introData/dataintro • StatTransfer • Supports Excel from/to anything • Does batch file conversion • Embedding in Word/Powerpoint • Keeps data linked so if you change data you don’t have to re-import to document
Working with Data • Treat Data Like Data, not a bunch of cells • Defining Data Ranges • Name your Sheets • Name your data ranges • Conditional formatting and Data Validation • Quick way to determine whether your data looks good • Set ranges so that if a value falls outside the allowed range, the cell formatting changes • Create a unique ID row if it doesn’t exist • Allows you to restore order and merge
Filtering, Sorting and Finding Data • Most common question: I need to find unique values or a subset of my data. • Sorting and Filtering – make sure you highlight the entire dataset! • Adding a conditional dummy variable is easy • Remember to copy and paste as values • Lookups • VLOOKUP and HLOOKUP • Used with named ranges
Text Manipulation • If cell A1 = “abcde ” • TRIM(A7)=abcde • gets rid of extra spaces • RIGHT (A7,2)=de • and LEFT, MID • SUBSTITUTE (A7, “c”, “o”)=abode • Comparing Cells (especially in Macros): • If cell A1 = “12345” (a string, not a #) • VALUE(A1)=12345 • Useful for building syntax, repetitive text for other programs
Statistics and Data Analysis • Many basic summary functions • AVERAGE, STDEV, CORREL, etc. • CONFIDENCE for confidence intervals • Probability distributions • NORMDIST, NORMINV, etc. • Array functions • LINEST, FORECAST • Data Analysis Tools • Data tab -> Data Analysis • If not installed, upper left circle -> Add-Ins • Linear regression, Chi-square
Pivot Tables and Pivot Charts • Easy snapshot views of your data. • Pivot charts for Large Data • Great for transforming time series data into tables • Be aware: Default calculation is “COUNT” and you probably want “SUM” • Click-and-drag interface makes this easy to use, but requires manual refresh anytime your data changes
Graphing and Charts • Dependent on column/row as x,y • You may need to copy, paste special/transpose to get the chart type you want • Charts for publishing • Black and white usually preferred • Patterns not gradients • Dots Per Inch (DPI) • Charts should depend on what is being conveyed • Bar graph (measurements) • Scatter plots (correlation, multivariate) • No Pie Charts
More Advanced Charts • Trend lines • Changing the data to series • Removing/Hiding lines • Error Bars • Stock (High-Low-Close) • Excel wants defined variable names, but it can be tricked • Excellent tips for advanced graphing online • http://peltiertech.com/
Writing Your Own Macros • Be careful: You can’t UNDO running a macro. • Use the record function to get the idea and then customize for your needs. • Basic structure is Object Oriented • Object.property() • The built-in Editor will show options • Supports If, then, else, for loops , while loops • Be careful: You can’t UNDO running a macro.
More advanced • Financial Calculation functions • Present value, IRR, amortization • Database connections • What-if analysis • Forms (with Data Validation) • Password protection
Resources • Lots of Yale resources available • StatLab Consultants • Sherlock Campbell • Themba Flowers • Orbis.yale.edu • We have access to several online resources • Excel Hacks (O’Reilly) • YouTube. (Oh really?) • Internet Searches • Bing might actually be better for Office
Thank you Questions?