1 / 13

Data Cleaning Techniques

Data Cleaning Techniques. Tips and tricks for data management in Excel. Workshop on Emergency Information Management Neuhausen , Germany, 18-22 June, 2012 Christian Oxenbøll, Registration Officer , UNHCR. Data Cleaning. Why is it important? Bad data leads to wrong results

tia
Download Presentation

Data Cleaning Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Cleaning Techniques Tips and tricks for data management in Excel Workshop on Emergency Information Management Neuhausen, Germany, 18-22 June, 2012 Christian Oxenbøll, Registration Officer, UNHCR

  2. Data Cleaning • Why is it important? • Bad data leads to wrong results • Operational and management decisions should not be based on wrong information • Even “a few bad data” can make a whole dataset useless for statistics

  3. What is data cleaning? • Existing data: • Reviewing logic consistency of data • Reviewing reliability of data • Correction of wrong values • Deletion or suppression of erroneous values • Subsequent data cleaning can be reduced by proper design of data collection: • Make a data management strategy • Make sure you know how you will process collected data • Ensure consistency in design • Validation rules in Excel

  4. What are we looking for? • Common errors include: • 0 when it should be “N/A” (not available/not applicable) • Totals do not match underlying data • Typing errors (and use of different location names) • Wrong interpretation of questions • Mismatch of units (cases/persons, days/months, square metres/hectares, pct/ratios, flow/stock, etc.) • Missing data • Percentages e.g. indicator values >100% • Date formats (12/01/06 or 01/12/06)

  5. How do you clean data? • Think logic! • Look at the data • Reflect over whether it makes sense • Logical consistence (Mathematical/Statistical) e.g. Total population vs. children < 18 years • Meaningful (e.g. is it really true that refugees survive without water and the camp is 2 square meters?) • Reliability of source • Ask the data source about how data was collected • What is covered • What was the methodology Note that logical consistency alone does not imply that data is correct. Always check if data is meaningful

  6. How do you clean data? • Be creative! • Use graphs • To spot outliers (high/low values) • Pivot tables • To create summary tables of large datasets • Filters • Easy to spot outliers (note the limit in Excel of 1,000 in drop-down list) • Sorting • To spot outliers and spelling • Conditional formatting • To spot invalid and dubious values or outliers

  7. Example from Uganda SIR

  8. How do you clean data? • Be creative! • Lookup functions • Easy to find non-existing codes (typos) • Formulas • Check of mathematical and logic consistency • Compare with other sources (Triangulation) • Validation of values/expected ranges (do we have approximately the same) • Compare with previous years • Validation of values/expected ranges (do we have approximately the same)

  9. Useful Excel Tools • Validation (allows only certain values) • Auto filters • Conditional Formatting • Pivot Tables • Formulas

  10. Logic And Or If Not Mathematical/Statistical Average Count CountA CountBlank CountIf Dsum SumIf Rank Information Trim Concatenate Left Right Mid Len Find Proper Lower Upper IsBlank Vlookup Yearfrac Today Some useful Excel functions Use the help in Excel which gives guidance on the use of each formula

  11. Data cleaning: some tips • Design good data collection forms • Checking plausibility • Outliers • Trends analyses • Using graphical views • Triangulation • Using filters, functions and formulas

  12. Useful websites • Google your questions • Microsoft Online Help • www.functionx.com/excel • www.pivottableguru.com

  13. Exercises • Open the file: “Excel Training.xls” • Follow the instructions. Ask your neighbour or the facilitators if you need assistance. • In the file: “Excel Training Result.xls” you will find the result of the exercises including the formulas.

More Related