1 / 11

Data Cleaning and Transformation

Data Cleaning and Transformation. Playing in the Mud. The Many Roles of Knowledge Workers. Brilliant IS idea. The Gist of the Problem. Getting data out of some system to: Analyze it (e.g. Excel, Access, stats package) Get it into another system (e.g. ERP)

bryce
Download Presentation

Data Cleaning and Transformation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Cleaning and Transformation Playing in the Mud

  2. The Many Roles of Knowledge Workers Brilliant IS idea

  3. The Gist of the Problem • Getting data out of some system to: • Analyze it (e.g. Excel, Access, stats package) • Get it into another system (e.g. ERP) • “Smart” manipulation of electronic reports with embedded data • Don’t want to do it manually (why?)

  4. A former student describing new job with major consulting firm • “Lots of modeling, muddy data problems, and working with OLAP tools and data warehousing. I know the muddy data area was a particular area of interest to you and it seems that it is a really BIG issue for many businesses - actually bigger than I imagined.”

  5. ACD Report Example Report Header Date Split Blank lines Data! Totals Lines Next report

  6. Why Talk About This? • Very common problem in business • [insert examples] • Huge amount of time wasted doing manual processing • Really useful spreadsheet and database skills (and mindset) • Example: from the EXCEL-L Developers listserv • It’s fun to play in the mud sometimes • Commercial products such as Content Extractor (formerly Cambio) (www.datajunction.com) are available if you need to do this stuff routinely. Doing it here will allow you to become power users of such products quickly. • Next few slides are screen shots from Content Extractor

  7. This is the main window where one defines style definitions for the different types of lines in the data file.

  8. The date is always in the same spot so we can use Fixed Column to get it out of the line we’ve defined as “DAY”.

  9. Use “Floating Tags” for data that could appear in different positions on a line. Ex: we don’t know how long the split name will be, but we do know that it will end with a period. For data appearing in headers/footers, (e.g. Split Name and #), we tell Content Extractor to “Propogate Field Contents” so we get the name and number with each detail line.

  10. Repeating day, split num and split name Check out our progress.

  11. Export to a wide variety of file types. ContExTest-acddata.asc Here’s an export I did to a delimited ASCII file.

More Related