1 / 6

XClean in Action: XML Data Cleaning System

Learn about XClean, an XML data cleaning system designed to address various types of errors in data, such as typos, different data formats, missing and contradictory data, and duplicates. Explore its methodology and possibilities for reuse in data cleaning processes. See a demo of XClean in action.

Download Presentation

XClean in Action: XML Data Cleaning System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. XClean in Action Melanie Weis, HPI Potsdam, Germany Ioana Manolescu, INRIA Futurs, France CIDR 2007 05.11.2006 |

  2. What is XClean? • XClean is an XML data cleaning system. • Types of errors that require data cleaning: • Typos • Different data formats (e.g., date, abbreviations, language) • Missing data • Contradictory data • Duplicates Melanie Weis, Hasso Plattner Institut Potsdam, 18.01.2007

  3. Where do we find Duplicates? False Duplicate Melanie Weis, Hasso Plattner Institut Potsdam, 18.01.2007

  4. How do we get rid of dirty data? • Quick fix (get glasses) • Start over again next year(get new, expensive glasses) • Clear methodology(Clearly defined processing stages that combine) • Possibility to reuse (parts of) a solution No! Yes! Melanie Weis, Hasso Plattner Institut Potsdam, 18.01.2007

  5. Data Cleaning with XClean • XClean/PL • Declarative • Modular • Readable XQuery CleanXMLdata DirtyXMLdata XQuery Processor Set of clearly defined cleaning operators. Melanie Weis, Hasso Plattner Institut Potsdam, 18.01.2007

  6. Come see the demo! • XClean Java plugin • Supports • Writing XClean/PL • Compiling XClean/PL to XQuery • Executing XQuery to obtain clean data Melanie Weis, Hasso Plattner Institut Potsdam, 18.01.2007

More Related