90 likes | 290 Views
A Brief Survey of Web Data Extraction Tools (WDET). Laender et al. Introduction. Web data is hard to query. A lot of unstructured data. Wrappers can help extract data. A wrapper maps a page to a repository . There are several ways to generate wrappers.
E N D
A Brief Survey of Web Data Extraction Tools (WDET) Laender et al.
Introduction • Web data is hard to query • A lot of unstructured data • Wrappers can help extract data • A wrapper maps a page to a repository • There are several ways to generate wrappers • This paper is a survey of different wrappers
Taxonomy of WDET • Languages for Wrapper Development • HTML-aware Tools • NLP-based Tools • Wrapper Induction Tools • Modeling based Tools • Ontology based Tools
Overview of WDET • Languages for Wrapper Development procedural programming languages(Minerva, TSIMMIS) • HTML-aware Tools W4F, XWRAP, RoadRunner • NLP-based Tools Uses free text form (RAPIER, SRV, WHISK)
Taxonomy of WDET • Wrapper Induction Tools Generates wrappers from input(WIEN,SoftMealy,STALKER) • Modeling based Tools Based on hierarchies of objects(NoDoSE, DEByE) • Ontology based Tools Uses Conceptual Models or Ontologies (BYU tool)
Qualitative Analysis • Degree of Automation • Support for Complex Objects • Page Contents: Semistructured data or text • Ease of Use • XML Output • Support for Non-HTML Sources • Resilience and Adaptiveness