220 likes | 439 Views
Web Crawler Agent (WCA). Presented by Kirk Martinez University of Southampton. Introduction. WCA searches for missing information (fragments) on the Web WCA structures information into ontology “place_of_birth” (Person,Place)
E N D
Web Crawler Agent(WCA) Presented by Kirk Martinez University of Southampton
Introduction • WCA searches for missing information (fragments) on the Web • WCA structures information into ontology “place_of_birth” (Person,Place) • Techniques used: NLP (Natural Language Processing), Information extraction, relation extraction, question answering
Is it something like “Google”? • Search “date_of_birth” (when Rembrandt was born) with Google
Searching information with Google • The “old” Web Search (eg Google) is good for getting documents but NOT for extracting concise answers • (e.g. “15-July-1606”) • No analysis to “understand” the documents (e.g. “Rembrandt” can mean “hotel” or “bookstore”)
Information extraction on the Web • data may be low quality and repeated • e.g. Seurat Georges’s date of death • 29, March 1891(http://www.ibiblio.org/wm/paint/auth/seurat/) • 19, March 1891 (http://www.rickdoble.net/influence/20seurat.htm) • WCA depends on: • Well-structured sentences and documents • Good named-entity recognisers
Future work • verification • performance • autonomous