PAIR project progress report

PAIR project progress report Yi-Ting Chou Shui-Lung Chuang Xuanhui Wang

Motivation • A lot of information exists distributed and unstructured on the Web • Web IE: To extract and organize such information into a structured format • E.g., Person (name, contact (email, phone, address), research interests,…) • E.g., Book (title, authors, price, ISBN,… )

Example Person (name, contact (email, phone, address), research interests,…) …… Page 1 Page 3 Page 2

Motivation (cont.) • Direct Web IE is very hard. • E.g., distributed and unstructured • This project is to provide a instance-attribute retrieval engine towards this problem • In this project, We focus on personal information. • The attribute should be given (e.g. contact).

Flow Chart Pages PageCollector SegmentTool Name Trees AttributeExpansion Attribute* Attribute Retrieval Rank List

Why tree structure for page segmentation?? • The parameter which controls the size of leaf block is difficult to tune • Our Solution: score each node of the tree instead of the leaf blocks. Then select the appropriate node to rank.

Current Progress Pages PageCollector SegmentTool Name Trees AttributeExpansion Attribute* Attribute Retrieval Rank List

Demo

The remaining task • 1. Improve the accuracy for single page. • 2. Extend to multiple pages: • INPUT: a person name (instead of a URL) and attribute name. • OUTPUT: a rank list of the blocks.

Issues for discussion • The possible problem of our method • E.g. how to effectively score and rank the “node” of the page “tree”? • The way to improve and extend our method • E.g. how to combine with the NLP/Name-Entity-Extraction on the retrieved blocks • E.g. How to deal with multiple page and duplicated information • The evaluation suggestion of our method • E.g. user study, anything more?? • The relation with Entity Retrieval • ??

PAIR project progress report

PAIR project progress report

Presentation Transcript

Progress Report: Project 4 Web Solutions

The Skolkovo Project, Progress Report

Geneva Baja SAE Project Progress Report

Project Scheduling Progress Report

Project Progress Report

Project 1 Progress Report

Project Progress Report

2008 Senior Project Progress Report

2008 SENIOR PROJECT PROGRESS REPORT

Microfluidics Chromosome Sorter Project Progress Report

Final Project Progress Report

Progress Report

Falkirk Falls Management Project Progress Report

PRIME/ GreenLight project Progress Report

HDF4 OPeNDAP Project Progress Report

Project COUNTER -a progress report

AOC Project Progress Report

AOC Project Progress Report

The BeamCal Simulation Project Progress Report

Engineering Project Progress Report #1

Progress Report of the Mosaic Project

MiM Project Progress Report