Efficient Web Browsing on Handheld Devices Using Page and Form Summarization

Efficient Web Browsing on Handheld DevicesUsing Page and Form Summarization Orkut Buyukkokten, Oliver Kaljuvee, Hector Garcia-Molina, Andreas Paepcke and Terry Winograd (2002) An Overview by ShrenikSadalgi

Introduction • A new approach for summarizing and browsing Web pages: - Page summarization: Each Web page is broken into text units that can each be hidden, partially displayed, made fully visible, or summarized - Form summarization: HTML forms are also summarized by displaying just the text labels that prompt the user for input

Page Summarization Macro-level’ summarization by structural analysis of Web pages - expand & contract pages based on their relative structural nesting ‘Micro-level’ summarization uses information retrieval techniques to outline portions of the text for the user

Page Summarization – Macro Level • Partition page into ‘Semantic Textual Units’ (STU) - STUs are page fragments – DOM Elements • The proxy uses font and other structural information to identify a hierarchy of STUs - Nesting of STUs • Does not require special formatting at the Web sources - significant advantage of this approach over schemes that rely on pages to be specially structured for PDAs

Page Summarization – Micro Level • In this two-level approach to Web browsing, users can initially get a good high-level overview of a Web page, and then “zoom into” the portions most relevant. • Simple to implement • Effectiveness is limited - first sentence of a paragraph is not necessarily the best representation • Five methods for micro-level summarization

Page Summarization – Micro Level

Processing a web page request from a PDA (on Proxy)

Extracting Keywords • Evaluate each word’s importance - a word is important if it occurs frequentlywithin the text and infrequently in the larger collection • Wij = tfij*log2 (N/n) where, Wij - weight of term Tj in document Di tfij - frequency of term Tj in document Di N - number of documents in collection n - number of documents where term Tj occurs at least once

Extracting a Summary Sentence • Each sentence (S) in an STU is assigned a significance factor S with the highest significance factor becomes the summary sentence • Mark all the significant words in S - word is significant if its TF/IDF weight is higher than a previously chosen weight cutoff ‘W’ • find all “clusters” in S - the sequence starts and ends with a significant word - fewer than ‘D’ (distance cutoff) insignificant words must separate any two neighboring significant words within the sequence • Add weights of all significant words within a cluster & divide by the total number of words within the cluster

Results Task completion times for all methods and all tasks

Results I/O activity required for all methods over all tasks

Results Average completion time for each method across all tasks

Form Summarization

Form Summarization Process • Algorithms for finding a matching label for form input fields from form text • Chunk Partitioning • small pieces of HTML code that are delimited by HTML tags (not the same as STU) • Label Matching • N-Gram  “ants” and “grants” • Letter/Word  “First Name” and “Fname” • Word/Letter  “PhoneWork” and “PhoneW” • Substring  “Password” and “pwd” • NULL Algorithm  takes name of input tag • Tables • Previous / Following • check_box_label

Results Each Algorithm Matching 115 Input Fields Matching Performance for Algorithm Combination over 330 Input Elements

Thank You

Efficient Web Browsing on Handheld Devices Using Page and Form Summarization

Efficient Web Browsing on Handheld Devices Using Page and Form Summarization

Presentation Transcript

Wireless Handheld Computing Devices

Design of Handheld Devices

Issues Using Handheld Devices for Safety Applications

PDAs-Handheld Devices

Efficient Video Browsing

Handheld Devices 101

Enhancing Web Browsing Security on Public Terminals Using Mobile Composition

CREATING WEB PAGE FORM

Web Browsing Policy Compliance Monitoring Using Keylogging

Efficient Video Browsing

P2P and Handheld Devices

Comfortable Web Browsing

Web browsing

Web-Page Summarization Using Clickthrough Data*

HANDHELD DEVICES QUIZ!

Operation Reuse on Handheld Devices

BAITNET.COM WEB PAGE ORDER FORM

Handheld Surgical Devices Market

Biometric Identification Using Visual System Classification on Handheld Devices

Efficient Video Browsing

Remote Cardiology Consultations Using Handheld Devices

Cost efficient Web page Style Providers