Index Structures of User Profiles for Efficient Web Page Filtering Services (Part.1)

Index Structures of User Profiles for Efficient Web Page Filtering Services (Part.1) Yi-Hung Wu and Arbee L. P. Chen

Outline • Introduction • Related approaches • 1 Proposed approaches

Introduction • Search engine: a centralized processing approach. • Meta-search engine: a distributed processing approach. • Information filtering services

Search engine • A search engine often collects the representative descriptions of the web pages before the users’ information needs. • 2 ways to reach the goal • Providing a registration form for a web page author to describe what the web page contains. (Yahoo) • Fetching the web pages and to extract the representative information periodically (Google)

Meta-search engine • A meta-search engine only keeps the summary information of the individual search engines, such as the index size, the frequency of updates, the search capability, and the expected response time. • When the user issues a query, a meta-search engine selects the best set of search engines to answer the query

Information filtering services • Why? • the performance of the search engines will get worse if the number of web pages grows rapidly • What? • Finding good matches between the web pages and the users’ information needs

Information filtering services • How? • Users first give descriptions about what they need, which are called user profiles, to start the services. A profile index is built on these profiles. A series of incoming web pages will be put into the matching process. Each incoming web page is represented in the same form of the user profile. • In this way, the users who are interested in an incoming web page can be identified by comparing the descriptions of the web page with each user profile.

Information filtering services

Related approaches • An Example • The counting method • The tree method

An Example

The counting method

The counting method • COUNT is used to keep the number of occurrences for a profile during traversing the inverted lists in the matching process. • During the matching process, the inverted lists of the keywords extracted from the web page (abcf) will be traversed one by one • Pi is a match if the values in the corresponding entries of TOTAL and COUNT are equal.

The tree methoda

The tree method • Index tree • P-node: the nodes which store profiles • K-node: the nodes which store keywords • External path:a path that starts from the root passes a consecutive sequence of k-nodes and ended with a p-node • If the keyword in a k-node is not contained in the set of keywords extracted from a web page, all the external paths containing this k-node will not cover any matched profiles.

Method 1: index path with path signatures

Method 1: index path with path signatures • This method guarantees that no repetition will occur in the index. • Considering the keywords of example page in Table 1(abcf), its path signature equals 11 at node b and 110 at node d. Therefore P1 is a match, but P2 is not. (We know that the keyword of matches most be the subsets of web page`s.)

To Be Continued… Thanks for your attention

Index Structures of User Profiles for Efficient Web Page Filtering Services (Part.1)

Index Structures of User Profiles for Efficient Web Page Filtering Services (Part.1)

Presentation Transcript

Page 0 of 31: Index

Index Structures

Efficient Probe Filtering

Index Structures for Querying the Deep Web

Web Services - Part 1

Index structures

User / Admin / Installer Profiles

Genre and Task for Web Page Filtering

Index Structures

Web Services Navigator: Visualizing the Execution of Web Services

Clustering-based Collaborative filtering for web page recommendation

Linguistic Web Services for Semantic Web

Index Structures

Algorithms for Efficient Collaborative Filtering

WEB SERVICES

securing web services, part 1

Index Structures for Files

Geometric index structures

Index Structures for Files

Genre and Task for Web Page Filtering

Efficient PCF shadowmap filtering

Page 0 of 31: Index