250 likes | 318 Views
Meaningful Labeling of Integrated Query Interfaces. Eduard C. Dragut (speaker) Clement Yu Weiyi Meng. University of Illinois at Chicago University of Illinois at Chicago SUNY at Binghamton. VLDB 2006, Seoul, Korea. A Motivating Scenario. Looking for a ticket
E N D
Meaningful Labeling of Integrated Query Interfaces Eduard C. Dragut(speaker) Clement Yu Weiyi Meng University of Illinois at Chicago University of Illinois at Chicago SUNY at Binghamton VLDB 2006, Seoul, Korea
A Motivating Scenario • Looking for a ticket • Chicago – Seoul, September 10th – September 17th delta.com orbitz.com expedia.com • A user looking for the “best” price for a ticket: • Has to explore multiple sources • It is tedious, frustrating and time-consuming E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces
Formulate the query The goal • Provide a unified way to query multiple sources in the same domain The Web Unified query interface Airfare.com priceline.com united.com delta.com nwa.com E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces
Auto Car Rental Books Extract query interfaces Cluster query interfaces Match query interfaces He05, Zhang04 B.He03, Dhamankar04, Doan02, Madhavan05, Wu04 Airfare Peng04 Various formats e.g. ASCII files Integration of Interfaces H.He03, Dragut 06 Overview Integrating Query Interfaces (Deep) Web E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces
Overview Integrating Query Interfaces • Integration Steps: • Structural merging of query interfaces [He03 et al, Dragut06 et al] • Grouping constraints • Ancestor-Descendant relationships • Determining the domain of each global field in the integrated interface [He03 et al] • Meaningful labeling of the integrated interface • The topic of this presentation E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces
Motivation of Naming • A query interface needs to be easily understood by any user, irrespective of his/her background • The study of query interfaces in the seven domains used in our experiment revealed that the designers of query interfaces follow some “hidden” norms: • there are certain relationships between the labels of the fields in the same groups • E.g., all plurals • the labels of the (super) groups semantically characterize the set of fields underneath them • The semantic ambiguity problem • Synonyms and homonyms are the two sources of naming conflicts [Batini86 et al, Bright94 et al] E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces
The objectives • The main goal is to provide a systematic way to label fields in the integrated query interface so that the concepts on the integrated query interface are easily understood by ordinary users. • Validated undergoing a survey • Provide a set of desirable propertiesrequired in order to have consistent labels for the attributes within an integrated interface so that users have no difficulty in understanding it. • Not covered in detail E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces
Naming Algorithm • The input • A set of query interfacesin the same domain • E.g. Airline domain: Delta, AA, NWA, Orbitz, Travelocity • Each query interface is represented hierarchically [Wu04] • The mapping between the fields of the query interfaces. • Organized in clusters (e.g. [Wu04 et al, B.He03 et al]) • The set of groups of fields given by the merge algorithm [Dragut06 et al] • The integrated query interfacegiven by the merge algorithm as a schema tree [Dragut06 et al] vacations.net E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces
An Example of Input • Three fragments of query interfaces represented hierarchically • The mapping between them, i.e. the set of clusters E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces
Naming Algorithm - Sketch • Step 1: Consistent labeling of the fields • Fields in the same group - use intersect-and-union strategy • Isolated fields, no consistency required • Root fields - treated as a group • Output: each group of fields (or field) has a set of candidate labels, possibly empty • Step 2: Consistent labeling of the internal nodes • For each internal node, starting from the lowest level to the root, apply a set of inference ruleson labels • Output: each internal node has a set of candidate labels, possibly empty • Step 3: Enforce consistency within the entire integrated interface • Not covered E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces
Preliminaries • Normalization [e.g., He03 et al, Madhavan01 et al , Rahm01 et al] • E.g. Adults (18-64)becomes adult • Semantic relationships among complex labels need to be established • E.g., synonymy, hypernymy/ hyponymy • Main issues • Thesauruses provide semantic relationships only for individual content words (e.g., WordNet [Fellbaum98]) • How to show that Area of Studyis a synonym of Field of Workin the Job domain? • How to show that Class is a hypernym of Class of Ticketsin the Airline domain? E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces
Preliminaries • Manipulation of labels • A label is seen as a set of normalized content words • E.g., {area, study} corresponds to Area of Study • E.g., {field, work} corresponds to Field of Work • Area of Studyis a synonym ofField of Work • Area is synonym of Field(by WordNet) • Study is synonym of Work(by WordNet) • Most descriptive vs. most general labels • e.g. Category, Job Category, Area of Work, Function • Category and Function – too general • Job Category and Area of Work – descriptive, avoids confusion E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces
Consistent Labeling of Groups of Fields • Assumption: • The labels given by a query interface for the fields in the same group are consistent • Organize the labels of a group in a relation-like form, called group relation • General idea to build a consistent solution: • Combine multiple rows of consistent labels until a label is assigned to each field in the group E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces
Consistent Labeling of Groups of Fields • Levels of Consistency • String Level • Two distinct tuples belong to this level of consistency if they have the same label for a cluster in the group relation • Equality Level • Two distinct tuples belong to this level of consistency if they have equal labelsfor a cluster in the group relation • Synonymy Level • Two distinct tuples belong to this level of consistency if they have synonym labelsfor a cluster in the group relation E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces
Consistent Labeling of Internal Nodes • The problem • Given an internal node in the integrated interface, determine a label that is semantically suitable for it, i.e. its semantic is rich enough to cover the semantics of all its descendant leaf nodes • An example • a fragment of the integrated interface of real Estate domain E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces
Consistent Labeling of Internal Nodes • In assigning labels to internal nodes we mainly exploit two types of knowledge: • The semantic relationship among the labels of the internal nodes in the individual schema trees • The relationship between internal nodes of source schema trees with overlapping sets of descendent leaves • The two types of knowledge are employed to derive a set of logical inference rules among the textual labels • Some of them will be exemplified next E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces
Consistent Labeling of Internal Nodes • First logical inference • Informally, consider two internal nodes v1 and v2 of two distinct source schema trees with the property that: • v1’s set of descendant leaves is a subset of v2’s set of descendant leaves nodes, • and v1’s label is a hypernym of v2’s label • Then the labels of the two nodes are semantically equivalent within the given domain of discourse • An example: E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces
Consistent Labeling of Internal Nodes • Second logical inference (the idea): • The same label is assigned to internal nodes in multiple source query interfaces and the descendant leaves of each such internal node are among those of the internal node in the integrated interface for which a label is sought. • An example: • Fragment integrated query interface • Within source query interfaces E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces
Consistent Labeling of Internal Nodes • Third logical inference (hypernymy scenario) • Informally, consider two internal nodes v1 and v2 of two distinct source schema trees with the property that: • v1’s label is a hypernym of v2’s label • Then v1’s label semantically covers the union of the descendant nodes of the two nodes. • An example: • Fragment integrated query interface • Within source query interfaces E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces
Where can the instances help? • Discard labels as values • The problem is known as schema element name as value [Xu03, Dhamankar04] • Example, in the Book domain labels like Hardcover or Paperback are data instances of fields with labels like Format or Binding • Reconcile most general vs. most descriptive • The idea is to bound the meaning of the most general label to a more descriptive one E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces
Experiment • Setup • Seven real world domain: • Used also in Wu04 et al, Madhavan05 et al, Dragut06 at al E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces
Experiment • Human Acceptance • Questions asked: • Do you have any difficulty in filling in an entry for each field? • If you do, identify the fields you had difficulty filling in. • Are the fields understandable on the source interfaces? • 11 Survey respondents reported the following: E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces
The source query interface Example Integrated Interfaces • Airfare domain integrated interface Four people found the group confusing E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces
Example Integrated Interfaces • Auto domain integrated interface • No surveyed person has identified any problem for this integrated query interface E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces
End • Please visit the project web site • http://www.cs.uic.edu/~edragut/QIProject.html Thank you for your time and patience! E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces