230 likes | 403 Views
Entity Queries. Seminar by Pankaj Vanwari Under guidance of Dr. S. Sudarshan. Overview of Presentation. Introduction to Entity Queries Keyword search on structured data Querying over unstructured data Entity queries using ontology based extraction Entity-relationship queries
E N D
Entity Queries Seminar by PankajVanwari Under guidance of Dr. S. Sudarshan
Overview of Presentation • Introduction to Entity Queries • Keyword search on structured data • Querying over unstructured data • Entity queries using ontology based extraction • Entity-relationship queries • Conclusion and future work Entity Queries by PankajVanwari under guidance of Dr. S. Sudarshan
Introduction • Query on database using keyword search • Restricted to retrieving pages/documents • Entity search on World Wide Web • Annotations and semantic links to text • Wikipedia, Word-Net, etc… as sources • Entity near queries, indexing and ranking • Entity-relationship search to find relationships between the entities Entity Queries by PankajVanwari under guidance of Dr. S. Sudarshan
Keyword search over graph structured data • Simple searching and browsing of data. • User types few keywords and then follows the hyper-links interactively. • Database is modeled as graph. • Uses proximity based ranking, based on foreign key and other similar links. • Useful in searching enterprise database for information without a query language. Entity Queries by PankajVanwari under guidance of Dr. S. Sudarshan
BANKS (Browsing ANd Keyword Searching) • RDB tuples constitute nodes of the graph. • Each foreign key- primary key link is a directed edge (to avoid “hubs”). • Link with higher importance is given lower weight. • Query result is a rooted directed tree. • Backward edge (v, u) with weight based on the number of links to v from the nodes of same type as u. Entity Queries by PankajVanwari under guidance of Dr. S. Sudarshan
Formal database model of BANKS • s(R(u), R(v)) denote the similarity between two relations R(u) and R(v) of nodes u & v. • If edge(u, v) exists but (v, u) does not then weight w(u, v) = s(R(u), R(v)) • If (u, v) does not exist and (v, u) does then w(u, v) = INv(u) * s(R(v), R(u)) • If both exists then the weight is minimum of the above equations. • Overall relevance score is obtained from the normalized edge and node scores. Entity Queries by PankajVanwari under guidance of Dr. S. Sudarshan
Querying over unstructured data • Worlds Wide Web supported keyword searching but not entity search. • Entities as first class citizens as opposed to pages. • No schema information on web documents to browse as in BANKS. • Statistics from large corpus with scoring and ranking from IR can be useful. • Challenges: Indexing and Annotations. Entity Queries by PankajVanwari under guidance of Dr. S. Sudarshan
CSAW • Scaling Entity Search to world wide web • Major components: Catalog, Corpus and Query Processor. • Data model of CSAW • Indexes used in CSAW system: The stem and full atype indexes, Reachability index and Forward index. • Scoring in CSAW: Selector energy, Gap and Decay and Aggregation. Entity Queries by PankajVanwari under guidance of Dr. S. Sudarshan
Entity Search with Dual-Inversion Index • Dual inversion index : Document inverted index and Entity inverted index. • Document inverted index: Given entity type E, maps to the documents where entity of type E occurs. • Entity Inverted Index: Entity instances as output from keywords as input. • Comparison of document and entity inverted indices. Entity Queries by PankajVanwari under guidance of Dr. S. Sudarshan
Entity Rank (Searching directly and holistically) • Integrates both local and global information in ranking. • ow(amazon customer service #phone) • Entity search needs to be contextual, holistic, uncertainty, associative, and discriminative. • Three layer model: Access (Global), Recognition (Local) and Validation (Hypothesis Testing). Entity Queries by PankajVanwari under guidance of Dr. S. Sudarshan
Entity queries using ontology based extraction • Knowledge representation model such as RDFS having general-purpose ontology on top of these representations. • Two ways of extracting knowledge structures automatically from text corpora: NLP/machine learning or human annotations. • YAGO, YAGO2 and ESTER all based on second approach with difference. Entity Queries by PankajVanwari under guidance of Dr. S. Sudarshan
YAGO (Yet Another Great Ontology) • YAGO combines Wikipedia categories with the Word-Net ontology. • Extracts facts based on fixed relations. • Fact is a triple having fact identifier I. y : I (I U C U R)XRX(I U C U R) • Compatable to RDF. • Relations: Type, SubClassOf, Means, … • Other relations: BornInYear, PoliticianOf,… • Meta relations: Describes, Context,… Entity Queries by PankajVanwari under guidance of Dr. S. Sudarshan
YAGO2 (extension of YAGO) • Focus on temporal and spatial knowledge. • Declarative rules stored in text files. Temporal dimension • Facts can only hold time points; time spans are represented by two relations. • 4 entity types (people, groups, artifacts and events) • 9 relations generalized to 2 relations (StartsExistingOn and EndsExistingOn). Entity Queries by PankajVanwari under guidance of Dr. S. Sudarshan
YAGO2 continued… Spatial Dimension • Harvests geo-entities from two sources Wikipedia and GeoNames. • class yagoGeoEntity groups all geo-entities related by hasGeoCoordinates to yagoGeoCoordinates. • 3 entity types (events, groups & artifacts). • 2 relations generalized to placedIn. • Relation occursIn holds fact and geo-entity. Entity Queries by PankajVanwari under guidance of Dr. S. Sudarshan
ESTER (Efficient Search on Text, Entities and Relations) • Combined full-text and ontology search system. Input is corpus and ontology. • Three components: An entity recognizer, a query engine, and a user interface. • Entity recognition adds at position 0, the artificial word < c >:< x > for each top-level category c of which x is an instance. For a fact (x; r; y) from YAGO add following artificial words: At pos1, add < r >:< p >, and at pos p, add entity :< y >. Entity Queries by PankajVanwari under guidance of Dr. S. Sudarshan
ESTER continued… • Query engine produces lists of word-in-document occurrences; each item consisting of a document-id, a word-id, a score, & a position within the document. • Two basic operations prefix search & join. • Given two occurrence lists, produced by prefix search, join operation computes a single list of all items whose word ids occur in both lists, and sorted by document id. • Proactive interface to user. Entity Queries by PankajVanwari under guidance of Dr. S. Sudarshan
Entity relationship queries over annotated web • Example query: “Find cities and countries in Europe where cities are capitals of respective countries”. • ERQ to handle relationships among entities across several pages. • High algorithmic complexity. • Scoring entities individually and aggregating the scores. Entity Queries by PankajVanwari under guidance of Dr. S. Sudarshan
WikiERQ: SSQ (Shallow Semantic Queries) • ERQ directly over text • Example query: “Find cities and countries in Europe where cities are capitals of respective countries”. • Position based BCM for ranking answers. Key components proximity, ordering and mutual exclusion. • Single predicate scoring • Multiple predicate scoring Entity Queries by PankajVanwari under guidance of Dr. S. Sudarshan
WikiBANKS • Extended graph model combines graph model of BANKS with document model. • Each Wikipedia page/document by a node in the graph. • Near query model: find C near (K) • Query evaluation algorithm: selection predicates individually as near query and then using entity lists to evaluate the relation predicates (2 approaches). Entity Queries by PankajVanwari under guidance of Dr. S. Sudarshan
WikiCSAW • ERQ over highly scalable CSAW system. • Queries in Master-Slave configuration • Category keyword mapping. • Optimizing ERQ over CSAW: Entity-Type and Keyword Pair Postings to improve merge step. Compound Token-AND Iterator. • Scoring based on Entity, Relation and node prestige with weights. Entity Queries by PankajVanwari under guidance of Dr. S. Sudarshan
Conclusion and Future Work • Challenges faced by different approaches. • Adding artificial words to link other pages by enterprise (manually or defining rules). • Integration of data by standards like RDF. • Domain-centric concept search to handle scalability. Ontology based mapping of user keywords to domains for higher accuracy. • Need for annotation of relations. • Complex operations for adhoc queries. Entity Queries by PankajVanwari under guidance of Dr. S. Sudarshan
Questions ? Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan
Thank You Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan