150 likes | 227 Views
TQL (text query language). Alexander Kotov Sungeun Kim Yeonjung Chung. Problem definition. Text data is the most commonly used way of information storage and transfer; Can we automatically extract the knowledge from such data. How? How to efficiently store and access such information;
E N D
TQL (text query language) Alexander Kotov Sungeun Kim Yeonjung Chung
Problem definition • Text data is the most commonly used way of information storage and transfer; • Can we automatically extract the knowledge from such data. How? • How to efficiently store and access such information; • Information extracted from natural language text is not completely reliable (many contradictory sources); • Textual information has highly unstructured nature (relational model does not work);
Problem definition • Named Entity Taggers can extract entities and dependency parsers can extract relations with reasonably high accuracy (extraction); • Entity-Relation Graph is a simple and powerful way of representing textual information (storage); • We can design a probabilistic measure of trustworthy of information (quantify the probability of information being correct) => “Fuzzy” Entity-Relation graph. (unreliability) Access ?!
Problem definition Washington, D.C. R-CAPITAL-PLACE United States A-WHY-FAMOUS-PERSON Massachusetts S-BIRTHPLACE John Fitzgerald Kennedy A-DEFINITION-PERSON S-BIRTHDATE Boston Jacqueline Kennedy May 29th, 1917
Text Query Language • Generalization is achieved by defining a minimal, yet powerful set of operators; • Should support multiple application scenarios; • Generalized and flexible structure, like that of SQL is not possible; • Declarative vs. Functional
Application scenarios • Inference (infer relations or linkages between entities or find an entity that is "remotely connected" with some known entities); • Navigation (the goal is to navigate from some known entities to other interesting (unknown) entities); • Comparison (the goal is to compare two groups of entities to figure out differences and similarities).
Possible queries • query about the connection between two entities. The result will allow us to identify a path or a set of paths in the entity-relation graph, consisting of entities and relations between them; • query about the entities that are related to a particular entity (neighbors finding); • query about similar entities (entities related to the same entities); • query about entities satisfying certain conditions of arbitrary complexity (e.g., find entities that are in particular relations with some other entities);
Data Definition Language TYPE (TypeName) REL(Entity(Type),Entity(Type)) INSTANCE ( REL(Entity(Type), Entity(Type)), REL(Entity(Type), Entity(Type)), REL(Entity(Type), Entity(Type)), REL(Entity(Type), Entity(Type)), )
“John goes to school in Urbana.” 0 verb 1 2 subj:person mod 3 pcomp-n 4 mod 5 pcomp-n:loc Data Definition Language - Instance = Sentence . John : subject ( type subj:person ) . goes : verb ( type verb ) . to, in : modifier ( type mod ) . school : complement ( type pcomp-n ) . Urbana : complement ( type pcomp-n:loc ) - Instance a set of relation : exist 5 relations INSTANCE( REL(John(subj:person), goes(verb)), REL(goes, to(mod)), REL(to, school(pcomp-n)), REL(school, in(mod)), REL(in, Urbana(pcomp-n:loc)) )
Mining Language FIND CONNECTION (Entity1, Entity2) FIND RELATED(Entity) FIND SIMILAR (Entity) FIND ENTITY (Entity(Type), CONSTRAINTS ( REL(Entity(Type), Entity(Type)), REL(Entity(Type), Entity(Type)), REL(Entity(Type), Entity(Type)), REL(Entity(Type), Entity(Type)), REL(Entity(Type), Entity(Type)) ) )
Query execution School to Query: FIND RELATED (John) 1 1 1 go in 1 1 Urbana John live in 1 1 1 1 1 Be at 1 1 Brother University of Illinois • John goes to school in Urbana • John is a brother of Mary • John lives in Urbana • Mary graduated from University of Illinois at Urbana • Mary lives in New York 1 1 1 1 1 Mary graduate of from 1 1 1 in live New York
Query execution Query: FIND CONNECTION (John, New York) School to 1 1 1 go in 1 1 John live in Urbana 1 1 1 1 1 Be at 1 1 Brother University of Illinois 1 1 1 1 1 Mary of graduate from 1 1 1 New York in live
Query execution Query: FIND SIMILAR (John) School to 1 1 1 go in 1 1 John live in Urbana 1 1 1 1 1 Be at 1 1 Brother University of Illinois 1 1 1 1 1 Mary of graduate from 1 1 1 in live New York
Technical challenges • Flexible parser; • Entity-Relation graph is a complex data structure with significant level of redundancy (hashing and complex indexing to reduce space); • Maintaining type information and consistency between typed entities; • Implementation of efficient query execution strategies.
Future work • Extension of the language by adding new operators (coverage); • Optimization of query execution performance (efficiency); • Automated generation of instances from natural language sentences (usability).