120 likes | 225 Views
Using Ontological Relationships to Provide Indexing of Plain T ext Searches. Research by Fletcher Liverance fletcher.liverance@gmail.com November 14 th , 2011. How Does a Search Engine Work?. 1. User submits a keyword based query to the search engine.
E N D
Using Ontological Relationships to Provide Indexing of Plain Text Searches Research by Fletcher Liverance fletcher.liverance@gmail.com November 14th, 2011
How Does a Search Engine Work? 1. User submits a keyword based query to the search engine 4. Pages are ranked and returned to the user 2. The indexer locates all relevant pages containing those keywords 3. The database returns all pages found in the index
How Does a Search Engine Work? Benefits • Fast • Machine learnable • Straight forward Drawbacks • Pattern matching • Keyword based • Garbage in, garbage out
Garbage in, Garbage out Scenario You saw this television series and you’d like to find out more about it, but you don’t know what the name of the series or any of the characters are. What do you do? http://www.dan-dare.org/FreeFun/Images/CartoonsMoviesTV/WinnieThePoohWallpaper1024.jpg
Garbage in, Garbage out POOR RESULTS!
Garbage in, Garbage out GOOD RESULTS!
Semantic Relationships • Ontology “An ontology is a description (like a formal specification of a program) of the concepts and relationships that can exist for an agent or a community of agents.”http://www-ksl.stanford.edu/kst/what-is-an-ontology.html • Resource Description Framework (RDF) “RDF extends the linking structure of the Web to use URIs to name the relationship between things as well as the two ends of the link. Using this simple model, it allows structured and semi-structured data to be mixed, exposed, and shared across different applications.” http://www.w3.org/RDF/ Disney Winnie the Pooh Bear isMadeBy isA hasFriend hasClothing hasColor Piglet Shirt Yellow hasColor isA Pig Red
Semantic Relationships How can we locate useful semantic relationships? • Link Distance • Link Direction • Link Relationship Bear Disney hasColor isA isA isMadeBy isA Company Brown Winnie the Pooh Mammal hasFriend hasClothing hasColor Piglet Shirt Yellow hasColor isA hasRGB Pig Red 0xFFFF00
Modified Search Indexing 1. User submits a keyword based query to the search engine 4. Searches are ranked and returned to the user as additional search suggestions 2. Search analyzer creates additional searches based on ontological information 3. Search engine performs parallel searches of top search terms
Current Work • NASA SWEET Ontologies • 6000 concepts • 200 ontologies • Scientific • Loose relationships • National Oceanographic and Atmospheric Administration • 30+ years of scientific research • Text based • Unsorted • 2+ gigabytes • Domain specific terminology
Challenges & Future Work • How to rank plain text • No links or history • No ‘page views’ • Limited ontology coverage • 6000 concepts in NASA SWEET ontologies • ~170,000 words in the English language • Many more unique names and scientific terms • How can ontologies be automatically generated? • Graph matching • Identifying related terms in a large graph is difficult • Multiple links per node, must identify appropriate links