210 likes | 358 Views
“Find What I Mean, Not What I Say". Mike Moran IBM Distinguished Engineer November, 2007. Why do companies use search?. How does IBM OmniFind meet those needs?. OmniFind Enterprise Edition. OmniFind Yahoo! Edition. Scalable and Secure Enterprise Search for Corporate Intranets.
E N D
“Find What I Mean, Not What I Say" Mike Moran IBM Distinguished Engineer November, 2007
How does IBM OmniFind meet those needs? OmniFind Enterprise Edition OmniFind Yahoo! Edition Scalable and Secure Enterprise Search for Corporate Intranets Basic, No-Charge Search OmniFind Discovery Edition Insight Solutions with OmniFind Search for Self-Service and eCommerce Content Analytics
Why is search so difficult? • It is harder to think of words than to make choices • Choosing the same words as the author is not easy • Words are ambiguous 1 to 10 of 10 zillion
Pat phone The classic search model Task Misconception I need to tell Pat. Information Need Mistranslation How do I contact Pat? Verbal form What’s Pat’s phone number? Misformulation Query Ambiguity Search Engine
Sometimes your word is used too often Searching for “neon” finds signs and cars
Sometimes your word isn’t used at all Searching for “Pat phone” finds nothing Pat phone
Analytics bridge unstructured and structured data Text Analysis Unstructured Information Structured Information Text, Chat, Email, Audio, Video Indices DBs KBs • Explicit semantics • Efficient search • Focused content ...BUT... • Slow growing • Narrow coverage • Less current/relevant • High-value • Most current • Fastest growing • ...BUT ... • Buried in huge volumes (noise) • Implicit semantics • Inefficient search
Find what I mean, not what I say Rate for Rate Billboard Going rate for leasing a billboard near Triborough Bridge SEARCH: Bronx Located in No keywords in common, but a good answer Rate for Rate Billboard “…We were offered $250,000/year in 2001 for an outdoor sign in Hunts Point overlooking the Bruckner expressway. …” Bronx Located in
Without semantic search, it’s not a pretty picture Rate for Rate Billboard Going rate for leasing a billboard near Triborough Bridge SEARCH: Bronx Common keywords Bad semantic match Located in Song Title Queens “…Simon and Garfunkel's "The 59th Street Bridge Song" was rated highly by the Billboard magazine in the 60's…” Magazine
Relationship Annotator Located At Gov Official Arg1:Entity Arg2:Location Country Title Person Named Entity Annotator NP VP PP Syntactic Annotator President visits shrine in Israel Bush News example • Search: “Bush trip to Middle East”
CeoOf Arg2:Org Arg1:Person Person Organization PP NP VP Financial services example • Search: “Fred Center’s title” • Search: “head of Center Micros” Relationship Named Entity Parser Fred Center is the CEO of Center Micros
Relationship Annotator Driven By Arg1:Car Arg2:Person Car Named Entity Annotator Person NP VP PP Syntactic Annotator A Neon was driven by Higgins Timothy Law enforcement example • Search: Neon car • Search: “Higgins’ car”
When you search for “IBM phone number” @xmlf2::‘ibm <.or>phone <#phonenumber/> "phone nbr" "telephone nbr" "telephone number" </.or> <.or>number <#phonenumber/> "phone nbr" "telephone nbr" "telephone number" </.or>' Expanded Query Synonyms Results
Customers need a platform, not just samples • To create domain-specific knowledge, create a new annotator or modify one already shipped • Or configure any regular expression with no coding • And it needs to work in many natural languages
Text Customers need an open, extensible framework • Text analysis is a complex, multi-step process • No one vendor can satisfy every need you’ll have in text analysis • That’s why you need an open framework OmniFind Enterprise Edition UIMA Parse Words Identify Language Categorize Search Index Annotate
IBM has submitted the Unstructured Information Management Architecture (UIMA) specification to the Organization for the Advancement of Structured Information Standards (OASIS) The UIMA source code has been contributed to the Apache Software Foundation and an Apache Incubator project has been established to foster collaborative, consensus based development of new software based on UIMA UIMA is an open standard framework
Support for UIMA and OmniFind Provide applications that leverage text analysis and enhanced search Deliver content to platform for analysis Provide components that perform text analysis
Read all about it • “Buy this book, read it, and then read it again.”--Chris Sherman, Search Engine Watch • “Indispensable guide”--Kirkus Reports • Updated every printing The search marketing best seller • “Act now and read it”—Bryan Eisenberg • “Great book”--Robert Scoble • “Bravo” --Search Engine Watch Internet Marketing For more information about the books, and for the free Biznology newsletter and blog: www.mikemoran.com