180 likes | 194 Views
This article provides an introduction to Oracle interMedia-Text and addresses common problems faced in querying and searching text data. It offers a solution using interMedia-Text query operators, dictionary query operators, and other helpful functions. The article also discusses sorting by relevance and creating indexes in Oracle interMedia-Text.
E N D
Introduction to Oracle interMedia-Text By Derek Mathieson (AS-IDS)
The Problem • Find X by keyword Y SELECT cod FROM bud_codes WHERE desc LIKE ‘%EDH%’; SELECT doc_id FROM edh_docs WHERE UPPER(short_desc) LIKE ‘%PRINTER%’; TOO SLOW! f1
The Solution Oracle interMedia-Text
Searching with interMedia-Text • Rewritten query using interMedia-Text. SELECT doc_id FROM edh_docs WHERE CONTAINS( short_desc, ‘%PRINTER%’, 1) > 0; d3, d4, d5
Query Operators • Algebraic • Dictionary • Other
Algebraic Query Operators • AND, OR, NOT, MINUS Example: • monitor NOT flat • AS-IDS OR AS-SAS d6
Dictionary Query Operators • ABOUT ABOUT(subatomic particles) matches text on the subject of physics • Broader, Narrower, Related or Preferred Term BT(dog) Matches ‘dog’, ‘mammal, ‘animal’ d7
Dictionary Query Operators • Stem stem(sing) matches ‘sing’ ‘sung’ or ‘sang’ • Synonym SYN(tiger) matches ‘tiger’ ‘cat’, etc. • Translated Term TR(chien) matches ‘chien’ or ‘dog’
Other Query Operators • fuzzy, soundex ?apply matches ‘apply’ ‘apple’ ‘applied’ ‘April’ • Wildcards math%, %day, %th%, _ing name
Other Query Operators • NEAR((word1, word2,...) [, max_span [, order]]) NEAR((monday, tuesday, wednesday), 20, TRUE) • WITHIN workflow WITHIN TITLE Derek WITHIN AUTHOR WITHIN BOOK • <TITLE>Workflow</TITLE> • Business process automation…
Sorting by Relevance SELECT doc_id FROM edh_docs WHERE CONTAINS( short_desc, ‘DOG, CAT, MOUSE’, 1) > 0;
Sorting by Relevance SELECT doc_id FROM edh_docs WHERE CONTAINS( short_desc, ‘DOG, CAT, MOUSE’, 1) > 0 ORDER BY SCORE(1) DESC;
Sorting by Relevance SELECT /*+ FIRST_ROWS */doc_id FROM edh_docs WHERE CONTAINS( short_desc, ‘DOG, CAT, MOUSE’, 1) > 0 ORDER BY SCORE(1) DESC; acid
Creating Indexes • Now built in to Oracle kernel • NOT automatically updated after DML • Manually refreshed • ctxsrv process create index edh_docs_idx on edh_docs( short_desc ) indextype is ctxsys.context
Data Sources • Column data (VARCHAR, CLOB, etc.) • Detail Table • External File • URL
Data Types • Text • ASCII, HTML, XML, … • Microsoft • RTF, Word, Works, PowerPoint, Excel, Access, … • Other • PDF, WordPerfect, Lotus 1-2-3, MacWrite, QuattroPro, dBASE, … • Over 160 different File Formats!
Applications • EDH Search Screens • Supplier by keyword, or partial address • Budget Code by description • Document by short description • Document by full text? What about YOUR application?
Thank You For More Information Browse to:http://technet.oracle.com /training/products/intermedia/listing.htm