210 likes | 354 Views
DB2 Net Search Extender. Presenter: Sudeshna Banerji (CIS 595: Bioinformatics). Topics to discuss: Information retrieval Text-indexing DB2 Text Extenders DB2 Net Search Extender References Questions. A Little Background…. Information Retrieval(IR):
E N D
DB2 Net Search Extender Presenter: Sudeshna Banerji (CIS 595: Bioinformatics)
Topics to discuss: • Information retrieval • Text-indexing • DB2 Text Extenders • DB2 Net Search Extender • References • Questions Sudeshna Banerji (CIS 595: Bioinformatics)
A Little Background… • Information Retrieval(IR): • Extraction of “relevant” information from huge volumes of data scattered across different databases. • Examples: Textual search, image search, video search etc. • Efficiency(time and speed) of IR is based on different INDEXING technologies. • Indexing increases performance of system. • An example of indexing technology: Text-indexing used for textual-search. Sudeshna Banerji (CIS 595: Bioinformatics)
A Little Background… • Text-Indexing : • Process of deciding what will be used to represent a given document. • A text index consists of significant terms extracted from the text documents, each term stored together with information about the document that contains it. • The search is then handled as a query to look up the index. Sudeshna Banerji (CIS 595: Bioinformatics)
A Little Background… • Text-Indexing (continued): • Involves the following: • Parsing the documents to recognize the structure. E.g title, date, other fields. • Scan for word tokens: numbers, special characters, hyphenation, capitalization etc. • Stopword removal: based on short list of common words like “the”, “and”, “or”. Sudeshna Banerji (CIS 595: Bioinformatics)
Indexing only Significant Terms Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Extenders • Product of IBM family that provide support to data beyond traditional character and numeric data types. • Extenders available for images, voice, video, complex documents (full-text search), spatial objects etc. • Trial and beta versions available for testing. • Link for extenders: http://www-3.ibm.com/software/data/db2/extenders/index.html Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Text Extenders • To meet the increasing demands of content management, IBM has introduced 3 full-text retrieval applications available for DB2 Universal Database (DB2 UDB). • DB2 Net Search Extender • DB2 Text Information Extender • DB2 Text Extender • When to use what? • Link for comparisons of the above: http://www-3.ibm.com/software/data/db2/extenders/fulltextcomparison.html Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Net Search Extender • Replaces DB2 Text Information Extender Version 7.2 • Some important features: • Indexing speed of about 1GB per hour . • Different text formats: ASCII Plain text, HTML,XML, GPP • Base support for 37 languages including English, Spanish, French, Japanese and Chinese . • Sub-second search response times. • No decrease in search performance with up to 1000 concurrent queries per second. Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Net Search Extender • Some text-search capabilities: • Search can be performed using SQL (fourth generation language…almost like English query). • Searches can include: • Boolean operations. • Proximity search for words in the same sentence or paragraph: for HTML,XML and GPP. • “Fuzzy” searches for words having a similar spelling as the search term: Andrew & Andru • Thesaurus related search. • Restrict searching to sections within documents. • User can limit the search results with a “hit count”, and can also specify how the results are to be sorted. Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Net Search Extender • System requirements • DB2 Version 8.1 • Java Runtime Environment (JRE) Version 1.3.1 • Windows Installation • Administrative rights required. • Call db2text start to start the DB2 Net Search Extender Instance Services. Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Net Search Extender • Simple example with the SQL queries • Following steps are required to do a basic textual-search in DB2 Net Search Extender: 1. Creating a database 2. Enabling a database for text search 3. Creating a table 4. Creating a full-text index 5. Loading sample data 6. Synchronizing the text index 7. Searching with the text index Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Net Search Extender 1. Creating a database: db2 "create database sample" 2. Enabling a database for text search: • To start Net Search Extender Service db2text "START“ • To prepare the database for use with DB2 Net Search Extender: db2text "ENABLE DATABASE FOR TEXT CONNECT TO sample" Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Net Search Extender 3. Creating a table: db2 "CREATE TABLE books (isbn VARCHAR(18) not null PRIMARY KEY, author VARCHAR(30), story LONG VARCHAR, year INTEGER)" 4. Creating a full-text index: db2text "CREATE INDEX db2ext.myTextIndex FOR TEXT ON books (story) CONNECT TO sample" Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Net Search Extender 5. Loading sample data: db2 "INSERT INTO books VALUES (‘0-13-086755- 1’,’John’,’ A man was running down the street.’,2001)“ db2 "INSERT INTO books VALUES (‘0-13-086755-2’ , ‘Mike’, ’The cat hunts some mice.’, 2000)“ 6. Synchronizing the text index: db2text "UPDATE INDEX db2ext.myTextIndex FOR TEXT CONNECT TO sample“ Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Net Search Extender 7. Searching with the text index: • Using CONTAINS scalar search function: db2 "SELECT author, story FROM books WHERE CONTAINS (story, ‘”cat“’) = 1 AND year >= 2000" The following result table is returned: AUTHOR STORY Mike The cat hunts some mice. • NOTE: • To create a text-index, the text columns must be one of the following data types: CHAR, VARCHAR, LONG VARCHAR, CLOB. Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Net Search Extender • Thesaurus Support: • A thesaurus is structured like a network of nodes linked together by relations: • Associative relations: RELATED_TO • Synonym relations: SYNONYM_OF • Hierarchical relations: LOWER_THAN, HIGHER_THAN • Creating and compiling a thesaurus: 1. Create a thesaurus definition file (explained below). 2. Compile the definition file into a thesaurus dictionary using DB2EXTTH utility. Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Net Search Extender • Create a thesaurus definition file. • Define its content in a definition file using a text editor. Example of some definition groups: :WORDS football .RELATED_TO goal .SYNONYM_OF soccer :WORDS chapel .LOWER_THAN skyscraper .HIGHER_THAN house Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Net Search Extender • An example of a structure of a Thesaurus: Game HIGHER_THAN Ball Game HIGHER_THAN HIGHER_THAN Soccer HIGHER_THAN Tennis Football SYNONYM_OF Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Net Search Extender • References: • http://www-3.ibm.com/cgibin/db2www/data/db2/udb/winos2unix/support/ document.d2w/report?fn=desu9m03.htm#ToC • Information Retrieval Site containing good lecture slides: http://ciir.cs.umass.edu/cmpsci646/ • Net Search Extender Administration and User’s Guide , Version 8.1 (can be downloaded with the software) Sudeshna Banerji (CIS 595: Bioinformatics)
ANY QUESTIONS???? Sudeshna Banerji (CIS 595: Bioinformatics)