Goat search

Goat search Revorg GOAT Search Solution (Powered by Lucene)

About Me Grover Fields • Revorg, LLC (Owner) • M.S. Information System (Troy University) • B.S. Industrial Engineering (Florida A&M University) • Stanford Project Management Courses

About Me • 10+ years of development, analysis, and implementation • 10+ years of ColdFusion experience • 2+ years of Java experience • Commonspot, Strongmail, ClickFix (Developer) • Email: grover_fields@yahoo.com • Web site: http://www.groverfields.com

Agenda • What? • What can we do with GOAT? • Why? • Why do we want to use GOAT and not Verity? • How? • How do we do that? • Conclusion and alternative solutions

What • What is a Search Engine? • Builds an index on text • Answers queries using that index, a la Verity • Existing database already • A search engine offers? • Scalability • Reliance Ranking • Tweaking • Integrates different sources (email, web pages, files, DATABASES)

What is a search engine? (cont.) • Works on words, not on substrings • Auto != automatic, automobile • Indexing process: • Convert document • Extract text and meta data • Normalize text • Write (inverted) index

Apache Lucene Overview • Lucene Java 2.4 • A high-performance, full-featured text search engine library written entirely in Java. • It is a technology suitable for nearly any application that requires full-text search, especially cross-platform. • No GUI • http://lucene.apache.org

Apache Lucene Overview • Java library for indexing and searching • No dependencies • Works with Java 1.4 or later • Input for indexing: Document objects • Each document: set of Fields, field name, field content • Stores its index as files on disk or memory • No document converters • No web crawler

Lucene Java users • HBCU.info • LinkedIn • IBM OmniFind Yahoo! Edition • Techorati.com • Eclipse • Monster.com • …

Lucene Java Summary • Java Library for indexing and searching • Lightweight /no dependencies • Powerful and fast and tested! • No document conversion • No GUI

Why? • Cost of Enterprise Search Solution • Need for search speed • Java projects to work on • Things to do

Verity Limitations • 10,000 documents for ColdFusion Developer Edition • 125,000 documents of ColdFusion Standard Edition • 250,000 documents for ColdFusion Enterprise Edition • What do developers do in a shared hosting environment? • Is it possible for the hosting company to limit the number of documents per Web site?

T-SQL Limitations? • Search for “Yahoo” on my blog • SELECT entry.id FROM tbl_mango_entry as entry INNER JOIN tbl_mango_post as post ON entry.id = post.id WHERE entry.blog_id = ‘default’ AND (entry.title LIKE ‘%yahoo%’ OR entry.content LIKE ‘%yahoo%’ OR entry.excerpt LIKE ‘%yahoo%’ ) AND post.posted_on <= getdate() AND entry.status = 'published' ORDER BY post.posted_on DESC • Multiply that time 10, 100, 500, or 1000 users/hr?

T-SQL Limitations? • Full table scan = 1 THING • PERFORMANCE KILLER!!! • No search sorting • RDBMS isn’t designed to do this but allows it • Use the right tools!

How? • GOAT Search Solution • Lucene 2.4.0 • ColdFusion MX 8 • MX is fine but GUI needs to be rolled back • Commons IO 1.4 • Simply package .jar files • Simply Web based GUI

How? • Macromedia JDBC Drivers • Same drivers that ColdFusion uses • No additional drivers to install • Supports RDBMS ONLY • MSSQL • MySQL • Oracle • No File system support (Yet)

Basics? • Indexing extracts both meaning and structure from unstructured information by indexing each document • Contains a complete list of all the words used in a given document along with metadata about that document • Lucene creates a collection that normalizes both the structured and unstructured data. • Search requests then check these collections rather than scanning the actual documents and database fields. • This provides a faster search of information, regardless of the file type and whether the source is structured or unstructured.

Basics? • Collection • A special database created by Lucene that contains metadata that describes the documents • Documents • A sequence of fields • Similar to a row in a database table • Row 1 • Row 2, etc • Fields • A named sequence of terms • Similar to a column in a table • Primary Key • Column 1 • Terms • Is a string

Knowledge? • Index • A special database created by Lucene that contains metadata that describes the documents • Query Syntax • Similar to Google’s advanced search: • field:value • E.G. resume: coldfusion • http://lucene.apache.org/java/2_4_0/queryparsersyntax.html • Results • Primary Key list of values • XML based on the document • CFX Tag integration

Alternative Solutions for Search • Commercial vendors: • FAST, $100k • Autonomy, $80k • Google, $50k • Commercial search engines based on Lucene • IBM OmniFind Yahoo Edition • RDBMS with Integrated Search • Oracle • MySQL • MSSQL • PERFORMANCE KILLERS

RoadMap Road Map A set of guidelines, instructions, or explanations: wrote an ethics code as a road map for the behavior of elected officials. • Overhaul Java programming (still novice) • Integrate with other products • Aperture • Nutch • Solr • File system integration • .txt, .pdf, .doc, .ppt, etc. • Geospatial based searches • E.G. All jobs within a 50 mile radius

References • Apache.org • Adobe.com • Ben Forta’s Blog • Slideshare.net • Multiple authors • Other references

Goat search

Goat search

Presentation Transcript

MEAT GOAT 101 Market Goat Production

Meat Goat Pocket Record Supports Meat Goat Producers

Beatrice’s Goat

Goat India

GOAT NUTRITION

Lamb & Goat

The Dairy Goat

Goat

Goat Breeds

Goat Showmanship

Goat Budgets

Meat Goat Structures

Goat ID

Mountain Goat

Goat Breeds

Meat Goat Reproduction

Meat Goat Production

Horny Goat Weed

Goat feed additives

Custom Goat Hats

Goat Feed additives

goat casinos

Goat search

Goat search

Presentation Transcript

MEAT GOAT 101 Market Goat Production

Meat Goat Pocket Record Supports Meat Goat Producers

Beatrice’s Goat

Goat India

GOAT NUTRITION

Lamb &amp; Goat

The Dairy Goat

Goat

Goat Breeds

Goat Showmanship

Goat Budgets

Meat Goat Structures

Goat ID

Mountain Goat

Goat Breeds

Meat Goat Reproduction

Meat Goat Production

Horny Goat Weed

Goat feed additives

Custom Goat Hats

Goat Feed additives

goat casinos

Lamb & Goat