1 / 28

NITISH MANOCHA

NITISH MANOCHA. Platforms. AIX workstation OS/390 Sun Solaris Windows NT. Tools to Use. Topic categorization tool Categorizing emails Categorizing Web Pages. Text Analysis Tool. Topic Categorization Tool. Text Analysis Tool. Topic Categorization Tool Category 1 (AI Schedule).

Download Presentation

NITISH MANOCHA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NITISH MANOCHA

  2. Platforms • AIX workstation • OS/390 • Sun Solaris • Windows NT

  3. Tools to Use • Topic categorization tool • Categorizing emails • Categorizing Web Pages

  4. Text Analysis Tool • Topic Categorization Tool

  5. Text Analysis Tool • Topic Categorization Tool • Category 1 (AI Schedule)

  6. Text Analysis Tool • Category2 (Database Schedule)

  7. Text Analysis Tool • Target Category ( Data Mining Schedule)

  8. Text Analysis Tool • Result - Category 2 (Databases)

  9. Tools to Use • Clustering Tool (Finding Similar Information) • Dividing Documents into Groups • Identifying hidden similarities in documents • Identifying duplicate documents from a collection • Finding Documents that are out of place

  10. Text Analysis Tool • Hierarchical Clustering - imzhclst

  11. Text Analysis Tool • Binary Clustering - imzcrlst

  12. Text Analysis Tool • Results

  13. Text Analysis Tool • Results

  14. Tools to Use • Feature Extraction Tool • Name Extraction • Abbreviation Extraction • Relation Extraction

  15. Text Analysis Tool • Using Feature Extraction tool to extract names • imzxrun -b 2 -f C -x n -o faculty.out faculty.htm

  16. Text Analysis Tool

  17. Tools to Use • Language Identification Tool • Organize collection of documents by language • Restrict Search Results to documents in a particular language

  18. Text Analysis Tool • Using Language Identification tool • imzlgini -b 2 -v < mydoc.htm

  19. Text Analysis Tool • Language Identification Tool Results • Supports 13 Languages, New Languages Can be trained

  20. Text Analysis Tool • Using Summarizer tool • imzsum -l 4 project.html

  21. Text Analysis Tool • Summarizer tool - Results

  22. Tools to Use • Web Crawler • Follows the Link topology for a fast search • Produces a Web Site Map • Use to Recognize the Authoritative pages • Provides a filtered collection of pages

  23. Web Crawler • imyclean - to define a web space • Created include.re , exclude.re, types.re • imycrawl - to crawl a defined web space • imycrawl url webspace • imystat - to track what happens during a crawl

  24. Tools to Use • Text Search Engine • Complicated Text Search • Powerful Linguistic Capabilities • Fuzzy searches • Query based on structure of document

  25. Text Search Engine • Operates on a Previously based index

  26. Text Search Engine • Types of Index • Linguistic Index (bought as buy) • Feature Index (Linguistics + Names) • Precise Index (bought as bought) • Normalized Precise Index (Case Insensitive) • Ngram Index

  27. Combining Tools for Solutions • Searching with Categories • combining Text Search Engine and Topic Categorization Tool • Surviving a flood of email • by using Topic Categorization Tools • Selectively indexing Web Pages • by combining Web Crawler, Topic Categorization Tool & Text Search Engine

  28. Views of the Tool • Command Line (Good for Unix) • Not very useful on Windows NT • Not a good stand-alone Tool • Should be viewed as a Library

More Related