1 / 19

An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature

An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature. Walter J. Trybula, Ph.D., IEEE Fellow Ronald E. Wyllys, Ph.D. ASIS 2000 – Chicago, Illinois November 14, 2000. Introduction. Data volume is growing and sources of information are more diverse

lea
Download Presentation

An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Evaluation of Text Mining Tools as Applied to Selected Scientific and Engineering Literature Walter J. Trybula, Ph.D., IEEE Fellow Ronald E. Wyllys, Ph.D. ASIS 2000 – Chicago, Illinois November 14, 2000

  2. Introduction • Data volume is growing and sources of information are more diverse • There is a need to evaluate this information • There are tools that claim to be able to find information in textbases • An investigation of existing tools would provide a measure of their ability. • If such tools worked, it might be possible to discover new knowledge.1 1 As described by Swanson as Undiscovered Public Knowledge w.trybula@ieee.org

  3. Objective/Goals • Provide a means of testing the existing instruments to determine their ability to “find” knowledge. • Determine if any of these instruments provide useful insight to the data. • Evaluate the findings of domain experts to determine if the instruments are helpful. • Develop recommendations based on the results of the experiments. w.trybula@ieee.org

  4. Overview of Process • Selected a technical area with known commonality (lithography masks). • Collected the most recent reports available. • Compile results into textbase for analysis by text mining tools. • Have domain experts evaluate the results. • Analyze their conclusions and draw recommendations for future directions. w.trybula@ieee.org

  5. Example of Commonality w.trybula@ieee.org

  6. Selection of Information • Information from leading researchers was collected. • Asian efforts on X-ray technology. • U.S. efforts on X-ray technology. • European efforts on Ion Projection Lithography. • U.S. efforts on Electron Projection Lithography. • U.S. efforts on Extreme UltraViolet technology. • Data was their annual update on technology progress provided for yearly review. • All reports, presentations, and data were assembled into a single textbase for analysis. w.trybula@ieee.org

  7. Sources of Data U.S. Europe Asia • Concerns: • Language • Terminology • Program (format) w.trybula@ieee.org

  8. Text Mining Tools • Selected three types of Text Mining Instruments available for desk-top operation. • Key terms identified with pointers to text • Excerpt presentation format • Hierarchal tree-structure presentation • Did not include Self-Organizing Maps (SOMs) • Included a search engine for baseline evaluation of the results (AltaVista). w.trybula@ieee.org

  9. Text Mining Tools Text Mining Tool that returns Key Terms w.trybula@ieee.org

  10. Text Mining Tools Text Mining Tool that returns Excerpts w.trybula@ieee.org

  11. Text Mining Tools Text Mining Tool that returns Hierarchy w.trybula@ieee.org

  12. Results • No method provided any novel results. There was some difficulty with mixed format documents. • Domain experts were required to evaluate the output and determine importance of delivered information. • Graphical information presentation was preferred over simple text. • Search Engine provided many pointers to occurrences of search terms. • There was no evidence that this approach provided any novel knowledge. w.trybula@ieee.org

  13. Conclusions • Text Mining instruments are in a developmental stage and need refinement to be more useful. • Text Mining instruments must be able to handle data in various formats, i.e., documents, spreadsheets, presentations, etc. • Without a defined goal of what data will be delivered, there is no commonality among the various instruments. • Experts had difficulty in retrieving information that was known to be present due to methodology of evaluating information in textbase. • There must be a cohesive direction provided for the development of these instruments. w.trybula@ieee.org

  14. Future Directions – Information Needs • An Instrument that evaluates the text in the textbase and provides an accurate representation of the information contained therein. • An Instrument that provides this information in a manner that can be accurately and quickly evaluated by the intended user. • An Instrument that draws the best elements from existing work and provides information based on proven methodologies. (In rapidly evolving technologies, efforts in one area may ignore developments in others. This is not acceptable.) Recommendations w.trybula@ieee.org

  15. Data Mining Process Recommendations Start with existing methodology. w.trybula@ieee.org

  16. Text Mining Process Recommendations Develop new methodology from existing ones. w.trybula@ieee.org

  17. Future Directions – Instrument Needs • There needs to be a cohesive direction for future work. The existing development must draw on the knowledge developed in the Library Science field. • Can build from Data Mining to derive Text Mining functionality. A key concern will remain the method of presenting the results. • Need to have some agreement on the purpose of the Text Mining Instruments • What is the purpose of “mining” text? • What kind of user will there be? • What is the anticipated outcome? • Consider the application of the latest software developments, e.g., Groove, Napster, for information sharing. Recommendations w.trybula@ieee.org

  18. Challenges • Establish a “goal” for the results of Text Mining. What will be accomplished? • Drive toward widespread application, i.e., desktop and handheld applications. • Incorporate latest hardware developments, i.e., distributed, parallel processing and wireless communications. • Deliver what the intended user needs. • Don’t reinvent the “wheel” • Have the Library Science, the Information Science, and the Computer Science people work together. Recommendations w.trybula@ieee.org

  19. Acknowledgements • Dean Brooke Sheldon, Sanda Erdelez, Mary Lynn Rice-Lively (GSLIS, University of Texas at Austin). • John Konopka of IBM. • The International SEMATECH team including Scott Mackay, Mark Mason, Phil Seidel, David Stark. • The various technology champions for their efforts in providing the latest technology information. w.trybula@ieee.org

More Related