1 / 38

Text Mining Overview

marin
Download Presentation

Text Mining Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Text Mining Overview Piotr Gawrysiak gawrysia@ii.pw.edu.pl Warsaw University of Technology Data Mining Group

    2. Topics Natural Language Processing Text Mining vs. Data Mining The toolbox Language processing methods Single document processing Document corpora processing Document categorization – a closer look Applications Classic Profiled document delivery Related areas Web Content Mining & Web Farming

    3. Natural Language Processing Natural language – test for Artificial Intelligence Alan Turing NLP and NLU

    4. Information explosion

    5. Data Mining This information explosion is not only problem with text, but also with all other kinds of data. Here data mining comes to the rescue.This information explosion is not only problem with text, but also with all other kinds of data. Here data mining comes to the rescue.

    6. Knowledge pyramid

    7. Text Mining – a definition TM can be described as statistical method – because KDD is mostly based on statisticsTM can be described as statistical method – because KDD is mostly based on statistics

    8. Text Mining tools Linguistic analysis Thesauri, dictionaries, grammar analysers etc. Machine translation Automatic feature extraction Automatic summarization Document categorization Document clustering Information retrieval Visualization methods

    9. Language analysis

    10. Thesaurus construction

    11. Machine translation

    12. Fully automatic approach

    13. Feature extraction

    14. Document summarization New area – multimedia document summarization New area – multimedia document summarization

    15. Document categorization & clustering

    16. Categorization/clustering system

    17. Information retrieval

    18. IR – exact match

    19. IR – fuzzy search

    20. Document visualization

    21. Document visualization

    22. Document categorization A closer look

    23. Measuring quality

    24. Metrics Wartosc wskaznika dokladnosci okresla prawdopodobienstwo dokonania poprawnej klasyfikacji, dla losowo wybranego dokumentu ze zbioru D. Wartosc wskaznika precyzji okresla prawdopodobienstwo, iz losowy dokument wybrany z dokumentów uznanych za relewantne, jest rzeczywiscie dokumentem relewantnym. Zupelnosc odpowiada prawdopodobienstwu tego, iz dokument faktycznie relewantny, zostanie za taki uznany przez system. Zaszumienie okresla z kolei prawdopodobienstwo niepoprawnego uznania za relewantny dokumentu, który faktycznie relewantny nie jest. Wartosc wskaznika dokladnosci okresla prawdopodobienstwo dokonania poprawnej klasyfikacji, dla losowo wybranego dokumentu ze zbioru D. Wartosc wskaznika precyzji okresla prawdopodobienstwo, iz losowy dokument wybrany z dokumentów uznanych za relewantne, jest rzeczywiscie dokumentem relewantnym. Zupelnosc odpowiada prawdopodobienstwu tego, iz dokument faktycznie relewantny, zostanie za taki uznany przez system. Zaszumienie okresla z kolei prawdopodobienstwo niepoprawnego uznania za relewantny dokumentu, który faktycznie relewantny nie jest.

    25. Multiple class scenario

    26. Categorization example

    27. Document representations

    28. Bigram example

    29. Probabilistic interpretation

    30. Positional representation

    31. Creating positional representation

    32. Examples

    33. Processing representations

    34. Expanding and trimming

    35. Representation processing

    36. Attribute selection

    37. Attribute space remapping

    38. Applications

    39. Thank you Plato has written in Fajdros that the art of writing may be lethal to our knowledge and wisdom, as human beings will no longer rely on their memory and therefore will recall everything from potentially misleading external sources.Plato has written in Fajdros that the art of writing may be lethal to our knowledge and wisdom, as human beings will no longer rely on their memory and therefore will recall everything from potentially misleading external sources.

More Related