140 likes | 152 Views
Learn about the interoperability of text and data mining using UIMA, Argo, and OpenMinTeD. Explore tools and techniques for annotating, identifying entities, and structuring unstructured information. Attend the Text Mining and Applications Workshop in Vietnam.
E N D
Introduction UIMA OpenMinTeD Argo The interoperability of text and data mining: UIMA, Argo, and OpenMinTeD Minh-Quoc NGHIEM NaCTeM, The University of Manchester
Welcome everyone to the Text Mining and Applications Workshop in Vietnam. Welcome/VB everyone/NN to/TO the/DT Text/NNP Mining/NNP and/CC Applications/NNP Workshop/NNP in/IN Vietnam/NNP ./. (ROOT (S (VP (VB Welcome) (NP (NN everyone)) (PP (TO to) (NP (NP (DT the) (NNP Text) (NNP Mining) (CC and) (NNP Applications) (NNP Workshop)) (PP (IN in) (NP (NNP Vietnam)))))) (. .)))
Welcome everyone to the Text Mining and Applications Workshop in Vietnam. Standoff annotation No text modification Annotations as offsets
0 1 2 3 4 5 10 15 20 25 Welcome everyone to the Text Mining and Applications Workshop in Vietnam. 30 35 40 45 50 55 60 65 70 Sentence: 0-71 Token: 0-7, 8-16, 17-19, …
Unstructured Information Management Architecture
Identify Semantic Entities Induce Structure People, Places, Org, Events … Times, Topics, Opinions, Relationships …
Pipeline Tokenizer POS Tagger Chunker NER Components
Type System • Sentence • begin, end … • Token • begin, end • POS … • Named-entity • begin, end • Entity type … • Relation • Entity 1, Entity 2 • Relation type
Example CollectionReader reader = UriCollectionReader.getCollectionReaderFromFiles(Arrays.asList(options.getTextFile())); AggregateBuilder aggregate = new AggregateBuilder(); aggregate.add(SentenceBoundaryDetector.getDescription()); aggregate.add(Tokenizer.getDescription()); aggregate.add(POSTagger.getDescription()); aggregate.add(Chunker.getDescription()); aggregate.add(NamedEntityRecognizer.getDescription()); SimplePipeline.runPipeline(reader, aggregate.createAggregateDescription());
Don’t know programming? UIMA Web App Components ARGO Interaction Collaboration