100 likes | 369 Views
Giới thiệu một số công cụ xử lý ngôn ngữ tự nhiên và khai phá dữ liệu. TRẦN MAI VŨ. Vietnamese NLP Tools. JVnTextPro : http://sourceforge.net/projects/jvntextpro/ Sentence Segmentation, Sentence Tokenization, Word Segmentation, Pos Tagging
E N D
GiớithiệumộtsốcôngcụxửlýngônngữtựnhiênvàkhaiphádữliệuGiớithiệumộtsốcôngcụxửlýngônngữtựnhiênvàkhaiphádữliệu TRẦN MAI VŨ
Vietnamese NLP Tools • JVnTextPro: http://sourceforge.net/projects/jvntextpro/ • Sentence Segmentation, Sentence Tokenization, Word Segmentation, Pos Tagging • VnToolkit: http://www.loria.fr/~lehong/softwares.php • A software for automatically extracting LTAGs* from treebanks. • An automatic tagger for Vietnamese texts • A tokenize for automatic word segmentation of Vietnamese texts • A sentence detector for automatic detecting sentences of Vietnamese texts • VLSP Tools: http://vlsp.vietlp.org:8080/demo/?page=resources • Vietnamese Chunking (*) Lexicalized Tree Adjoining Grammars
NLP Tools • LingPipe: http://alias-i.com/lingpipe/ • Gate – General Architecture for Text Engineering: http://gate.ac.uk/ • Mallet - Machine Learning for Language Toolkit: http://mallet.cs.umass.edu/ • MinorThird: http://sourceforge.net/projects/minorthird/ • OpenNLP: http://opennlp.sourceforge.net/
Preprocessing Tools • TextCat - Java Text Categorizing Library: http://textcat.sourceforge.net/ • HTML Parser: http://htmlparser.sourceforge.net/ • CyberNeko HTML Parser: http://nekohtml.sourceforge.net/ • Crawler4J: http://code.google.com/p/crawler4j/ • Lucene: http://lucene.apache.org/
Other Tools • SVM-Light Support Vector Machine: http://svmlight.joachims.org/ • CRF: http://crf.sourceforge.net/ • Text Clustering Toolkit: http://mlg.ucd.ie/tct • A Java Implementation of Latent Dirichlet Allocation (LDA) using Gibbs Sampling for Parameter Estimation and Inference: http://jgibblda.sourceforge.net/
Data mining Tools • Weka - Machine Learning Software in Java: http://sourceforge.net/projects/weka/ • RapidMiner -- Data Mining, ETL, OLAP, BI: http://sourceforge.net/projects/yale/ • RSES - Rough Set Exploration System: http://logic.mimuw.edu.pl/~rses/
Ontology Tools • The Protégé Ontology Editor and Knowledge Acquisition System: http://protege.stanford.edu/ • Jena Semantic Web Framework: http://jena.sourceforge.net/