70 likes | 211 Views
Met óda Ontea. Pracovná dielňa NAZOU 21-23. 9. 2007. Poľana. Pattern based annotation Podobné metódy C-PANKOW, SemTag Iné jazyky ako angličtina Slovenčina Rýchlejšie a presnejšie ako C-PANKOW Umožňuje aj tvorbu inštancií, SemTag nie. Príspevok k stavu poznania.
E N D
Metóda Ontea Pracovná dielňa NAZOU 21-23. 9. 2007. Poľana
Pattern based annotation Podobné metódy C-PANKOW, SemTag Iné jazyky ako angličtina Slovenčina Rýchlejšie a presnejšie ako C-PANKOW Umožňuje aj tvorbu inštancií, SemTag nie Príspevok k stavu poznania NAZOU, 21-23. 9. 2007, Poľana
Príspevok k stavu poznania – nástroj Ontea • Pattern • PatternRegExp: annotate(), vráti množinu resultov • Result: napr. (Bratislava, region:Settlement) • ResultRegExp • ResultOnto • ResultTransformer • LuceneRelevance • SesameIndividualSearch • SesameIndividualSearchAndCreate • TvaroslovnikLemmatizer NAZOU, 21-23. 9. 2007, Poľana
Nový experiment pre Ontea creation Ontea Creation + indexovanie: Experiment s RFTS a Lucene indexing Lematizácia Overovanie – úšpešnosť (1) NAZOU, 21-23. 9. 2007, Poľana
Overovanie– rýchlosť (2) • Ontea Creation: the instances of ontological concepts are created in the input text collection based on regular patterns matching. • produce OWL ontology files which need to be integrated on central machine. • Created instances are evaluated by computing their relevance using RTFS or Lucene indexing tool. The instances with relevance value above given threshold are identified as relevant and filled in result domain ontology OWL file. (stage related to RTFS tool) • Ontea Search: process for searching annotation tags within annotated text similarly to step one but using general keyword matching patterns. This results to executing more ontology queries and thus consuming more time. • Last stage integrated produced semantic metadata to one knowledge base represented by OWL file. NAZOU, 21-23. 9. 2007, Poľana
Overovanie – rýchlosť (3) • 500 job offers documents takes ~ 67 minutes • Intel(R) Pentium(R) 4 CPU 2.40GHz • About 35000 Slovak offers on the web, many more in English language • This means that periodic annotation of jobs takes ~78 hours = more then 3 days • Step 1 and 3 can run as distributed • Tests run on 500 job offers documents which takes ~ 67 minutes • This means that periodic annotation of jobs takes ~78 hours = more then 3 days • When submitting jobs with e.g. 1000 documents of job offers on one node ~134 minutes = 1000 doc on 35 nodes in grid = 35000 doc • (1000 document set ~ 3M) • + 10 minutes of grid middleware overhead + ~60 minutes data integration • On grid ~ 204 minutes = 3 hours 24 minutes NAZOU, 21-23. 9. 2007, Poľana
Michal Laclavík, Marek Ciglan,Martin Šeleng, LadislavHluchý: Empowering Automatic Semantic Annotation in Grid, PPAM 2007, Springer, LNCS Michal Laclavík, Marek Ciglan, Martin Šeleng, Stanislav Krajčí, Peter Vojtek, Ladislav Hluchý: Semi-automatic Semantic Annotation of Slovak Texts, SLOVKO 2007 Michal Laclavík, Marek Ciglan,Martin Šeleng; Ontea: Semi-automatic Pattern based Text Annotation empovered with Information Retrieval Methods; NAZOU-ITAT, 2007 Michal Laclavik, Martin Seleng, Emil Gatial, Zoltan Balogh, Ladislav Hluchy: Ontology based Text Annotation – OnTeA; Information Modelling and Knowledge Bases XVIII. IOS Press, Amsterdam, Marie Duzi, Hannu Jaakkola, Yasushi Kiyoki, Hannu Kangassalo (Eds.), Frontiers in Artificial Intelligence and Applications, Vol. 154, February 2007, pp.311-315. ISBN 978-1-58603-710-9, ISSN 0922-6389. Publikácie NAZOU, 21-23. 9. 2007, Poľana