90 likes | 193 Views
Primary Research Team & Capabilities. URL: http://ikt.ui.sav.sk. Dept. of Parallel and Distributed Computing Research and Development Areas: Large-scale HPCN, Grid and MapReduce applications Intelligent and Knowledge oriented Technologies Experience from IST:
E N D
Primary Research Team & Capabilities URL: http://ikt.ui.sav.sk Dept. of Parallel and Distributed Computing Research and Development Areas: • Large-scale HPCN, Grid and MapReduce applications • Intelligent and Knowledge oriented Technologies Experience from IST: • 3 project in FP5: ANFAS, CrosGRID, Pellucid • 6 project in FP6: EGEE II, K-Wf Grid, DEGREE (coordinator),EGEE, int.eu.grid, MEDIGRID • 4 projects in FP7: Commius, Admire, Secricom, EGEE III Several National Projects (SPVV, VEGA, APVT) IKT Group Focus: • Information Processing (Large Scale) • Graph Processing • Information Extraction and Retrieval • Semantic Web • Knowledge oriented Technologies • Parallel and Distributed Information Processing Solutions: • SGDB: Simple Graph Database • gSemSearch: Graph based Semantic Search • Ontea: Pattern-based Semantic Annotation • ACoMA: KM tool in Email • EMBET: Recommendation System • Experts on MapReduce and IR (Nutch, Solr, Lucene) Director & leader of PDC: Dr. Ladislav Hluchý 7th May 2013
Large scale Text and Graph data processing Underlined are the technologies developed by IISAS Core Technology • Web crawling • Nutch + plugins • Full text indexing and search • lucene, Sorl • Information Extraction • Ontea, GATE • All above large scale • Hadoop, S4 • Graph processing and Querying • Simple Graph Database (SGDB) • gSemSearch • Neo4j • Blueprints 7th May 2013
Relation to Business Intelligence • Old BI approaches • Data Integration from RDBM • Data ware houses • OLAP • … • New BI approaches • Other than RDBM data structures: Networks, Semantics • Networks/Graphs in Telecom, Social Networks, Transactions, Linked Data … • NoSQL: key value (Tokyo Cabinet), column stores (HBase), Graph databases, RDF(s) • In-Memory computing • Commodity PCs solutions for large data: • MapReduce style - Hadoop, Pregel style – Giraph, Hama • Big unstructured data processing (on Hadoop): • Sentiment analysis, topic detection, named entity detection 7th May 2013
Ontea: Information Extraction Tool http://ontea.sf.net Tree of annotations • Regex patterns • Gazetteers • Resuls • Key-value pairs • Structured into trees • graphs • Transformers, Configuration • Automatic loading of extractors • Visual Annotation Tool • Integration with external tools • GATE, Stemers, Hadoop … • Multilingual tests • English, Slovak, Spanish, Italian Text with annotations Network /Graph of annotations 7th May 2013
Named Entity Recognition (NER) • Combination of Existing NER • ANNIE (GATE), Apache OpenNLP, • Illinois NER, Illinois Wikifier, • LingPipe, Open Calais • Stanford NER ,WikiMiner, • Miscinator • Machine Learning • Decision Trees models • Our approach was evaluated in best 6 from 17 word wide on MSM 2013http://oak.dcs.shef.ac.uk/msm2013/challenge.html 7th May 2013
gSemSearch: Graph based Semantic Search • http://ikt.ui.sav.sk/esns/ • Entity relation search in semantic networks/graphs • Search, Navigation, Data Interaction • Aiming at data integration of • Structured data(Relational data, LinkedData) • Unstructured Data(text, documents, communication) • Applications: • Email, Web, Text documents, LinkedData 17 April 2013
SemSets: Sematnic Search • Answering list type questions: astronauts who walked on the Moon • Wikipediaas text and networks/graph • Text: IR methods, Lucene based • Graph/network: sprading activation and SemSets • Winning solution on Semantic Search Challenge 2011 Eugene_Cernan Alan_Bean David_Scott John_Young_(astronaut) Neil_Armstrong Pete_Conrad Harrison_Schmitt Alan_Shepard Charles_Duke Buzz_Aldrin James_Irwin Edgar_Mitchell 17 April 2013
SGDB: Simple Graph Database • Storage for graphs • Optimized for graph traversing and spread of activation • Faster then Neo4j for graph traversing operations • Supports Blueprints API • https://simplegdb.svn.sourceforge.net/svnroot/simplegdb/Sgdb3 • Graph Database Benchmarks • Graph Traversal Benchmark for Graph Databases • http://ups.savba.sk/~marek/gbench.html • Blueprints API - possibility to test compliant Graph databases Source: http://geza.kzoo.edu/bionet/html/scalefree.html 7th May 2013
Future Direction: Relations Discovery in Large Graph Data • Motivation • Graph/Network data are everywhere: social networks, web, LinkedData, transactions, communication (email, phone). • Also text can be converted to graph. • Interconnecting graph data and searching for relations is crucial. • Approach • Forming semantic trees and graphs from text, web, communication, databases and LinkedData • User interaction with graph data in order to achieve integration and data cleansing • Users will do it, if user effort have immediate impact on search results 7th May 2013