1 / 8

Primary Research Team & Capabilities

Primary Research Team & Capabilities. URL: http://ikt.ui.sav.sk. Dept. of Parallel and Distributed Computing Research and Development Areas: Large-scale HPCN, Grid and MapReduce applications Intelligent and Knowledge oriented Technologies Experience from IST:

thora
Download Presentation

Primary Research Team & Capabilities

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Primary Research Team & Capabilities URL: http://ikt.ui.sav.sk Dept. of Parallel and Distributed Computing Research and Development Areas: • Large-scale HPCN, Grid and MapReduce applications • Intelligent and Knowledge oriented Technologies Experience from IST: • 3 project in FP5: ANFAS, CrosGRID, Pellucid • 6 project in FP6: EGEE II, K-Wf Grid, DEGREE (coordinator),EGEE, int.eu.grid, MEDIGRID • 4 projects in FP7: Commius, Admire, Secricom, EGEE III Several National Projects (SPVV, VEGA, APVT) IKT Group Focus: • Information Processing (Large Scale) • Graph Processing • Information Extraction and Retrieval • Semantic Web • Knowledge oriented Technologies • Parallel and Distributed Information Processing Solutions: • SGDB: Simple Graph Database • gSemSearch: Graph based Semantic Search • Ontea: Pattern-based Semantic Annotation • ACoMA: KM tool in Email • EMBET: Recommendation System • Experts on MapReduce and IR (Nutch, Solr, Lucene) Director & leader of PDC: Dr. Ladislav Hluchý 11 November 2011

  2. Approach and Solutions

  3. Large scale Text and Graph data processing Underlined are the technologies developed by IISAS Core Technology • Web crawling • Nutch + plugins • Full text indexing and search • lucene, Sorl • Information Extraction • Ontea, GATE • All above large scale • Hadoop, S4 • Graph processing and Querying • Simple Graph Database (SGDB) • gSemSearch • Neo4j • Blueprints 11 November 2011

  4. Ontea: Information Extraction Tool http://ontea.sf.net • Regex patterns • Gazetteers • Resuls • Key-value pairs • Structured into trees • graphs • Transformers, Configuration • Automatic loading of extractors • Visual Annotation Tool • Integration with external tools • GATE, Stemers, Hadoop … • Multilingual tests • English, Slovak, Spanish, Italian 11 November 2011

  5. Email Search Prototype • Use of Social Network from email • Includes extracted objects • Full text of extracted objects • Related objects discovered and ordered by spread activation on social network graph • Faceted search, navigation 11 November 2011

  6. gSemSearch: Graph based Semantic Search • Graph/Network of interacting (interconnected) entities • Discovering relation in the Graph (network) using spread of activation algorithm • Showing relations of concrete type, e.g. telephone numbers related to a person • Navigation over related entities • Full-text search of the entities • User interface for search • User interaction with data (merging, deleting entities) with immediate impact on discovered relations • Tested on Email Enron Corpus • Email Social Network Search • http://ikt.ui.sav.sk/esns/ 11 November 2011

  7. SGDB: Simple Graph Database • Storage for graphs • Optimized for graph traversing and spread of activation • Faster then Neo4j for graph traversing operations • Supports Blueprints API • https://simplegdb.svn.sourceforge.net/svnroot/simplegdb/Sgdb3 • Graph Database Benchmarks • Graph Traversal Benchmark for Graph Databases • http://ups.savba.sk/~marek/gbench.html • Blueprints API - possibility to test compliant Graph databases 11 November 2011

  8. Future Direction: Relations Discovery in Large Graph Data • Motivation • Graph/Network data are everywhere: social networks, web, LinkedData, transactions, communication (email, phone). • Also text can be converted to graph. • Interconnecting graph data and searching for relations is crucial. • Approach • Forming semantic trees and graphs from text, web, communication, databases and LinkedData • User interaction with graph data in order to achieve integration and data cleansing • Users will do it, if user effort have immediate impact on search results 11 November 2011

More Related