110 likes | 209 Views
Research Meeting. 2009-07-30 Jaeseok Myung. SPARQL Processing with MR Framework. SPARQL Query. SELECT ?name ? mbox WHERE { ?x foaf:name ?name . ?x foaf:mbox ? mbox . FILTER regex (?name, “^Tim”) && regex (? mbox , “w3c”) } ORDER BY ?name LIMIT 5. MR Framework. Result.
E N D
Research Meeting 2009-07-30 JaeseokMyung
SPARQL Processing with MR Framework SPARQL Query SELECT ?name ?mboxWHERE{ ?xfoaf:name ?name. ?xfoaf:mbox ?mbox. FILTERregex(?name, “^Tim”) && regex(?mbox, “w3c”) } ORDERBY ?nameLIMIT 5 MR Framework Result SPARQL Algebra MR_SLI (slice _ 5 (project (?name ?mbox) (order (?name) (filter (&& (regex ?name "^Tim") (regex ?mbox "w3c")) (bgp (triple ?x <http://xmlns.com/foaf/0.1/name> ?name) (triple ?x <http://xmlns.com/foaf/0.1/mbox> ?mbox) ))))) MR_PRJ MR_ORD MR_FIL MR_BGP Center for E-Business Technology
SPARQL Algebra SELECT ?name ?mboxWHERE { ?xfoaf:name ?name. ?xfoaf:mbox ?mbox. FILTERregex(?name, “^Tim”) && regex(?mbox, “w3c”) } ORDERBY ?nameLIMIT 5 Projection BGP Filter OrderBy Slice Center for E-Business Technology
Basic Graph Pattern SELECT?x ?y ?z WHERE{ ?x type GraduateStudent. ?y type University. ?z type Department. ?x memberOf?z. ?z subOrganizationOf?y. ?x undergraduateDegreeFrom?y. } Dependency between TPs 1 ?x ?x (project (?x ?y ?z) (bgp (triple ?x <type> <GraduateStudent>) (triple ?y <type> <University>) (triple ?z <type> <Department>) (triple ?x <memberOf> ?z) (triple ?z <subOrganizationOf> ?y) (triple ?x <undergraduateDegreeFrom> ?y) )) ?x 6 4 1 2 3 4 5 6 ?y ?z ?z ?y 2 5 3 ?y ?z Center for E-Business Technology
Basic Graph Pattern SELECT?x ?y ?z WHERE{ ?x type GraduateStudent. ?y type University. ?z type Department. ?x memberOf?z. ?z subOrganizationOf?y. ?x undergraduateDegreeFrom?y. } Optimization => Dependency => Selectivity Estimation => Ordering 1 ?x ?x (project (?x ?y ?z) (bgp (triple ?x <type> <GraduateStudent>) (triple ?y <type> <University>) (triple ?z <type> <Department>) (triple ?x <memberOf> ?z) (triple ?z <subOrganizationOf> ?y) (triple ?x <undergraduateDegreeFrom> ?y) )) ?x 6 4 1 2 3 4 5 6 ?y ?z ?z ?y 2 5 3 ?y ?z Center for E-Business Technology
Basic Graph Pattern - MR SELECT?x ?y ?z WHERE{ ?x type GraduateStudent. ?y type University. ?z type Department. ?x memberOf?z. ?z subOrganizationOf?y. ?x undergraduateDegreeFrom?y. } MapReduce => Parallel & Distributed Processing 1 ?x ?x (project (?x ?y ?z) (bgp (triple ?x <type> <GraduateStudent>) (triple ?y <type> <University>) (triple ?z <type> <Department>) (triple ?x <memberOf> ?z) (triple ?z <subOrganizationOf> ?y) (triple ?x <undergraduateDegreeFrom> ?y) )) ?x 6 4 1 2 3 4 5 6 ?y ?z ?z ?y 2 5 3 ?y ?z Center for E-Business Technology
MapReduce • Distributed Processing Framework • Proposed for parallel processing of large data sets • Processing Flow • Map(k,v) -> list(k’, v’) • Reduce(k’, list(v’)) -> list(v’’) • WordCountMapReduce Center for E-Business Technology
MR for BGP public void map(LongWritable key, Text val, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { Triple triple = val.toTriple(); if(triple satisfies s, p, o of triple patterns) { output.collect(corresponding variables, triple); } // 즉 해당 triple이 pattern을 만족시킨다면 (var, triple) 형태로 저장 } public void reduce(Text key, Iterator<IntWritable> vals, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { // ???? } Center for E-Business Technology
Basic Graph Pattern - MR Query Algebra Result MR_BGP MR_BGP 1 1 6 6 4 4 2 2 5 5 3 3 Distributed File System Center for E-Business Technology
SPARQL Processing with MR Framework • SPARQL Algebra에 대한 MR 처리는 BGP 예에서 볼 수 있듯이 매우 Promising함 • ToDo • 구현 • 실험계획 • 구현은 어떻게? • MR로 구현함에 있어서 생각해볼 문제 있음 • M에서 트리플을 변수 별로 나누고 • R에서 조인을 수행 Center for E-Business Technology