100 likes | 222 Views
Research Meeting. 2009-12-28 Jaeseok Myung. Summary. 수업 ( 성적입력 ) 학부생졸업논문 ( 이승재 , 김홍찬 ) 서울대 멘토링 진행중 Research SPARQL BGP Processing with Iterative MR Implementation: Hbase WAIM 2010(1/29), VLDB 2010(3/9) How MR works for triples? Why do we need iterative MRs?. Outline.
E N D
Research Meeting 2009-12-28 JaeseokMyung
Summary • 수업(성적입력) • 학부생졸업논문(이승재, 김홍찬) • 서울대 멘토링 진행중 • Research • SPARQL BGP Processing with Iterative MR • Implementation: Hbase • WAIM 2010(1/29), VLDB 2010(3/9) • How MR works for triples? • Why do we need iterative MRs? Center for E-Business Technology
Outline • Introduction • Related Work • BGP Processing with MR • MR Iteration (Join시 MR iteration 발생이유, N-Triple 저장 구조) • Naïve Approach (Single-Random) • Our Approach • Multi-Greedy Algorithm • Discussion (edge preserving, type별 performance, key selection) • Experiments • Environmental Settings (Hadoop, LUBM, Complex Query, Amazon EC2, Converter) • SPARQL Processing Results (node개수 변화, 데이터 size 변화) • Dealing with Intermediate Result (중간의 파일 IO 비용 크다, CGL-MR, MR-Online) • Conclusion (N-Triple보다 복잡한, 압축가능한 저장 구조 및 인덱싱 연구 필요) • Reference Center for E-Business Technology
MapReduce 한재선, SearchDay2008, http://nexr.tistory.com Center for E-Business Technology
How MR works fortriples? (1/2) SELECT ?a ?b WHERE { ?a dbpedia:spouse ?b. ?a dbpedia:wikilinkdbpediares:actor. ?b dbpedia:wikilinkdbpediares:actor. ?a dbpedia:placeOfBirth ?c. ?b dbpedia:placeOfBirth ?c } Actors who are married to each other and born in the same place 1 2 3 4 5 2 4 1 3 5 a1 (1), (2), (4) … a1 a1 a1 a1 b1 a1 b1 a1 a1 b1 b1 a1 place spouse spouse link link place place link spouse place link place b1 c1 c1 actor c1 c1 actor b1 actor actor c1 b1 Mapper … b1 (1), (3), (5) c1 … (4), (5) … Center for E-Business Technology
How MR works for triples? (2/2) SELECT ?a ?b WHERE { ?a dbpedia:spouse ?b. ?a dbpedia:wikilinkdbpediares:actor. ?b dbpedia:wikilinkdbpediares:actor. ?a dbpedia:placeOfBirth ?c. ?b dbpedia:placeOfBirth ?c } Actors who are married to each other and born in the same place 1 2 3 4 5 2 4 1 3 5 a1 a1 spouse b1 (1, 2, 4) link actor … b1 b1 a1 a1 a1 place spouse place link link b1 actor c1 actor c1 Reducer place c1 b1 a1 spouse b1 link actor … (1, 3, 5) c1 a1 place c1 … (4, 5) b1 place … Center for E-Business Technology
Why do we need iterative MR? SELECT ?a ?b WHERE { ?a dbpedia:spouse ?b. ?a dbpedia:wikilinkdbpediares:actor. ?b dbpedia:wikilinkdbpediares:actor. ?a dbpedia:placeOfBirth ?c. ?b dbpedia:placeOfBirth ?c } Actors who are married to each other and born in the same place a|c a 1 2 3 4 5 2 4 a|b 1 b b|c 3 5 a1 a1 spouse b1 (1, 2, 4) link actor … a1 a1 b1 a1 b1 place link spouse link place actor c1 actor b1 c1 place c1 b1 (1, 3, 5) a1 spouse b1 link actor … (4, 5) c1 a1 place c1 … b1 place … … Center for E-Business Technology
Why do we need iterative MR? SELECT ?a ?b WHERE { ?a dbpedia:spouse ?b. ?a dbpedia:wikilinkdbpediares:actor. ?b dbpedia:wikilinkdbpediares:actor. ?a dbpedia:placeOfBirth ?c. ?b dbpedia:placeOfBirth ?c } Actors who are married to each other and born in the same place a|c a 1 2 3 4 5 2 4 a|b 1 b b|c 3 5 a|b b|c a|d 3 1 2 a|c 2 a|b b|c c|d a|b 4 a|e 1 1 2 3 6 a|b b|c c|d d|e a|g 5 a|f 1 2 3 4 (b) (c) (d) … (a) Center for E-Business Technology
Naïve vs. Our Approach • 정리 진행중 Center for E-Business Technology
Outline • Introduction • Related Work • Preliminaries • BGP Processing with MR • MR Iteration (Join시 MR iteration 발생이유, N-Triple 저장 구조) • Naïve Approach (Single-Random) • Our Approach • Multi-Greedy Algorithm • Improvement • Using Advanced Storage for Selection Task • Using Selectivity Info. for Minimizing BGP Iteration • Discussion (edge preserving, type별 performance, key selection) • Experiments • Environmental Settings (Hadoop, LUBM, Complex Query, Amazon EC2, Converter) • SPARQL Processing Results (node개수 변화, 데이터 size 변화) • Dealing with Intermediate Result (중간의 파일 IO 비용 크다, CGL-MR, MR-Online) • Conclusion (N-Triple보다 복잡한, 압축가능한 저장 구조 및 인덱싱 연구 필요) • Reference Center for E-Business Technology