250 likes | 270 Views
OrientX: A Native XML Database System. XML Group. Outline. Preliminaries Architecture and Features Storage management Achievement Conclusion and Future Work. Outline. Preliminaries Architecture and Features Storage management Achievement Conclusion and Future Work. Legend:. <bib>
E N D
OrientX: A Native XML Database System XML Group
Outline • Preliminaries • Architecture and Features • Storage management • Achievement • Conclusion andFuture Work
Outline • Preliminaries • Architecture and Features • Storage management • Achievement • Conclusion andFuture Work
Legend: <bib> <vendor> <name>LongMark</name> <book isbn="isbn1001"> <title>C++</title> <author> <fname>Rose</fname> <lname>Smith</lname> </author> <price>50</price> </book> <book isbn="isbn1002"> <title>XML</title> <author> <fname>Steven</fname> <lname>Tom</lname> </author> <price>80</price> </book> </vendor> <bib> element node bib text node vendor book book name price price author author title title lname lname fname fname LongMark 50 80 XML C++ Steven Rose Smith Tom 图1 XML 文档和文档树 XML • XML文档和文档树
Legend: element node text node book book price price author author title title lname lname fname fname 80 50 C++ XML Rose Steven Smith Tom XPath&XQuery • XPath XPath is a language for addressing parts of an XML document. bib /bib/vender/book //book bib//book //@lang /bib/vendor/book[last()] //book[price>50] //book/title | //book/price bib vendor name LongMark
XPath&XQuery • XQuery FLWOR "For, Let, Where, Order by, Return" for $x in doc(“bib.xml")/bib/vendor/book where $x/price>30 order by $x/title return $x/title <bib> <vendor> <name>LongMark</name> <book isbn="isbn1001"> <title>C++</title> <author> <fname>Rose</fname> <lname>Smith</lname> </author> <price>50</price> </book> <book isbn="isbn1002"> <title>XML</title> <author> <fname>Steven</fname> <lname>Tom</lname> </author> <price>80</price> </book> </vendor> <bib> bib.xml
XQuery/Update • Insert • Delete • Replace • Replacing a Node • Replacing the Value of a Node • Rename • Transform
Outline • Introduction of XML • Architecture and Features • Storage management • Achievement • Conclusion andFuture Work
Introduction of OrientX • OrientX means: Original RUCIDKENative XML Database • RUC: Renmin University of China • IDKE: Institute of Data and Knowledge Engineering • Native XML DataBase: Exposing a logical model of storing and retrieving XML documents. (non Native XML DataBase: for example, based on relation database)
System Architecture OrientX3.0 system Architecture
Features • Full support to XML Schema • Supporting XQuery1.0, XPath2.0 XQuery/Update (except transform) • A set of programming API • Various native storage techniques • Multi-Query Processing strategies based on native storage. • Friendly UI (java-based)
History • OrientX1.0 (2002-2003) • OrientXStore, schema manager, document importing and exporting. • OrientX1.5 (2003-2004) • Execute XPath, xml numbering, and index manger. • OrientX2.0 (2004-2005) • XQuery Execute Engine based on Navigation • OrientX2.5 (2005-2006) • XQuery Execute Engine based on XML Algebra. • OrientX3.0 (2006-2007) • XQuery/Update: insert, delete, etc.
Outline • Preliminaries • Architecture and Features • Storage management • Achievement • Conclusion andFuture Work
Different storage granularities • Document: • do not decompose the document, build index on it to direct the structure. • Query complexity and efficiency are restricted by the power of index. • Sub tree: • decompose the document into sub trees according to storage space partition. • Persistent the structure in the tree. • save space • Node: • decompose the document into nodes sequence , each node corresponding to a type (element, attribute, …). • May use too many links to persistent relation between nodes
Storage Techniques in OrientX Like DEB, the storing order is depth-first, but each record is a sub-tree. The size of sub tree is close to physical page size One Element is a record, in deep-first order tree One element is a record, but all elements with the same tag name will be clustered-stored. similar to DSB, each record is a sub tree. But all sub trees with the same structure are clustered store. Implemented techniques are marked in red
Example-- Element based • DEB • CEB r t1 a1 a2 r t1 l1 f1 a1 l2 f2 a2 l1 f1 l2 f2 Source doc r t1 l1 l2 f1 f2 a1 a2
Example-- Subtree based r Proxy node (virtual node) t1 a1 a2 f2 l1 f1 l2 Also have Proxy node DOC r r t1 a1 a2 t1 a1 a2 l1 f1 l2 f2 l1 f1 l2 f2 DSB(Depth-first sub-tree based) CSB (clustered sub-tree based)
Outline • Preliminaries • Architecture and Features • Storage management • Achievement • Conclusion andFuture Work
W3C的收录 • 开发了国内首个XML 数据库原型系统OrientX,并于2004年5月在W3C网站上发布,受到国际同行的认可。
同行的高度评价 • 2006年11月德国“Dagstuhl Seminar on XQuery Implementation Paradigms”会议主席在邀请函中特别指出OrientX 系统的研究对于本领域的研究工作来说是一个重要贡献(Your native XML database system OrientX is clearly recognized as a highly significant contribution in this research area)。 • 希腊国家技术大学的Timos Sellis教授(著名数据库专家,R+树发明人)主动与我们联系进行合作研究并得到了科技部中希国际合作交流项目“基于context的XML数据管理研究”的支持。
发表的论文 • 由于XML文档本身的结构特性,使得在应用关系系统管理XML数据的时候面临着数据冗余、查询效率低下等问题。对此,我们以构建Native XML 数据库为目标,从XML数据的存储、编码、索引、查询代数及优化等方面进行了系统深入的研究,在VLDB2003、SIGMOD2004、ICDE2005、DASFAA2003、WWWJ、软件学报等会议和刊物上均有论文发表,被来自ICDE2007、VLDB2007、WWW2007、VLDB2005、IEEE Internet Computing9(2)、SIGMOD WebDB、CIKM2005、EDBT2005、DKE2005等国际会议和刊物引用39次
Outline • Preliminaries • Architecture and Features • Storage management • Achievement • Conclusion and Future Work
Conclusion and Future Work • Conclusion: • OrientX is an integrated, schema-based native XML database system. • It implements storing and querying xml data. • Future work: • XQuery/Update: transform • Further implementation of XML algebra query engine.
Thanks! Q&A