220 likes | 376 Views
Sedna: A Native XML DBMS. Andrey Fomichev Maxim Grinev Sergey Kuznetsov Institute for System Programming of RAS. SOFSEM 2006 23 January. Agenda. Sedna overview and goals Data organization Memory management Query evaluation Conclusion. Challenges.
E N D
Sedna: A Native XML DBMS Andrey Fomichev Maxim Grinev Sergey Kuznetsov Institute for System Programming of RAS SOFSEM 2006 23 January
Agenda • Sedna overview and goals • Data organization • Memory management • Query evaluation • Conclusion
Challenges Fernandez, M.F., Semeon, J.: Growing XQuery. ECOOP 2003 • Extending XQuery with data update facilities • Growing XQuery to a program language Physical layer for supporting these aspects is required. The layer is primarily based on • Data structures • Memory management
Sedna Overview • Full-featured database system (external and main memory management, query and update facilities, concurrency etc.) • Native XML database • Based on the XQuery language and the XQuery/XPath data model • XUpdate language • Implemented in Scheme and C/C++ • Supported platforms are Windows and Linux
Agenda • Sedna overview and goals • Data organization • Memory management • Query evaluation • Conclusion
Data Organization • Descriptive schema driven storage strategy is used, which consists in clustering nodes of XML document according to their position in descriptive schema • Direct pointers are used to represent relations between nodes of an XML document such as parent, child and sibling relationships
library book title Descriptive Schema (Data Guide) library <library> <book> <title>Foundation on databases</title> <author>Abiteboul</author> <author>Hull</author> <author>Vianu</author> </book> . . . <book> <title>An Introduction to Database Systems</title> <author>Date</author> <issue> <publisher>Addison-Wesley</publisher> <year>2004</year> </issue> </book> <paper> <title>A Relational Model for Large Shared Data Banks</title> <author>Codd</author> <paper> . . . <paper> <title>The Complexity of Relational Query Languages</title> <author>Codd</author> <paper> </library> book paper title author issue title book publisher year /child::library/child::book/child::title
Data Structures title . . . Indirection table parent prev-in-block node handle next-in-block left-sibling label right-sibling children “by descriptive schema”
Structural query efficiency When we answer structural queries like We • Read only blocks containing necessary information and do not read other blocks • Every block, which is being read, does contain only those nodes that are to be in the answer /child::library/child::book/child::title
Node updates efficiency • Node descriptors have fixed size aside the block • Node descriptors are partly ordered • Immutable numbering scheme • Indirection table for parents parent left- sibling right- sibling node indirection table child … child
Agenda • Sedna overview and goals • Data organization • Memory management • Query evaluation • Conclusion
Memory Management • Pointers are used to present relationships between nodes and traversing nodes results in intensive pointer dereferencing, so the dereferencing operation should be effective • Database address space should be big enough to represent large volumes of data OS memory management restrictions • Restriction on the size of address space caused by 32-bit architecture that prevails nowadays • We can’t control the page replacement (swapping) procedure
Layered Address Space (LAS) Transaction process Layered Address Space (layer, addr) addr OS Virtual Process Address Space MapViewOfFile(Windows) mmap (Linux) Buffer Manager Buffer Memory VirtualLock (Windows) mlock (Linux) layer * LAYER_SIZE + addr External Memory (Disk)
Sedna Memory Management Benefits • Emulating 64-bit virtual address space on the standard 32-bit architecture allows removing restrictions on the size of database • Pointer dereferencing in LAS is comparable to dereferencing of ordinary pointer in a low-level programming language because we map the layer to process virtual address space on an equality basis • The same pointer representation in main and secondary memory is used that allows avoiding costly pointer swizzling
Agenda • Sedna overview and goals • Data organization • Memory management • Query evaluation • Conclusion
Query Evaluation Aspects • Suspended element constructors • Different strategies for XPath queries evaluation • Combining Lazy and Strict Semantics
Element constructors • XML element construction requires deep copy of its content (so, the operation is heavy) • Suspended element constructors (the copy is performed on demand when some operation gets into the constructed element)
book year /library/book/issue/ year[.=2004]/../.. Different strategies for XPath queries evaluation /library/book[issue/year=2004] library book paper title author issue title book publisher year
Combining Lazy and Strict Semantics (1) • Iterative result computation (open; next; close) • Iterative result computation with functional programming language give lazy evaluation • On the other hand, strict semantic of a language is more efficient comparing with lazy semantics • So, we combine strict and lazy semantics for XQuery
Combining Lazy and Strict Semantics (2) • Query evaluations starts in lazy mode • Every function call is a reason to switch to strict mode if the sizes of arguments are relatively small • The large input sequence for any physical operation in the strict mode is the subject to switch to lazy mode
Conclusion • Efficient evaluation of structured XPath queries • Local node-level updates • Effective processing of XML data in main memory comparable to general purpose programming language
Thank you for your attention You can find more about Sedna at http://modis.ispras.ru/Development/sedna.htm