Sedna: A Native XML DBMS

Sedna: A Native XML DBMS Andrey Fomichev Maxim Grinev Sergey Kuznetsov Institute for System Programming of RAS SOFSEM 2006 23 January

Agenda • Sedna overview and goals • Data organization • Memory management • Query evaluation • Conclusion

Challenges Fernandez, M.F., Semeon, J.: Growing XQuery. ECOOP 2003 • Extending XQuery with data update facilities • Growing XQuery to a program language Physical layer for supporting these aspects is required. The layer is primarily based on • Data structures • Memory management

Sedna Overview • Full-featured database system (external and main memory management, query and update facilities, concurrency etc.) • Native XML database • Based on the XQuery language and the XQuery/XPath data model • XUpdate language • Implemented in Scheme and C/C++ • Supported platforms are Windows and Linux

Data Organization • Descriptive schema driven storage strategy is used, which consists in clustering nodes of XML document according to their position in descriptive schema • Direct pointers are used to represent relations between nodes of an XML document such as parent, child and sibling relationships

library book title Descriptive Schema (Data Guide) library <library> <book> <title>Foundation on databases</title> <author>Abiteboul</author> <author>Hull</author> <author>Vianu</author> </book> . . . <book> <title>An Introduction to Database Systems</title> <author>Date</author> <issue> <publisher>Addison-Wesley</publisher> <year>2004</year> </issue> </book> <paper> <title>A Relational Model for Large Shared Data Banks</title> <author>Codd</author> <paper> . . . <paper> <title>The Complexity of Relational Query Languages</title> <author>Codd</author> <paper> </library> book paper title author issue title book publisher year /child::library/child::book/child::title

Data Structures title . . . Indirection table parent prev-in-block node handle next-in-block left-sibling label right-sibling children “by descriptive schema”

Structural query efficiency When we answer structural queries like We • Read only blocks containing necessary information and do not read other blocks • Every block, which is being read, does contain only those nodes that are to be in the answer /child::library/child::book/child::title

Node updates efficiency • Node descriptors have fixed size aside the block • Node descriptors are partly ordered • Immutable numbering scheme • Indirection table for parents parent left- sibling right- sibling node indirection table child … child

Memory Management • Pointers are used to present relationships between nodes and traversing nodes results in intensive pointer dereferencing, so the dereferencing operation should be effective • Database address space should be big enough to represent large volumes of data OS memory management restrictions • Restriction on the size of address space caused by 32-bit architecture that prevails nowadays • We can’t control the page replacement (swapping) procedure

Layered Address Space (LAS) Transaction process Layered Address Space (layer, addr) addr OS Virtual Process Address Space MapViewOfFile(Windows) mmap (Linux) Buffer Manager Buffer Memory VirtualLock (Windows) mlock (Linux) layer * LAYER_SIZE + addr External Memory (Disk)

Sedna Memory Management Benefits • Emulating 64-bit virtual address space on the standard 32-bit architecture allows removing restrictions on the size of database • Pointer dereferencing in LAS is comparable to dereferencing of ordinary pointer in a low-level programming language because we map the layer to process virtual address space on an equality basis • The same pointer representation in main and secondary memory is used that allows avoiding costly pointer swizzling

Query Evaluation Aspects • Suspended element constructors • Different strategies for XPath queries evaluation • Combining Lazy and Strict Semantics

Element constructors • XML element construction requires deep copy of its content (so, the operation is heavy) • Suspended element constructors (the copy is performed on demand when some operation gets into the constructed element)

book year /library/book/issue/ year[.=2004]/../.. Different strategies for XPath queries evaluation /library/book[issue/year=2004] library book paper title author issue title book publisher year

Combining Lazy and Strict Semantics (1) • Iterative result computation (open; next; close) • Iterative result computation with functional programming language give lazy evaluation • On the other hand, strict semantic of a language is more efficient comparing with lazy semantics • So, we combine strict and lazy semantics for XQuery

Combining Lazy and Strict Semantics (2) • Query evaluations starts in lazy mode • Every function call is a reason to switch to strict mode if the sizes of arguments are relatively small • The large input sequence for any physical operation in the strict mode is the subject to switch to lazy mode

Conclusion • Efficient evaluation of structured XPath queries • Local node-level updates • Effective processing of XML data in main memory comparable to general purpose programming language

Thank you for your attention You can find more about Sedna at http://modis.ispras.ru/Development/sedna.htm

Sedna: A Native XML DBMS

Sedna: A Native XML DBMS

Presentation Transcript

Native Plants

The Native Americans

Database Systems Kernel

Exotics vs. native pests

Native American Women

NATIVE AMERICANS OF OHIO

Native American Culture vs. Western Culture (European)

Secure your native code

Building a Proactive Monitoring and Alerting System Using Native IBM Domino Tools

Data Management: Databases and Organizations Richard Watson

Practical matters + Case studies

From Relational Algebra to SQL

During de Soto’s search for gold in Georgia, his soldiers killed thousands of American Indians.

Drama Jeopardy

8. Distributed DBMS Reliability

DBMS TECHNOLOGY AND APPLICATION

Native Americans of Arkansas