1 / 22

Sedna: A Native XML DBMS

Sedna: A Native XML DBMS. Andrey Fomichev Maxim Grinev Sergey Kuznetsov Institute for System Programming of RAS. SOFSEM 2006 23 January. Agenda. Sedna overview and goals Data organization Memory management Query evaluation Conclusion. Challenges.

oni
Download Presentation

Sedna: A Native XML DBMS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sedna: A Native XML DBMS Andrey Fomichev Maxim Grinev Sergey Kuznetsov Institute for System Programming of RAS SOFSEM 2006 23 January

  2. Agenda • Sedna overview and goals • Data organization • Memory management • Query evaluation • Conclusion

  3. Challenges Fernandez, M.F., Semeon, J.: Growing XQuery. ECOOP 2003 • Extending XQuery with data update facilities • Growing XQuery to a program language Physical layer for supporting these aspects is required. The layer is primarily based on • Data structures • Memory management

  4. Sedna Overview • Full-featured database system (external and main memory management, query and update facilities, concurrency etc.) • Native XML database • Based on the XQuery language and the XQuery/XPath data model • XUpdate language • Implemented in Scheme and C/C++ • Supported platforms are Windows and Linux

  5. Agenda • Sedna overview and goals • Data organization • Memory management • Query evaluation • Conclusion

  6. Data Organization • Descriptive schema driven storage strategy is used, which consists in clustering nodes of XML document according to their position in descriptive schema • Direct pointers are used to represent relations between nodes of an XML document such as parent, child and sibling relationships

  7. library book title Descriptive Schema (Data Guide) library <library> <book> <title>Foundation on databases</title> <author>Abiteboul</author> <author>Hull</author> <author>Vianu</author> </book> . . . <book> <title>An Introduction to Database Systems</title> <author>Date</author> <issue> <publisher>Addison-Wesley</publisher> <year>2004</year> </issue> </book> <paper> <title>A Relational Model for Large Shared Data Banks</title> <author>Codd</author> <paper> . . . <paper> <title>The Complexity of Relational Query Languages</title> <author>Codd</author> <paper> </library> book paper title author issue title book publisher year /child::library/child::book/child::title

  8. Data Structures title . . . Indirection table parent prev-in-block node handle next-in-block left-sibling label right-sibling children “by descriptive schema”

  9. Structural query efficiency When we answer structural queries like We • Read only blocks containing necessary information and do not read other blocks • Every block, which is being read, does contain only those nodes that are to be in the answer /child::library/child::book/child::title

  10. Node updates efficiency • Node descriptors have fixed size aside the block • Node descriptors are partly ordered • Immutable numbering scheme • Indirection table for parents parent left- sibling right- sibling node indirection table child … child

  11. Agenda • Sedna overview and goals • Data organization • Memory management • Query evaluation • Conclusion

  12. Memory Management • Pointers are used to present relationships between nodes and traversing nodes results in intensive pointer dereferencing, so the dereferencing operation should be effective • Database address space should be big enough to represent large volumes of data OS memory management restrictions • Restriction on the size of address space caused by 32-bit architecture that prevails nowadays • We can’t control the page replacement (swapping) procedure

  13. Layered Address Space (LAS) Transaction process Layered Address Space (layer, addr) addr OS Virtual Process Address Space MapViewOfFile(Windows) mmap (Linux) Buffer Manager Buffer Memory VirtualLock (Windows) mlock (Linux) layer * LAYER_SIZE + addr External Memory (Disk)

  14. Sedna Memory Management Benefits • Emulating 64-bit virtual address space on the standard 32-bit architecture allows removing restrictions on the size of database • Pointer dereferencing in LAS is comparable to dereferencing of ordinary pointer in a low-level programming language because we map the layer to process virtual address space on an equality basis • The same pointer representation in main and secondary memory is used that allows avoiding costly pointer swizzling

  15. Agenda • Sedna overview and goals • Data organization • Memory management • Query evaluation • Conclusion

  16. Query Evaluation Aspects • Suspended element constructors • Different strategies for XPath queries evaluation • Combining Lazy and Strict Semantics

  17. Element constructors • XML element construction requires deep copy of its content (so, the operation is heavy) • Suspended element constructors (the copy is performed on demand when some operation gets into the constructed element)

  18. book year /library/book/issue/ year[.=2004]/../.. Different strategies for XPath queries evaluation /library/book[issue/year=2004] library book paper title author issue title book publisher year

  19. Combining Lazy and Strict Semantics (1) • Iterative result computation (open; next; close) • Iterative result computation with functional programming language give lazy evaluation • On the other hand, strict semantic of a language is more efficient comparing with lazy semantics • So, we combine strict and lazy semantics for XQuery

  20. Combining Lazy and Strict Semantics (2) • Query evaluations starts in lazy mode • Every function call is a reason to switch to strict mode if the sizes of arguments are relatively small • The large input sequence for any physical operation in the strict mode is the subject to switch to lazy mode

  21. Conclusion • Efficient evaluation of structured XPath queries • Local node-level updates • Effective processing of XML data in main memory comparable to general purpose programming language

  22. Thank you for your attention You can find more about Sedna at http://modis.ispras.ru/Development/sedna.htm

More Related