1 / 36

Analysis and Evaluation of a Native XML Database

Analysis and Evaluation of a Native XML Database. Kenneth H. Wenker, Ph.D. May 19, 2003 Dr. C. Edward Chow, Advisor University of Colorado, Colorado Springs. Outline of the Presentation. 1. The nature of this masters project 2. XML and databases 3. NeoCore XML database architecture

chapa
Download Presentation

Analysis and Evaluation of a Native XML Database

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysis and Evaluation of a Native XML Database Kenneth H. Wenker, Ph.D. May 19, 2003 Dr. C. Edward Chow, Advisor University of Colorado, Colorado Springs

  2. Outline of the Presentation 1. The nature of this masters project 2. XML and databases 3. NeoCore XML database architecture 4. Testing architecture 5. Testing results 6. Conclusion

  3. PART 1: The nature of this masters project

  4. Goals of the Master Project • To analyze and explain some significant architectural features of the NeoCore XML database, specifically those that have been patented and are therefore public • To test the performance of those features as they have been implemented in v. 2.6 for Windows • To create benchmarks for evaluating the XML database performance

  5. PART 2: XML and databases

  6. What is an XML Database? • information = data + context • THE CHALLENGE: with XML, not only the data, but also the context, can vary. XML is inherently extensible. • A traditional RMDBS is not inherently extensible. • An “XML-enabled” database is an RMDBS which has been modified in an attempt to handle XML extensibility. • XML-enabled databases need to deal with both storing and reconstructing the tag structure. • A “native XML database” is specifically designed to process XML documents efficiently • Tamino: must provide some sort of structure definition • dbXML: weak doing updates and inserts

  7. USES FOR AN XML DATABASE • e-commerce—since we are frequently using XML to transport data, it would be efficient to store it that way, provided we can store it, manipulate it, and retrieve it easily. • When the tag structures are frequently and unpredictably changing, as in, for example, DNA research. • When the tag structures are more important than the data, as in (perhaps) a next-generation search engine.

  8. XML Database Benchmarks • No authoritative group has recognized any benchmark for XML databases • There are five academic benchmarks • The Michigan Benchmark (Univ. of Michigan) • XOO7 (Nat’l Univ. of Singapore) • XBench (Univ. of Waterloo, Canada) • XMach1 (Univ. of Leipzig) • XMark (National Research Institute for Mathematics and Computer Science in the Netherlands) • The Michigan Benchmark is designed to allow developers to tune databases. The others are usable for this project. • None of them provides an adequate test of extensibility, in my opinion. The extensibility they allow is within the bounds of predefined schema.

  9. XOO7 BENCHMARK • Document Structure: <Module> <Manual>Text of Configurable Length</Manual> <ComplexAssembly> <ComplexAssembly> <BaseAssembly> <CompositePart> <Document>Text of Configurable Length</Document> <Connection> <AtomicPart> • Typical Configuration File: NumAssmPerAssm 3NumCompPerAssm 3NumCompPerModule 50NumAssmLevels 5NumAtomicPerComp 200NumConnPerAtomic 3DocumentSize 1000ManualSize 3800

  10. PART 3: NeoCore XML database architecture

  11. CLIENT SERVER Console (runs on client’s browser) Used for administration Used for manual input NeoCore XMS HTTP DB User Application API C++, Java, VB, .COM, or create your own HTTP HTTP CLIENT/SERVER OVERVIEW

  12. Icon Gene-rator Set of Indices Map Store Tag Dictionary Data Dictionary Figure 2-2. Overview of NeoCore Database Architecture OVERVIEW OF NEOCORE STORAGE ARCHITECTURE

  13. THE “ICON” • It is a transform of a string of indefinite length to a n-bit binary number. • Similar to a hash or CRC code. • The width of the icon is typically 64 bits. The first part of the icon, depending on the number of rows in the index, is used to specify which row of an index is to be accessed. The rest is used as a “confirmer” to be explained shortly. • NeoCore uses a patented technique to create the icon: to create a 64-bit icon you would access four 256-row tables and XOR the resultant values. The technique allows processing 32 bits at a time from the input string.

  14. AN EXAMPLE: 1. The XML Document <my-phone-book> <entry> <name>me</name> <nmbr>6198654227</nmbr> </entry> </my-phone-book>

  15. AN EXAMPLE: 2. The Data Dictionary   me  6198654227                                                                                                                                                                                          

  16. AN EXAMPLE: 3. The Tag Dictionary   my-phone-book>entry>name>   entry>name>   name>   my-phone-book>entry>nmbr>   entry>nmbr>   nmbr>                                                                                                                                                                                                                                                                                                                                                                               

  17. AN EXAMPLE: 4. The Map Store

  18. AN EXAMPLE: 5. An Empty Data Core Index 1000 0111 0110 0101 0100 0011 0010 0001 0000

  19. AN EXAMPLE: 6. The Data Core Index, 1 entry 1000 0111 0110 0101 0100 0011 0010 0001 0000

  20. AN EXAMPLE: 7. The Data Core Index, 2 entries 1000 0111 0110 0101 0100 0011 0010 0001 0000

  21. Dupe Index Map Store Data Dictionary Icon Gen-erator Index “Smith” “Smith” Figure 6-2. Pointers in the Duplicate Index for a Specific Data Item A TYPICAL QUERY for $a in document(“myPhoneBook.xml”)//[last=“Smith”] return count($a) “last>Smith”

  22. KEY ARCHITECTURAL INNOVATIONS • Separate Tag Store and Data Store • No-string Map Store • Multi-Table Icon Generator • 1.5 Look-ups Core Indices • Use of Duplicate Indices

  23. PART 4: Testing architecture

  24. TESTING PHILOSOPHY • Interested only in the core database, not in the broader XMS • For this round of testing, did not evaluate the ability to handle huge amounts of data • A good test of the NeoCore architecture requires testing it with several different kinds of XML documents, as shown on the next page.

  25. DOCUMENT TYPES USED

  26. CLIENT SERVER Console NeoCore XMS Docu- ment Gene- ration Query Gene- ration HTTP KornShell Tools DB OS File System NeoCore API Query Tool HTTP Store Tool TESTING ARCHITECTURE Client and server were run on the same Dell 350 platform, using Windows XP Professional, a 3.0 GHz CPU, and 1.50 GB of RAM.

  27. NINE TESTS 1. Disk space used 2. Flattening & restoring 3. Storage speed 4. Query accuracy 5. Query speed 6. Full core-index check 7. Extensibility test 8. Insertion test 9. Non-indexed string searches

  28. PART 5: Testing results

  29. DISK STORAGE USED Original XOO7 Benchmark Mean Results: approx 3.0:1 NeoCore XOO7 Benchmark Results: approx 2.5:1

  30. Storage Time • Mean time to store a “small3.config” XOO7 Document was slightly over 21 seconds. • The XOO7 team reported the time to prepare or convert the XML document so that it could be stored. The mean time for a “small3.config” document was about 5 minutes. They did not report how much time it subsequently took to store the document. • Our testing: Dell 350, Windows XP Professional, 3GHz CPU, 1.5GB RAM. XOO7 original testing: SunOS 5.7 Unix, 333MHz, 256MB RAM

  31. QUERY TIME

  32. FINDINGS--STRENGTHS • Handles extensibility well—no difference between extensibility documents and XOO7 documents • Indexed searches very fast (if enough RAM)—almost 300 times faster than the original XOO7 findings for some queries (although some of that difference is due to difference in platform capabilities) • Disk requirements about the same as needed by the XOO7 team just for their conversion documents • Efficiency remains even when the core indices are full—effective collision management

  33. FINDINGS--WEAKNESSES • In some cases, we do not get out exactly what was put in: • White-Space inaccuracies • Some dropped comments • Some DOCTYPE information is dropped or enclosed in database-created “<prolog>” tags • Duplicate indices can be a bottleneck • Core indices do not scale easily if expansion is needed

  34. PART 6: Conclusion

  35. FUTURE RESEARCH NEEDS • Conduct this testing on a more heavy-duty platform using more robust data sets. • Test future releases—3.0 to have significant changes. • Analyze the patent for new duplicate indices once it is published on the USPTO site. • Publish an article on the (currently unimplemented) non-indexed substring search algorithm. • Do the same testing with NeoCore competitors, if they will support it (Tamino would not). • Publish related articles in IT journals.

  36. Conclusion • On each key performance indicator the NeoCore XMS is as good or better than the initial results reported by the XOO7 team. • It handles the extensibility of XML well—its primary strength. • To realize full performance, run with enough accessible RAM to hold all indices and still have plenty of room for buffers. • The NeoCore database remains a work in progress: significant improvements are being planned for future releases. • Acceptance likely to depend more on the broader management system than on the database at its heart.

More Related