1 / 43

A 1 Cycle-Per-Byte XML Accelerator

A 1 Cycle-Per-Byte XML Accelerator. Zefu Dai, Nick Ni and Jianwen Zhu Presented by Zefu Dai. University of Toronto. What is XML. Extensible Markup Language A Platform independent tool for data exchange and representation Widely used in: Web service Database system Scientific application

halen
Download Presentation

A 1 Cycle-Per-Byte XML Accelerator

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A 1 Cycle-Per-Byte XML Accelerator Zefu Dai, Nick Ni and Jianwen Zhu Presented by Zefu Dai University of Toronto University of Toronto

  2. What is XML • Extensible Markup Language • A Platform independent tool for data exchange and representation • Widely used in: • Web service • Database system • Scientific application • … University of Toronto

  3. Performance Threat: XML Parsing • 70 mins loading 3 GB XML file, 26x slower than loading plain text • >1s per bank transaction, how many transactions per day? • Average 175 K insts parsing 1KB XML data (IBM XML4C) • With network speed reaching tens of Gbps, XML Parsing speed outstands network improvement as the performance bottleneck University of Toronto

  4. Previous work • Cycle Per Byte (CPB) = Average cycle to process each byte of XML data • Multi-core Acceleration • Require a pre-parsing process, done sequentially • 30 CPB on a 4-core processor • SIMD Acceleration • without in memory tree construction and validation • 6-15 CPB • Hardware Accelerator • Most commercial products do not reveal performance metric and design details • 10-40 CPB University of Toronto

  5. Our Design • Causes of the parsing slowdown • Text-based Data Stream • Variable-length string comparison • Poor memory performance due to streaming and memory back-tracing • An XML Parsing Accelerator implemented in FPGA • Fixed-length string operation • Optimized circuits for string comparison • Common case optimized stallable pipeline • data structure for high bandwidth on-chip memory • Achieve 1 CPB processing speed and saturate 1 Gbps Ethernet link, running at 125 MHz University of Toronto

  6. Outlines • Background • High-level architecture • Design Details • Evaluation University of Toronto

  7. Tasks of XML Parser • Well-formed Checking • Check if the document confirms to XML syntax rules • Schema Validation • Check if the document confirms to XML semantic rules specified in DTD or Schema files • DOM Construction • Capture the parental relationship between elements and attributes and store them into memory in Document Object Model (DOM) format University of Toronto

  8. Well-formed Checking example • Has an unique root element University of Toronto

  9. Well-formed Checking example • Has an unique root element • Elements must be closed and nested properly University of Toronto

  10. Well-formed Checking example • Has an unique root element • Elements must be closed and nested properly • Unique attributes within an element • … University of Toronto

  11. XML Schema Example • Specify permitted child elements/attributes University of Toronto

  12. XML Schema Example • Specify permitted child elements/attributes • Specify type of content University of Toronto

  13. XML Schema Example • Specify permitted child elements/attributes • Specify type of content • Specify occurrence limit • … University of Toronto

  14. DOM Construction • Create in-memory tree structure for XML • Provide application accesses through tree operations University of Toronto

  15. Outlines • Background • High-level architecture • Design Details • Evaluation University of Toronto

  16. Top Level Diagram University of Toronto

  17. Top Level Diagram <Elem attr=‘xyz’> content </elem> University of Toronto

  18. Top Level Diagram <Elem attr=‘xyz’>content</Elem> University of Toronto

  19. Top Level Diagram <Elem attr=‘xyz’> content </Elem> University of Toronto

  20. Top Level Diagram Elemattr xyz content Elemattr xyz content University of Toronto

  21. Top Level Diagram rule name rule content H(Elem) H(attr) Elemattr xyz content Elemattr xyz content University of Toronto

  22. Top Level Diagram rule name Elem attr rule content xyz content Elem content attr xyz University of Toronto

  23. Outlines • Background • High-level architecture • Design Details • Evaluation University of Toronto

  24. Recurring Idioms (Dwarfs) • Identified 3 recurring computational idioms (referred to as Dwarfs) • One-to-one String Matching • One-to-many String Membership Test • One-to-many String Search • One of the major reasons accounting for low performance University of Toronto

  25. Dwarf I: One-to-one String Matching • Tests if a subject string equals to a reference string • Example: correct nesting • The string is variable-length • Not efficient on conventional architecture • Solution: memory stack • Convert variable-length string comparison to fixed-length character comparison University of Toronto

  26. Dwarf II: One-to-many String Membership Test • Tests if a subject string equals to any member of a set of reference strings • Example: unique attribute within an element • String comparison against all previously arrived attributes belonging to the same element • Expensive memory back-tracing • Solution: Bloom Filter • achieved in one memory lookup University of Toronto

  27. Dwarf III: One-to-many String Search • “Finds” a subject string among a set of reference strings (different to just “test”) • Example: Search for corresponding schema rule • string comparison against all candidates • Undeterministic look up time • Solution: Balance Routing Table Scheme • Achieved in one memory lookup University of Toronto

  28. Dwarf II: Bloom Filter • Example: attribute name uniqueness checking • Common case: attribute name is unique • Filter out obvious cases using Bloom Filter • Lookup into a bit array instead of compare strings • Uncommon case: attribute name may already exists • Stall the entire design • Do all necessary string comparisons to confirm the existences of the incoming sting • Assumption: low occurring rate (high cost) University of Toronto

  29. Solution II: Bloom Filter • For each attribute name: • Generate N independent hash codes • Look up the bit array • Update the bit array University of Toronto

  30. Solution II: Bloom Filter • For each attribute name: • Generate N independent hash codes • Look up the bit array • Update the bit array University of Toronto

  31. Solution II: Bloom Filter • For each attribute name: • Generate N independent hash codes • Look up the bit array • Update the bit array University of Toronto

  32. Solution II: Bloom Filter • For each attribute name: • Generate N independent hash codes • Look up the bit array • Update the bit array Unique! University of Toronto

  33. Solution II: Bloom Filter • For each attribute name: • Generate N independent hash codes • Look up the bit array • Update the bit array False Positive! University of Toronto

  34. Bloom Filter Implementation • Implement the Bloom Filter algorithm in a pipeline • Attribute name usually has multiple characters • Allow multiple processing cycles for each attribute name University of Toronto

  35. Outlines • Background • High-level architecture • Design Details • Evaluation University of Toronto

  36. Experimental Setup • Software XML parsers test • XML Parsing Accelerator testbed University of Toronto

  37. Benchmarks University of Toronto

  38. Test Results • Metric: Raw Throughput (Gbps) University of Toronto

  39. Test Results • Metric: Cycle Per Byte University of Toronto

  40. Scalability Examination • Bloom Filter efficiency • Test Attribute Name Uniqueness circuit with generated test files • Count the number of false positives University of Toronto

  41. Implementation Cost Target Device: Xilinx Virtex-5 XC5VSX50T University of Toronto

  42. Conclusion • FPGA is a valid contender in XML processing • Low clock frequency requirement to achieve high throughput • Scalable to process large XML documents • Moderate hardware cost to achieve high performance • Future work • Fully conformance to XML specification University of Toronto

  43. Questions? University of Toronto

More Related