A 1 Cycle-Per-Byte XML Accelerator

A 1 Cycle-Per-Byte XML Accelerator Zefu Dai, Nick Ni and Jianwen Zhu Presented by Zefu Dai University of Toronto University of Toronto

What is XML • Extensible Markup Language • A Platform independent tool for data exchange and representation • Widely used in: • Web service • Database system • Scientific application • … University of Toronto

Performance Threat: XML Parsing • 70 mins loading 3 GB XML file, 26x slower than loading plain text • >1s per bank transaction, how many transactions per day? • Average 175 K insts parsing 1KB XML data (IBM XML4C) • With network speed reaching tens of Gbps, XML Parsing speed outstands network improvement as the performance bottleneck University of Toronto

Previous work • Cycle Per Byte (CPB) = Average cycle to process each byte of XML data • Multi-core Acceleration • Require a pre-parsing process, done sequentially • 30 CPB on a 4-core processor • SIMD Acceleration • without in memory tree construction and validation • 6-15 CPB • Hardware Accelerator • Most commercial products do not reveal performance metric and design details • 10-40 CPB University of Toronto

Our Design • Causes of the parsing slowdown • Text-based Data Stream • Variable-length string comparison • Poor memory performance due to streaming and memory back-tracing • An XML Parsing Accelerator implemented in FPGA • Fixed-length string operation • Optimized circuits for string comparison • Common case optimized stallable pipeline • data structure for high bandwidth on-chip memory • Achieve 1 CPB processing speed and saturate 1 Gbps Ethernet link, running at 125 MHz University of Toronto

Outlines • Background • High-level architecture • Design Details • Evaluation University of Toronto

Tasks of XML Parser • Well-formed Checking • Check if the document confirms to XML syntax rules • Schema Validation • Check if the document confirms to XML semantic rules specified in DTD or Schema files • DOM Construction • Capture the parental relationship between elements and attributes and store them into memory in Document Object Model (DOM) format University of Toronto

Well-formed Checking example • Has an unique root element University of Toronto

Well-formed Checking example • Has an unique root element • Elements must be closed and nested properly University of Toronto

Well-formed Checking example • Has an unique root element • Elements must be closed and nested properly • Unique attributes within an element • … University of Toronto

XML Schema Example • Specify permitted child elements/attributes University of Toronto

XML Schema Example • Specify permitted child elements/attributes • Specify type of content University of Toronto

XML Schema Example • Specify permitted child elements/attributes • Specify type of content • Specify occurrence limit • … University of Toronto

DOM Construction • Create in-memory tree structure for XML • Provide application accesses through tree operations University of Toronto

Top Level Diagram University of Toronto

Top Level Diagram <Elem attr=‘xyz’> content </elem> University of Toronto

Top Level Diagram <Elem attr=‘xyz’>content</Elem> University of Toronto

Top Level Diagram <Elem attr=‘xyz’> content </Elem> University of Toronto

Top Level Diagram Elemattr xyz content Elemattr xyz content University of Toronto

Top Level Diagram rule name rule content H(Elem) H(attr) Elemattr xyz content Elemattr xyz content University of Toronto

Top Level Diagram rule name Elem attr rule content xyz content Elem content attr xyz University of Toronto

Recurring Idioms (Dwarfs) • Identified 3 recurring computational idioms (referred to as Dwarfs) • One-to-one String Matching • One-to-many String Membership Test • One-to-many String Search • One of the major reasons accounting for low performance University of Toronto

Dwarf I: One-to-one String Matching • Tests if a subject string equals to a reference string • Example: correct nesting • The string is variable-length • Not efficient on conventional architecture • Solution: memory stack • Convert variable-length string comparison to fixed-length character comparison University of Toronto

Dwarf II: One-to-many String Membership Test • Tests if a subject string equals to any member of a set of reference strings • Example: unique attribute within an element • String comparison against all previously arrived attributes belonging to the same element • Expensive memory back-tracing • Solution: Bloom Filter • achieved in one memory lookup University of Toronto

Dwarf III: One-to-many String Search • “Finds” a subject string among a set of reference strings (different to just “test”) • Example: Search for corresponding schema rule • string comparison against all candidates • Undeterministic look up time • Solution: Balance Routing Table Scheme • Achieved in one memory lookup University of Toronto

Dwarf II: Bloom Filter • Example: attribute name uniqueness checking • Common case: attribute name is unique • Filter out obvious cases using Bloom Filter • Lookup into a bit array instead of compare strings • Uncommon case: attribute name may already exists • Stall the entire design • Do all necessary string comparisons to confirm the existences of the incoming sting • Assumption: low occurring rate (high cost) University of Toronto

Solution II: Bloom Filter • For each attribute name: • Generate N independent hash codes • Look up the bit array • Update the bit array University of Toronto

Solution II: Bloom Filter • For each attribute name: • Generate N independent hash codes • Look up the bit array • Update the bit array Unique! University of Toronto

Solution II: Bloom Filter • For each attribute name: • Generate N independent hash codes • Look up the bit array • Update the bit array False Positive! University of Toronto

Bloom Filter Implementation • Implement the Bloom Filter algorithm in a pipeline • Attribute name usually has multiple characters • Allow multiple processing cycles for each attribute name University of Toronto

Experimental Setup • Software XML parsers test • XML Parsing Accelerator testbed University of Toronto

Benchmarks University of Toronto

Test Results • Metric: Raw Throughput (Gbps) University of Toronto

Test Results • Metric: Cycle Per Byte University of Toronto

Scalability Examination • Bloom Filter efficiency • Test Attribute Name Uniqueness circuit with generated test files • Count the number of false positives University of Toronto

Implementation Cost Target Device: Xilinx Virtex-5 XC5VSX50T University of Toronto

Conclusion • FPGA is a valid contender in XML processing • Low clock frequency requirement to achieve high throughput • Scalable to process large XML documents • Moderate hardware cost to achieve high performance • Future work • Fully conformance to XML specification University of Toronto

Questions? University of Toronto

A 1 Cycle-Per-Byte XML Accelerator

A 1 Cycle-Per-Byte XML Accelerator

Presentation Transcript

A Byte Out of Apple Computer

Shadow Byte Inc .

R Byte Code Optimization Compiler (1)

Bit and Byte

Least Popularity-per-Byte Replacement Algorithm for a Proxy Cache

Cycle 1

Byte Addressability

XML Lecture 1

Byte Order

Keyboard Status Byte

Cycle 1

Scenario 1 - XML

Byte of Python