Parallel XML Parsing Using Meta-DFAs

Parallel XML Parsing Using Meta-DFAs Yinfei Pan1, Ying Zhang1, Kenneth Chiu1,Wei Lu2 1State University of New York (SUNY) Binghamton 2Indiana University

Motivation • XML is gained wide prevalence as a data format for input and output. • Multicore CPUs are becoming widespread. • Plans for 100 cores. • If you have 100 cores, and you are only using one to read and write your output, that could be a significant waste.

Parallel XML Parsing • How can XML parsing be parallelized? • Task parallelism. • Pipeline parallelism. • Data parallelism.

Task parallelism. • Multiple independent processing steps. • The sauce for a dish with sauce can be made in parallel to the main part. Step 2A Core 1 Step 1 Step 3 Step 2B Core 1 Core 1 Core 2 Time

Pipeline parallelism. • Multiple stages, all simultaneously performed in parallel. • If you are making two cakes (but only have one oven), you can start mixing the batter for the second cake while the first one is in the oven. Core 3 Core 1 Core 2 Time Stage 1Data C Stage 2Data B Stage 3Data A Stage 1Data D Stage 2Data C Stage 3Data B Stage 1Data E Stage 2Data D Stage 3Data C

Data parallelism • Divide the data up, process multiple pieces in parallel. Input Chunk 1 Input Chunk 2 Input Chunk 3 Core 1 Core 2 Core 3 Output Chunk 1 Output Chunk 1 Output Chunk 1 Merge Output

But XML is Inherently Sequential • How can a chunk be parsed without knowing what came before? • The parser doesn’t know what state to start in. • Could do various scanning forwards and backwards, but it is ad hoc, and tricky. • Special characters like < can be in comments. <element attr=“value”>content</element>

Previous work • We used a fast, sequential preparse scan • Build an outline of the document (skeleton) • Skeleton are used to guide full parse by first decomposing XML document into well-formed fragments on well-defined unambiguous positions • The XML fragments are parsed separately on each core by Libxml2 APIs • Merge the results into final DOM with Libxml2 APIs • The preparse is sequential, however, so Amdahl’s law kicks in. We scale well to 4 cores, or so. • So how can we parallelize the preparse?

DFA for XML Parsing • First, we model XML parsers as a DFA with states, transitions and actions on transitions. • The transition function maps from the current state and input character to a new state and an action. • As it makes transitions, it also encounters and executes a sequence of actions. • This model is general enough so that we believe the large majority of parsers can fit in this model.

a < / ! ' " 5 > a > a > / ! ' 6 2 ! ( END ) / < " a ' " ! / > 0 1 " ( START ) a > ( END ) > a ' > / ! 7 " 4 3 / ' a Example: The Preparsing DFA • The preparsing DFA has two actions: START and END, which are used to build the skeleton during execution of the DFA.

START END 0 1 3 3 0 0 0 1 2 2 0 Example of running preparsing DFA <foo>sample</foo> How can this be parallelized?

Meta-DFA • Goal • Pursues simultaneously all possible states at the beginning of a chunk when a processor is about to parse the chunk • Achieved by: • Transforming the original DFA to a meta-DFA whose transition function runs multiple instances of the original DFA in parallel via sub-DFAs • For each state q of the original DFA, the meta-DFA includes a complete copy of the DFA as a sub-DFA which begins execution in state q at the beginning of the chunk • For the actual execution, the meta-DFA transitions from a vector of states to another vector of states

a a a > < > < a d a [ 1 , ] d [ , 0 ] > < < > [ d , 1 ] > < d d [ , ] ] [ 0 , d Constructing meta-DFA • Steps on constructing meta-DFA a a < 0 1 > [ 0 , 1 ]

Output Merging • Since the meta-DFA pursues multiple possibilities simultaneously, there are also multiple outputs when a chunk is finished. • One corresponding to each possible initial state. • We know definitively the state at the end of the first chunk. • This is used to select which output of the second chunk is the correct one. • The definitive state at the end of the second chunk is now known. • Etc.

Performance Evaluation • Machine: • Sun E6500 with 30 400 MHz US-II processors • Operating System: Solaris 10 • Compiler: g++ 4.0 with the option -O3 • XML Standard Library: Libxml2 2.6.16 • Tests: • We take the average of ten runs • Test file is selected from a well-known project named Protein Data Bank (PDB), sized to 34 MB • All the speedups are measured against parsing with stand-alone Libxml2

The full parsing process is: • First do a parallel preparse using a meta-DFA. This generates an outline of the document known as the skeleton. • Then use techniques based on parallel depth-first tree search to parallelize the full parse. • Subtrees of the document are parsed using unmodified libxml2.

Preparser Speedup • Parallel preparser relative to the non-parallel preparser

Speedup on parallel full parsing • After applying our meta-DFA technique in parallizing the preparsing stage, the parallel full parsing is now scalable.

Analysis

Summary • Data parallel XML parsing is challenging because the parser does not know in which state to begin a chunk. • One solution is to simply begin the parser in all states simultaneously. • This can be achieved by modeling the parser as a DFA with actions, then transforming the DFA into a meta-DFA (product machine). • The meta-DFA runs multiple instances of the original DFA, one instance for each state of the original DFA. • The number of states in the meta-DFA is finite, so it is also a DFA and can be executed by a single core. • The parallelism of the meta-DFA is logical parallelism.

Questions

Apply Multi-core CPUs on XML • The chip industry trend now is to • design multiple cores instead of faster CPUs. • So, this call for real parallel techniques • XML now is very common in message-oriented interactions, but parsing it is slow • Some solutions with multi-core parallelism • One core for parsing one XML message • Problem: If the application must first fully process one large XML file, other cores will idle. • Parallelism available on single XML input • This is our work: Parallel DOM-style XML parsing

Problem of Previous Work • The preparse stage is sequential, so only scale to 4 cores • In this paper, we • Can parallize this stage. We only need to know the size of the XML document, and physically decomposed the document into equal-sized chunks. • In doing so, we introduce an XML parsing model with a DFA. Our preparsing can be modeled by such parsing model. We then do parallelization through transforming the DFA to a meta-DFA.

Introduction • Performance concerns of XML parsing • XML is self-descriptive, so verbose by design • One way to improve the performance • Using multi-core to do parallel processing

Parse single XML input in parallel • We do it by: • Decompose the XML document into chunks • Parse each chunk in parallel, use one core per chunk • Merge the results of each chunk into final true result • However, we met problem as: • Each chunk cannot be unambiguously parsed independently, since the true state of the parser when start a chunk is unknown until preceding chunks are parsed. • Our previous work did something effective…

Parallel XML Parsing Using Meta-DFAs