1 / 21

HARDCODING FINITE AUTOMATA

Ernest Ketcha Ngassam Prof. Bruce W. Watson Prof. Derrick G. Kourie Department of Computer Science University of Pretoria Fastar Research Group http://fastar.cs.up.ac.za. HARDCODING FINITE AUTOMATA. FA Definition: ( Σ , S, F, δ , s 0 ) Finite set of Alphabet symbols ( Σ )

Download Presentation

HARDCODING FINITE AUTOMATA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ernest Ketcha Ngassam Prof. Bruce W. Watson Prof. Derrick G. Kourie Department of Computer Science University of Pretoria Fastar Research Group http://fastar.cs.up.ac.za HARDCODING FINITE AUTOMATA

  2. FA Definition: (Σ, S, F, δ, s0) Finite set of Alphabet symbols (Σ) Finite set of states (S) Finite set of accepting states (F) Transition function (δ) Starting state (s0) FAs Context Chomsky hierarchy Right linear grammar Many FAs Applications Pattern matching in text Text indexing Computational genetics Network intrusion detection Computer and natural virus scanning Natural language translation Spell checking Etc. FAs are therefore performance-sensitive CFL CFL RLL RLL CSL CSL UL UL Introductory Remarks

  3. 1986, Penello in “Very fast LR Parsing” System that produces hardcoded parsers in Assembly Language 1988, Horspool and Whitney in “Even Faster LR Parsing” Used Pennello’s idea Additional optimization strategies to reduce the code size Some fine tuning 1995, Bhamidipaty and Proebsting in “Very fast YACC-Compatible Parsers (For Very Little Effort)” YACC produces table-driven Parsers The method produced directly executable hardcoded parsers in C 2002, Kimmel in “Programming with Regular Expressions in C#” Suggests implementation of regular expressions in Assembler Related work

  4. Objective: Determine if a string is in a language represented by an FA? Key issue: Transition table that embeds Alphabet States Entries Uses function / ”controller” recognize(str, transition): boolean Checks for acceptance symbol per symbol from str Transverses the table transition Returns true or false Conventional FA Implementation (i,chk)

  5. No table as data structure Only Primitive data types used Data embedded into algorithm Data are part of the instructions Uses function recognize(str): boolean Checks for acceptance symbol per symbol from str Returns true or false What is a Hardcoded algorithm? read(str[0]); goto label_0; label_0: action_0; read(str[1]); goto label_1; label_1: action_1; read(str[2]); goto label_2; … … label_{n-1}: action_{n-1}; goto decision; … Instructions

  6. Table-driven heavily depends on data Hardcoded heavily depends on instructions Computationally equivalents O(len) Need to perform empirical evaluation! Table-driven vs. Hardcoded Algorithms Hardcoded Table-driven

  7. Based on single symbol recognition Easy to implement Problem domain restricted Various implementation strategies for the hardcoded algorithm High-level language (2 variations) Low-level language (3 variations) Baseline for string recognition Table-driven Algorithm reflects work for any transition function Hardcoded Algorithm reflects work for specific transition function a e s0 d Preliminary Experiments Transition array

  8. Generate random transition array Measure clock cycles using The control program for Table-driven (C++) Hardcoded program (5 variations) Switch statement (C++) Nested conditionals (C++) Linear search (ASM) Jump table (ASM) Direct jump (ASM) The Experiment & Data Collection

  9. Just an indication on how to continue with experiments Hardcode outperforms table-driven (in low-level language) Conclusion: Rely on jump table version for further experiments Use it to explore cache effects Preliminary Results

  10. Language based on: Accepting symbol (a) Rejecting symbol (b) In each of the n-1 states a :triggers a transition to the next state b : does not trigger transition Only string accepted: aaa…aaa (n-1 times) Represents worst case scenario Not concerned about reducing the FA Use Jump table and table-driven versions a a a a 1 2 3 n A Simple String Test Experiment

  11. Table-driven (2 symbols alphabet) Hardcode (2 symbols alphabet) Hardcode (single state) Table-driven (single state) Performance based on 2 symbols alphabet • Remark: • Caching effect on the hardcoded version • L1 cache (Hits) between 10 states and about 110 states • L1 cache (Misses) between 160 states and about 360 states • L2 cache (Hits) between 460 states and 1700 states • Slow L2 cache (Misses) from 1800 states then need Main memory

  12. 1 2 3 n The String Recognition Experiment String

  13. Two ways of Implementing a string recognizer: 1 2 3 n The String Recognition Experiment String

  14. Two ways of Implementing a string recognizer: Implementation based on direct indexing 1 2 3 n (i,val(strk)) The String Recognition Experiment String

  15. Two ways of Implementing a string recognizer: Implementation based on direct indexing Implementation based on symbol searching 1 2 3 n (i, pos(strk)) a b c d e Array of alphabet symbols The String Recognition Experiment String

  16. Two ways of Implementing a string recognizer: Implementation based on direct indexing Implementation based on symbol searching Binary search Linear search We used Linear search 1 2 3 n (i, pos(strk)) a b c d e Array of alphabet symbols The String Recognition Experiment String

  17. 1 2 3 n The String Recognition Experiment • Language based on: • 10-symbol alphabet • Number of states between 10 and 4000 • Randomly generate accepting string of length n-1 (n automaton size) • Filling density of each automaton sets to 41%

  18. Hardcode searching Table-driven searching Hardcode direct index Table-driven direct index The String Recognition Experiment • Remarks and Finding: • Caching effect on the hardcoded version • Noises due to Branch Prediction Buffer • Wrong guesses in the Branch History Buffer • Hardcoding outperforms table-driven up to a thousand states

  19. Dynamic Implementation of Finite Automata for Performance (DIFAP) using: Table-driven Linked list Hardcode Fine tuning, Constraints, Etc. An Adaptive method for DIFAP (A-DIFAP) Adapts to system’s/platform’s constraints at run-time Programming Language specific toolkit for DIFAP / A-DIFAP Exploits programming language’s features Future Work

  20. Preliminary Experiments on Hardcoding Finite Automata. CIAA 2003 Hardcoding Finite State Automata Processing. SAICSIT 2003 Hardcoding Finite State Automata Processing. (Submitted to SACJ) On Hardcoding Finite State Automata Processing. Technical Report T/UE 2003. The Effect of Cache Memory on Hardcoded Finite Automata (To be submitted to SP&E) Publications

  21. Questions? Thanks!

More Related