1 / 12

ReBug: A Regex Debugger

ReBug: A Regex Debugger. Michel Lambert mlambert@mit.edu Massachusetts Institute of Technology http://perl.jall.org/rebug/ Perl Conference 5, Grande Ballroom B. The Basic Idea. Three basic parts: Instrument the regex (aka: debuggerizing) Run the regex Analyze the data returned.

holly-wynn
Download Presentation

ReBug: A Regex Debugger

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ReBug: A Regex Debugger Michel Lambert mlambert@mit.edu Massachusetts Institute of Technology http://perl.jall.org/rebug/ Perl Conference 5, Grande Ballroom B

  2. The Basic Idea • Three basic parts: • Instrument the regex (aka: debuggerizing) • Run the regex • Analyze the data returned

  3. A New Feature • Perl 5.6.0’s new regex operator: (?{}) • Perl-in-a-regex • Called every time the token is matched • To find out how far ‘through’ a regex we are, we can study the order the callbacks get called

  4. Instrument the Regex • Adding the tokens • Many tokens needed to see the match • /a/ becomes: • /(?{callback()})a(?{callback()})/ • /c*d/ becomes: • (?:c(?:{callback()}))*d(?:{callback()})/ • Requires that we parse the regex entirely

  5. Parsing the Regex • Regexes have a simple language • Linear token stream • Insert (?{callback()}) after each token • Parenthesized expression is a ‘nested’ token • Parse it recursively, tokenizing the subexpression • Regex::Tokenizer creates a stream of tokens • Regex::Debuggerizer creates instrumented regex

  6. Regex::Tokenizer • Regex ‘language’: • regex = item* • item = token quantifier • token = char, char-class, nested token • quantifier = * + ? *? +? ?? {3} {3,} {3,5} • nested token = (?:a*) (?>b) (abc) (?!d)

  7. quantifier: [*+?]?\?? | \{ \d+(?:,\d*) \} nested token prefix: \?(?: [:=!>] | <[=!] ) | (?=[^?]) matching parenthesis: lazily-evaled regexes $parens = qr{ \( (?: (?>(?:\\. | [^()] )+ ) | (??{ $parens }) )* \)}x; Regex::Tokenizer

  8. Extracting Information • Dependant upon the debugger’s feature set • Target string information • $`, $&, $’, $1 • Querying these variables during the regex match works perfectly fine • Regex information • Current place in regex, and the current token • No easy way, but by encoding data during debuggerizing, we can give the callback additional information about the state of the regex at that point

  9. Additional Features • Step/Go Forwards….and Backwards • Less state information with a regex machine • Can easily record series of regex snapshots to allow freeform time travel through the match • Should be independent of flaws in regex • ‘Infinite loop’ regexes should be debuggable • Through forking the parsing into a regex matching backend and a responsive Tk frontend, IPC can allow us to communicate during the regex match

  10. How it Works • The debugger engine waits for ‘match-next-token’ • The frontend asks for new state data as needed, and stores retrieved data (real regex matches can’t be rewound) • The user uses VCR controls to interface • Colored text highlighting displays data

  11. Demonstration • The fun part • Files: • /rebug.pl – the simple wrapper around the modules • IPC::Meiosis – splits program into front/backend • IPC::Talk – communication interface • Regex::Tokenizer – tokenizes the regex • Regex::Debuggerizer – instruments regex with Regex::Tokenizer • Regex::Debugger::State – debugger’s state object • Regex::Debugger – backend: handles the regex match and state encapsulation • Regex::Interface – frontend: provides the Tk interface code and querying logic

  12. Screenshots (Plan B) • To be completed this week…

More Related