910 likes | 1.13k Views
Parsing with Boost.Spirit. Rob Stewart robert.stewart@sig.com. Overview. Introduction to Boost.Spirit Parsing with Qi Parsing ping command output Problems using Qi. Introduction to Boost.Spirit. Introduction to Boost.Spirit. Three sub-libraries Lex : Lexical analysis Qi: Parsing
E N D
Parsing with Boost.Spirit Rob Stewart robert.stewart@sig.com
Overview • Introduction to Boost.Spirit • Parsing with Qi • Parsing ping command output • Problems using Qi
Introduction to Boost.Spirit • Three sub-libraries • Lex: Lexical analysis • Qi: Parsing • Karma: Generating output • DSELs • Clear, readable because targeted to domain • Use within your C++ code • No external tools required
Boost.Spirit.Lex • Tokenizes input • Parses character sequence • Produces tokens • Applies your grammar • Separates tokenization from analysis • Reduces complexity of parser • Not covered in this presentation
Boost.Spirit.Qi • Converts sequence of tokens or characters • Implements a recursive descent parser • Parsing Expression Grammar (PEG) based • Similar to Extended Backus-Naur Form (EBNF) • Not ambiguous • Well-suited to computer languages • Ill-suited to natural languages • Replaces uses of scanf(), regular expressions, and tokenizers • Much more powerful and flexible than common tools
Boost.Spirit.Karma • Produces character sequence from data • Can replace uses of printf(), std::ostream, boost::format(), etc. • Much more powerful and flexible than common output tools • Inverse of Qi • Not covered in this presentation
Parsing Basics • Iterate input sequence • Optionally tokenize • Apply grammar • Indicate a match • Produce side effects • Save text • Convert text to another type • Call a function
Parsers like Function Objects • Arguments: Inherited Attributes • Return value: Synthesized Attribute • State
Parser Concept boolparse(FwdIt, FwdIt, Context, Skipper, Attribute); infowhat(Context);
Kinds of Parsers • Primitive • char_, float_, int_, lit, etc. • Rule • Placeholder for one or more parsers • Reusable • Support recursion • Have a name (empty by default) • Grammar: • Encapsulates a set of rules, parsers, and nested grammars • High level abstraction • Offers modularization and composition
Parsers for doubles • To parse one double: boost::spirit::qi::double_ • To parse two whitespace-delimited doubles: double_ >> double_ • Parsing zero or more doubles: *double_ • Parsing a comma-delimited list of doubles: double_ >> *(lit(',') >> double_)
Parsing a Comma-delimited List of doubles double_ >> *(lit(',') >> double_)
Parsing a Comma-delimited List of doubles double_ >> *(lit(',') >> double_) Matches sign, mantissa, and exponent
Parsing a Comma-delimited List of doubles double_ >> *(lit(',') >> double_) Left side might be followed by right side
Parsing a Comma-delimited List of doubles double_ >> *(lit(',') >> double_) Kleene star: zero or more
Parsing a Comma-delimited List of doubles double_ >> *(lit(',') >> double_) Matches a comma which won’t be added to the synthesized attribute
Parsing a Comma-delimited List of doubles double_ >> *(lit(',') >> double_)
Parsing a Comma-delimited List of doubles double_ >> *(lit(',') >> double_)
Parsing a Comma-delimited List of doubles double_ >> *(lit(',') >> double_) double_ % ',' Qi extends PEG operators for convenience
Parsing Functions • boost::spirit::qi::parse() • Parses exactly what’s described by the supplied parser • Provides complete control over where whitespace may occur • Appropriate when parsing token sequences from Lex • boost::spirit::qi::phrase_parse() • Applies a skip parser between parsers comprising the main parser • Simplifies delimiter handling • Can disable for specific parts of the main parser
Using parse() template <class It> bool matches(It _first, It _last) { return parse(_first, _last, double_ % ','); }
Using phrase_parse() template <class It> bool matches(It _first, It _last) { return phrase_parse(_first, _last, double_ % ',', space); }
Reality Isn’t Quite So Pretty #include <boost/spirit/include/qi.hpp> template <class It> bool matches(It _first, It _last) { using boost::spirit::qi::double_; using boost::spirit::qi::lit; using boost::spirit::qi::phrase_parse; using boost::spirit::ascii::space; return phrase_parse(_first, _last, double_ % ',', space); }
Reality Isn’t Quite So Pretty #include <boost/spirit/include/qi.hpp> namespace qi = boost::spirit::qi; template <class It> bool matches(It _first, It _last) { using boost::spirit::ascii::space; return qi::phrase_parse(_first, _last, qi::double_ % ',', space); }
Deconstructing phrase_parse() Calls template <class It> bool matches(It _first, It _last) { return phrase_parse( _first, _last, double_ % ',', space) && _first == _last; }
Deconstructing phrase_parse() Calls template <class It> bool matches(It _first, It _last) { return phrase_parse( _first, _last, double_ % ',', space) && _first == _last; } Half open input range of characters
Deconstructing phrase_parse() Calls template <class It> bool matches(It _first, It _last) { return phrase_parse( _first, _last, double_ % ',', space) && _first == _last; } The parser to apply
Deconstructing phrase_parse() Calls template <class It> bool matches(It _first, It _last) { return phrase_parse( _first, _last, double_ % ',', space) && _first == _last; } The skip parser
Deconstructing phrase_parse() Calls template <class It> bool matches(It _first, It _last) { return phrase_parse( _first, _last, double_ % ',', space) && _first == _last; } Check that the entire input range was consumed
ping Command Output PING www.google.com (74.125.131.147) 56(84) bytes of data. 64 bytes from vc-in-f147.1e100.net (74.125.131.147): icmp_seq=1 ttl=39 time=24.6 ms 64 bytes from vc-in-f147.1e100.net (74.125.131.147): icmp_seq=2 ttl=39 time=20.5 ms 64 bytes from vc-in-f147.1e100.net (74.125.131.147): icmp_seq=3 ttl=39 time=18.9 ms --- www.google.com ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2003ms rtt min/avg/max/mdev = 18.984/21.411/24.697/2.410 ms
Creating the ping Parser template <class It, class Skipper>class ping::parser : public qi::grammar<It,Skipper>{public:parser() { // grammar here } private: // rules here};
Creating the ping Parser template <class It, class Skipper>class ping::parser : public qi::grammar<It,Skipper>{public:parser() { // grammar here } private: // rules here};
Creating the ping Parser template <class It, class Skipper>class ping::parser : public qi::grammar<It,Skipper>{public:parser() { // grammar here } private: // rules here};
Creating the ping Parser template <class It, class Skipper>class ping::parser : public qi::grammar<It,Skipper>{public:parser() { // grammar here } private: // rules here};
Creating the ping Parser public: parser() : parser::base_type(start, "ping parser") { } private: qi::rule<It,Skipper> start;
Creating the ping Parser public: parser() : parser::base_type(start, "ping parser") { } private:qi::rule<It,Skipper> start;
Creating the ping Parser public: parser() : parser::base_type(start, "ping parser") { } private: qi::rule<It,Skipper> start;
start Rule PING www.google.com (74.125.131.147) 56(84) bytes of data. start = lit("PING") …
start Rule PING www.google.com (74.125.131.147) 56(84) bytes of data. start = lit("PING")> host …
start Rule PING www.google.com (74.125.131.147) 56(84) bytes of data. start = lit("PING")> host> ip_address …
start Rule PING www.google.com (74.125.131.147) 56(84) bytes of data. start = lit("PING")> host> ip_address> +(char_ - '.') > '.' …
start Rule PING www.google.com (74.125.131.147) 56(84) bytes of data. start = lit("PING")> host> ip_address> +(omit[char_] - '.') > '.' …
start Rule PING www.google.com (74.125.131.147) 56(84) bytes of data. start = lit("PING")> host> ip_address> +(omit[char_] - '.') > '.' > eol …
start Rule PING www.google.com (74.125.131.147) 56(84) bytes of data. start = lit("PING")>host>ip_address>+(omit[char_] - '.') >'.'>eol …
start Rule PING www.google.com (74.125.131.147) 56(84) bytes of data. 64 bytes from vc-in-f147.1e100.net (74.125.131.147): icmp_seq=1 ttl=39 time=24.6 ms 64 bytes from vc-in-f147.1e100.net (74.125.131.147): icmp_seq=2 ttl=39 time=20.5 ms 64 bytes from vc-in-f147.1e100.net (74.125.131.147): icmp_seq=3 ttl=39 time=18.9 ms start = lit("PING")>host> ip_address> +(omit[char_] - '.') > '.' > eol >> *(reply > eol) …
start Rule PING www.google.com (74.125.131.147) 56(84) bytes of data. 64 bytes from vc-in-f147.1e100.net (74.125.131.147): icmp_seq=1 ttl=39 time=24.6 ms 64 bytes from vc-in-f147.1e100.net (74.125.131.147): icmp_seq=2 ttl=39 time=20.5 ms 64 bytes from vc-in-f147.1e100.net (74.125.131.147): icmp_seq=3 ttl=39 time=18.9 ms --- www.google.com ping statistics ---start = lit("PING") … >> *(reply > eol) > eol > +(omit[char_("A-Za-z0-9.-")]) > eol …
start Rule PING www.google.com (74.125.131.147) 56(84) bytes of data. 64 bytes from vc-in-f147.1e100.net (74.125.131.147): icmp_seq=1 ttl=39 time=24.6 ms 64 bytes from vc-in-f147.1e100.net (74.125.131.147): icmp_seq=2 ttl=39 time=20.5 ms 64 bytes from vc-in-f147.1e100.net (74.125.131.147): icmp_seq=3 ttl=39 time=18.9 ms --- www.google.com ping statistics ---start = lit("PING") … >> *(reply > eol) > eol > +(omit[char_] - eol) > eol …