120 likes | 271 Views
Parsing XML. Grammars, PDAs, Lexical Analysis, Recursive Descent. Recipe Book Markup Language. Why Markup languages? Give structure of contents – aid in interpreting semantics of content, storing in database, etc. Why XML? Human readable (sort of)
E N D
Parsing XML Grammars, PDAs, Lexical Analysis, Recursive Descent
Recipe Book Markup Language • Why Markup languages? • Give structure of contents – aid in interpreting semantics of content, storing in database, etc. • Why XML? • Human readable (sort of) • Widely accepted and used for data interchange • Why RBML? • Don’t reinvent the wheel – use existing stuff IAAP • Simplest of the recipe XML formats I found
Formal Languages • What is a Formal Language? • Mathematically defined subset of strings over a finite alphabet • Regular Languages • Very simple, can be recognized by FSM • Still very powerful • Context-Free Languages • Pretty simple, can be recognized by PDA • Esp. useful for programming language
Regular Expressions/Languages • Alphabet, Σ = finite set of symbols • String, σ = sequence of 0 or more symbols in Σ* • Regular Expressions • The empty set, Ø • The empty string, ε is an RE and denotes {ε} • For all a in Σ,a is an RE and denotes {a} • If r and s are REs, denoting the languages R and S, resp., then (r+s), (rs), and (r*) are REs that denote R U S, RS, and R*, resp.
Context-Free Languages • Context-Free Grammar G=<V,T,P,S> • V = variables • T = terminals (alphabet characters) • P = Productions • S = start symbol in V • Productions • Replace a variable with a string from (V U T)* • Example: E -> E + E | E * E | (E) | id
RBML Grammar cookbook -> “<cookbook>” title (section | recipe)+ “</cookbook>” title -> “<title>” pcdata “</title>” section -> “<section>” title recipe+ “</section>” recipe -> “<recipe>” title recipeinfo ingredientlist preparation serving notes “</recipe>”
RBML Grammar recipeinfo -> <recipeinfo> (author | blurb | effort | genre | preptime | source | yield)* </recipeinfo> ingredientlist -> <ingredientlist> ingredient)* </ingredientlist> preparation -> <preparation> (pcdata | equipment | step | hyperlink)* </preparation> serving -> <serving> (pcdata | hyperlink)* </serving> notes -> <notes> (pcdata | hyperlink)* </notes>
RBML Grammar equipment -> <equipment> (pcdata | hyperlink)* </equipment> step -> <step> (pcdata | equipment | hyperlink)* </step> ingredient -> <ingredient> (pcdata | quantity | unit | fooditem)* </ingredient> quantity -> <quantity> number | number "or" number | number "and" number </quantity> number -> integer | fraction | integer " " fraction fraction -> integer "/" integer
Recipe Book Markup Language unit -> <unit> pcdata </unit> fooditem -> <fooditem> pcdata </fooditem> blurb -> <blurb> pcdata </blurb> effort -> <effort> pcdata </effort> genre -> <genre> pcdata </genre>
Recipe Book Markup Language preptime -> <preptime> pcdata </preptime> source -> <source> (pcdata | hyperlink)* </source> yield -> <yield> pcdata </yield> hyperlink -> pcdataurl
Recursive Descent Parsing • Match required (literal) symbols • Call procedure to match variable • May itself call similar procedures
Lexical Analysis • Helps prepare for parsing • Uses regular language expressions to • Organize input into multi-symbol chunks • Each chunk has a meaning for parser