1 / 12

Parsing XML

Parsing XML. Grammars, PDAs, Lexical Analysis, Recursive Descent. Recipe Book Markup Language. Why Markup languages? Give structure of contents – aid in interpreting semantics of content, storing in database, etc. Why XML? Human readable (sort of)

kimberly
Download Presentation

Parsing XML

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parsing XML Grammars, PDAs, Lexical Analysis, Recursive Descent

  2. Recipe Book Markup Language • Why Markup languages? • Give structure of contents – aid in interpreting semantics of content, storing in database, etc. • Why XML? • Human readable (sort of) • Widely accepted and used for data interchange • Why RBML? • Don’t reinvent the wheel – use existing stuff IAAP • Simplest of the recipe XML formats I found

  3. Formal Languages • What is a Formal Language? • Mathematically defined subset of strings over a finite alphabet • Regular Languages • Very simple, can be recognized by FSM • Still very powerful • Context-Free Languages • Pretty simple, can be recognized by PDA • Esp. useful for programming language

  4. Regular Expressions/Languages • Alphabet, Σ = finite set of symbols • String, σ = sequence of 0 or more symbols in Σ* • Regular Expressions • The empty set, Ø • The empty string, ε is an RE and denotes {ε} • For all a in Σ,a is an RE and denotes {a} • If r and s are REs, denoting the languages R and S, resp., then (r+s), (rs), and (r*) are REs that denote R U S, RS, and R*, resp.

  5. Context-Free Languages • Context-Free Grammar G=<V,T,P,S> • V = variables • T = terminals (alphabet characters) • P = Productions • S = start symbol in V • Productions • Replace a variable with a string from (V U T)* • Example: E -> E + E | E * E | (E) | id

  6. RBML Grammar cookbook -> “<cookbook>” title (section | recipe)+ “</cookbook>” title -> “<title>” pcdata “</title>” section -> “<section>” title recipe+ “</section>” recipe -> “<recipe>” title recipeinfo ingredientlist preparation serving notes “</recipe>”

  7. RBML Grammar recipeinfo -> <recipeinfo> (author | blurb | effort | genre | preptime | source | yield)* </recipeinfo> ingredientlist -> <ingredientlist> ingredient)* </ingredientlist> preparation -> <preparation> (pcdata | equipment | step | hyperlink)* </preparation> serving -> <serving> (pcdata | hyperlink)* </serving> notes -> <notes> (pcdata | hyperlink)* </notes>

  8. RBML Grammar equipment -> <equipment> (pcdata | hyperlink)* </equipment> step -> <step> (pcdata | equipment | hyperlink)* </step> ingredient -> <ingredient> (pcdata | quantity | unit | fooditem)* </ingredient> quantity -> <quantity> number | number "or" number | number "and" number </quantity> number -> integer | fraction | integer " " fraction fraction -> integer "/" integer

  9. Recipe Book Markup Language unit -> <unit> pcdata </unit> fooditem -> <fooditem> pcdata </fooditem> blurb -> <blurb> pcdata </blurb> effort -> <effort> pcdata </effort> genre -> <genre> pcdata </genre>

  10. Recipe Book Markup Language preptime -> <preptime> pcdata </preptime> source -> <source> (pcdata | hyperlink)* </source> yield -> <yield> pcdata </yield> hyperlink -> pcdataurl

  11. Recursive Descent Parsing • Match required (literal) symbols • Call procedure to match variable • May itself call similar procedures

  12. Lexical Analysis • Helps prepare for parsing • Uses regular language expressions to • Organize input into multi-symbol chunks • Each chunk has a meaning for parser

More Related