200 likes | 225 Views
Explore complexities of regular expressions, containment and equivalence in the context of FOk. Understand Regular Ranked and Un-Ranked Tree Languages, DTDs, and more. Discover the significance of FO restrictions and extensions.
E N D
Finite Model TheoryLecture 12 Regular Expressions, FOk
Outline • The paper • FOk
Background • XML = unranked trees • XML Schema = wants to be a regular language of unranked trees • Several official proposals: • DTD, XSchema - baroque and bad • Several counter proposals • Relax NG • Often arbtrary restrictions are imposed on the RE’s, claiming “efficiency”
The Problems • Containment: E1µ E2 • Equivalence: E1 = E2 • Intersection: E1Å E2 = ; • In class: What is their complexity for all regular expressions ?
Complexities • E1µ E2: PSPACE complete • E1 = E2: PSPACE complete • E1Å E2 = ;: PTIME
Restricted REs • Paper claims (and is right) that in practice the DTDs or XSchema use “simple” regular expressions • What is “simple” ? Open to debate, but paper makes the following proposals
Restricted RE’s Symbol s = a letter or a word Notation a or w Possibly followed by ? or *. Notation: a?, a*, w?, w* Factor f = s | s | . . . | s Notation: s or +s (e.g. (+w*) or (+a*)? or (w*)? ) Possibly followed by ? or * Simple RE = f.f….f Notation: RE(f1, f2, …, fk) where f1, …, fk are the kinds of factors allowed
Simple RE Examples • RE(a,a*): • Name.Address.Phone*.Email • RE(a,a?,a*): • Name.Email?.Address?.Email*.Phone?.Email • RE((+a),a*) • Name.(Email | Phone).Address*.Email*
Containment • RE(a?, (+a)*) in PTIME [1] • RE(a, S, S*) in PTIME [17] • RE(a, a*) or RE(a, a?) coNP hard • WOW ! • Others in the paper
Regular Ranked Tree Languages Background: • Given two ranked-tree automata A1, A2, checking L(A1) µ L(A2) is EXPTIME complete • Note: if A_2 is deterministic, then one can check containment in time |A_1| |A_2|
Regular Un-Ranked Tree Languages • DTDs and XML Schemas are unranked tree languages • No big deal: easy to encode unraked trees into ranked trees [show in class] • Still, lots of papers out there that re-invent regular languages for unranked trees
DTD’s • Given alphabet S • A DTD is a set of expressions:s := E where E is a “regular expression” • Example: root := person* person := name,project?,email*,(address|contact) project := name, project* • A tree T satisfies the DTD iff it is a derivation tree
DTDs • Strictly weaker than regular tree languages on unranked trees [why ?] • Lots of ways to extend them; most popular in the theory community: specialized DTDs
“Specialized” DTDs • Given two alphabets S, S’ • A specialized DTD is a set of expressions:s‘ := E’ where E’ is a “regular expression”and a mapping m : S’ !S • Example: root := (person|project)* person := name1,phone project := name2,cost name1 := firstName, lastName name2 := internalName, publicName
Single Type SDTDs • One more restriction: if E’ contains two occurrences of s2S, then they have the same “type”. • Formally: there are no two occurrences of s1’ and s2’ in any regular expression E’ s.t. m(s1’) = m(s2’) • The XML Schema standard has such a requirement
The Paper The main result is that the following have the same complexities: • Inclusion for a class R of RE’s • Inclusion for a class of DTD’s restricted to R • Inclusion for a class of single-type SDTDs restricted to R BUT not for SDTD’s over R
FOk • Is FO restricted to only k variables: x1, …, xk • What can we express here ? • Try this in FO3: • There exists a path of length 10 from u to v[in class]
Why we care: 1 The combined complexity of query evaluation: • Given A, f, decide whether A ²f • What is the complexity of:{(A,f) | A 2 STRUCT[s], f2 FO}{(A,f) | A 2 STRUCT[s], f2 FOk}
Why we care: 2 Satsfiability • Given f, decide if 9 A s.t. A ²f • Undecidable for FO (Trakhtenbrot) • Decidable for FO2 (WOW !) • Undecidable for FO3 (Hmm….)
Why we care: 3 • Extensions of FO: • LFP, IFP, PFP, TC, you-name-it, … • All are expressible in infinitary FO, L1,w: • Allowed to take infinite conjuctions/disjunctions:Çi 2 Ifi or Æi 2 Ifi [why ?] • But L1,w is boring… • All are expressible in [k ¸ 0 Lk1,w= Lw1,w