90 likes | 332 Views
DTD/XSD 中确定型正则表达式的研究 冯晓强 导师:陈海明. DTD/XSD 确定型正则表达式 真实 数据中的确定性表达式研究 实验室科研生活. DTD/XSD. Definition 1. A DTD is a pair(d, s) where d is a function that maps -symbols to regular expressions over , and s is the start symbol.
E N D
DTD/XSD中确定型正则表达式的研究 冯晓强 导师:陈海明
DTD/XSD • 确定型正则表达式 • 真实数据中的确定性表达式研究 • 实验室科研生活
DTD/XSD Definition 1. A DTD is a pair(d, s) where d is a function that maps -symbols to regular expressions over , and s is the start symbol. A tree satisfies the DTD if its root is labeled by s and for every node u with label a, the sequence … of labels of its children matches the regular expression d(a). The class of tree languages definable by DTDs is referred as the local tree languages. A simple example is the following: store dvddvd* dvd title price
DTD/XSD Definition 2. A specialized DTD(SDTD) is a 4-tuple(, , , ), where is an alphabet of types, is a DTD over and is a mapping from to . Note that can be applied to a -tree as a relabeling of the nodes, thus yielding a -tree. A -tree t satisfies the SDTD if t can be written as (t’) where t’ satisfies the DTD . As SDTDs are equivalent to unranked tree automata, the class of tree languages definable by SDTDs is the class of regular tree languages. For ease of exposition, we always take ={| 1 <= i <= k, a , i} for some natural numbers k and set () = a.
DTD/XSD Definition 3. A single-type SDTD is an SDTD (, , (d, s), ) with the property that no regular expression d(a) has occurrences of types of the form and with the same b but different i and j. The class of tree languages definable by single-type SDTDs is the class of single-type tree languages. It is strictly between the local and the regular tree languages. An example of a single-type grammar is given below: store regulars discounts regular ()* discounts ()* title price title price discount
确定型正则表达式 定义: 一个正则表达式E是确定型的(deterministic),当且仅当对所 有标号的字u ,v ,w 以及所有的符号x , y 有如下条件: uxv, uyw∈ L(E’) , x ≠ y → x’ ≠ y’ 例子: 非确定型表达式:a*a 标号表达式:a1*a2 标号的句子:a1a2,a2 a1 ≠ a2, a1’ = a2’. 不满足定义
实际数据中的确定性表达式研究 W3C规定DTD/XSD中的正则表达式必须是确定型的,但开发 人员往往会忽略这一点,所以可以设计实验检验实际中用到的 DTD/XSD中的正则表达式是否符合规范。 实验方案: 抓取实际中的DTD/XSD; 抽取正则表达式; 检验正则表达式是否是确定型的; 检验一些其他性质…
实验室科研生活 • 辛苦的科研 • 健康的身体 • 愉快的生活
Questions ? Thanks!