1.1k likes | 1.47k Views
语义网的逻辑基础 Logical Foundation of the S emantic Web. 主讲: 黄智生 Zhisheng Huang Vrije University Amsterdam, The Netherlands huang@cs.vu.nl 助教: 胡伟 Wei Hu Southeast University whu@seu.edu.cn . 课程时间表 Schedule. 讲座4:语义网与逻辑 Lecture 4: The Semantic Web and its Logics. 语义网的基本思想 RDF/RDFS OWL 语言
E N D
语义网的逻辑基础Logical Foundation of the Semantic Web 主讲: 黄智生 Zhisheng Huang Vrije University Amsterdam, The Netherlands huang@cs.vu.nl 助教: 胡伟 Wei Hu Southeast University whu@seu.edu.cn
讲座4:语义网与逻辑Lecture 4: The Semantic Web and its Logics • 语义网的基本思想 • RDF/RDFS • OWL语言 • OWL-DL及其与描述逻辑的关系
我们能不能做得更好?Can we do it better? • 基于语义的搜索Semantics-based search • 概念组合描述 concept combination specification • 指定特定领域 domain specific • 逼近搜索 approximate search • 搜索代理 search agent
语义网(Semantic Web) • 核心思想:给网络信息赋于确切定义的意义, 即语义。 • „ • The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work inco-operation.“[Berners-Lee et al., 2001]
语义网想做什么?(What the Semantic Web wants to do) • 机器可自动处理 • 机器可理解 Content is machine-understandable if it is bound to some formal description of itself (i.e. metadata).
对网络3.0的期待Expectations on Web3.0 从字面上看对Web3.0的特征期待: • 新颖性(Novelty): 它不同于已有的Web1.0和Web2.0的技术,它能提供全新的一代网络服务模式(即为什么不是Web1.0或Web2.0) • 可行性(Achievability):它在现有的网络环境下,经过努力是可能实现的, 它并不存在不可逾越的技术障碍(即为什么不是Web4.0或更高)。 • 迫切性(Urgency):它提供网络服务是当前社会迫切需要的,它的技术引入是能够对社会产生重大影响。(即为什么只能是Web3.0)
网络1.0 – 网络2.0 – 网络3.0Web1.0 – Web2.0 – Web3.0 • 网络1.0: 文件网 • Web1.0: Web of documents • 网络2.0: 人际/社会网 • Web2.0: Web of persons • 网络3.0: 数据网 • Web3.0: Web of data (semantics)
语义联接的好处:从一个实例说起Advantages of Linked Data
数据联接的好处:小结 • 现有的网页是供人们阅读的,不便于机器自动处理,数据联接便于机器自动处理 • 文件联接在局部文字上只允许一个链接,而数据联接对局部文字支持多重链接 • 文件联接只提供部分文字链接,而数据联接保证全文链接 • 基于关键词的搜索引擎如Google虽然看起来支持全文检索,但它不能区分同一个词的不同含义,这对于人名,地名等重复性频率较高的问题领域处理尤其困难,而且在许多具体应用领域一词多义的情形比比皆是。
数据联接的统一概念格式 • 三元组(Triple)方法: <subject, predicate, object> 例子:<zhishengHuang, isStaffof, VrijeUnivAm> • 提供网络资源的描述能力 例子:<http://wasp.cs.vu.nl/~huang, isStaffof, http://www.vu.nl> • 提供语义的唯一标识 • 让数据内容独立于表达形式 • 提供初步的语义推理能力
为什么推理支持是必要的? 例子:从ZhishengHuang是自由大学的雇员和自由大学在阿姆斯特丹,能够推出ZhishengHuang在阿姆斯特丹工作。 <ZhishengHuang, isStaffof, VrijeUnivAm> <VrijeUniv, inCity,Amsterdam>, <?x, isStaffof, ?y>,<?y,inCity,?z> -><?x,worksin,?z> =》<ZhishengHuang, worksin, Amsterdam>
五句话介绍语义网的主要思想:Why the Semantic Web? • 任任何信息系统都需要数据; • 数数据表示要独立于具体的应用和平台,以保证最大程度地可重用; • 采用统一的数据概念表示以保证数据表示独立于具体系统(即可采用Triple/Tuple形式); • 数数据应能描述网络资源(即要采用RDF/RDFS或其他类似的语言) • 数数据应提供初步的推理支持(即要采用OWL或其他知识表示语言) • (注意;RDF/RDFS/OWL均采用Triple语义模型)
发展趋势 根据美国著名市场研究公司Gartner的2007五月份报告, 到2012年,70%的公开网页将带有一定程度的语义标注,20%将使用更强的基于语义网的本体。 Gartner (May 2007): "By 2012, 70% of public Web pages will have some level of semantic markup, 20% will use more extensive Semantic Web-based ontologies”
海量语义数据的一部分Ontologies and Metadata: Billion Triples dataset(十亿三元组数据集) • 雅虎数据 • 东南大学数据 • 马里兰大学 • 英国open大学 • SemWebBase • (DERI) • 维基百科 • 地理名字 • 出版物 • 英文语义词典 • Freebase • 美国政府数据
一个具体的数据联接的实例http://sindice.com/apiv2/search?q=%22zhisheng%20huang%22&format=atom&page=1&qt=term一个具体的数据联接的实例http://sindice.com/apiv2/search?q=%22zhisheng%20huang%22&format=atom&page=1&qt=term
一个具体的数据联接的实例http://sindice.com/apiv2/search?q=%22zhisheng%20huang%22&format=atom&page=5&qt=term一个具体的数据联接的实例http://sindice.com/apiv2/search?q=%22zhisheng%20huang%22&format=atom&page=5&qt=term
More about the Semantic Web • 请见8月29日 星期六的导课 • 09:00-12:00导课1:Introduction to the Semantic Web(Ivan Herman)
HTML标识(HTML Markup) …… <h2>Zhisheng Huang</h2> <b>Affiliation</b>: Department of Computer Science<br> Faculty of Sciences<br> Vrije University Amsterdam<p> <b>Email</b>: huang @ cs.vu.nl<br> <b>Phone</b>: 31-20-4447740(office) …… </html>
XML标注 XML-Annotations <researcher><name>Zhisheng Huang</name> <affiliation> <department>Department of Computer Science</department> <faculty>Faculty of Sciences</faculty> <university>Vrije University Amsterdam</university> </affiliation> <email>huang @ cs.vu.nl</email> <phone id=“office”> (31)-20-4447740</phone> ……</researcher> </html>
Data Structures • 结构化数据Structured Data: • Database • 半结构化数据Semi-structured Data: • HTML, XML, BibTex • 非结构化数据Non-structured Data: • Text
关系数据库的XML表示XML representation of a relational database <group name=“AI”> <member id=“001”> <name>John</name> <phone>1234567</phone> </member> <member id=“002”> <name>Mary</name> <phone>7654321</phone> </member> ….. </group> AI group
文件类型定义Document Type Definition(DTD) <!DOCTYPE researcher [ <!ELEMENT researcher (name, affiliation, email, phone)> <!ELEMENT name (#PCDATA)> <!ELEMENT email (#PCDATA)> <!ELEMENT phone (#PCDATA)> <!ATTLIST phone id CDATA #REQUIRED > <!ELEMENT affiliation (department, faculty, university)> … ]>
数据模型Data Model Name Department has n 1 Phone Researcher Affiliation Faculty eMail University
XML模式XML Schema • The purpose of an XML Schema is to define the legal building blocks of an XML document, just like a DTD.
Why XML Schemas • XML Schemas are extensible to future additions • XML Schemas are richer and more useful than DTDs • XML Schemas are written in XML • XML Schemas support data types • XML Schemas support namespaces
名字冲突Name Conflicts • Since element names in XML are not fixed, very often a name conflict will occur when two different documents use the same names describing two different types of elements. • If these two XML documents were added together, there would be an element name conflict because both documents contain a same element with different content and definition.
XML名字空间XML NameSpace • Using Namespaces to solve Name Conflicts Examples: • xmlns:namespace prefix="namespace" • xmlns:xsd="http://www.w3.org/2001/XMLSchema"
可扩展标识语言模式 XML Schema <xsd:element name="reseracher"> <xsd:complexType> <xsd:element name="name" type="xsd:String"/> <xsd:element name="affiliation" type="affil" minOccurs="1" maxOccurs="unbounded"/> <xsd:element name="phone" type="xsd:String"/> <xsd:element name="email" type="xsd:String"/> </xsd:complexType> </xsd:element> <xsd:complexType name="affil"> <xsd:element name= " department" type="xsd:String"/> <xsd:element name= " faculty" type="xsd:String"/> <xsd:element name="university" type="xsd:String"/> </xsd:complexType>
资源描述框架Resource Description Framework(RDF) Triple: T(subject, attribute, values) Creator Zhisheng http://wasp.cs.vu.nl/sekt/dig/dig.pdf Cees Creator • Metadata is machine understandable information about web resources or anything that has an URI, it is represented as a set of independent assertions: <rdf:Description about="http://wasp.cs.vu.nl/sekt/dig/dig.pdf"> <dc:Creator rdf:ressource="http://www.cs.vu.nl/~huang"/> <dc:Creator rdf:ressource="mailto:ctv@cs.vu.nl"/> </rdf:Description>
RDF: Dublin Core • The Dublin Core provides properties for describing network objects, suitable for use by network search engines. • The Dublin Core is a set of predefined properties for describing documents. • The first Dublin Core properties were defined at the Metadata Workshop in Dublin, Ohio in 1995 and is currently maintained by the Dublin Core Metadata Initiative.
Dublin Core Metadata Initiative • The Dublin Core Metadata Initiative is an open forum engaged in the development of interoperable online metadata standards that support a broad range of purposes and business models. • http://dublincore.org/
Annotating Metadata <rdf:Description rdf:about=…dc-rdf/"> <dc:title> Guidance on expressing the Dublin Core within the Resource Description Framework (RDF) </dc:title> <dc:creator> Eric Miller </dc:creator> <dc:creator> Paul Miller </dc:creator> <dc:creator> Dan Brickley </dc:creator> <dc:subject> Dublin Core; RDF; XML </dc:subject> <dc:publisher> Dublin Core Metadata Initiative </dc:publisher> <dc:contributor> Dublin Core Data Model Working Group </dc:contributor> <dc:date> 1999-07-01 </dc:date> <dc:format> text/html </dc:format> <dc:language> en </dc:language> </rdf:Description>
资源描述框架模式RDF Schema (RDFS) • RDFS defines vocabulary for RDF • Organizes this vocabulary in a typed hierarchy • Class, subClassOf, type • Property, subPropertyOf • domain, range
RDFS Person subClassOf subClassOf range domain PhDStudent Professor hasSuperVisor type type Prof. Qu Hu,W
Using A Blank Node • Here the blank node stands for the concept of "John Smith's address".
Blank Node Identifiers • Blank nodes must have a name for triple usage. • Blank node identifiers have the form _:name • exstaff:85740 exterms:address _:johnaddress . • _:johnaddress exterms:street"1501 Grant Avenue" . • _:johnaddress exterms:city "Bedford" . • _:johnaddress exterms:state "Massachusetts" . • _:johnaddress exterms:zip"01730" . • If a node in a graph needs to be referenced from outside this context, a URIref is required. • Blank nodes make binary relationships out of an n-ary one (between John and the street, city, etc.).
资源描述框架模式RDF Schema (RDFS) • RDFS defines vocabulary for RDF • Organizes this vocabulary in a typed hierarchy • Class, subClassOf, type • Property, subPropertyOf • domain, range
4. Other RDF Capabilities • Containers • Collections • Reification • Structured Values