430 likes | 584 Views
Semantic Web Technologies: A Tutorial. Li Ding University of Maryland Baltimore County Joint work with Deborah McGuinness, Tim Finin and Anupam Joshi Presented at Kodak Research Laboratories , Rochester, New York 18 July 2006. The Web has made people smarter. craigslist . Surfing. WWW.
E N D
Semantic Web Technologies:A Tutorial Li Ding University of Maryland Baltimore County Joint work with Deborah McGuinness, Tim Finin and Anupam Joshi Presented at Kodak Research Laboratories, Rochester, New York 18 July 2006
The Web has made people smarter craigslist Surfing WWW Search bag-of-words tagging del.icio.us
Machines still have a very minimal understanding of text and images. tell register But what about machines?
Motivation: machine-friendly data • Natural Language • XML – represent structures • Semantic Web - represent more semantics • represent structures • enable common vocabulary • associate symbols with logic interpretation for inference Li Ding is a person LiDingisasaon as seen by a machine as seen by a person <on>LiDing</on> <person>Li Ding</person> as seen by a person as seen by a machine
Semantic Web Layers Semantic Aspect Web Aspect HTTP "The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.“ – Berners-Lee, Hendler & Lassila, Scientific American, 2001 Image source: http://en.wikipedia.org/wiki/Image:W3c_semantic_web_stack.jpg
The Semantic Web is simple • Each URI denotes a concept • URIs are connected by triples • Machines read data as directed RDF graph Don't say "colour" say <http://example.com/2002/std6#col> RDF (Resource Description Framework) Relational database Source: Tim Berners-Lee, Putting the Web back into Semantic Web, ISWC2005 Keynote
Example: RDF graph and syntax http://xmlns.com/foaf/0.1/name • RDF Graph • URI, Literal, BNode • Triple Li Ding t1 http://www.w3.org/1999/02/22-rdf-syntax-ns#type t2 http://xmlns.com/foaf/0.1/Person The entire graph means: there exist a person whose name is “Li Ding”. <?xml version="1.0" encoding="utf-8"?> <rdf:RDF xmlns:foaf=http://xmlns.com/foaf/0.1/ xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#“> <foaf:Person> <foaf:name>Li Ding</foaf:name> </foaf:Person> </rdf:RDF> • XML • unicode • Namespace • URI as tag Data encoded in RDF/XML syntax Alternative RDF syntax languages: N3(notation 3), N-Triples, Turtle
Example: Surfing RDF graphs G1: http://cs.umbc.edu/~dingli1/foaf.rdf Surf to definition http://cs.umbc.edu/~dingli1/foaf.rdf#dingli foaf:name G3: http://xmlns.com/foaf/1.0/ rdf:type foaf:knows Li Ding foaf:Person wordNet:Agent rdf:type foaf:mbox mailto:finin@umbc.edu rdfs:subClassOf rdfs:seeAlso foaf:Person http://cs.umbc.edu/~finin/foaf.rdf rdf:type rdfs:Class Surf to another instance rdfs:domain foaf:mbox G2: http://cs.umbc.edu/~finin/foaf.rdf foaf:mbox rdf:type mailto:finin@umbc.edu rdf:Property foaf:firstName Tim foaf:surname rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# rdfs: http://www.w3.org/2000/01/rdf-schema# foaf: http://xmlns.com/foaf/1.0/ Finin
Example: Serving human & machine The Original RDF/XML for machines The HTML is generated by applying XSLT on RDF/XML
Ontology Spectrum Thesauri “narrower term” relation space of interest Disjointness, Inverse,part of… Frames (properties) Formal is-a Catalog/ID CYC DB Schema UMLS RDF RDFS DAML Wordnet OO OWL IEEE SUO Formal instance General Logical constraints Value Restriction Terms/ glossary Informal is-a ExpressiveOntologies SimpleTaxonomies Source: Originally by Deborah L. McGuinness (KSL, Stanford), modified by Tim Finin
Ontology Languages: RDFS and OWL • RDFS • Set theory – rdfs:Class • Relation – rdf:Property, rdfs:domain, rdfs:range • Hierarchy – rdfs:subClassOf, rdfs:subPropertyOf • Built-in Datatype – xsd:string, xsd:dataTime • OWL • Description Logic • Class, Thing, Nothing • DatatypeProperty, ObjectProperty, AnnotationProperty,… • Class axioms • oneOf, disjointWith, unionOf, complementOf, intersectionOf … • Restriction, onProperty, cardinality, hasValue… • Property axioms • inverseOf , TransitiveProperty , SymmetricProperty • FunctionalProperty, InverseFunctionalProperty • Equality– equivalentClass , sameAs , differentFrom… • Ontology annotation – Ontology, imports, versionInfo
Example: Inference using ontologies • Ontology Languages (RDFS, OWL) has formal foundations that allow us to infer additional (implicit) statements • RDFS provides basic ones, e.g. sub-class, sub-property, domain • OWL adds many more axioms, e.g. inverse-property, equality, • SWRL (Semantic Web Rule Language) enables a general purposed solution • Supports rule representation • But also requires inference support beyond RDFS and OWL hasbrother rdfs:subPropertyOf hasSibling hasChild owl:inverseOf hasParent hasSibling hasParent #Joe #Louise #Deborah hasBrother hasChild hasUncle SWRL: (x hasParent y) (y hasBrother z) => (x hasUncle z) Source: Semantic Web tutorial (AAAI 2005) by Deborah L. McGuinness
More languages and more ontologies • Languages (require special inference engine) • [Trust/Uncertainty] BayesOWL • [Proof] PML (Proof Markup Language) • [Query/Data Access] SPARQL Query Language for RDF • [Rule] SWRL( Semantic Web Rule Language) • [Policy] REI: A Policy Specification Language • [Service] OWL-S by DAML (1.2 preview available) • [Service] SAWSDL (Semantic Annotations for WSDL) • [Thesauri] SKOS (Simple Knowledge Organization System) • Ontologies (only need RDFS and/or OWL inference) • Upper ontologies - OpenCyc, WordNet, OntoSem, SUO • Specialized common ontologies - FOAF, Dublin Core, RSS • Domain ontologies – bibtex, biology, and many… Li Ding, Pranam Kolari, Zhongli Ding, and Sasikanth Avancha, “Using Ontologies in the Semantic Web: A Survey”, in Ontologies in the Context of Information Systems (book chapter), 2005. http://ebiquity.umbc.edu/paper/html/id/257/
Semantic Web Tools • Pellet (DL) • Racer (DL) • FACT++ (DL) • Jena • JTP • F-OWL • Euler • CWM Editor Online Registry • Protégé • Swoop • DAML Ontology Library • Schema Web Reasoner • Jena (SPARQL) • KAON • Kowari • Seasam • OWLIM • 3store • Instance store • Redland • Tap • RDF store • Yars • IBM IODT • RDFLib • RDF gateway • allegro • Oracle 10 create Search Engine publish inference • Swoogle • Semantic Web Search Managing Ontologies instance Triple store browse Browser update • Tabulator • IsaViz • Piggybank • Arago • Horus • Mspace • Magpie extend integrate • ONION • PROMPT • OntoMapper • Glue • OntoMerge • Ontomorph Mapping Tools source1: http://ebiquity.umbc.edu/paper/html/id/257/Using-Ontologies-in-the-Semantic-Web-A-Survey source2: http://www.wiwiss.fu-berlin.de/suhl/bizer/toolkits/
Semantic Web data sources • Text editor: I write RDF/XML manually. • Semantic Web Editors: Protégé, Swoop • Information Extraction (consumer side) • NLP (hard), e.g. SemNews • heuristic scrapping (regular expr.), e.g. Semagix Freedom • Wrapped database content (publisher side) • blog, social network websites, e.g. livejournal.com • academic interests: http://www.mindswap.org/, http://ebiquity.umbc.edu • Generated by software • creative commons license embedded in HTML • embedded metadata JPEG, PDF (XMP) • agent communication message • …
The Scale of the Semantic Web Statistics based Semantic Web data indexed by Swoogle Estimated number of documents based on Google query
Where the data from • “com” has contributed the largest portion of websites (71%) and pure SWDs (39%) because industry has adopted virtual hosting technology as well as ontologies such as RSS and FOAF • most SWOs are from “org” (46%, e.g. www.w3.org) and “edu” (14%, e.g., spire.umbc.edu) because of the deep interests in developing ontologies from academia and non-profit organizations. SWDs: Semantic Web documents; SWOs: semantic web ontologies; pure SWD: not embeded note: Statistics of top level domain is also used in characterizing the Web (Henziger and Lawrence 2004)
Source websites of SWD Jan 2005- Aug 2005 Jan 2005- Mar 2006 • Invariant found! • The number of websites hosting more than m SWDs follows power law distribution • Similar to the Web • Head: virtual hosting • Tail: crawling strategy
Size of SWD • Embedded SWDs are small • 69% have 3 triples • 96% have <10 triples; • Pure SWDs • 60% have 5 to 1000 triples. • Special size of RSS 130 • 17 triples for channel • 7 triples for each of the 15 items • SWOs • Biased by PML, • Small ones from RDF test • Largest is 1M Number of SWDs Number of SWOs # of triples
Age of SWD • Measured by the last-modified time of SWD • PSWD: Exponential distribution • SWO: flat tail -- ontology development interests decrease?
How Semantic Web Terms are used? • All usage distributions follow Power distribution • Few SWTs been well populated • 371 has >100 class-instance • 1208 has>100 property-instances
Swoogle Rank (citation based) http://www.w3.org/2000/01/rdf-schema indegree=432,984,mean(inflow)=0.039 http://www.w3.org/1999/02/22-rdf-syntax-ns 0.51 1 indegree=1,077,768,mean(inflow)=0.100 0.11 0.10 0.25 2 0.30 0.35 5 0.11 http://purl.org/rss/1.0 http://www.w3.org/2002/07/owl 0.03 indegree=86,959,mean(inflow)=0.069 indegree=270,178,mean(inflow)=0.168 0.18 0.10 0.20 0.16 6 8 0.12 http://web.resource.org/cc 0.43 0.17 indegree=57,066,mean(inflow)=0.195 0.21 0.27 0.27 9 0.07 0.10 4 http://www.w3.org/2001/vcard-rdf/3.0 0.10 0.07 indegree=155,949,mean(inflow)=0.036 0.25 0.12 0.11 0.06 0.23 0.12 0.16 0.05 http://purl.org/dc/elements/1.1 10 0.03 indegree=861,416,mean(inflow)=0.096 7 0.20 http://www.hackcraft.net/bookrdf/vocab/0_1/ http://purl.org/dc/terms 0.08 indegree=16,380,mean(inflow)=0.167 indegree=54,909,mean(inflow)=0.042 0.17 3 http://xmlns.com/foaf/0.1/index.rdf 0.29 indegree=512,790,mean(inflow)=0.217 Computed using Swoogle metadata by May 2006
Report Direct Buy Transactions Report Contract Report Auction Transactions Market Oversight Agent Request CFP Report Travel Package Bid Bid Bulletin Board Agent Auction Service Agent Customer Agent Proposal Direct Buy Travel Agents Web Service Agents TAGA: Travel Agent Game in Agentcities Motivation • Market dynamics • Auction theory (TAC) • Semantic web • Agent collaboration (FIPA & Agentcities) Features • Open Market Framework • Auction Services • OWL message content • OWL Ontologies • Global Agent Community Technologies • FIPA (JADE, April Agent Platform) • Semantic Web (RDF, OWL) • Web (SOAP,WSDL,DAML-S) • Internet (Java Web Start ) Ontologieshttp://taga.umbc.edu/ontologies/ • travel.owl – travel concepts • fipaowl.owl – FIPA content lang. • auction.owl – auction services • tagaql.owl – query language Owl for representation and reasoning Owl for protocol description Owl as a content language Owl for service descriptions FIPA platform infrastructure services, including directory facilitators enhanced to use OWL-S for service discovery http://taga.umbc.edu (offline now)
Semantic Content Publishing http://ebiquity.umbc.edu/person/html/Li/Ding/ • data stored in database • PHP generates both HTML and OWL • HTML pages link to corresponding OWL • no more web scraping http://ebiquity.umbc.edu/person/foaf/Li/Ding/foaf.rdf FOAF PHP PHP Mysql database http://ebiquity.umbc.edu/ -- ebiquity group website
Rei Policy Language • Rei is a declarative policy language for describing policies over actions • Reasons over domain dependent information • Currently represented in OWL + logical variables • Based on deontic concepts • Permission, Prohibition, Obligation, Dispensation • Models speech acts • Delegation, Revocation, Request, Cancel • Meta policies • Priority, modality preference • Policy engineering tools • Reasoner, IDE for Rei policies in Eclipse http://rei.umbc.edu/
Example: enforcing privacy policy • The speaker doesn’t want others to know the specific room that he’s in, but is willing for others to know he’s on campus • He defines the following privacy policy • Share my location with a granularity >= “State” • The broker • isLocated(US) => Yes! • isLocated(Maryland) => Yes! • isLocated(UMBC) => Uncertain.. • isLocated(ITE-RM210) => Uncertain..
Cobra: Context Broker Architecture • Ontology • Agents • Service • Inference • Policy http://cobra.umbc.edu/
Web-scale semantic web data access data access service the Web agent Index RDF data ask (“person”) Search vocabulary Search URIrefs in SW vocabulary inform (“foaf:Person”) Compose query ask (“?x rdf:type foaf:Person”) Search URLs in SWD index Populate RDF database inform (doc URLs) Fetch docs Query local RDF database
Swoogle Semantic Web Search Engine • Harvesting Semantic Web data from the Web • Provide search/navigation services for machines (via REST+ RDF/XML) • Digest doc, term, namespace • Links • Also serves human users • Status • Running since summer 2004 • 1.6M RDF documents, 300M RDF triples, 10K ontologies http://swoogle.umbc.edu/
Ontology Dictionary • From web of document to web of data • Aggregate from multiple sources • Inductively learned definition Onto 1 Onto 2 rdf:type owl:Class foaf:name rdfs:domain foaf:Person foaf:Person foaf:Agent rdfs:subClassOf foaf:name rdfs:domain rdf:type owl:Class wob:hasInstanceDomain foaf:Person wob:hasInstanceDomain foaf:Agent dc:title rdfs:subClassOf SWD3 foaf:name Tim Finin rdf:type foaf:Person dc:title Dr. http://swoogle.umbc.edu/2005/modules.php?name=Ontology_Dictionary
Semantic Web Challenges - Winners 2003 2004 Flink itself is also likely to be unique as a crossover between a social experiment and a semantic application. CS AKTive Space (CAS) is an integrated Semantic Web application which provides a way to explore the UK Computer Science Research domain across multiple dimensions for multiple stakeholders, from funding agencies to individual researchers. 2005 CONFOTO is a browsing and annotation service for conference photos. http://challenge.semanticweb.org/
Triple Shop: SPARQL dataset finder Who knows Anupam Joshi? Show me their names, email address and pictures 1. Compose a SPARQL query without FROM clause 2. Parse SPARQL query, search Swoogle for related URLs, and compose a dataset 3. Run SPARQL query on dataset http://sparql.cs.umbc.edu/tripleshop2/
Integrating Social Networks FOAF Network Reputation Systems data • FOAF • knows RDF • RDF/XML • DBLP • Coauthor Database • HTML • Trust • Reputation • Trust network Computation • Entity mapping • Tie strength • Trust aggregation J. Golbeck source Google PageRank knows Citeseer Rank L. Ding J. Hendler H. Chen P. Kolari knows knows F. Perich T. Finin A. Joshi Kagal Golbeck’s Trust Network hub sink island sameName Y. Peng L. Ding co-author 6 1 28 A. Sheth L. Kagal T. Finin A. Joshi 1 5 M. P. Singh H. Chen F. Perich DBLP Coauthor Network
Inference Web Infrastructure WWW Toolkit Trust computation IWTrust OWL-S/BPEL SDS (DAML/SNRC) Proof Markup Language (PML) End-user friendly visualization IW Explainer/ Abstractor N3 CWM (TAMI) Expert friendly Visualization Trust KIF JTP (DAML/NIMD) IWBrowser search engine based publishing Justification SPARK-L SPARK (CALO) IWSearch Provenance provenance registration Text Analytics IWBase UIMA (NIMD/Exp Agg) [Inference Web] Framework for explaining question answering tasks by abstracting, storing, exchanging, combining, annotating, filtering, segmenting, comparing, and rendering proofs and proof fragments provided by question answerers.
PML: Proof Markup Langauge isQueryFor IWBase Question foo:question1 (what is Tony’s Specialty) Query foo:query1 (type TonysSpecialty ?x) hasAnswer hasLanguage Justification Trace NodeSet foo:ns1 (hasConclusion …) Language hasInferencEngine fromQuery isConsequentOf InferenceEngine InferenceStep hasRule InferenceRule hasAntecendent Source NodeSet foo:ns2 (hasConclusion …) … hasVariableMapping Mapping isConsequentOf fromAnswer hasSourceUsage hasSource SourceUsage InferenceStep usageTime …
Tracking Provenance via RDF Molecule decompose The graph’s RDF molecules An RDF graph G http://www.cs.umbc.edu/~dingli1 t1 foaf:knows t2 foaf:name t1 Li Ding foaf:name t2 t3 t4 Tim Finin foaf:mbox t3 t4 t3 mailto:finin@umbc.edu Match sub-Graph Web pages containing one or more molecules discovered by Swoogle Ding, L.; Finin, T.; Peng, Y.; Pinheiro da Silva, P.; McGuinness, D.L. Tracking RDF Graph Provenance using RDF Molecules. Proceedings of the Fourth International Semantic Web Conference (poster), November 2005. 2005 , http://www-ksl.stanford.edu/KSL_Abstracts/KSL-05-06.html
Conclusion • The Semantic Web • simple but powerful • Standardized by W3C: RDF, RDFS, OWL • Current focuses • Query -- SPARQL • Rules – SWRL, RIF • Web services – OWL-S, WSDL-S, SAWSDL • Best practice and deployment • but cannot do everything • Open questions • Business model, Industry adoption? • Privacy?
Recommended Readings • Tutorials • Semantic Web Road map, (since 1998), Tim Berners-Lee • The Semantic Web, Scientific American, May 2001, Tim Berners-Lee, James Hendler and Ora Lassila • Ontology Development 101: A Guide to Creating Your First Ontology, 2001, Natalya F. Noy and Deborah L. McGuinness • Semantic Web Tutorials, http://www.w3.org/2001/sw/BestPractices/Tutorials • Starting points • W3C Semantic Web activity, http://www.w3.org/2001/sw/ • W3C Semantic Web Interest Group, http://www.w3.org/2001/sw/interest/ • W3C Semantic Web News, http://www.w3.org/2001/sw/news • Planet RDF - aggregated blogs, http://planetrdf.com/ • Dave Beckett’s Resource Description Framework (RDF) Resource Guide • Swoogle Semantic Web Search Engine, http://swoogle.umbc.edu • Semantic Web reference card, http://ebiquity.umbc.edu/resource/html/id/94/ • Conferences and Journals • International Semantic Web Conference (ISWC) • European Semantic Web Conference (ESWC) • Semantic Technology Conference (SemTech) • Journal of Web Semantics
Ongoing W3C’s Semantic Web Activity • RDF Data Access Working Group • RDQL…=> SPARQL • Rules Interchange Working Group • RuleML => SWRL=> RIF • Best Practices Working Group • Vocabulary management, e.g. WordNet • Thesauri– SKOS (Simple Knowledge Organization System) • Image Annotation • DOAP (Description of a Project) • Many tutorials and demos • Semantic Annotations for Web Services Description Language Working Group • OWL-S and WSDL-S • WSDL 2.0