460 likes | 477 Views
Explore how the Semantic Web can solve complex queries, locate information, handle travel inquiries, analyze human genome experiments, delegate tasks to web agents, and book holidays.
E N D
Problemele care după părerea liu Tim Berners-Lee pot fi rezolvate de programele pe internet • Complex queries involving background knowledge • Find information about “animals that use sonar but are not either bats, dolphins or whales” • Locating information in data repositories • Travel enquiries • Prices of goods and services • Results of human genome experiments • Delegating complex tasks to web “agents” • Book me a holiday next weekend somewhere warm, not too far away, and where they speak French or English
“... a goal of the Web was that, if the interaction between person and hypertext could be so intuitive that the machine-readable information space gave an accurate representation of the state of people's thoughts, interactions, and work patterns, then machine analysis could become a very powerful management tool, seeing patterns in our work and facilitating our working together through the typical problems which beset the management of large organizations.”
Webul la momentul dat este considerat sintactic dar nu semantic
Ce vede calculatorul într-o pagină web WWW2002 The eleventh international world wide web conference Sheraton waikiki hotel Honolulu, hawaii, USA 7-11 may 2002 1 location 5 days learn interact Registered participants coming from australia, canada, chile denmark, france, germany, ghana, hong kong, india, ireland, italy, japan, malta, new zealand, the netherlands, norway, singapore, switzerland, the united kingdom, the united states, vietnam, zaire
Limitations of the Web today Machine-to-human, not machine-to-machine
„The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work inco-operation.“ [Berners-Lee et al., 2001]
Semantic Web layer cake 2000 • This is the specification, W3C recommendation date: 16-05-2002, of the Platform for Privacy Preferences (P3P). • CC/PP stands for Composite Capabilities/Preferences Profiles, and is a way to specify what exactly a user agent (web browser) is capable of doing. This allows for sophisticated content negotiation techniques between web servers and clients, to produce optimized XML-based markup for display and use on a wide variety of web user agents.
Elementele necesare pentru realizarea funcționalității descrise • 1. Un limbaj cu sintaxa optimală pentru reprezentarea cunoștințelor; • 2. Un limbaj de descriere a ontologiilor; • 3. Un limbaj de descriere a serviciilor web; • 4. Instrumente de creare, editare, vizualizare a documentelor Semantic Web; • 5. Un limbaj de interogare la cunoștințele codificate; • 6. Deducție logică în baza cunoștințelor; • 7. Motor de căutare semantic.
Elementele necesare pentru realizarea funcționalității descrise • 1. Un limbaj cu sintaxa optimală pentru reprezentarea cunoștințelor (RDF); • 2. Un limbaj de descriere a ontologiilor (ОWL); • 3. Un limbaj de descriere a serviciilor web (WSDL, OWL-S); • 4. Instrumente de creare, editare, vizualizare a documentelor Semantic Web (Jena, Haystack, Protege); • 5. Un limbaj de interogare la cunoștințele codificate cu ajutorul RDF (SPARQL); • 6. Deducție logică în baza cunoștințelor (se discută); • 7. Motor de căutare semantic (de ex., SHOE).
IRI- Internationalized Resource Identifier (localizator internațional de resurse), este o generalizare al URIeste creat pentru identificarea unică a resurseor în web semantic utilizînd oricare limbă, nu doar engleza. • Unicodeeste utilizat pentru reprezentarea textului în orice limbă. • URI - identificator uniform de resurse (Uniform Resource Identifier) este o secvență alfanumerică univocă și universală a unei resurse de pe Internet, cum ar fi un document sau un sit web. Deseori URI-ul unei resurse este identic cu URL-ul ei (localizator uniform de resurse), formă timpurie a identificatorului URI.
XMLest un limbaj de marcare pentru crearea documentelor structurate. Web Semantic adaugă sens(meaning, semantics) în documentele structurate. XML Namespacespermite utilizarea seturilor de marcare din mai multe surse. WebSemantic unește datele din surse diferite într-un document și are nevoie de indicarea sursei marcării.
Conflict! Spații de nume XML Spatiu de nume- vocabular utilizat pentru calificarea elementelor in mod unic <tutorial ident=“03”> <title>XML, cuceritorul</title> <year>2001</year> </tutorial> <student> <name>Stefan Tanasa</name> <year>4</year> </student>
<?xml version=“1.0“?> <webxmlns:b="urn:infoiasi.ro:busaco-ns"> <b:tutorialb:ident=“03"><title>XML, cuceritorul</title><b:year>2001</b:year> <b:desc> <h2 xmlns=“http://www.w3.org/TR/REC-html40”> Un <i>tutorial</i> despre XML</h2></b:desc> </b:tutorial> </web>
<name>WWW2002 The eleventh international world wide webcon</name> <location>Sheraton waikiki hotel Honolulu, hawaii, USA</location> <date>7-11 may 2002</date> <slogan>1 location 5 days learn interact</slogan> <participants>Registered participants coming from australia, canada, chile denmark, france, germany, ghana, hong kong, india, ireland, italy, japan, malta, new zealand, the netherlands, norway, singapore, switzerland, the united kingdom, the united states, vietnam, zaire</participants>
Middle layers contain technologies standardized by W3C to enable building semantic web applications. • Resource Description Framework (RDF) este o metodologie de a reprezenta unități de informație în formă de așa-numite tripluri. Aceasta permite reprezentarea informației despre resurse web în formă de graf. Web semantic uneori este numit Giant Global Graph. • RDF Schema (RDFS) descrie vocabularul de bazăpentru RDF. RDFS permite crearea ierarhiei claselor și a proprietăților lor.
Graful cartei de vizită a fondatorului WikipediaJimmy Donal Wales
Ontologia (Ontology) este ramură a filozofiei care studiază trăsăturile generale ale existenţei (definiţia din DEX).
În Wikipedia apare și definiția ontologiei ca termenul din domeniul informaticii. “An ontology is a specification of a conceptualization” (Gruber 1993) Ontologia este o descriere explicită a conceptelor din cadrul unui domeniu şi a relaţiilor dintre acestea.
Sharing common understanding of the structure of information among people or software agents is one of the more common goals in developing ontologies.
OWL • OWL = Web Ontology Language • Owl’s superior intelligence is known throughout the Hundred Acre Wood, as are his talents for Writing, Spelling, other Educated and Special tasks. • "My spelling is Wobbly. It's good spelling, but it Wobbles, and the letters get in the wrong places."
Trei versiuni de OWL • OWL Lite. • OWL DL (descriptive logic). • OWL Full.
equivalentClass equivalentProperty sameAs differentFrom AllDifferent distinctMembers ObjectProperty DatatypeProperty inverseOf TransitiveProperty SymmetricProperty Class (Thing, Nothing) rdfs:subClassOf rdf:Property rdfs:subPropertyOf rdfs:domain rdfs:range Individual Vocabularul OWL Lite
oneOf, dataRange disjointWith equivalentClass unionOf complementOf intersectionOf minCardinality maxCardinality cardinality hasValue Vocabularul OWL DL
SPARQL • SPARQL is a RDF query language - it can be used to query any RDF-based data (i.e., including statements involving RDFS and OWL). Querying language is necessary to retrieve information for semantic web applications.
SPARQL Exemplu: PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name ?email WHERE { ?person a foaf:Person. ?person foaf:name ?name. ?person foaf:mbox ?email. }
SPARQL Exemplu: PREFIX abc: <http://example.com/exampleOntology#> SELECT ?capital ?country WHERE { ?x abc:cityname ?capital ; abc:isCapitalOf ?y . ?y abc:countryname ?country ; abc:isInContinent abc:Africa . }
SPARQL SPARQL definește patru tipuri de interogări: SELECT query Este utilizat pentru extragerea datelor din sursa interogată, rezultatele se prezintă în formă de tabel. CONSTRUCT query Este utilizat pentru extragerea informației din sursa interogată, rezultatele se transform în RDF valid. ASK query Este utilizat pentru extragerea unui răspuns simplu da/nu din sursa interogată. DESCRIBE query Este utilizat pentru extragerea RDF grafului din sursa interogată. Fiecare din interogările date poate să conțină bloc WHERE pentru precizarea interogării.
Unrealized Semantic Web technologies • Top layers contain technologies that are not yet standardized or contain just ideas that should be implemented in order to realize Semantic Web. • RIF or SWRL will bring support of rules. This is important for example to allow describing relations that cannot be directly described using description logic used in OWL. • Cryptography is important to ensure and verify that semantic web statements are coming from trusted source. This can be achieved by appropriate digital signature of RDF statements. • Trust to derived statements will be supported by (a) verifying that the premises come from trusted source and by (b) relying on formal logic during deriving new information. • User interface is the final layer that will enable humans to use semantic web applications.
AI is one of the contributing disciplines for Semantic Web building. AI has already given us functional and logicprogramming methods, ways to understanddistributed systems, pattern detection anddata mining tools, approaches to inference,ontological engineering and knowledge representation.
Active Working Groups at the W3C: • Semantic Web Coordination Group • Rules Interchange Format Working Group • RDB2RDF Working Group • RDFa Working Group • SPARQL Working Group • Health Care and Life Sciences Interest Group • Semantic Web Interest Group
Challenges • Vastness: The World Wide Web contains at least 24 billion pages as of this writing (June 13, 2010). Any automated reasoning system will have to deal with truly huge inputs. • Vagueness: These are imprecise concepts like "young" or "tall". Fuzzy logic is the most common technique for dealing with vagueness. • Uncertainty: These are precise concepts with uncertain values. For example, a patient might present a set of symptoms which correspond to a number of different distinct diagnoses each with a different probability. Probabilistic reasoning techniques are generally employed to address uncertainty. • Inconsistency: These are logical contradictions which will inevitably arise during the development of large ontologies, and when ontologies from separate sources are combined. Deductive reasoning fails catastrophically when faced with inconsistency, because "anything follows from a contradiction". Defeasible reasoning and paraconsistent reasoning are two techniques which can be employed to deal with inconsistency. • Deceit: This is when the producer of the information is intentionally misleading the consumer of the information. Cryptography techniques are currently utilized to alleviate this threat.