1.03k likes | 1.23k Views
IoT Semantics. IoT Semantics. IoT Cloud Cloud = utility computing IoT resource sharing Via IoT services IoT data sharing. Global interoperability. Issues - Service/Data discovery - how to know where things are - Service/Data representation - to find things, first need to
E N D
IoT Semantics • IoT Cloud • Cloud = utility computing • IoT resource sharing • Via IoT services • IoT data sharing Global interoperability Issues - Service/Data discovery - how to know where things are - Service/Data representation - to find things, first need to specify them - the better the semantic model, the easier for matchmaking - Knowledge extraction - Storage
IoT Semantics • IoT Cloud • Requires semantic models for IoT services and data • Semantic annotation • Semantic description models • Basics: XML, RDF, OWL • Data: SSN, Linked data, … • Services: WSDL, OWL-S, WSMO, … • Supporting technologies • Semantic technologies: RDF/SPARQL, Jena • Hyper/CAT (categorization) • Infrastructure for discovery: ICN, UDDI, semantic reasoning, similarity reasoning, peer-to-peer • Reasoning (matchmaking) by semantics, by similarity, … Raw sensor data What is it? Needs semantic annotation
Semantics Annotation Example of Semantic Annotation
Semantic Annotation • Issues • Dynamics in semantic descriptions • E.g., temperature, time, accuracy, all may change along time • Automation • Semantic annotation can be learnt and created automatically • Manual effect is time consuming and tedious • Automated semantic interpretation of sensor data
Web Semantics • XML • Labeled content • RDF • URI • Uniform resource identifier • Linked data • OWL • SPARQL, Jena
module title lecturer students name weblink Web Semantics • XML • Labeled content • Meaning of XML documents is intuitively clear due to the "semantic" mark-up • But, the label and content do not have semantic interpretations for the machines <module date=“...”> <title>...</title> <lecturer> <name>...</name> <weblink>...</weblink> </lecturer> <students>...</students></module>
RDF • Resource description framework (RDF) • Provide relationship between concepts • Expressed as triples • <subject, predicate, object> or <object, attribute, value> • “The Old Man and the Sea”, writtenBy, “Ernest Hemingway” • Query the knowledge base • SPARQL, RQL • select … from … where … • E.g., select x, y from {x} writtenBy {y} • Still, same as XML, subjects, objects, and predicates cannot be interpreted by the computer
URI • Everything has an URI • For subjects, objects, as well as predicates • RDF with URI: an example BASE <http://example.org/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX schema: <http://schema.org/> PREFIX dcterms: <http://purl.org/dc/terms/> PREFIX wd: <http://www.wikidata.org/entity/> <bob#me> a foaf:Person ; foaf:knows <alice#me> ; schema:birthDate "1990-07-04"^^xsd:date ; foaf:topic_interest wd:Q12418 . wd:Q12418 dcterms:title "Mona Lisa" ; dcterms:creator <http://dbpedia.org/resource/Leonardo_da_Vinci> . <http://data.europeana.eu/item/04802/243FA8618938F4117025F17A8B813C5F9AA4D619> dcterms:subject wd:Q12418 . http://dbpedia.org/resource/Leonardo_da_Vinci (Thing) http://dbpedia.org/data/Leonardo_da_Vinci (RDF def) http://dbpedia.org/page/Leonardo_da_Vinci (html page)
Linked Data • Linked data • A method for publishing structured data so that it can be interlinked and can offer semantic meaning • Built on HTTP, RDF, and URIs • By World Wide Web Consortium (W3C), Semantic Web project • Web and HTML offers linked documents • The link does not have explicit semantics • Linked data converts the web from linked documents to linked structured data and its semantics is processable by the computer • Linked open data • Open data accessible by the public
Linked Data • From relational database to linked data • Relational database has a flat attribute structure • Further relations between the attributes have to be expressed in a complex way • By nested foreign keys • Graph database is the best way for storing linked data • Matches the nature of linked data • Faster retrieval (in relational DB, links needs to be foreign keys slower retrieval)
OWL • Web Ontology Language (OWL) • Support the specification of ontologies • In RDF, one can specify <mammal, subClassOf, animal>, but subClassOf is a user defined predicate, the same as any other predicate • In OWL, class is a built-in concept with formal definitions • A class can be defined by inheriting the “Thing” class or other classes • Class hierarchy can be built with built-in assertions • SubClassOf, EquivalentClasses, DisjointClasses, ClassAssertion, … • SubClassOf (:mammal :animal), EquivalentClass (:human :person), ClassAssertion (:woman :Mary) • Can define class property attributes and their types • DataPropertyDomain (:hasAge :Person) • DataPropertyRange (:hasAgexsd:nonNegativeInteger ) • Can also define property hierarchy for a class • SubObjectPropertyOf (:hasThroughput :hasQoS) Much more!
OWL and SWRL • Logic and logic programming • Propositional logic Description logic First order logic • RDF is equivalent to propositional logic • OWL supports quantificataion in first order logic • allValuesFrom • someValuesFrom • SWRL • Support rule specification and inference • A Semantic Web Rule Language • Rule: father(?x,?y) ∧ father(?y,?z) ⇒ grandfather(?x,?x) <owl:Restriction> <owl:onPropertyrdf:resource="#hasParent" /> <owl:allValuesFromrdf:resource="#Human" /> </owl:Restriction> <owl:Restriction> <owl:onPropertyrdf:resource="#hasParent" /> <owl:someValuesFromrdf:resource="#Physician" /> </owl:Restriction> Implies(Antecedent(father(I-variable(x) I-variable(y)) father(I-variable(y) I-variable(z))) Consequent(grandfather(I-variable(x) I-variable(z))))
Semantic Web Structure Jena: Provide APIs for building knowledge in RDF and OWL, process SPARQL queries, … Jena
SOA and Web Services • SOA • Evolution: component-based/OO, Sun-RPC/Java-RMI/Corba, SOA/microservices • Semantic model for service description • WSDL, OWL-S, WSMO • Service invocation • SOAP, REST • Service discovery and composition • UDDI for discovery, peer-to-peer discovery • Planning for functional composition • QoS based service composition
SOA • Software technology evolution • Monolithic Layered architecture SOA • Layered architecture is still monolithic, SOA is not • Procedural Component based and OO SOA • Service Oriented • Component Oriented • Procedural Oriented • Object Oriented
SOA Now: microservices - Smaller granularity - Serverless functions • Software technology evolution SOA Before SOA
SOA and Web Services • SOA • Can achieve faster deployment, better reuse and dynamic adaptivity (compared to traditional architectures) • A coarse grained and loosely coupled approach • Focus on publish, discovery, and invocation • Web service • Is a specific realization of SOA • With a set of XML based standards • SOAP, Restful • UDDI, WSDL Service Broker Registry Publish (WSDL) Discover (UDDI) Service Provider Service Requestor Invocation (SOAP) Microservices focus on the non-monolithic property and function containerization, not so much on the architecture on the right.
Web Services • Web service and RPC • RPC is more point-to-point integration (No discovery, composition) • SOA integration is based on the concept of service bus and supports enterprise integration and beyond • Web services are http based, can nicely integrate with static Web Three RPC based systems
Web Services Technology Stack Access Choreography Composition Orchestration (BPEL4WS, WSMO) WS-Reliability, WS-Security WS-Transaction, WS-Coordination, WS-Context QoS WSMX Discovery UDDI Discovery Service Description OWL-S, WSMO RDF/OWL WSDL Messaging SOAP, REST, … Networking HTTP, SMTP, FTP, … URI
Web Service Semantics • Semantic web service • = Web service technology + Semantic web technology • WSDL • Similar to conventional specification for a class • One web service may have a set of operations • Each operation and its IO is specified • IO (message) specification includes IO data type, which can be an ontology definition • E.g., datatype mail • Offer primitive service specification • It is the basis of the low level, for SOAP data exchange and service invocation • High level semantics OWL-S mail receiver sender address country state city street number
OWL-S • OWL-S • A high level semantic model for service specification • OWL-S, unlike RDF or OWL, is not a language • Only provides upper ontology to guide service specification
OWL-S • OWL-S • Service profile: IOPE
OWL-S • OWL-S • Process • atomic, composite • recursive
OWL-S • OWL-S • Grounding • Bind to WSDL
WSMO • Web service modeling ontology (WSMO) • An ontology specifying overall web service model • Also offer a formal description language (WSML) and service execution environment (WSMX) • The top level model Objectives that a client wants to achieve by using Web Services Provide definitions of terminologies used by other components Semantic description of Web Services • Capability (functional) • Interfaces (usage) Mediator between components for handling heterogeneities
WSMO • WSMO service description • Ontology for data model • IOPE • Choreography • How to interact with the service • Similar to grounding (WSDL) in OWL-S • Orchestration • Composite service specification • Similar to “Process” in OWL-S • For service description model, WSMO is similar to OWL-S
Globally Integrated IoT Cloud
IoT Perspective • IoT and edge computing • Gartner • Gartner defines edge computing as solutions that facilitate data processing at or near the source of data generation. For example, in the context of the Internet of Things (IoT), … • https://www.gartner.com/smarterwithgartner/what-edge-computing-means-for-infrastructure-and-operations-leaders/ • IDC prediction • According to a report by the International Data Corporation (IDC), by 2019, 45 percent of IoT-created data will be stored, processed, analyzed and acted upon close to, or at the edge of the network. • https://www.databank.com/2018/08/30/solving-edge-computing-challenges-in-era-of-iot/ • IoT will be coupled with Edge Computing
IoT Perspective • IoT and Edge
IoT Perspective • IoT and big data • Currently 5 quintillion bytes of data produced every day () • By the year 2020, the IoT will comprise more than 30 billion connected devices • It would take a lifetime to manually analyze the data produced by a single sensor on a manufacturing assembly line • Quotes from: https://blogs.cisco.com/datacenter/internet-of-things-iot-data-continues-to-explode-exponentially-who-is-using-that-data-and-how • That’s why Harvard Business Review found out • Less than half of structured data is actively used in decision making • Less than 1% of unstructured data is analyzed or used at all • Need IoT device and data management • Need semantic models and associated techniques for IoT data and services
Semantics for IoT Cloud • Simple IoT naming and accesses • URI, ICN naming • URI based accesses: CoAP and MQTT -- IoT’s http • Semantic IoT data • Model: SSN, Fiesta, Data Stream Centric Semantics • Storage: TSDB databases, TSDB and linked data • Discovery: ICN, beyond • Semantic IoT services • Model: Software services versus IoT services • Discovery: Service discovery versus IoT service discovery • Composition: Service composition for IoT services
Simple Naming Schemes • URI • Used by CoAP, Linked Data, IoT@Work • ICN • DONA: data oriented network architecture • Consider each IoT device/data has an owner, use P+L, where P is the public key (hashed) of the owner and L is owner assigned label • NDN: Named data networking • Opaque to the network, user can assign a name at her/his will • Waiting for a naming scheme to come out in the future • CCN: Content centric network • Hierarchical name, like file path, still a single string • These naming schemes do not provide semantics
URI Based Protocols • Application layer protocols for IoT • Use URI for data/service accesses • Can use http and restful protocols • But they do not consider resource constraints • CoAP and MQTT are equivalent protocols based on URI • Designed to replace http in low power device domain, can easily be converted to http • Compare HTTP and CoAP/MQTT protocol stacks • Request/response layer handles transmission of req/rep • Transaction layer handles a single message exchange CoAP, MQTT
CoAP • CoAP and HTTP interaction
CoAP • CoAP and http • Protocol: User can control whether to get confirmation • CON/NON
CoAP • CoAP low power considerations • Do not use TCP, which has a high overhead • TCP’s flow control is not appropriate for short-lived transactions • Use UDP and 6loWPAN • Use 4 byte header, suitable for small payloads in many IoT applications • Support asynchronous communication • When a request cannot be handled, the server sends back an ACK of receiving the request but cannot handle it right away • Suitable for low power IoT devices where delays may cause unnecessary timeouts
MQTT • Protocol for embedded devices • MQTT • Constrained application protocol • Based on publisher-subscriber model • Consists of three types of nodes: publishers, broker, subscribers
MQTT • Protocol for embedded devices • MQTT allows users to control transmission assurance • Can set QoS to be 0, 1, or 2 • 0: The broker/client will deliver the message once, with no confirmation • 1: The broker/client will deliver the message at least once, with confirmation required • 2: The broker/client will deliver the message exactly once by using a four step handshake protocol
Semantics for IoT Cloud • Simple IoT naming and accesses • URI, ICN naming • URI based accesses: CoAP and MQTT -- IoT’s http • Semantic IoT data • Model: SSN, Fiesta, Data Stream Centric Semantics • Storage: TSDB databases, TSDB and linked data • Discovery: ICN, beyond • Semantic IoT services • Model: Software services versus IoT services • Discovery: Service discovery versus IoT service discovery • Composition: Service composition for IoT services
IoT Data Storage: TSDB • Evolution of storage models in the cloud • Distributed file systems • GFS, HDFS, Ceph, Swift, … • Big Table • NoSQL database: remove the unnecessary and expensive features in relational database to greatly improve the performance • Key-value model, document model, … • GBT, Cassandra, MongoDB, Redis, … • TSDB • Specifically designed for the storage of IoT data stream • Consider the time aspect, consider how to handle continuous data • Linked data for IoT data? • Mainly linking static data, no time aspect, no data stream handling • Can link data stream, combined with TSDB for better semantic power
IoT Data Model: Data Stream Centric • SSN • Focus on sensor and sensor observations • Why it is not good? How about focusing on data • Or, more specifically, should focus on data stream • Difficult to handle it in terms of individual data entity • Discovery, analytics are all based on data streams • SensorML • Very similar to SSN • Data stream centric model • TSDB also offers some basic data stream model • Integrate good features from TSDB or SSN • Consider features that are missing from both TSDB and SSN Student presentation
IoT Data Discovery • SensorML based • Centralized, registry based discovery • Not widely studied in terms of discovery • SSN based • Different proposals, but all centralized • Good work in terms of data indexing and partial matching • Should we consider distributed solutions? • IoT systems trends to be IoT-Edge-Cloud infrastructure • Centralized discovery is not feasible Need to be hierarchical or peer-to-peer solutions No IoT specific solutions yet • Many existing techniques in traditional data discovery Can we use them for IoT data discovery?
Traditional Data Discovery • Background • Basic questions • How is a data query specified? Keywords of course • A single or multiple keywords? Consider both models • How is the underlying infrastructure? • Centralized, hierarchical, peer-to-peer
Traditional Data Discovery • Centralized: inverted indexing • Central servers collect information • Process each document to obtain a keyword vector • Union all keywords to obtain a unified keyword vector • 0/1 to indicate whether a document has the specific keyword • Mostly use count, some use appearance area to adjust count • Problems: cannot support efficient search + space inefficient e.g., Google is locally distributed
Traditional Data Discovery • Centralized: inverted indexing • Dictionary: all keywords • Use them as the index (sorted in keywords order) • A list of documents is associated with each keyword • Document list is sorted by the document number • Document number points to the original (keyword + #occurrence + …) vector to allow ranking of the documents in terms of the keyword • Can handle any number of keywords Dictionary Postings list Brutus 2 4 8 16 32 64 128 Calpurnia 1 2 3 5 8 13 21 34 Caesar 13 16
Traditional Data Discovery • Hierarchical • Decentralized solution • Bloom filter construction • M N-bit vectors with M hash functions, e.g., , , + , , • For each document, hash each keyword • If , set , … • For each node, the summary is OR( from all documents) • When looking for a key , hash it and check whether the corresponding bits in the three hash vectors are set If yes, then is very likely to be available; otherwise, is not available Can be false positive, no false negative (“abc”) = 5 (“abc”) = 2 (“abc”) = 8 (“cd”) = 3 (“cd”) = 1 (“cd”) = 8
Traditional Data Discovery • Hierarchical Bloom filter • Bloom filter exchange • A node sends its to its parent • Parent keeps Bloom filters of all children so that later, it knows which children to forward a search query to • Parent summarizes Bloom filters of all children (OR the vectors) to get one filter and sends the summarized vector to its parent • Routing can be done on the hierarchy • Can handle any number of keywords, but filter size impacts error rate
N56 N1 N51 N8 N14 N48 N42 N21 N38 N32 Traditional Data Discovery • Peer-to-peer • Distributed hash table (DHT ring approach) • DHT ring with hash nodes • N physical nodes: (, , …, ), 2m >> N • Each physical node • Is mapped to a hash node on the ring • Knows the hash nodes of all other physical nodes (maintained in its routing table) • For a document D • Hash(D.key) = K store D at successor(K) • I.e., the first node after or at K, with a physical node (on the ring) Not true in some algo