570 likes | 782 Views
Integrating End-to-End and Cross-Layer Optimizations for Cyber-Physical Systems. Nikil D. Dutt Center for Embedded Computer Systems (CECS) University of California, Irvine dutt@cecs.uci.edu http://www.cecs.uci.edu/~dutt. Where’s UC Irvine (UCI)?. UCI.
E N D
Integrating End-to-End and Cross-Layer Optimizations for Cyber-Physical Systems Nikil D. Dutt Center for Embedded Computer Systems (CECS) University of California, Irvine dutt@cecs.uci.edu http://www.cecs.uci.edu/~dutt
Where’s UC Irvine (UCI)? UCI Irvine: About an hour south of Los Angeles UCI: Established 1965, Fastest growing UC campus 27,000+ students, 2000+ faculty Schools: Engineering, ICS, Physical Sciences, Bio, Medical, Law, Social Sciences, Humanities, Arts,… Notables: 3 Nobel Prize winners, Top 10 Public University in USA Pacific Ocean UC Irvine
Dutt Laboratory: Who are we? Team • Ph.D. Students • Luis Bathen, Arup Chakraborty, Aseem Gupta, Gabor Madl, Jayram Moornikara, Ashok Halambi, Kazuyuki Tanimura, Jun Yong Shin • Visiting Faculty • Prof. Kiyoung Choi (SNU, Korea) • Prof. Ing-Jer Huang (National Sun Yat-sen University, Taiwan) • Visiting PhD Students • Ganghee Lee, Manwhee Jo (SNU, Korea) • Chun Hung Lai, Fu-Ching Yang, Liang-Bi Chen (National Sun Yat-sen University, Taiwan)
Dutt Laboratory: What do we do? • Research themes focused on Embedded SoCs, Distributed Embedded Systems, and Cyber-Physical Systems • System-level view of hardware and software • Tools and techniques for exploring Multi-Processor Systems-on-Chip (MPSOCs) • Compiler, simulator, estimation and validation tools • Memory and communication architecture exploration and customization • Power-aware QoS for Distributed Multimedia Platforms • End-to-end and Cross-Layer Optimizations • Power-performance-QoS tradeoffs for multimedia delivery to mobile devices • Timing and Reliability for CPS • Modeling constraints, interactions • End-to-end and Cross-Layer Optimizations
What’s that funny logo at the bottom? UCI Athletics Mascot: Peter the Anteater Inspired by the Johnny Hart comic strip, "B.C." Anteater sculpture on UCI campus
Integrating End-to-End and Cross-Layer Optimizations for Cyber-Physical Systems Nikil D. Dutt Center for Embedded Computer Systems (CECS) University of California, Irvine dutt@cecs.uci.edu http://www.cecs.uci.edu/~dutt
Outline • Emergency Response Technologies: CPS Exemplar • Modeling Time in CPS • Cross-Layer Optimizations: • Performance, Energy, QoS • Reliability and Fault-Tolerance • CPS Middleware • Open Issues
ERT: CPS Application Exemplar • ERT: Emergency Response Technologies • Situational Awareness for First Responders • Real-time response • Emergencies require immediate attention! • Diverse set of sensors • Cameras, motion sensing, temperature, pressure,… • Complex interactions between multiple physical and computing subsystems • Exemplar • CERT: Center for Emergency Response Technologies • Large multi-PI center at UCI • Uses physically instrumented campus as testbed • Situational Awareness Technologies for Fire Emergency Response • Specific instance of ERT in collaboration with fire and govt. agencies
CPS Exemplar: Cyber-Physical Spaces • UCI campus instrumented with sensors • Buildings: cameras, motion sensors, people counters,.. • Ring mall: cameras, RFIDs,.. • Built through large NSF grants : RESCUE, Responsphere • Applications • Situational Awareness • Emergency/first responder service • On-campus traffic management • Infrastructure energy management • http://www.responsphere.org
Transition to CERT Overview • Link: • http://www.cert.uci.edu • Courtesy Prof. Nalini Venkatasubramanian
CPS Exemplar as Context • Situational Awareness Technologies for Emergency Response • Diverse set of sensors • Cameras, motion sensing, temperature, pressure,… • Real-time response • Emergencies require immdeiate attention! • Complex interactions between multiple physical and computing subsystems • Simulations and prototypes • Exemplar • CERT: Center for Emergency Response Technologies • Large multi-PI center at UCI • Uses physically instrumented campus as testbed • Situational Awareness Technologies for Fire Emergency Response • Specific instance of ERT in collaboration with fire and govt. agencies • Today’s Lecture: • Use of Streaming Multimedia in CPS environments • Cameras, smartphones, etc. as sensors!
Modeling Time • Time is a continuous domain variable • Digital systems reason about time by discretizing time into “states” • Synchronous and Asynchronous Digital Systems • Well-defined “clock” partitions time into observable windows during which we reason about the digital system • Reasoning about time in distributed systems • Much more difficult! • Rely on notion of sequencing or causality
Notion of Time for MM Streaming? • Proxy-based streaming • Multiple stream sources • Heterogeneous network of client devices • Each node can be a source and/or destination MEDIA SERVERS CLIENTS Wired Network • Transcoding • Adaptation • etc. Handheld PC PROXY AP Wireless Network PDA
End-to-End Time Scales • Multiple time scales • From ns – ms – sec
Granularity of Time Scales • Modeling End-to-End Timing
End-to-End and Cross-Layer Optimizations? Video Player Other Tasks Applications Client1 Network Management Transcoding Admission Control Middleware Server Clienti DVS Scheduler Operating System Clientn CPU Network Card Display Cache Memory RegFiles H/W • Investigate cross-layer interactions through multiple abstraction layers • Tradeoff QoS vs. energy vs. resources Abstraction Layers FORGE PROJECTweb site: http://www.ics.uci.edu/~forge
Tradeoff Energy, Performance, QoS across Abstraction Layers User/Application Distributed Middleware User/Application Operating System Distributed Middleware device Architecture Operating System Architecture LOCAL CROSS LAYER ADAPTATION GLOBAL PROXY BASED ADAPTATION Directory Service network Proxy • Power-aware Distributed Embedded System framework • Exploit global changes (network congestion, system loads, mobility patterns) • Distribute local information (e.g. device mobility, residual power) for improved global adaptations • Co-ordinate power management strategies at different system levels • Maximize the utility (application QoS, power savings) of a mobile device. Project web site: http://www.ics.uci.edu/~forge
Transition to Cross-Layer Optimizations • Link: • http://www.ics.uci.edu/~xtune/ • Minyoung Kim’s PhD thesis work
What is a System? (cont.) • An entire collection of components • Cyber: Hardware, software • Physical: interacting with the environment • It provides some pre-defined service to the user • System is embedded in an environment • Has • Operators • Users • They may be the same • System Provides • Feedback to the operator • Services to the user
What is a System? (cont.) • Systems are designed to meet some requirements • One such requirement is dependability • Fault-Tolerance is a means to achieve this requirement
Levels of Fault-Tolerance? • Hardware replication to recover from hardware faults • hardware fault-tolerance • Software programming to recover from program/data corruption, transient faults – • software fault-tolerance • Computer subsystem may provide for correcting non-computer related faults • System fault-tolerance • Sensor correction, network error correction • Acknowledgements, checksums and so on Fault-tolerance at all levels must be consistent with each other!
Failures and Faults • Failure • service delivered to the user deviates from compliance with the system specification for a specified period of time • Fault • Reason for failure (causal view) • Alternative views of faults: as failures in other systems that interact with the system under consideration • a subsystem internal to the system under consideration, • a component of the system under consideration, • an external system that interacts with the system under consideration (the environment)
Failures and Faults (cont.) • A Fault is a Failure of • a component of the system, • a subsystem of the system, or • another system which has interacted or is interacting with the considered system • Every fault is a failure from some point of view. • A fault can lead to other faults, or to a failure, or neither • A system with faults may continue to provide its service, that is, not fail. • Such a system is said to be fault tolerant • The observable effect of a fault at the system boundary is called a symptom. The most extreme symptom of a fault is a failure!
An Example • System • Embedded computer system executing a program that controls the temperature of a boiler by calculating the burner’s firing rate • Examples of Faults • If a bit in memory is stuck at one: memory fault • If the memory fault affects the program’s operation such that the boiler temperature rises above normal zone, that is a computer system failure and a fault in the overall boiler system • If there is a gauge showing the temperature of the boiler, and its needle moves into the "yellow" zone (abnormal, but acceptable), that is a symptom of the system fault • if the boiler explodes because of the faulty firing calculation, that is a (catastrophic) system failure.
Example (cont.) • Causes for Fault in the memory • Chip used might not have been manufactured to specification (manufacturing fault) • The hardware design may have caused too much power to be applied to the chip (system design fault) • The chip’s design may be prone to such faults (chip design fault) • a field engineer may have inadvertently shorted two lines while performing preventive maintenance (maintenance fault) • Transient errors
Fault Tolerance: Redundancy Management • Redundancy is the provision of functional capabilities that would be unnecessary in a fault-free environment • Redundancy is necessary but not sufficient for fault-tolerance • computer system may provide redundant functions or outputs • at least one result is correct in the presence of a fault • if the user must somehow examine the results and select the correct one, then the system is not FT • if the computer system correctly selects the correct redundant result for the user, then the system is FT
Techniques for Redundancy Management • Fault Detection • determining that a fault has occurred. • Fault Diagnosis • determining what caused the fault, or exactly which subsystem or component is faulty • Fault Containment • Preventing the propagation one point in a system to a point where it can have an effect on the service to the user. • Fault Masking • ensuring that only correct values get passed to the system boundary in spite of a failed component • Fault Compensation • it may be necessary for the system to provide a response to compensate for output of the faulty subsystem. • Fault Repair • The process in which faults are removed from a system
Cross-Layer Reliability in CPS Context • Information within CPS • Lossless • Lossy • Errors can be exploited in Lossy context • Example: Streaming Multimedia • Exploit errors to provide “slack” for another dimension • Exploit errors between abstraction layers • Example: Tradeoff between QoS and Errors
Transition to Cross-Layer Reliability • Kyoungwoo Lee’s ACM MM08 Presentation • Link: • http://www.ics.uci.edu/~kyoungwl/
CPS Middleware? • Middleware is the software between the application programs and the operating System and base networking • Middleware provides a comprehensive set of higher-level distributed computing capabilities and a set of interfaces to access the capabilities of the system.
Distributed Systems Middleware • Enables the modular interconnection of distributed software • abstract over low level mechanisms used to implement resource management services. • Computational Model • Support separation of concerns and reuse of services • Customizable, Composable Middleware Frameworks • Provide for dynamic network and system customizations, dynamic invocation/revocation/installation of services. • Concurrent execution of multiple distributed systems policies. • How is this applied in CPS context?
Transition to SATWARE • SATware: Middleware for Sentient Spaces • Link: • http://www.satware.ics.uci.edu • Courtesy Prof. Nalini Venkatasubramanian and Prof. Sharad Mehrotra
Physical Systems and Software Design • Physical Systems: “Traditional” Engineering View • Build to spec • Spec includes usage scenarios, tolerances, margins • Use basic engineering principles • Laws of physics, thermodynamics, SOM, etc. • Validate design • Simulations and prototypes • Software (CSE) view • Build to spec • Spec includes usage scenarios, tolerances, margins • Use basic CSE principles • Abstractions (formal), modularity, composition, virtualization, estimation,… • Validate design • Formal validation, simulations and prototypes
PS and Software Design: What’s different? • “Traditional” Engineering View of PS • Models tightly integrate physical attributes: motion, pressure, temperature, etc. with some notion of “time” • Engineering hierarchies • Engineering tradeoffs: size, reliability, (real) costs, … • Complex compositions to create “systems” • Software/Embedded Real-Time Systems (ERTS) view • Models time and computation • Abstract away physical detail: are we omitting critical information? • CS hierarchies • Do these match the engineering hierarchies? • Validation • Formal models of interactions with PS?
Do PS and Software models mesh? • Clearly application/domain dependent • PS careabouts vary widely: • Motion, pressure, temperature, etc. • Scale and precision of data • Design goals and constraints • Study PS for domain-specific abstraction hierarchies • Complex (hierarchical) abstractions • How do they mesh/dovetail with SW abstraction hierarchies? • Possible to create a “physically-aware” virtualization layer? • Generic abstractions for primitive interactions of PS with SW • Layered Domain-specific models • Software for composition • Formal reasoning, predictability analysis
Revisit End-to-End ERTS Guarantees • Augment existing ERTS infrastructures • Physical/spatial/temporal interactions • Define domain-specific primitives that succinctly capture physical phenomena • Establish relationships between PS “knobs” and ERTS “knobs” • ERTS: events, data, control • PS: Input-output relationships • Formalize these, if possible • End-to-end data integrity • Move data across • Multiple abstraction domains • sensors/actuators, HW, OS, networking, middleware • Multiple time scales • ns, ms, micro, seconds, minutes, …
Closing Thoughts on Software and CPS What’s Cyber about Physical? What’s Physical about Cyber? Cyber-Physical Software!
Many Open Issues for Cyber-Physical SW Separation of concerns Constraint-critical/sensitive (e.g., timing, reliability…) vs. Functional Partitioning of models to contain analyses Concurrent/codesign of these artifacts Move towards “Physically-aware” Virtualization Generic abstractions for primitive interactions of PS with SW Layered Domain-specific models Software abstractions to compose models Formal models to allow correct composition Formal reasoning about timing across layers Disparate time scales provide design “slack” Example: 1 ms slack at the HW abstraction layer: huge opportunities for optimizations V+V about timing, physical constraints across layers Verification Probabilistic? Validation Use cases, certification, …
Acknowledgements • CERT, Responsphere, RESCUE, SATWARE • Profs. Nalini Venkatasubramanian and Sharad Mehrotra • xTune Project • Dr. Minyoung Kim, Prof. Nalini Venkatasubramanian, Dr. Carolyn Talcott (SRI) • Cross-Layer Error Mitigation • Dr. Kyoungwoo Lee, Prof. Nalini Venkatasubramanian, Prof. Aviral Shrivastava (ASU) • Members of the Dutt and Venkatasubramanian Labs
Modeling Time and Causality • Given 2 events: • e1: somebody enters a room • e2: the telephone rings • Consider the following cases: • e2 occurs after e1 • e1 occurs after e2 • Both events are temporallyordered • Is there a causal ordering?
Actions, Events, and Order • An action is a function or task performed by a system • An event is an instance of an action • instances are commonly labeled using time stamps and action values. • An order is a binary relation between two events • Two events are temporally ordered if the respective time instants are not identical • Two events are causally ordered if one event is caused by the other • induced by order on respective actions
Clocks • Clocks: • An abstraction (physical, logical, virtual) that measures time • Impossible for all nodes in a distributed ES to have exactly the same clock time • However • we want the CPS (i.e., a distributed ES) to behave predictably, and synchronize at specific instances of time! • How to do this? Use notions of: • Physical clock • Reference clock • Global clock (virtual time) for synchronization • Logical clocks (for causality)