360 likes | 372 Views
This article discusses the importance of optimizing data centers, IoT, and networks for application performance, latency, and availability. It explores topics such as failure detection, deterministic latency bounding, and integrating application code into network systems. The article also highlights the similarities between data centers and IoT, and addresses technical considerations for the security and access control of IoT devices.
E N D
Three topics – 1. Data Centers, 2. IoT, 3. Networks for Dev/Disasters Jon Crowcroft, http://www.cl.cam.ac.uk/~jac22
1. Systems (th)at ScaleData Centers Jon Crowcroft, http://www.cl.cam.ac.uk/~jac22
Data Centers don’t just go fast • They need to serve applications • Latency, not just throughput • Face users • Web, video, ultrafast trade/gamers • Face Analytics… • Availability & Failure Detectors • With many different paths • Application code within network • On NIC on host or switch
Industry (see pm ) Azure http://conferences.sigcomm.org/sigcomm/2015/pdf/papers/keynote.pdf Facebook: http://conferences.sigcomm.org/sigcomm/2015/pdf/papers/p123.pdf Google: http://conferences.sigcomm.org/sigcomm/2015/pdf/papers/p183.pdf
2. Deterministic latency bounding • Learned what I was teaching wrong! • I used to say: • Integrated Service too complex • Admission&scheduling hard • Priority Queue can’t do it • PGPS computation for latency? • I present Qjump scheme, which • Uses intserv (PGPS) style admission ctl • Uses priority queues for service levels • http://www.cl.cam.ac.uk/research/srg/netos/qjump/
Data Center Latency Problem • Tail of the distribution, • due to long/bursty flows interfering • Need to separate classes of flow • Low latency are usually short flows (or RPCs) • Bulk transfers aren’t so latency/jitter sensitiv
Data Center Qjump Solution • In Data Center, not general Internet! • can exploit topology & • traffic matrix & • source behaviour knowledge • Regular, and simpler topology key • But also largely “cooperative” world…
Qjump – two pieces • At network config time • Compute a set of (8*) rates based on • Traffic matric & hops => fan in (f) • At run time • Flow assigns itself a priority/rate class • subject it to (per hypervisor) rate limit * 8 arbitrary – but often h/w supported
Failure Detectors • 2PC & CAP theorem • Recall CAP (Brewer’s Hypothesis) • Consistency, Availability, Partitions • Strong& weak versions! • If have net&node deterministic failure detector, isn’t necessarily so! • What can we use CAP-able system for?
Consistent, partition tolerant app? • Software Defined Net update! • Distributed controllers have distributed rules • Rules change from time to time • Need to update, consistently • Need update to work in presence of partitions • By definition! • So Qjump may let us do this too!
3. Application code -> Network • Last piece of data center working for application – not just SDN… • Switch and Host NICs have a lot of smarts • Network processors, • like GPUs or (net)FPGAs • Can they help applications? • In particular, avoid pathological traffic patterns (e.g. TCP incast)
Application code • E.g. shuffle phase in map/reduce • Does a bunch of aggregation • (min, max, ave) on a row of results • And is cause of traffic “implosion” • So do work in stages in the switches in the net (like merge sort!) • Code very simple • Cross-compile into switch NIC cpus
Other application examples • Are many … • Arose in Active Network research • Transcoding • Encryption • Compression • Index/Search • Etc etc
Need language to express these • Finite iteration • (not Turing-complete language) • So design python– with strong types! • Work in progress in NaaS project at Imperial and Cambridge…
Conclusions/Discussion • Data Center is a special case! • Its important enough to tackle • We can hard bound latency easily • We can detect failures and therefore solve some nice distributed consensus problems • We can optimise applications pathological traffic patterns • Integrate programming of net&hosts • Weird new h/w… • Plenty more to do…
2. IoT is similar! Key for IoTDI, Berlin4 April 2016 • Jon Crowcroft • jon.crowcroft@cl.cam.ac.uk • Christopher Millard • c.millard@qmul.ac.uk • Ian Walden • i.n.walden@qmul.ac.uk
Similarity to data center • Many to one • Care about latency • for feedback control loops • Potentially many routes • e.g. in smart city • What’s different?
Clouds of Things: Technical Considerations(Confidentiality, Integrity, Availability) • Secure communications (C, I): Work is advanced and existing techniques can be leveraged. IoT could benefit from lighter-weight schemes, particularly where cryptography is involved. • Access controls for IoT-Cloud (C): Standard mechanisms can be used. IoT adds complexity due to the scale and dynamism of ‘thing’ access. • Identifying sensitive data (C): Largely a non-technical concern, but has an impact on how policies are defined. • Public, private or hybrid? (C, A): Currently blunt partitioning is supported, but emerging research will allow for more flexible deployments that facilitate data sharing. • In-cloud data protection (C): There are strong isolation techniques available and providers employ general access controls. More flexible approaches are needed for inter-application sharing to be possible (see 6, below). • In-cloud data sharing (C, A): Inter-application sharing is needed for IoT but currently is not part of the cloud philosophy. • Encryption by ‘things’ (C, I): Encryption techniques are mature, but this approach precludes most computations on protected data and involves complex key management. Ongoing work into homomorphic encryption will assist. Lightweight encryption mechanisms are being developed and will require robust testing and analysis.
Clouds of Things: Technical Considerations(Confidentiality, Integrity, Availability) • Data combination (C): Some techniques exist to prevent user re-identification, but much more work is needed. • Identifying ‘things’ (C): Existing work on identity management can be leveraged for IoT, but more experience at a larger scale is needed to determine suitability and/or limitations. • Identifying the provider (C): The basic issues are mostly architectural or configuration concerns. Some outstanding issues remain when resources are shared or where decisions need to be made at runtime. • Increase in interactions and data load (A): Cloud services manage elasticity well, but resource expansion is not unlimited. Peak IoT loads are unknown, but possibly controlled by economics (payment/ownership). • Logging at large scale (C, I, A): Currently logging is low-level and system-centered. More work is needed on logging and processing tools for applications and users. • Malicious ‘things’—protection of provider (C, I, A): Existing techniques can be deployed. • Malicious ‘things’—protection of others (C, I, A): There are potentially techniques that can assist. Experience is needed of cloud services operating across IoT subsystems.
Clouds of Things: Technical Considerations(Confidentiality, Integrity, Availability) • Certification of cloud service providers (C, I, A): This is currently manual and static, leading to delays when updates are required. Research is needed on automatic certification processes, possibly including hardware-based solutions. • Trustworthiness of cloud services (C, I, A ): An emerging field with ongoing research. Experience of practical implementation is needed. • Demonstrating compliance using audit (C, I, A): Currently, the compliance of cloud providers to their contractual obligations is not demonstrated convincingly. Research is needed, and IoT will add additional complexity. • Responsibility for composite services (C, I, A): The legal implications of the use of third-party and other services are unresolved. Such usage is not as yet transparent to tenants and/clients. More work is needed concerning user and application-level policy aspects. • Compliance with data location regulations (C, I, A): Currently not enforceable except at coarse granularity. There is research in IFC that can assist, but the concepts are not yet commercially deployed. • Impact of cloud decentralisation (C, I, A): This is an emerging field, where the current focus is on functionality. More attention is needed regarding security.
3. Networking for Development Jon Crowcroft http://www.cl.cam.ac.uk/~jac22 Jon.crowcroft@cl.cam.ac.uk
Disasters happen • Read this book: A Paradise Built in Hell, R.Solnit: • http://www.amazon.co.uk/Paradise-Built-Hell-Extraordinary-Communities/dp/0143118072/ • Lots of nice examples, but subtitle matters: The Extraordinary Communities That Arise in Disaster • Before first responders get there (>72hrs) self-organise! • Then the emergency “services arrive, and mess things up badly - Can we help?
Yes we can • Lots of useful tech • Twimight, firechat, haggle, thingbox etc • Not only improve community creation/organisation • Create situational awareness • E.g. maps of dry land, clean water, food/drink etc • Do pub/sub/cloud (distr) • Then have first reponders start with a clue • Note – this has been done manually • (see Haiti Earthquake/google maps/twitter of safe buildings/ triage data on who to rescue etc) • So what are the challenges? IoT like….
Challenges • transition from small group to large where social ties stretch… • …and break • (when society invented money and policemen) • So can we mitigate this? • Digitial societies are being tried • Old2new – usenet, wiki, liquid, occupy
New Tech mixes • TVWS +network coding+Named Data • Or DTN and ICN • Or nano-containers….
Summary • Three areas continue to raise challenges • Surprisingly, some overlap (e.g. path diversity, reducing latency) • Some differ (security, power)