570 likes | 676 Views
Naturally Adaptive ICT. Mark Shackleton Pervasive ICT Research Centre BT Research & Venturing mark.shackleton@bt.com. Talk outline. BT Research & Venturing Why BT is interested in Nature-inspired Computing and Communications Adaptive and Autonomic Systems
E N D
Naturally Adaptive ICT Mark Shackleton Pervasive ICT Research Centre BT Research & Venturing mark.shackleton@bt.com
Talk outline • BT Research & Venturing • Why BT is interested in Nature-inspired Computing and Communications • Adaptive and Autonomic Systems • Some autonomic Self-* examples: • “Fly Phones”: self-configuring channel allocation for mobile networks using principles from developmental biology • “Self Service”: an autonomic protocol for dynamic service provision • “Embryo”: a fully decentralised, autonomic service management framework, inspired by morphogenesis • From nature-inspired heuristics to engineering and design principles
BT Research & venturing • BT’s Research facility • Provides innovation and R&D for BT’s lines of business • About 300 people based at: • Adastral Park, Martlesham Heath, Ipswich, Suffolk • My team: nature-inspired & adaptive solutions
Today’s Problems with IT / ICT • ICT systems are becoming so complicated that they are increasingly becoming largely unmanageable • sheer scale of current and envisaged ICT deployments • heterogeneity of the underlying infrastructure that nobody (“no single person”) can understand • unanticipated and unwanted interactions between components • These add up to frequent failures or sub-optimal system-level behaviour, and costly, error-prone system administration • “Nature-inspired Computing” and more specifically “Autonomic Computing” - by analogy to the human autonomic nervous system, regulating “basic” functions e.g. blood pressure, heart rate, breathing
“Autonomic Computing” TCO, Resilience, Telephony, SysAdmin->BusGoals
Autonomic Computing • Self-configuring • adaptation to IT system changes, such as new nodes becoming available or going offline • Self-optimising • tuning resources and load balancing • Self-protecting • guard against damage from attacks or failures • Self-healing • recovery from, or work around, failed components
BT’s interest in Nature-inspired and Autonomic Solutions • The Digital Networked Economy will require the support of a highly adaptive underlying ICT infrastructure. • This will be dynamic, heterogeneous and support multiple domains of ownership and control. • It will need to adapt to transient changes in demand as well as longer-term usage trends. • It will embed “autonomic behaviour” that reduces deployment and running costs, whilst enhancing resilience.
network Adaptive / Autonomic ICT- decentralised, autonomic, lightweight • Self-managing Peer-to-Peer and Decentralised architectures • Complex systems engineering& nature-inspired solutions • From: complex & costlymanual management • To: self-managing“autonomic” ICT solutions • And: resilient provisionof services in a complex Pervasive ICT world • Self-healing ICT “immune systems”
“Autonomic Computing” …a nature-inspired analogy!
The Autonomic Analogy “..we should not only consider what theautonomic nervous system does but also howit does it… Successful self-management lies in the way [biological systems] achieve thisfunctionality.” From: O. Babaoglu, M. Jelasity, A. Montresor. Grassroots Approach to Self-Management in Large-Scale Distributed Systems. In Proceedings of the EU-NSF Strategic Research Workshop on Unconventional Programming Paradigms, Mont Saint-Michel, France, 15-17 September 2004. Architectures & principles + Specific algorithms
Interactions Local entity Rules GlobalBehaviour Tune/control Adaptive Systems versus Elements
Novel design principles... Problem statement: Increasingly complicated (diverse, dynamic and seemingly unpredictable) ICT systems have created a management crisis, with serious reliability issues and a severe loss of confidence from the average user. Traditional solution: Reinforce central control, artificially reduce diversity through enforcement of restrictive usage policies, “fight” emergent system properties by blocking unsupervised (e.g. P2P) interactions. Consequences: Escalating cost of ownership, waste of computing resources, lost opportunities for innovative use of technology. Alternative solution: Reject “complication”, embrace “complexity”. Adopt novel design principles that take advantage of emergent phenomena, learn to rely on statistical rather than deterministic predictability, focus on developing methods to “promote” desirable system properties rather than on “micro-management”.
Some (NI) Design Heuristics (1) • Local rules - wherever possible use local rules and decision making to achieve overall behaviour • Interactions - by combining local decision making with carefully crafted interactions between neighbourhood nodes/entities the desired global behaviour can often be achieved • Positive and negative feedback - biological systems make extensive use of feedback to control processes and achieve robust design of structure and behaviour
Some (NI) Design Heuristics (2) • Decentralised solutions - often a given problem is in essence a decentralised problem - in this case a decentralised solution may be well matched • In addition, nodes in a decentralised system often "bring their own resources" which can help provide a scalable solution • Engineered-in behaviour versus explicit external control - where possible it is preferable to embody some management within the system itself • Policy-based management is still appropriate and possible via tuning parameters and via the system's in-built adaptability
Some examples using these approaches to create adaptive solutions...
Frequency allocation - the problem • Interference between neighboring base stations should be prevented. • The number of available frequencies is limited. • Bandwidth has to be very carefully distributed between adjacent cells.
Cell differentiation for fruit flies • Most cells have the potential to make bristles. • They all start to express the corresponding gene (greyscale = density of the associated transcription factor) • But they gradually specialise until only a few actually develop a bristle. • Clearly a self-organised process.
Underlying mechanism • This process obeys a fully decentralised control mechanism. • It involves a local positive feedback loop… • Coupled with cross-inhibition of neighboring cells(Delta-Notch signalling)
Why is an original solution needed? • This is in fact a very complex problem: • The base stations are not regularly distributed (i.e. they don’t have the same size and/or number of neighbours) • The continuously fluctuating traffic must be taken into account. • A centralised decision process is not particularly well adapted… • But a system allocating frequencies on the basis of local competition between cells is.
The “Flyphones” algorithm • The equivalent of the natural feedback loop is implemented for each available frequency. • Through the “negotiation” process, each base station starts to develop a preference for some frequencies (moving away from the unstable equilibrium)… • And simultaneously inhibits its neighbours from using them.
Flyphones • Nature-inspired innovation • Self-organising • Self-healing • Micro Cell Decisions • Macro Result • ‘Autonomic’ Network • Dynamic • Scale-independent • Distributed • Self-organising • Self-healing
58 base stations 4 from 29 channels 10253 Solutions 680 base stations 6 from 42 channels 104031 Solutions
Applying FlyPhones Field Network Radio scenario -- 600 lines of communication, 250 channels, mobile transceivers - currently solved centrally, statically (obvious difficulties!) - FlyPhones can solve it too - contract research to use FlyPhones in dynamic management
Ex#2: SelfService- an autonomic protocol for dynamic service provision Telco network->ICT services; Bgd; NI
C A B z z z x x x x x x y y y z z z y y y “Alone in the world” (~PC model) Need for service x x Installed module x x
Pros and cons • Highly robust to node or network failure. • End user has total control. • Need a lot of onboard power (good for hardware manufacturers!). • Need many copies of every application (good for software manufacturers!). • Need virtually no ICT infrastructure (not good for service providers!). • Amazing waste of resources (“99% idle time” syndrome).
A B y y x x z z z C x x y z y “Thin client” Need for service x x Installed module x D x Client-server relationship
Pros and cons • Extremely brittle (single point of failure!) • Administrator has total control. • Mixed picture for the hardware industry (need powerful servers, but only low-end PC’s) • Mixed picture for the software industry (depends on license management). • Mixed picture for ICT providers (network services are paramount, but risk of bottlenecks and QoS degradation).
Quotes:(from IBM’s Autonomic Computing Manifesto) • “An autonomic computing system knows its environment and the context surrounding its activity, and acts accordingly (…) It will tap available resources, even negotiate the use by other systems of its underutilized elements, changing both itself and its environment in the process.” • “An autonomic computing system cannot exist in a hermetic environment (…) Standard ways of system identification, communication and negotiation – perhaps even new classes of system-neutral intermediates or ‘agents’ specifically assigned the role of cyber-diplomats to regulate conflicting resource demands – need to be invented and agreed on.” • n.b. Plus industry trends: SOA, P2P, Grid, Web services
A B C “SelfService” Need for service x x y x Installed module x y x x z z y x Service-specific client-server relationship x x y z z [P2P service provision]
Pros and cons • Intermediate robustness (no single point of failure, but problems will tend to be “non-local”). • End user is back in charge (decides what to install). • Interesting model for sharing resources (i.e. P2P utility computing). • A clear step towards pervasive ICT and a great opportunity for service providers, if it works...
What is the challenge? • The difficulty is of course to ensure adequate service coverage, in terms of accessibility, reliability, latency etc. • This has to be achieved without central control or planning, otherwise: • It won’t scale. • We’ll lose many of the benefits (in terms of agility and adaptability).
“SelfService” • Objectives: • To support reliable, fault-tolerant access to a sub-set of services, which are required at local “access points”. • To reduce the need for installation/running of the corresponding software modules on local devices used as access points. • Without having to rely on dedicated servers. • Underlying hypothesis: There are unpredictable but consistent patterns of activity, which can be used to select stable partnerships (e.g. “device X, hosting service S1, is able to provide it to device Y for 80% of business hours”).
Experimental algorithm START (generate request) Already know provider Broadcast request (Re-)examine request yes Download component Reply received Targeted request Increment delay yes Store provider’s address Reply received Delay reached unacceptable limit yes Process or send job yes “Forget” provider EXIT
Subsciption Subsciption Subsciption Subsciption KnownProvider KnownProvider KnownProvider KnownProvider KnownProvider KnownProvider KnownProvider KnownProvider KnownProvider Service ID ID ID ID ID ID ID ID ID score score score score score score score score score ID QoS value KnownProvider Service Decision rules ID score ID requests KnownProvider KnownProvider QoS success ID ID score score ID ID value Service QoS QoS requests Service Service value value success requests requests success success Locally maintained information • Maintain a “subscription” to each required service component • Can keep a record of QoS attributes, such as speed of response
Mobile nodes (e.g. PDAs) Regular, but unpredictable, daily activity cycles. Colour code = QoS. Size = number of modules installed. Grey links = in range. White links = identified opportunities for co-operation. Pervasive “SelfService”
Biological Morphogenesisas a source of inspiration? • In morphogenesis, individual stem cells simultaneously differentiate (specialise) and move in space (equivalent to rewiring in a network) • In this way cells of the right type occupy the right location in the developing organism • Neighbours influence each other’s choice via a dynamic web of positive and negative feedback • This aspect of the developmental process shares many characteristics with co-operative peer-to-peer (P2P) service provision across networks • deciding which service to host is equivalent to differentiation • selecting providers via “rewiring” is similar to cell migration
Ex3: Embryo • Embryo is a fully decentralised, autonomic service management framework, inspired by morphogenesis • It is capable of inducing the local installation of components AND modifying the topology of a peer-to-peer interaction overlay network • In doing so, it adapts the overall system so as to meet the needs of the majority of peers • Simulations show that Embryo supports deployment of new applications, adding or retiring of service components and re-scaling (reallocating resources) “on the fly” without explicit management. => Autonomic
Embryo’s key mechanisms • Rewiring (~Cell migration) • nodes send “adverts” offering their available service components, as well as their own needs • these signals are propagated using local “gossiping” • links are made where there is a reciprocal match of offer & need • nodes maintain an awareness of their local context via adverts they have seen “pass them by” • Changing type (~Cell differentiation) • a node is permitted to be of only one type (i.e. host only one service component) since we wish to explore a node’s ability to dynamically reconfigure within its current context • neighbours exhibit implicit mutual inhibition since there is a reduced pressure to offer a service that is already offered by a neighbour
Embryo: simulation of an autonomic service management framework • Each node acts as an “access point” for one application type (indicated by shape) • Each node hosts and makes available one service component type (indicated by colour & number) • The three strips show which (of 3) application types require which service components
Embryo: some simulation results • The time to convergence of the system to stable state grows only logarithmically with population size (i.e. number of nodes or cells) • This suggests good scalability properties
“Rewiring” ~ Cell migration/adhesion • The number of “corrective actions” (rewiring events) a node must make before the system reaches steady state grows slowly with population size • Note that when there are more service types scalability is better still (downward trend - not shown)
Changing type ~ Differentiation • Bigger population sizes (i.e. more nodes/cells) require less differentiation events per node • This suggests good scalability properties
Some Design Heuristics (1) • Local rules - wherever possible use local rules and decision making to achieve overall behaviour • Interactions - by combining local decision making with carefully crafted interactions between neighbourhood nodes/entities the desired global behaviour can often be achieved • Positive and negative feedback - biological systems make extensive use of feedback to control processes and achieve robust design of structure and behaviour • Flyphones uses explicit inhibitory signalling • Embryo exhibits implicit inhibition i.e. if a neighbour offers a service then I don’t need to offer it
Some Design Heuristics (2) • Decentralised solutions - often a given problem is in essence a decentralised problem (c.f. Flyphones) - in this case a decentralised solution may be well matched. In addition, nodes in a decentralised system often "bring their own resources" which can help provide a scalable solution • Engineered-in behaviour versus explicit external control - where possible it is preferable to embody some management within the system itself; policy-based management is still appropriate and possible via tuning parameters and via the system's in-built adaptability