110 likes | 253 Views
DIME Network Architecture (DNA) for a New Generation of Many-core Computing. Parallax - A New Operating System for Scalable, Distributed, and Parallel Computing. Stop. Start. SMTPS11. Agenda. The hardware upheaval and the von Neumann Bottleneck
E N D
DIME Network Architecture (DNA) for a New Generation of Many-core Computing Parallax - A New Operating System for Scalable, Distributed, and Parallel Computing Stop Start SMTPS11
Agenda • The hardware upheaval and the von Neumann Bottleneck • Possible Solution using a Parallel DIME™ network computing model with telecom grade trust • Parallax – A potentially new Operating System (OS) • Proof of concept demo The history of the evolution of current OSs is filled with lessons on wasted billions (does anyone remember Multics or OS2?), unmet expectations (who would have thought UNIX, the original System V, would vanish), surprise winners (Windowsand Linux), and stealthy survivors (Mach in a Mac)
Many-core Servers • SeaMicro – Custom Servers – 512, 1.66 GHz 64 bit X86 Intel Atom cores in 10 RU; 2,048 CPUs/rack • Calxeda - highly integrated Server‐on‐Chip built around a new generation ARM processor – 480 cores • Silicon Graphics – Altix UV – • 2048 cores, 16 TB memory per Single System Image scales to 32,768 processor sockets providing up to 262,144 Intel Xeon cores (8-cores per socket)
Hardware Upheaval and von Neumann Bottleneck Up to 46,080 processing cores or 29.8 petabytes of storage per container No Operating System that provides Application-centric Resource Management in real-time 512 Cores Operating System Gap 480 Cores Layers of Management Infrastructure Running an OS that cannot see beyond tens of cores Network Infrastructure With Complex Management Systems
Current Economics of IT $61.2M $31.6M % of TCO over Five Years Hardware Upheaval is not Matched by Software Innovation!!
SPC Element Network & von Neumann Bottleneck Distributed Intelligent Managed Element Network Network, Storage, Virtualization, application etc. etc. Management Service Regulation Executable Instructions Parallel FCAPS* Management of Stored Program Computing Element using Signaling Channel ...mngt code... ...mngt code... ...mngt code... Stop ...mngt code... ...mngt code... ...mngt code... End-to-end distributed transaction response is no longer controlled by the individual node OS in a shared resource environment Real-time Application Management (Provisioning, Monitoring & Control) Start ...mngt code... ...mngt code... ...mngt code... ...code... ...code... ...code... Distributed Application Service Regulation Executable Instructions Managed Intelligent Computing Element Signaling & Self-Management of Node Workflow with DIME Network Management ...mngt code... ...mngt code... ...mngt code... Application (Service Component in a Distributed Workflow) ...code... ...code... ...code... Serial Processing ...mngt code... ...mngt code... ...mngt code... * Fault, Configuration, Accounting, Performance and Security (Node & Network) ...code... ...code... ...code... Hello World Hello World Service Package Executable Instructions
Service (Service Regulator and Service Package) DIMEs In A Multi-Core Server Run-time Orchestrator Linux • Proof of Concept Features • DIME Instantiation • Discovery • Workflow Orchestration • Scaling • Dynamic Reconfiguration • Fault Management DIME Sub-network Managers F C A P S F Signaling App App Network MICE MICE B A I/O A A B A B A B B F F F S S S S S P P P P P A B A B A B Parallax OS (P) Shared Memory (S) Free Memory (F) Physical Server 1 Physical Server 1 Physical Server 1 Free Memory (F) Free Memory (F) Free Memory (F) Server 1 Server 2 Server 3
DNA In A Multi-core Server The proof of concept and the secret sauce http://youtu.be/IMXxmRSVGoI • Neumann, J. v. “The General and Logical Theory of Automata” In E. b. Taub, John von Neumann Collected Works (pp. Vol 5, p259). Chicago: University of Illinois Press (1951) • George B. Dyson, “Darwin among the Machines, the evolution of global intelligence”, Helix Books, Addition Wesley Publishing Company, Inc., Reading, MA, 1997, p123.
Service Deployment Run-time Orchestrator Linux DIME Sub-network Managers F C A P S Service Component Developer (Service Creation) Network Service Control Manager (Service Assurance) Service Workflow Creator (Service Delivery) F F F Node 1 Worker 1 Node 1 Worker 2 Node 3 Worker 1 Node 3 Worker 2 Node 2 Worker 1 Node 2 Worker 2 Hello World Hello World Hello World
Lessons From Biology "The basic principle of dealing with malfunctions in nature is to make their effect as unimportant as possible and to apply correctives, if they are necessary at all, at leisure. In our dealings with artificial automata, on the other hand, we require an immediate diagnosis. Therefore, we are trying to arrange the automata in such a manner that errors will become as conspicuous as possible, and intervention and correction follow immediately." --- John von Neumann, "The General and Logical Theory of Automata", John vonNeumann Collected Works, Edited by A. H. Taub, Volume 5, p 289 (Hixon Symposium 1948) "It's very likely that on the basis of philosophy that every error has to be caught, explained, and corrected, a system of the complexity of the living organism would not run for a millisecond." --- von Neumann, Theory of Self-Reproducing Automata (1948) at the Hixon Symposium, Pasadena, California
DIME Network Architecture (DNA) for a New Generation of Many-core Computing Replication Repair Recombination Reconfiguration Questions? Stop Start SMTPS11