1 / 10

Distributed Processing and Large-Scale System Engineering for AGI

David Hart Axogenic Pty Ltd / Novamente LLC 10 slides - 20 minute talk Artificial General Intelligence Research Institute Workshop 20 May 2006. Distributed Processing and Large-Scale System Engineering for AGI.

farren
Download Presentation

Distributed Processing and Large-Scale System Engineering for AGI

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. David Hart Axogenic Pty Ltd / Novamente LLC 10 slides - 20 minute talk Artificial General Intelligence Research Institute Workshop20 May 2006 Distributed Processing and Large-Scale System Engineering for AGI

  2. “After exhaustive analysis Cray Inc. concluded that, although multi-core commodity processors will deliver some improvement, exploiting parallelism through a variety of processor technologies using scalar, vector, multithreading and hardware accelerators (e.g., FPGAs or ClearSpeed co-processors) creates the greatest opportunity for application acceleration.” — Christopher Lazou, HiPerCom “Distributed systems need radically different software than centralized systems do.” — Andrew S. Tanenbaum “Anybody who tells you that distributed algorithms are "simpler" is just so full of sh*t that it's not even funny.” — Linus Torvalds, 2006

  3. Distributed Computing Benefits & Downsides • Low Hardware Cost • Low cost of entry, and low cost to grow organically • Great Flexibility • Heterogeneous infrastructures relatively easily integrated(i.e. instruction sets, network hardware, etc.) • Performance & Scaling Issues • Limited by interconnect and distributed algorithms (i.e. ability to keep pipelines full on scaled-up hardware; many tools and metrics exit to measure performance on parallel systems) • Increased Software Complexity • Programming for parallelism is more difficult, and creates more complicated code (e.g. debugging is a longer, more difficult process)

  4. Multiple [in]dependent nodes [Non] shared memory systems(global memory address space) Node Operating Systems & Resource Managers Bandwidth / Latency constraints Simulation-friendly onsmall single or multi-processor For theory, see also: Flynn's taxonomy (SISD, MISD, SIMD, MIMD), Parallel Random Access Machine (PRAM) Parallel Programming and Distributed Processing- Overview - • Levels of Software Parallelism • Implicit/extracted • Thread • Process • Application • Levels of Hardware Parallelism • Implicit/extracted • Multi-processor (SMP or NUMA) • Distributed Processing • Specialized hardware of various kinds may fit anywhere in this stack

  5. Tightness of Coupling(overlayed hardware & software examples) • Single - shared-memoryOS balances processes across processor/memory groups • PC (single or multiprocessor) • Server (multiprocessor) example: UNIX Server • Supercomputer (traditional) example: Cray X1E • Distributed - distributed-memory; may be geographically distributedOS or distributed control software balances processes across nodes • SSI Cluster (single-system-image with distributed-shared-memory)examples: SGI Altix, OpenSSI, openMosix • Supercomputer Class HPC (High Performance Cluster)examples: Deep Blue/Gene, ASC Purple, Earth-Simulator, Cray XD1 (OctigaBay) (many new designs moving to more tightly coupled SSI/DSM model) • Beowulf Class Cluster home-grown, from commodity PCs • Grid systems: Sun Grid, Folding@home, SETI@home, etc.

  6. SSI Virtual large SMP machine 'fork and forget' Message Passing Interface (MPI) Portable and fast C, C++ and Fortran bindings Parallel Virtual Machine (PVM) C, C++ and Fortran bindings Grid Distributed Resource Management Application API (DRMAA) Practical Techniquesfor Parallel & Distributed Computing • Custom / Low Level (techniques used internally by SSI, MPI, PVM ,etc.) • Inter-process Communication (IPC) e.g. via sockets, shared memory or Remote Direct Memory Access (RDMA) • Client-server, e.g. via sockets

  7. The need for dynamic resource management for efficient use of hardware resources applies to all distributed architectures Tools are still maturing Home-grown tools option Multi-parallel approach with Integrated resource management • Selection of techniques must be careful given high cost of design & code refactoring • Cost/risk assessment, modular systems with independent development teams, and organic growth makes mix-and-match of software techniques and hardware platforms inevitable • Many other industries will find best-of-breed combinations

  8. Distributed computing applied to AGI(Novamente-specific select examples) • Primary “full cognitive unit” (monolithic AtomSpace) operations(execution in a single large shared-memory computer environment) • AttentionAllocation, interactive ShemaExecution, ongoing reasoning, goal/feeling evaluation, etc. • Distributed operations • Secondary “full cognitive units”(tightly coupled to primary unit) • Pre-processing sensory or language input, enacting output (e.g. 'motor control' or virtual/physical system control) • Pattern mining units utilizing copy/subset of primary AtomSpace(tightly coupled cluster or loosely coupled grid) • Map Encapsulation, Long-term importance, procedure mining • GP & Logic units: MOSES/BOA, batch-PLN (tightly and loosely coupled) • used by pattern mining and other units; co-processor use possible • Stand-alone units(loosely coupled) • Pre-processing text and other non-realtime input

  9. Select software & hardware resources • ClusterKnoppix (liveCD) utilizing OpenMosix (single-system image Linux)http://clusterknoppix.sw.be/ | http://openmosix.sourceforge.net/ • OpenSSI for KNOPPIX and other Linuxes (single-system image Linux)http://openssi.org/ • ParallelKnoppix (LiveCD) utilizing MPI (openMPI) and/or PVMhttp://idea.uab.es/mcreel/ParallelKnoppix/ | http://www.open-mpi.org/ • Red Hat Cluster Suitehttp://www.redhat.com/software/rha/cluster/ • SunGrid | GridEngine (subscription batch job service at $1/CPU/hr)http://www.network.com/ | http://gridengine.sunsource.net/ • FPGA Co-processors: Clearspeed and DRChttp://www.clearspeed.com/ | http://www.drccomputer.com/

  10. HypergraphDB Borislav Iordanov Datatainer Google developing portable/interchangeable mini data centers for use at 300 Internet peering points globally The Future: Distributed Knowledge Representation & Storage • Centralized knowledge storage used in AGI systems of today • Distributed database technology immature compared with distributed computing • Distributed knowledge storage requires new architectures • RDBMS or something new?

More Related