140 likes | 159 Views
Explore the timeline and advancements of GJC's array processors from the 1960s to the 1990s presented at SC2000 Masterworks. Learn about the development of revolutionary computing systems focusing on interactive scientific computing. Dive into the internal architectures and computing milestones that have shaped the field.
E N D
Evolution of Glen Culler’s Architectures for Interactive Scientific Computing David Culler SC2000 Masterworks Nov. 7, 2000
Plan for this session • Evolution of GJCs Array Processors • “Internal architecture that expresses algebra of bilinear forms” • Video of GJC presentation at ACM Conference on the History of the Personal Computer GJC Evolution
Timeline 1985 Culler-7 MP AP Unix Computer Sever 1963 UCSB On-Line Computer Classroom 1976 UCLA Plasma Simulation System 1990 Star 910/VP Vector Workstation 1970 MP32A Sonor Signal Processor 1961 RW-400 Culler-Fried System 1972 CHI AP120 Array Processor 1981 CHI-5 General Purpose AP 1982 Motorola Single Chip APU 1986 Personal Supercomputer 1966 IBM 360 On-Line system 1969 Culler-Harrison Inc 1954 Ramo Wooldridge 1975 FPS AP120B 1974 CHI AP90B 1964 Teleputer 1951 RadLab 1979 LPCAP 1959 UCSB GJC Evolution
MP32A - 1970 • 16-bit fixed-point processor @ 6 MHz • Multiple operations per microinstruction • 28-bit instructions • 2 cycle multiply • Parallel memories • 64-word scratch pad • 512-word fixed + 64-word writable instruction memory • 64KW instruction & data memory • SONAR signal processing GJC Evolution
AP120: 1972-73 • CHI Serial 1 • DARPA acoustical research center at Moffett Field GJC Evolution
AP120 (CHI Serial 2) – 1974 • Constructed to perform signal analysis and speech compression • Used for real-time digital speech transmission on ARPA net • with SRI, Lincoln labs, ISI • basis for Floating-Point Systems AP120b GJC Evolution
Floating Point Systems AP120B (1975) • 6 MHz (167 ns), 38-bit floating point, 64-bit instructions • Independent floating Add (2 stage) and Mult (3 stage) – peak 12 MFLOPS • Memories • Two 32-word data pad (DX, DY) – 2 per cycle • 2560 word fixed table memory – 1 per cycle, 2 cycle delay • 64KW data memory – ½ per cycle, 3 cycle delay • 512 word instruction memory • Two blocks of 32 word accumulators (dx, dy) • Address indexing & counting (SPAD & ALU) GJC Evolution
UCLA Plasma Simulation Interactive (PSI) System - 1976 • MP32A: Scheduling and Control • FPS AP120B: most calculation • 6 MHz, parallel pipelined Multiply Add • four CHI IOPs: data movement • Fixed pgm microprcessors, 4 way xfer at ¼ MW/s • Math System Language interactive interpreter • 2-1/2 D Million Particle Simulation: 6 MFLOPS “out of core” • 3D MagnetoHydordynamics @ 4 MFLOPS • Particle-by-particle or grid-by-grid, not vectorization • 4x IBM 360/91 at 1/160th the cost GJC Evolution
LPCAP - 1979 • 12/24-bit fixed point speech processor • Statistical models of speech • Linear Predictive Coding • Very small form factor (large shoe box) • Used in ground-air comm GJC Evolution
CHI-5 General-Purpose AP (1980) • 16/ 32/ 48-bit fixed point speech processor with parallel memories • Stand-alone or hosted operation • Very fast macro-micro dispatch • Program sequencer (80-bit x 3KW) • Three 16-bit adders (linked to form 32 or 48) • Parallel storage • Four accumulators • 16/32 bit main memory + 16 address registers • Two 1024x16-bit array memories • 32-bit ROM table memory • Extensive bussing • Host block transfer, A/D D/A 8 KHz, Serail ports GJC Evolution
Motorola APU: 1982 • 3 micron CMOS platinum silicide, 4 MHz, 100 pin • 16 MHz multiplexed instruction port (78-bit instr) • 30.5 K transistors, 296x305 mils • 20 16-bit data buses, 184 control lines • 16/32-bit fixed or floating point array and signal processor • Data arithmetic processor • 1 Multiply, 3 Add, 4 accum, multiplier storage • Array memory address controllers • 2D 9-point stencil matrix addressing • External X, Y, R busses • Control => Micro-nets of array processors GJC Evolution
Culler-7 (1985 – PC AT) • 2-16 MFLOPS Linpack @ 250K$ - 1 M$ • Bipolar TTL • 1-4 Computer Processors + Kernel Proc • A, XY, & D machine per processor • Dual 64-bit data busses • 96-bit instructions (48 A, 48 XY) • Memories • Kernel memory (2 MB) • Global Data memory (5-42 MB, 32-bit VAS) • Program memory (256 KB real, 32 MB virtual) • Array memory – 4 x 16 KB GJC Evolution
Personal Supercomputer (1986) • ¼ Cray 1S under 100K$ (< 6k$ PER mips) • PC-AT / Sun 3 days • 3-4 mflops DP linpack (387 does 0.02) • 200-bit wide instruction • Multiple levels of parallelism • Multiple processors • XY and A Machines per processor • Multiple operations per instruction in each • Very high delivered/peak GJC Evolution
Star 910/VP (1990) • 40 MHz Sparc (cypress chip-set) • TI 8847 CMOS vector processor • 80 MFLOPS SP, 160 MFLOPS SP • Vector DMA, Vector Cache • 1.3 GB/s • 320 MB/s shared memory system • 18 MFLOPS Linpack for 200K$ GJC Evolution