180 likes | 298 Views
Stamatis Vassiliadis Symposium The Future of Computing A+A=A. Mateo Valero Barcelona Supercomputing Center. To Stamatis, my loved friend. The way we all do research ... As seen from HPCA 1999. Microarchitecture idea. Applications. SPEC, PerfectClub, TPC-D, NAS, Splash …. Compiler.
E N D
Stamatis Vassiliadis SymposiumThe Future of ComputingA+A=A Mateo Valero Barcelona Supercomputing Center To Stamatis, my loved friend
The way we all do research ... As seen from HPCA 1999 • Microarchitecture idea Applications SPEC, PerfectClub, TPC-D, NAS, Splash … Compiler Production, public, custom, … Simulator Public, custom, … Results How much we get from our idea
The Past Future ... As seen from HPCA 1999 Applications Algorithms Absolutelyobsessed with going to the limits of extracting available ILP on a single core Compiler Architecture Hardware
The Past Future Continued:Advanced ILP Techniques for Superscalar Processors • Optimized Pipeline • Cache • Branch Predictors • Instruction Collapsing • Value Prediction • Reuse • Assisted/Subordinated Threads • Trace Cache/Processor • Control/Data Speculation • Kilo-instruction Processors • ………
check_issue kill_time TIMING cmmutime statistics Real_execution EXE Sbus2 breakpoint? FETCH PC guess breakpoint? fetch_next Distant Parallelism: Non-numerical applications • (In)Dependent threads: e.g. m88ksim • Application speed-up: 2.65
The “immediate” future: Number of cores doubled every 18 months “It is better for Intel to get involved in this now so when we get to the point of having 10s and 100s of cores we will have the answers. There is a lot of architecture work to do to release the potential, and we will not bring these products to market until we have good solutions to the programming problem” Justin Rattner Intel CTO Marenostrum Most beautiful supercomputer Fortune magazine, Sept. 2006 #1 in Europe, #5 in the World 100's of TeraFlops with general purpose Linux supercluster of commodity PowerPC-based Blade Servers “Now, the grains inside these machines more and more will be multi-core type devices, and so the idea of parallelization won't just be at the individual chip level, even inside that chip we need to explore new techniques like transactional memory that will allow us to get the full benefit of all those transistors and map that into higher and higher performance.” Bill Gates, Supercomputing 05 keynote
Supercomputers will likely have millions of processing cores
The “far” future (e.g. 2017) and The big question! How to solve the programming problem? a.k.a. How to program the beast? • How to enable the power of the hundreds to millions of cores on a system? • Computer Architects must adapt their thinking. From now on, parallel software requirements will directly drive systems design • We need a multidisciplinary top-down approach to this, including • Applications • Algorithms • Debugging • Programming models • Programming languages • Compilers • Operating Systems • Runtime environment … as design drivers for future Architectures
The holistic view: A + A = A How to solve the programming problem? a.k.a. How to program the beast? • How to enable the power of the hundreds to millions of cores on a system? • Computer Architects must adapt their thinking. From now on, many-core software requirements will directly drive processor design • We need a multidisciplinary top-down approach to this, including • Applications • Algorithms • Debugging • Programming models • Programming languages • Compilers • Operating Systems • Runtime environment … as design drivers Applirithms + Adhesive = Architecture
Far Future: Applications • What will be the typical applications in 2017? • Is it Dwarfs and/versus RMS the right path to follow? • Applications are ephemeral but the kernels are forever: the applications may change, the kernels stay the same. • Will streaming applications require new architectures? • Are we approaching the special purpose accelerators for specific applications? M. Valero. Microsoft Workshop on Multicore, Seattle, June-2007
Far Future: Algorithms • Bad news (for some folks): “Rethink and rewrite the algorithms” • For manycores, the algorithms need to carefully consider: • The right level of parallelism • Load Balancing • Communication-Computation overlapping • Speculation (e.g. in message passing) Source: Jack Dongarra Microsoft Workshop on Multicore, Seattle, June-2007
Top-Down CMP Design, an initial programmer wishlist • Easy-to-express paralellism • Transactional Memory (TM): Compared to locks, TM provides an easy to use mechanism for ensuring mutual exclusion • Hide all kind of non-uniformities to the programmer (heterogeneous cores, non-uniform memory access, …) • Continue using standard tools • OpenMP: the industry standard for writing parallel programs on shared memory • TM and OpenMP combines ease with familiarity for programming multi-cores • BSC-UPC-Microsoft: IWOMP07, MEDEA07 • Stanford: PACT07 • Dataflow model ideally suited to express paralelism • Cell Superscalar = Distant Parallelism+Data Flow+ Out of Order Execution • Super computers: MPI+ (OpenMP/Cell Superscalar)+TM))
Cache Cache Cache Cache Memory Memory On-die Interconnect On-die Interconnect Cache Cache Cache Cache Memory Memory Off-die Interconnect Chip organization in 2017: many-core • How many cores will the processor of 2017 have? • Will they be homogeneous or heterogeneous?. Arrays of simple in order cores, fewer complex out of order or a mix of the two? Consentry and Internet Security • Simultaneous Multithreading is just for servers? • Should we push for further optimizing classical OoO implementations or research how to put into practical use radical new approaches such as dataflow or asynchronous architectures? Microsoft Workshop on Multicore, Seattle, June-2007
Chip organization in 2017: memory and interconnection network • How will the latency and bandwidth problems be addressed? • 3D integration aware Computer Architecture: it is a great future idea. Will it will always be a great future idea? • What is the best many-core interconnect topology? • How we can evaluate the importance of the interconnection network in the applications? • What are the obstacles that are presented for parallel applications when I/O doesn't scale well? Microsoft Workshop on Multicore, Seattle, June-2007
Transactional Memory STM HTM Functional Programming model Imperative An overall picture of the Microsoft Many-core project Architecture • Programming models for futuremany-core architectures • Architectural support to programmingmodels • OpenMP+TM • HW acceleration for Haskell • Many-core architecture • Power-aware Applications
Applicationdevelopmentan tuning Fine-grain programming models Model andprototype Performance analysis andPredictionTools Load balancing Processor and node Interconnect An overall picture of the IBM MareIncognito project • Our 10-100 Petaflop research project for BSC (2010) • Port/develop applications to reduce time-to-production once installed • Programming models (MPI, OpenMP+TM, CellSs) • Tools for application developmentand to support previous evaluations • Evaluate node architecture (heavily multicored) • Evaluate interconnect options
5 Grand Challenge applications 22 groups 119 senior researchers Supercomputing and e-Science Consolider program Life Sciences Earth Sciences Compilers and tuning of application kernels Programming models and performance tuning tools Astrophysics Engineering Architecturesand hardwaretechnologies Material Sciences Strong interaction Interaction to be created
I programmingmulticores Multicore-based pacifier Education for multi-core