110 likes | 205 Views
Future mass a pps reflect a concurrent world. Exciting applications in future mass computing market represent and model physical world. Traditionally considered “supercomputing apps” or super-apps.
E N D
Future mass apps reflect a concurrent world • Exciting applications in future mass computing market represent and model physical world. • Traditionally considered “supercomputing apps” or super-apps. • Physiological simulation, Molecular dynamics simulation, Video and audio manipulation, Medical imaging, Consumer game and virtual reality products • Attempts to grow current architectures “out” or domain-specific architectures “in” lack success; a more broad approach to cover more domains is promising
MPEG Encoding Parallelism • Independent IPPP sequences • Frames: independent 16x16 pel macroblocks • Localized dependence of P-frame macroblocks on previous frame • Steps of macroblock processing exhibit finer grained parallelism, each block spans function boundaries
Building on HPF Compilation: what’s new? • Applicability to mass software base - requires pointer analysis, control flow analysis, data structure and object analysis, beyond traditional dependence analysis • Domain-specific, application model languages • More intuitive than C for inherently parallel problems • increased productivity, increased portability • Will still likely have C as implementation language • There is room for a new app language or a family of languages • Role for the compiler in model language environments • Model can provide structured semantics for the compiler, beyond what can be derived from analysis of low-level code • Compiler can magnify the usefulness of model information with its low-level analysis
Pointer analysis: sensitivity, stability and safety Fulcra in OpenIMPACT [SAS2004, PASTE2004] and others Improved efficiency increases the scope over which unique, heap-allocated objects can be discovered Improved analysis algorithms provide more accurate call graphs (below) instead of a blurred view (above) for use by program transformation tools
Thoughts from the VLIW/EPIC Experience • Any significant compiler work for a new computing platform takes 10-15 years to mature • 1989-1998 initial academic results from IMPACT • 1995-2005 technology collaboration with Intel/HP • 2000-2005 SPEC 2000, Itanium 1 and 2, open source apps • This was built on significant work from Multiflow, Cydrom, RISC, HPC teams • Real work in compiler development begins when hardware arrives • IMPACT output code performance improved by more than 20% since arrival of Itanium hardware – and much more stable • Most apps brought up with IMPACT after Itanium systems arrived: debugging! • Real performance effects can only be measured on hardware • Early access to hardware for academic compiler teams crucial and must a priority for industry development team. • Quantitative methodology driven by large apps is key • Innovations evaluated in whole system context
Heavyweight loops How the next-generation compiler will do it (1) • To-do list: • Identify acceleration opportunities • Localize memory • Stream data and overlap computation • Acceleration opportunities: • Heavyweight loops identified for acceleration • However, they are isolated in separate functions called through pointers
How the next-generation compiler will do it (2) Initialization code identified Large constant lookup tables identified • To-do list: • Identify acceleration opportunities • Localize memory • Stream data and overlap computation • Localize memory: • Pointer analysis identifies indirect callees • Pointer analysis identifies localizable memory objects • Private tables inside accelerator initialized once, saving traffic
How the next-generation compiler will do it (3) Constant table privatized Summarize output access pattern Summarize input access pattern • To-do list: • Identify acceleration opportunities • Localize memory • Stream data and overlap computation • Streaming and computation overlap: • Memory dataflow summarizes array/pointer access patterns • Opportunities for streaming are automatically identified • Unnecessary memory operations replaced with streaming
How the next-generation compiler will do it (4) • To-do list: • Identify acceleration opportunities • Localize memory • Stream data and overlap computation • Achieve macropipelining of parallelizable accelerators • Upsampling and color conversion can stream to each other • Optimizations can have substantial effect on both efficiencyand performance
Memory dataflow in the pointer world Array of constant pointers • Arrays are not true 3D arrays (unlike in Fortran) • Actual implementation: array of pointers to array of samples • New type of dataflow problem – understanding the semantics of memory structures instead of true arrays Row arrays never overlap