1 / 47

Other Applications of Dependence

Other Applications of Dependence. Allen and Kennedy, Chapter 12. Overview. So far, we’ve discussed dependence analysis in Fortran Dependence analysis can be applied to any language and translation context where arrays and loops are useful Application to C and C++

cili
Download Presentation

Other Applications of Dependence

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Other Applications of Dependence Allen and Kennedy, Chapter 12

  2. Overview • So far, we’ve discussed dependence analysis in Fortran • Dependence analysis can be applied to any language and translation context where arrays and loops are useful • Application to C and C++ • Application to hardware design

  3. Problems of C • C as “typed assembly language” versus Fortran as “high performance language” • C focuses more on ease of use and hardware operations • Post-increments, Pre-increments, Register variable • Fortran focus is on ease of optimization

  4. Problems of C • In many cases, optimization is not desired while (!(t=*p)); • Optimizers would moves p outside the loop • C++ as well as other new languages focus more on simplified software development, at the expense of optimizability • Use of new languages has expanded into areas where optimization is required

  5. Problems of C • Pointers • Memory locations accessed by pointers is not clear • Aliasing • C does not guarantee that arrays passed into subroutine do not overlap • Side-effect operators • Operators such as pre and post increment encourage a style where array operations are strength-reduced by the programmers

  6. Problems of C • Loops • Fortran loops provides values and restrictions to simplify optimizations

  7. Pointers • Two fundamental problems • A pointer variable can point to different memory locations during its use • A memory location can be accessed by more than one pointer variable at any given time, produces aliases for the location • Resulting in a much more difficult and expensive dependence testing

  8. Pointers • Without knowledge of all possible references of an array, compilers must assume dependence • Analyzing entire program to find out dependence is solvable, but still unsatisfactory • Lead to the use of compiler options / pragmas • Safe parameters • All pointer parameters to a function point to independent storage • Safe pointers • All pointer variables (parameter, local, global) point to independent storage

  9. Naming and Structures • In Fortran, a block of storage can be uniquely identified by a single name • Consider these constructs: p; *p; **p; *(p+4); *(&p+4);

  10. Naming and Structures • Troublesome structures, such as unions • Naming problem • What is the name of ‘a.b’ ? • Different sized objects to overlap same storage • Reduce references to the same common unit of smallest storage possible

  11. Loops • Lack of constraints in C • Jumping into loop body is permitted • Induction variable (if there’s any) can be modified in the body of the loop • Loop increment value may also be changed • Conditions controlling the initiation, increment, and termination of the loop have no constraints on their form

  12. Loops • Rewrite as a DO loop • It must have one induction variable • That variable must be initialized with the same value on all paths into the loop • The variable must have one and only one increment within the loop • The increment must be executed on every iteration • The termination condition must match • No jumps from outside of the loop body

  13. Scoping and Statics • Create unique symbols for variables with same name but different scopes • Static variables • Which procedures have access to the variable can be determined from the scope information • If it contains an address, then the content of that address can be modified by any other procedures

  14. Problematic C Dialects • Use of pointers rather than arrays • Use of side effect operators • Complicates the work of optimizers • Need to be removed • Use of address and dereference operators

  15. Problematic C Dialects • Requires enhancements in some transformations • Constant propagation • Treat address operators as constants and propagate them where is essential • Replace generic pointer inside a dereference with actual address • Expression simplification and recognition • Need stronger recognition within expression where variable is actually the ‘base variable’

  16. Problematic C Dialects • Conversion into array references • Useful to convert pointers into array references • Induction variable substitution • Problem with strength reduction of array references • Expanding side-effect operators also requires changes

  17. C Miscellaneous • Volatile variables • Functions with these variables are best left without optimization • Setjmp and Longjmp • Commonly used for error handling • Storing and loading current state of computation which is complex when optimization is performed and variables are allocated to registers • No optimization

  18. C Miscellaneous • Varags and stdargs • Variable number of arguments • No optimization

  19. Hardware Design: Overview • Today, most hardware design is language-based • Textual description of hardware in languages that are similar to those to develop software • Level of abstraction moving towards low level detailed implementation to high level behavioral specification • Key factor: compiler technology

  20. Hardware Design: Overview • Four level of abstraction • Circuit / Physical level • Diagrams of electronic components • Logic level • Boolean equations • Register transfer level (RTL) • Control state transitions and data transfers, timing • Synthesis: conversion from RTL to its implementation • System level • Concentrate on behavior • Behavioral synthesis

  21. Hardware Design • Behavior Synthesis is really a compilation problem • Two fundamental tasks • Verification • Implementation • Simulation of hardware is slow

  22. Hardware Description Languages • Verilog and VHDL • Extensions in Verilog • Multi-valued logic: 0, 1, x, z • x = unknown state, z = conflict • E.g. division by zero produces x state • Operations with x will result in x state -> can’t be executed directly • Reactivity • Propagation of changes automatically • “always” statement -> continuous execution • “@” operator -> blocks execution until one of the operands change in value

  23. Verilog • Reactivity always @(b or c) a = b + c; • Objects • Specific area of silicon • Completely separate area on the chip • Connectivity • Continuous passing of information • Input port and output port

  24. Verilog • Connectivity module add(a,b,c) output a; input b, c; integer a, b, c; always @(b or c) a = b + c; endmodule

  25. Verilog • Instantiation • Verilog only allows static instantiation integer x, y, z; add adder1(x,y,z); • Vector operations • Viewing other data structures as vector of scalars

  26. Verilog • Advantages • No aliasing • Restriction of form of subscripts • Entire hardware design given to compilers at one time

  27. Verilog • Disadvantages • Non-procedural continuation semantics • Lack of loops • Loops are implicitly represented by always blocks and the scheduler • Size

  28. Optimizing simulation • Philosophy • Increases level of abstraction • Opts for less details • Inlining modules • HDLs have two properties that make module inlining simpler • Whole design is reachable at one time • Recursion is not permitted

  29. Optimizing simulation • Execution ordering • The order in which the statement is executed can have a dramatic effect on the efficiency • Fast in hardware, but not in software • Grouping increases performance • Execute blocks in topological order based on the dependence graph of individual array elements • No memory overhead

  30. Dynamic versus Static Scheduling • Dynamic scheduling • Dynamically track changes in values and propagate them • Mimics hardware • Overhead of change checks • Static scheduling • Blindly sweeps through all values for all objects regardless any changes • No need for change checks

  31. Dynamic versus Static Scheduling • If the circuit is highly active, static scheduling is more suitable • In general, using dynamic scheduling guided by static analysis provides the best results

  32. Fusing always blocks • High cost of change checks motivates fusing always blocks • Output of a design may change

  33. Vectorizing always block • Regrouping low level operations back together to bring higher lever abstractions • Vectorizing the bit operations

  34. Two state versus four state • Extra overhead in four state hardware • Few people like hardware that enters unknown states • Two state logic can be 3-5x faster • Utilization of two valued logic where ever possible • Finding out part executable in two state logic is difficult • Use interprocedural analysis

  35. Two state versus four state • Test for detecting unknown is low cost, 2-3 instructions • Check for unknowns but default quickly to two state execution

  36. Rewriting block conditions always @(posedge(clk)) begin sum = op1 ^ op2 ^ c_in; c_out = (op1 & op2) | (op2 & c_in) | (c_in & op1) end always @(op1 or op2 or c_in) begin t_sum = op1 ^ op2 ^ c_in; t_c_out = (op1 & op2) | … end always @(posedge(clk)) begin sum = t_sum; c_out = t_c_out; End

  37. Basic Optimizations • Raise level of abstraction • Constant propagation and dead code elimination • Common subexpression elimination

  38. Synthesis Optimization • Goal is to insert the details • Analogous to standard compilers • Harder than standard compilers • Not targeted towards a fixed target • No single goal. Minimize cycle time, area, power consumption

  39. Basic Framework • Selection outweigh scheduling • Analogous to CISC • Body of tree matching algorithms • Needs constraints

  40. Loop Transformations for(i=0; i<100;i++) { t[i] = 0; for(j=0; j<3; j++) t[i] = t[i] + (a[i-j]>>2); } for(i=0; i<100; i++) { o[i] = 0; for(j=0; j<100; j++) o[i] = o[i] +m[i][j] * t[j] }

  41. Loop Transformations for(i=o; i<100; i++) t[i] = 0; for(i=0; i<100; i++) o[i] = 0; for(i=0; i<100; i++) for(j=0; j<3; j++) t[i] = t[i] + (a[i-j] >> 2) for(i=0; i<100; i++) for(j=0; j<100; j++) o[i] = o[i] + m[i][j] * t[j];

  42. Loop Transformations for(i=0; i<100; i++) o[i] = 0; for(i=0; i<100; i++) t[i] = 0; for(j=0; j<3; j++) t[i] = t[i] + (a[i-j] >> 2); for(j=0; j<100; j++) o[j] = o[j] + m[j][i] * t[i];

  43. Loop Transformation for(i=0; i<100; i++) { o[i] = 0; a0 = a[0]; a1 = a[-1]; a2 = a[-2]; a3 = a[-3]; for(i=0; i<100; i++) { t = 0; t = t + (a0>>2) + (a1>>2) + (a2>>2) + (a3>>2) a3 = a2; a2 = a1; a1 = a0; a0 = a[i+1]; for(j=0; j<100; j++) o[j] = o[j] + m[j][I] * t; } }

  44. Control and Data Flow • Von Neumann architecture • Data movement among memory and registers • Control flow encapsulated in the program counter and effected with branches • Synthesized hardware • Data movement among functional units • Control flow is which functional unit should be active on what data at which time steps

  45. Control and Data Flow • Wires • Immediate transfer • Latches • Values hold throughout one clock cycle • Registers • Static variables in c • Held in one or more clock cycle • Memories

  46. Memory Reduction • Memory access is slow compared to unit access • Application of techniques • Loop interchange • Loop fusion • Scalar replacement • Strip mining • Unroll and jam • Prefetching

  47. Summary • Not limited to Fortran • Have other applications • Early stage of research

More Related