250 likes | 371 Views
Architecture. Compiler. Crosscutting Issues: The R ô le of Compilers. Architects must be aware of current compiler technology. Front End. High-level Optimisations. Global Optimiser. Code Generator. Modern Compilers. E.g. procedure inlining, loop transformations. Register allocation.
E N D
Architecture Compiler Crosscutting Issues: The Rôle of Compilers • Architects must be aware of current compiler technology
Front End High-level Optimisations Global Optimiser Code Generator Modern Compilers E.g. procedure inlining, loop transformations Register allocation Machine dependent optimisations
Compiler Technology • Multiple passes complicate matters • E.g. common subexpression elimination must assume that a register will be allocated for the temporary value • E.g. Procedure inlining before size is known • Register allocation is critical • Uses graph colouring techniques • Requires at least 16 registers to be effective
Architectural Issues • How are variables allocated and addressed? • Stack: local variables, scalars • Global data area: global variables, constants, arrays • Heap: dynamic objects, not scalars • How many registers are needed? • Integer: 26 registers • FP: 20 registers
Aiding Compiler Writers • Architectures should: • Be regular (orthogonal instruction set) • Provide primitives, not solutions • Simplify trade-offs among alternatives • Not require run-time interpretation of data known at compile-time • VAX CALLS Keep it simple!
Compiler Support for Multimedia Instructions • SIMD instructions act on multiple smaller data items in a large “word” • Solutions, not primitives! • Too few registers! • Data types not found in programming languages! Result: Only used by low-level graphics libraries.
Multimedia Instructions • These SIMD instructions act like a “mini-vector” architecture • E.g. MMX in 64 bits • 8 × 8-bit vectors • 4 × 16-bit vectors • 2 × 32-bit vectors • SSE: 128 bits • Much more limited than genuine vector processors
Putting It All Together: MIPS • 64-bit load/store design • RISC features: • GPR, load-store architecture • Small, simple instruction set • Designed for efficient pipelining (fixed length instructions) • Efficient compiler target
MIPS • 32 64-bit integer registers • R0…R31 • R0 fixed: 0 • 32 64-bit or 32-bit floating point registers • Supports “paired single” operations
MIPS Data Types • Integer: • Bytes, 16-bit halfwords, 32-bit words, 64-bit double words • Operations are all 64-bit • Floating point: • 32-bit and 64-bit
MIPS Addressing Modes • Only immediate and displacement • 16-bit displacements/immediates • Register-indirect: set displacement = 0 • 16-bit absolute: use R0 • Byte addressable with 64-bit addresses • Big-endian or little-endian • Alignment required
6 5 5 16 I-type opcode rs rt immediate 6 5 5 5 5 6 R-type opcode rs rt rd shamt funct 6 26 J-type opcode offset MIPS Instructions • Three instruction formats:
MIPS Operations • Load-store • ALU operations • Add, subtract, multiply, divide, and, or, xor, LUI (load upper immediate), shifts • Control transfer • Set conditions • Branch (reg=0, reg0, reg1=reg2, reg1reg2), jump, jump-and-link (call) • Conditional move • Floating point • Paired single operations • Multiply-add (DSP)
MIPS: Instruction Usage • Integer applications: • Load, add, branch, store, or, compare • FP applications: • Add (int), load (int), load, multiply, add, store Figure 2.34.
Another View: Trimedia Media Processor • Embedded processor for multimedia applications • E.g. set-top boxes (decoders, etc.) and TVs • Very different architecture • 128 32-bit registers (FP or int) • Partitioned (SIMD) instructions • 2’s complement and saturating arithmetic • VLIW architecture
Trimedia: VLIW Approach • Compiler can group up to five instructions for simultaneous execution • Must be independent • Use NOPs if there are insufficient independent instructions • Large program size • Trimedia uses memory compression • Programs are 2-3 times larger than MIPS (even with compression)!
Fallacies and Pitfalls • Pitfall: Designing a “high-level” instruction set to support HLL’s • Seldom provide an exact match • Often too general (VAX CALLS)
Fallacies and Pitfalls • Fallacy: There is such a thing as a typical program • Programs vary very significantly • Pitfall: Designing an architecture to reduce code size without considering compilers • Compilers have much greater impact on code size • Start with densest compiled code
Fallacies and Pitfalls • Pitfall: Expecting good compiled performance for DSPs • Hand-tuned assembler is faster and more compact • Fallacy: An architecture without flaws cannot be successful • 80x86! • Segments, accumulators, stack-based FP
Fallacies and Pitfalls • Fallacy: You can design a flawless architecture • All designs have trade-offs • VAX code size more important than easy decoding • Early RISCs: delayed branches • Address space
2.15. Concluding Remarks • 1960’s: Stack architectures • Matched the compiler technology of the day • 1970’s: CISC era • Tried to support HLL features in hardware • Today: RISC era • Simple, load-store architectures
Concluding Remarks • Trends in the 1990’s: • Move to 64 bits • Conditional instructions • Eliminating branches • Optimisation of cache access (prefetch instructions) • Support for multimedia • Faster floating point
The Future • Trend towards VLIW architectures • Increased use of conditional execution • Blending of general-purpose and DSP architectures • Emulating 80x86 architecture