1 / 45

Build GCC Cross Compiler for a Specify CPU

Build GCC Cross Compiler for a Specify CPU. Chia-Tsun Wu D92943007 tommy@access.ee.ntu.edu.tw. Outline. Introduction to SoC Motivation and project goal Design a CPU Tools are used to design CPU hardware CPU Specification CPU Design flow Simulation and Results. Outline.

gitel
Download Presentation

Build GCC Cross Compiler for a Specify CPU

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Build GCC Cross Compiler for a Specify CPU Chia-Tsun Wu D92943007 tommy@access.ee.ntu.edu.tw

  2. Outline • Introduction to SoC • Motivation and project goal • Design a CPU • Tools are used to design CPU hardware • CPU Specification • CPU Design flow • Simulation and Results

  3. Outline • Build a GCC Cross Compiler • GCC structure • Knowledge to port GCC • Build Flow • Build a GCC Cross Assembler and Cross Linker • Build a GCC Cross Compiler • A simple test program • Summary

  4. Introduction to SoC • SoC: System on a Chip. • Highly integrated include: • CPU • System Bus • Peripherals • Co-processor • ………… • Low cost, low area, high performance.

  5. Portable / reusable IP • Embedded CPU • Embedded Memory • Real World Interfaces (USB, PCI, Ethernet) • Software (both on-chip and off) • Mixed-signal Blocks • Programmable HW (FPGAs) • > 500K gates What is SOC?

  6. SOC Design Flow System Specs.. HW/SW Partitioning Hardware Descript. Software Descript. HW Synth. and Configuration Software Gen. & Parameterization Interface Synthesis Configuration Modules Hardware Components HW/SW Interfaces Software Modules HW/SW Integration and Cosimulation Integrated System System Evaluation Design Coverification System Validation

  7. Motivation and project goal • Motivation: • SoC is the major trend in recent years • CPU is one of the key kernel of SoC design • Development environment is the most important to a CPU • Goal: • Design a simple 32-bit RISC CPU • Build a cross assembler and cross linker for a specify CPU • Build a cross compiler for a specify CPU

  8. Design a CPU • Specification • 32-bit RISC based CPU • General-purpose register architecture • 32-bit (64 Gbyte) addressing • 32-bit fixed instruction length (excluding immediate data) • MSB first • Reset address 0x000ffffc • No pipeline, one instruction cycle four clock cycles • Instruction fetch • Instruction decode and Data fetch • Execution • Write back • No interrupt • No timer

  9. Registers • General purpose register R0~R15 • R13: Accumulator • R14: memory data pointer • R15: stack pointer • Program counter (PC) (0x000ffffc after reset) • Program status (PS) (Sign flag, Zero flag, oVerflow flag, Carry flag)

  10. Instruction formats • General: OP Rn1, Rn2 • OP: 8 bits • n: register number 0000: R0, 1111: R15 • Immediate: OP #data, Rn2 • OP: 8 bits • n: register number 0000: R0, 1111: R15 • #data:32 bit data • Branch: OP Addr • OP: 16 bit (low byte=0x00) • Addr: 32 bits branch address

  11. Instruction sets • ADD Rn1,Rn2 Machine code:00000000Rn1Rn2 • Rn2=Rn1+Rn2 • Flag: SZVC • ADDC Rn1,Rn2 Machine code:00000001Rn1Rn2 • Rn2=Rn1+Rn2 • Flag: SZVC • SUB Rn1,Rn2 Machine code:00000010Rn1Rn2 • Rn2=Rn2-Rn1 • Flag: SZVC • SUBC Rn1,Rn2 Machine code:00000011Rn1Rn2 • Rn2=Rn2-Rn1 • Flag: SZVC

  12. Instruction sets • LDI #data,Rn2 Machine code:00001000000Rn2#Data • Rn2=data • Flag: • MOV Rn1,Rn2 Machine code:00000101Rn1Rn2 • Rn2=Rn1 • Flag: • RET Machine code:0000011000000000 • PC=[SP--] • Flag: • JMP #Addr Machine code:0000011100000000#Addr • PC=[Addr] • Flag:

  13. Tools are used • Synposis Design Compiler • Mentor Graph ModelSim • Synposis Apollo • TSMC 0.25um standard cell libraries

  14. Design Flow CPU Specifications RTL Coding Test bench Function simulation Constrain Design compiler Test bench Gate level simulation Constrain Apollo Test bench Post layout simulation Tape out

  15. Test vectors LDI #0x0,R0 00000000000000000000010000000000 00000000000000000000000000000000 LDI #0x1,R1 00000000000000000000010000000001 00000000000000000000000000000001 LDI #0x2,R2 00000000000000000000010000000010 00000000000000000000000000000010 LDI #0x3,R3 00000000000000000000010000000011 00000000000000000000000000000011 LDI #0x4,R4 00000000000000000000010000000100 00000000000000000000000000000100 LDI #0x5,R5 00000000000000000000010000000101 00000000000000000000000000000101 LDI #0x6,R6 00000000000000000000010000000110 00000000000000000000000000000110 LDI #0x7,R7 00000000000000000000010000000111 00000000000000000000000000000111 LDI #0x8,R8 00000000000000000000010000001000 00000000000000000000000000001000 LDI #0x9,R9 00000000000000000000010000001001 00000000000000000000000000001001 LDI #0xa,R10 00000000000000000000010000001010 00000000000000000000000000001010 LDI #0xb,R11 00000000000000000000010000001011 00000000000000000000000000001011 LDI #0xc,R12 00000000000000000000010000001100 00000000000000000000000000001100 LDI #0xd,R13 00000000000000000000010000001101 00000000000000000000000000001101 LDI #0xe,R14 00000000000000000000010000001110 00000000000000000000000000001110 LDI #0xf,R15 00000000000000000000010000001111 00000000000000000000000000001111 ADD R0,R1 00000000000000000000000000000001 ADDC R2,R3 00000000000000000000000100100011 SUB R4,R5 00000000000000000000001001000101 SUBC R6,R7 00000000000000000000001101100111 MOV R8,R9 00000000000000000000010110001001 JMP 0x000000 00000000000000000000011100000000 00000000000000000000000000000000

  16. Simulation result

  17. TSMC 0.25um Area:0.35mm*mm Clock:400MHz Power:1.73mW UMC 0.18um Area:0.19mm*mm Clock:600MHz Power:1mW Synthesis results

  18. Build a GCC Cross Compiler • GCC structure • Knowledge to port GCC • Build Flow • Build a GCC Cross Assembler and Cross Linker • Build a GCC Cross Compiler • A simple test program • Summary

  19. GCC Execution

  20. The Structure of Compiler

  21. The Structure of GCC

  22. GCC Code Generation • Backend machine descriptionpattern match intermediate format (RTL). • Machine description like a template. • Machine description includes • type bit widths, memory alignment • instruction patterns, register classes • peephole optimization rules

  23. GCC Code Generation (cont’d)

  24. Example of RTL • Adds two 4-byte integer (SImode) operands. • First operand is register • Register is also 4-byte integer. • Register number is 8. • Second operand is constant integer. • Value is “123”. • Mode is VOIDmode (not given).

  25. Templates • Used for three purposes: • Generating RTL from parse tree. • Generating machine insns from RTL. • Specifying parameters about instructions. • Sample Template for RISC machine:

  26. GCC Porting and Retargeting • Porting to new machines/processors • The “Using and Porting the GCC” book and self-contained. • Done by describing machine, not how to compile for machine. • Using GCC as backend for other language • Few well-documented. • Few examples. • See GNAT、GNU Cobol、Fortran porting. • In both case, copy from similar ports.

  27. How to port GCC • In directory gcc-xxx/gcc/config/machine/ • machine.h • Contain C macros that define general attributes of the machine. • machine.md • Contain RTL expressions that define the instruction set. • Input to programs that procude .h and .c files. • machine.c • Machine-dependent functions; normally things too large to cleanly put into above two files.

  28. How to port GCC (cont’d)

  29. gcc/config--Architecture characteristic key • H A hardware implementation does not exist. • M A hardware implementation is not currently being manufactured. • S A Free simulator does not exist. • L Integer registers are narrower than 32 bits. • Q Integer registers are at least 64 bits wide. • N Memory is not byte addressable, and/or bytes are not eight bits. • F Floating point arithmetic is not included in the instruction set • I Architecture does not use IEEE format floating point numbers • C Architecture does not have a single condition code register. • B Architecture has delay slots. • D Architecture has a stack that grows upward. • l Port cannot use ILP32 mode integer arithmetic.

  30. gcc/config--Architecture characteristic key • q Port can use LP64 mode integer arithmetic. • r Port can switch between ILP32 and LP64 at runtime. (Not necessarily supported by all subtargets.) • c Port uses cc0. • p Port does not use define_peephole. • f Port does not define prologue and/or epilogue RTL expanders. • g Port does not define TARGET_ASM_FUNCTION_(PRO|EPI)LOGUE. • m Port does not use define_constants. • b Port does not use '"* ..."' notation for output template code. • d Port uses DFA scheduler descriptions. • h Port contains old scheduler descriptions. • a Port generates multiple inheritance thunks using TARGET_ASM_OUTPUT_MI(_VCALL)_THUNK. • t All insns either produce exactly one assembly instruction, or trigger a define_split. • e <arch>-elf is not a supported target. • s <arch>-elf is the correct target to use with the simulator in /cvs/src.

  31. gcc/config--Architecture characteristic key • Gcc-config.txt

  32. define_peephole • In addition to instruction patterns the `md' file may contain definitions of machine-specific peephole optimizations. • The combiner does not notice certain peephole optimizations when the data flow in the program does not suggest that it should try them. • For example, sometimes two consecutive insns related in purpose can be combined even though the second one does not appear to use a register computed in the first one. A machine-specific peephole optimizer can detect such opportunities.

  33. define_splits • Often you can rewrite the single insn as a list of individual insns, each corresponding to one machine instruction. • The compiler splits the insn if there is a reason to believe that it might improve instruction or delay slot scheduling. • Splits are evaluated after the combiner pass and before the scheduling passes • Splits optimaized the speed and instruction length • they are the perfect place to put this intelligence. • Ex: If we are loading a small negative constant we can save space and time by loading the positive value and then sign extending it.

  34. define_expand • On some target machines, some standard pattern names for RTL generation cannot be handled with single insn, but a sequence of RTL insns can represent them. • For these target machines, you can write a `define_expand' to specify how to generate the sequence of RTL. • A `define_expand' is an RTL expression that looks almost like a `define_insn'; but, unlike the latter, a `define_expand' is used only for RTL generation and it can produce more than one RTL insn. • The combiner pass only • cares about reducing the number of instructions • does not care about instruction lengths or speeds

  35. Push and pop movsi_push movsi_popmove Move movqi_unsigned_register_load movqi_signed_register_load *movqi_internal movhi movhi_unsigned_register_load movhi_signed_register_load *movhi_internal movsi movsi_internal movdi *movdi_insn movsf *movsf_internal *movsf_constant_storeSigned conversions from a smaller integer to a larger integer extendqisi2 extendhisi2 zero_extendqisi2 zero_extendhisi2 Addition add_to_stack addsi3 addsi_regs addsi_small_int addsi_big_int *addsi_for_reload Subtraction subsi3 Multiplication mulsidi3 umulsidi3 mulhisi3 umulhisi3 mulsi3 Negation negsi2 Shifts ashlsi3 ashrsi3 lshrsi3 define_insn

  36. Logical Operations andsi3 iorsi3 xorsi3 one_cmplsi2 Comparisons cmpsi *cmpsi_internal Branches beq bne blt ble bgt bge bltu bleu bgtu bgeu *branch_true *branch_false Calls & Jumps call call_value jump indirect_jump tablejump Function Prologues and Epilogues prologue epilogue return_from_func leave_func enter_func Miscellaneous nop blockage define_insn

  37. define_insn “addsi_regs” • (define_insn "addsi_regs" • [(set (match_operand:SI 0 "register_operand" "=r") • (plus:SI (match_operand:SI 1 "register_operand" "%0") • (match_operand:SI 2 "register_operand" "r")))] • "" • "add %2, %0" • ) • ;set value x chapter 9.15 p110 • ; value=x • ; (plus:m x y) • ; x+y with carry out in mode m

  38. define_insn “addsi_regs” (cont’d) • ; (mach_operand:m n predicate constraint) chapter 10.4 p131 • ; if condition(predicate) is true then return n • ; n count from 0 • ; for each number n, only one match_operand expression • ; predicate is a name of C function call. return 0 when failed • ; general_operand: check the operand is either a constant, a register, or a memory reference • ; register_operand: check the operand is register or not • ; immediate_operand: check the operand is immediate data or not • ; constraint: describes one kind of operand that is permited • ; r: register • ; m: any kind of memory operand • ; o: only offsetable memory operand • ; V: only not offsetable memory operand • ; <: memory operand with autodecrement addressing • ; >: memory operand with autoincrement addressing • ; i: immediate integer operand • ; 0~9: an operand that matches the specified operand number is allowed.

  39. Build a GCC Cross Compiler Machine Description Configure GCC Configure Binutils Make Make Make install Make install GCC compiler

  40. Build a GCC Cross Assembler and Cross Linker • Binutils: Ver 2.14 • Configure --target=fr30-elf –prefix=dir • Make • Make install

  41. Build a GCC Cross Compiler • GCC: ver 3.3.1 • ../configure --target=fr30-elf --prefix=dir --enable-languages=c • Make • Make install

  42. A simple c to test cross compiler int test(int i,int j,int k) { int a; int b; a=49999999; b=39999999; a+=k; b+=j; a++; b--; i += a + b; return i; } • fr30-elf-gcc –S –O2 t.c

  43. A simple c to test cross compiler (cont’d) • .file "t.c" • .text • .p2align 2 • .globl test • .type test, @function • test: • mov r4, r2 ;00000000000000000000010101000010 • ldi:32 #50000000, r4 ;00000000000000000000010000000100 ;10111110101111000010000000 • ldi:32 #39999998, r1 ;00000000000000000000010000000001 ;10011000100101100111111110 • add r6, r4 ;00000000000000000000000001100100 • add r5, r1 ;00000000000000000000000001010001 • add r1, r4 ;00000000000000000000000000010100 • add r2, r4 ;00000000000000000000000000100100 • ret • .size test, .-test • .ident "GCC: (GNU) 3.3.1 (cygming special)"

  44. A simple c to test cross compiler (cont’d)

  45. Summary • Study RTL is more important than study MD. • Build cross assembler and cross linker before build cross compiler. • There are few data to port GCC as a cross compiler • Modify an existing MD is easier than to create a new one. • “The main goal of GCC was to make a good, fast compiler for machines in the class that the GNU system aims to run on: 32-bit machines that address 8-bit bytes and have several general registers.” -- Richard Stallman. • It seems that to design a new CPU is easier than to build a cross compiler for a GIEE studient. • http://gcc.gnu.org

More Related