600 likes | 721 Views
Retarget Open64 to an Embedded CPU A practice for automatic approach. SS&SE Group (System Software & Software Engineering ) Department of Computer Science and Technology Tsinghua University. Outline. Background and Motivation. Overview of the Current Design. Prototype System.
E N D
1/60 Retarget Open64 to an Embedded CPU A practice for automatic approach SS&SE Group (System Software & Software Engineering ) Department of Computer Science and Technology Tsinghua University
2/60 Outline • Background and Motivation • Overview of the Current Design • Prototype System • Perspective • Acknowledgment
3/60 Background and Motivation • Why Open64 • Not Difficult retarget manually • Based on a short procedure guideline and some guidance, one of our students had made a preliminary retarget of Open64 to PowerPC within 6 weeks • Work is still laborious and tedious changes are scattered in many places wrong results due to erroneous changes are hard to find for a new developer
4/60 Background and Motivation • Why Open64 • Good Research Platform for Automatic Retarget and Computer Architecture • High level IR (WHIRL) is machine independent. • Performance of code generated is already of high quality right after retarget • There is no contribution in open source to explore automatic retarget. It brings an “clean” platform to research on computer architecture and ISA enhancement
5/60 Background and Motivation • Our Current Practice • Objective • Explore a reasonable solution to automatic retarget for Open64 without changing the current CG framework • Experience a realistic new target CPU (we chose PowerPC) • Seek more opportunities in research about automatic retarget (software engineering, machine description, etc.)
6/60 Background and Motivation • Our Current Practice • Status • A preliminary solution to automatic retarget (exercised with PowerPC) Overview of the Current Design follows • A Prototype system Prototype System will be discussed later
7/60 Overview of the Current Design • Principle of Current Design • Keep the basic structure unchanged • Determine automatable part incrementally • Make machinedescription as abstract as possible
8/60 Overview of the Current Design • Flowchart of Code Generator • From Tutorial on the SGI Pro64 Compiler Infrastructure by Gao et. al., PACT 2000
9/60 Overview of the Current Design • Targeting Pro64 to a New Processor • From Tutorial on the SGI Pro64 Compiler Infrastructure by Gao et. al., PACT 2000
10/60 Overview of the Current Design • Automation retarget approach • Generate target information including ISA information and some ABI information from machine description automatically • Produce expanding code automatically by using Olive tool (Steve Tjiang) as the code-generator generator
11/60 Overview of the Current Design • Machine Description • Regular Target Information (ISA, ABI, etc. to generate TARG_INFO) • Tree Patterns for WHIRL Operators (to generate Olive rules) • Others • Information for other retargetable part • Abstract model for processor properties (to be developed)
12/60 Overview of the Current Design • Design of Prototype System Machine Description C source programs to collect target information Regular target information Tree patterns ParserA ParserB C source programs to perform code generation Framework for Olive rules Olive rules Complete manually Code Generator Generator
13/60 Overview of the Current Design • Regular Target InformationDescription • ISA information • Registers, Operators, Operands, … • ABI information • Calling convention, …
14/60 Overview of the Current Design • Regular Target InformationDescription • Example {SECTION "architecture" ARCH = "PPC32"; END} {SECTION “registers“ …… END}{SECTION “operands“ …… END}…… {SECTION "abi_properties" …… END} ……
15/60 Overview of the Current Design • Files Produced from Machine Description • By Regular Target Information(ParserB) • isa_registers.cxx, isa_operands.cxx, isa_subset.cxx, isa_bundle.cxx, isa_decode.cxx, isa_enums.cxx, isa_print.cxx, isa_properties.cxx, isa_pseudo.cxx, isa_hazards.cxx, isa_lits.cxx, isa_pack.cxx, isa.cxx, (under ../common/targ_info/isa/) • abi_properties.cxx (under ../common/targ_info/abi/) • proc_properties.cxx, proc.cxx, (PPC specific)ppc_si.cxx (under ../common/targ_info/proc/) /*To do*/
16/60 Overview of the Current Design • Produce expanding code automatically • Olive tool • Code generator generator • A follow-up to Aho, Ganapathi & Tjiang's TWIG [TOPLAS89] • Generate C source program to perform optimal instruction selection ( the program implements dynamic programming algorithm with cost function, performing tree pattern matching and graph covering )
17/60 Overview of the Current Design • Produce expanding code automatically • Grammar for Olive Rules rulenonterm tree [cost] action tree term ( tree_list ) term nonterm treelist tree_list , child child child tree _ cost C-code C-expr action C-code
18/60 Overview of the Current Design • Produce expanding code automatically • Expand WHIRL to TOP • Produce the expander by Olive • Input VL-WHIRL tree to the expander (Very Low WHIRL, some registers are exposed) • The expander produces TOP instruction sequence equivalent to the input WHIRL tree semantically (TOP CGIR-level abstraction)
19/60 Overview of the Current Design • Produce expanding code automatically • Expand WHIRL to TOP • Only expand expressions in the current design • Why not expanding the whole tree? Tradeoff benefit change proportion of original CG structure how easy in writing Olive rules • To investigate further on this in the future
20/60 Overview of the Current Design • Produce expanding code automatically • 2-Stage Editing for Olive rules • Stage 1: Abstract description of Olive rules (tree patterns) which will produce the framework used in the next stage • Stage 2: Fill uncompleted Olive rules in the framework description for the specific target
21/60 Overview of the Current Design • 2-Stage Editing for Olive rules • Stage 1 • Ex.1 Abstract description of a special Olive rule #reg : I4ADD(reg, reg) (I4ADD res(0, reg, int32); src(0, reg, int32); src(1, reg, int32); => "add res(0) src(0) src(1)" 1 ) Cost (count of cycles)
22/60 Overview of the Current Design • 2-Stage Editing for Olive rules • Framework Description Produced by ParserA • Ex.1 A Olive rule automatically produced by the special Olive rule above reg : I4ADD(reg, reg) { $cost[0].cost = 1 + $cost[2].cost + $cost[3].cost; } = { $action[2](ops); $action[3](ops); $0->result = Build_TN_Of_Mtype (WN_rtype($0->wn)); Build_OP(TOP_add, $0->result, $2->result, $3->result, ops); }
23/60 Overview of the Current Design • 2-Stage Editing for Olive rules • Stage 1 • Ex.2 Abstract description of a general Olive rule (for PowerPC) # reg : I4F8TRUNC(f8reg) ( => )
24/60 Overview of the Current Design • 2-Stage Editing for Olive rules • Framework Description Produced by ParserA • Ex. 2 A Olive rule automatically produced by the general Olive rule above (which is an uncompleted Olive rule) reg : I4F8TRUNC(f8reg) { } = { }
25/60 Overview of the Current Design • 2-Stage Editing for Olive rules • Stage 2 • Complete uncompleted Olive rules
26/60 Overview of the Current Design • Files Produced from Machine Description • By Olive Rules • Update Expand_Expr( ) (under ../be/cg/whirl2ops.cxx) • Replace expand.cxx, exp_loadstore.cxx, exp_divrem.cxx, exp_branch.cxx, etc. (under ../be/cg/ppc32/, where ppc32 is target specific)
27/60 Prototype System • Prototype for Retargeting to PowerPc • Connect the Machine Description Get regular target information from the machine description and distribute them into source trees (in proper form) • Expand WHIRL to TOP Expander is produced automatically by the Olive tool, to which specific Olive rules is input
28/60 Prototype System • Description for Regular Target Information • ISA and ABI Information Syntax definition reflects directly the data organization in source code where these information is processed To be further improved in the future • Connecting to the Compiler The parser, produced by YACC, translates these information to C programs, then connected to the compiler by Makefile
29/60 Prototype System • Examples: Target InformationDescription • ISA and ABI Information {SECTION "architecture" ARCH = "PPC32"; END} {SECTION "isa_list“ isa = add, add_i, adds, addl, …END}
30/60 Prototype System • Examples: Target InformationDescription • ISA and ABI Information {SECTION "operand“ #name=size,type,lit_class Literal_Type={ simm16=16,SIGNED,LC_simm16; uimm16=16,UNSIGNED,LC_uimm16; uimm5 =5, UNSIGNED,LC_uimm5; } Register_Type={ …… } Enum_Type={ ……} Use_Type={ ……} Instruction_Group={ ……} END}
31/60 Prototype System • Examples: Target InformationDescription • ISA and ABI Information {SECTION "registers" # registers definition # isa_register_class definition NAME = "integer", BIT_SIZE = 32, CAN_STORE = true, MULTIPLE_SAVE = false; ……# isa_register_set definition RCLASS = rc_integer, MIN_REGNUM = 0, MAX_REGNUM =31, …… END}
32/60 Prototype System • Examples: Target InformationDescription • ISA and ABI Information {SECTION "abi_properties" #ABI properties definition (integer, ABI_PROPERTY) = { {……}; # list of integer registers (REG_LOW_BOUND, REG_UPPER_BOUND) = (0, 31); ALLOCATABLE(0, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 31, -1) CALLEE(1, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, -1) CALLER(0, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, -1) FUNC_ARG(3, 4, 5, 6, 7, 8, 9, 10, -1) FUNC_VAL(3, 4, -1) STACK_PTR(1, -1) FRAME_PTR(1, -1) GLOBAL_PTR(13, -1)} (float, ABI_PROPERTY) = { … } … END}
33/60 Prototype System • Expand WHIRL to TOP • Interface to Olive • Example Rules Specific to PowerPC
34/60 Prototype System • Interface to Olive • Costs typedef struct COST { int cost;} COST; static COST COST_INFINITY = { MAX_INT16 }; static COST COST_ZERO = { 0 }; #define COST_LESS(x,y) ((x).cost < (y).cost)
35/60 Prototype System • Interfacing to Olive • Trees typedef struct burm_state * STATE;typedef struct olive_node * NODEPTR;typedef struct olive_node * TREE; #define GET_KIDS(r) ((r)->get_kids())#define OP_LABEL(r) ((r)->op_label())#define STATE_LABEL(r) ((r)->state_label())#define SET_STATE(r,s) (r)->set_state(s)
36/60 Prototype System • Interfacing to Olive • Tree Nodes struct olive_node{ OPCODE opcode; OPERATOR opr; TOP top; int num_opnds; WN * wn; WN * parent; INTRINSIC intrn_id; TN * result; TN * opnd_tn[OP_MAX_FIXED_OPNDS]; NODEPTR kids[OP_MAX_FIXED_OPNDS]; STATE state; int opc; olive_node(WN * w, WN * p, TN * res, INTRINSIC iid); virtual ~olive_node() ; void set_state(STATE s) { state = s; } STATE state_label() { return state; } NODEPTR* get_kids() { return kids; } int op_label() { return opc; } void Print() { /* printf("WN\n%s\n", dump_wn(wn));*/ } };
37/60 Prototype System • Example Rules Specific to PowerPC • Classification of PowerPCOperators • Integer (arithmetic/compare/logical/rotate/shift) • Floating-point (arithmetic/multiply-add/rounding and conversion/compare/status and control register/move) • Load/Store (integer/floating-point/integer byte-reverse /integer multiple/string) • Branch (unconditional/conditional/conditional to LR/conditional to CTR) • Misc (system call/trap/ condition register logical)
38/60 Prototype System • Example Rules Specific to PowerPC • Load/Store reg : I4I4LDID // Integer load { $cost[0].cost = 3; // Cycles } = { $0->result = Build_TN_Of_Mtype (WN_rtype($0->wn)); Handle_Load($0->wn, $0->result, TOP_lwz, ops); } static TN * Handle_Load(WN * , TN *, TOP, OPS *);
39/60 Prototype System • Example Rules Specific to PowerPC • Load/Store null : I4STID(reg)// integer store { $cost[0].cost = 3 + $cost[2].cost; } = { $action[2](ops); $0->result = $2->result; Handle_Store($0->wn, $0->result, TOP_stw, ops); } static void Handle_Store(WN * , TN *, TOP, OPS *);
40/60 Prototype System • Example Rules Specific to PowerPC • Load/Store f4reg : F4F4LDID // floating-point load { $cost[0].cost = 4; } = { $0->result = Build_TN_Of_Mtype (WN_rtype($0->wn)); Handle_Float_Load($0->wn, $0->result, TOP_lfs, ops); } static TN * Handle_Float_Load(WN * , TN *, TOP, OPS *);
41/60 Prototype System • Example Rules Specific to PowerPC • Load/Store null : F4STID(f4reg) // floating-point store { $cost[0].cost = 4; } = { $action[2](ops); $0->result = $2->result; Handle_Float_Store($0->wn, $0->result, TOP_stfs, ops); } static void Handle_Float_Store(WN * , TN *, TOP, OPS *);
42/60 Prototype System • Example Rules Specific to PowerPC • Call null : I4CALL { $cost[0].cost = 2; } = { Handle_Call_Site($0->wn, $0->opr); }; static void Handle_Call_Site (WN *, OPERATOR);
43/60 Prototype System • Example Rules Specific to PowerPC • Addition reg : I4ADD(reg, reg) { $cost[0].cost = 1 + $cost[2].cost + $cost[3].cost; } = { $action[2](ops); $action[3](ops); $0->result = Build_TN_Of_Mtype (WN_rtype($0->wn)); Build_OP(TOP_add, $0->result, $2->result, $3->result, ops); }
44/60 Prototype System • Example Rules Specific to PowerPC • Addition of Immediate const : I4INTCONST { $cost[0].cost = 0; } = { $0 = $1; }; reg : I4ADD(reg, const) // small immediate { if (!(ISA_LC_Value_In_Class(WN_const_val($3->wn), LC_simm16))) return 0; $cost[0].cost = 1 + $cost[2].cost; }= { $action[2](ops); $action[3](ops); $0->result = Build_TN_Of_Mtype (WN_rtype($0->wn)); Build_OP(TOP_addi, $0->result, $2->result, Gen_Literal_TN(WN_const_val($3->wn), 4), ops);};
45/60 Prototype System • Example Rules Specific to PowerPC • Addition of Immediate (continue) reg : I4ADD(reg, const) // big immediate { $cost[0].cost = 2 + $cost[2].cost; }= { $action[2](ops); $action[3](ops); $0->result = Build_TN_Of_Mtype (WN_rtype($0->wn)); INT64 val = WN_const_val($3->wn); Build_OP(TOP_addi, $0->result, $2->result, Gen_Literal_TN((short)(val & 0xffff), 4), ops); Build_OP(TOP_addis, $0->result, $2->result, Gen_Literal_TN((short)(val >> 16), 4), ops);};
46/60 Prototype System • Example Rules Specific to PowerPC • Floating-Point Arithmetic (multiply-add) f4reg : F4MADD(f4reg, f4reg, f4reg) { $cost[0].cost = 5+ $cost[2].cost + $cost[3].cost + $cost[4].cost ; } = { $action[2](ops); $action[3](ops); $action[4](ops); $0->result = Build_TN_Of_Mtype (WN_rtype($0->wn)); Build_OP(TOP_fdivs,$0->result, $2->result, $3->result, ops); }
47/60 Prototype System • Example Rules Specific to PowerPC • Floating-Point Rounding and Conversion reg : I4F8TRUNC(f8reg) { $cost[0].cost = 11 + $cost[2].cost; } = { $action[2](ops); TN* tmp_tn = Build_TN_Of_Mtype(MTYPE_F8); Build_OP(TOP_fctiwz, tmp_tn, $2->result, ops); ST * tmp_sym = CGSPILL_Get_TN_Spill_Location(tmp_tn, CGSPILL_LRA); INT64 ofst = TN_offset(tmp_tn); ST* base_sym; INT64 base_ofst; Base_Symbol_And_Offset_For_Addressing(tmp_sym, 0, &base_sym, &base_ofst); Build_OP(TOP_stfd, tmp_tn, FP_TN, Gen_Literal_TN(base_ofst, 4), ops); $0->result = Build_TN_Of_Mtype (WN_rtype($0->wn)); Build_OP(TOP_lwz, $0->result, FP_TN, Gen_Literal_TN(base_ofst + 4, 4), ops ); }
48/60 Prototype System • Example Rules Specific to PowerPC • Conditional Branch reg : I4F4GT(f4reg, f4reg) { $cost[0].cost = 7 + $cost[2].cost + $cost[3].cost ; } = { $action[2](ops); $action[3](ops); $0->result = Build_TN_Of_Mtype (WN_rtype($0->wn)); Handle_Cond_Branch(TOP_bgt, TOP_fcmpu, $0->result, $2->result, $3->result, ops); } static void Handle_Cond_Branch(TOP, TOP, TN *, TN *, TN *, OPS *);
49/60 Prototype System • Example Rules Specific to PowerPC • Conditional Branch static void Expand_Cond (TOP top_branch, TOP top_cmp, TN *dest, TN *src1, TN *src2, OPS *ops) /*Expand_Cond is an auxiliary function shared by compare operators */ /* For example */ reg : I4F4NE(f4reg, f4reg) vs Expand_Cond(TOP_bne, …) reg : I4F4GT(f4reg, f4reg) vs Expand_Cond(TOP_bgt, …) reg : I4F4EQ(f4reg, f4reg) vs Expand_Cond(TOP_beq, …) reg : I4F4GE(f4reg, f4reg) vs Expand_Cond(TOP_bge, …) reg : I4F4LE(f4reg, f4reg) vs Expand_Cond(TOP_ble, …) reg : I4F4LE(f4reg, f4reg) vs Expand_Cond(TOP_ble, …)
50/60 Prototype System • Example Rules Specific to PowerPC • Condition Move reg : I4I4GT(reg, reg) { $cost[0].cost = 3 + $cost[2].cost + $cost[3].cost ; } = { $action[2](ops); $action[3](ops); $0->result = Build_TN_Of_Mtype (WN_rtype($0->wn)); Handle_Cond_Move(OPR_GT, TOP_cmpw, $0->result, $2->result, $3->result, ops); } static void Handle_Cond_Move(OPERATOR, TOP, TN *, TN *, TN *, OPS *)