280 likes | 422 Views
Squish-DSP Application of a Project Management Tool to manage low-level DSP processor resources. M. Smith, University of Calgary, Canada smithmr @ ucalgary.ca. Series of Talks and Workshops. CACHE-DSP – Talk on a simple process tool to identify cache conflicts in DSP code.
E N D
Squish-DSPApplication of a Project Management Toolto manage low-level DSP processor resources M. Smith, University of Calgary, Canadasmithmr @ ucalgary.ca
Series of Talks and Workshops • CACHE-DSP – Talk on a simple process tool to identify cache conflicts in DSP code. • SQUISH-DSP – Talk on using a project management tool to automate identification of parallel DSP processor instructions . • SHARC Ecology 101 – Workshop showing how to systematically write parallel 2106X code. • SHARC Ecology 201 – Workshop on SQUISH-DSP and CACHE-DSP tools. Squish-DSP Tool smithmr@ucalgary.ca
Scope of Talk • Overview of hand optimization of code • Paradigm shift in microprocessor resource scheduling • Project Management Tool Application • Translating ‘microprocessor’ language into a ‘business’ format • Examples and limitations • Better optimization from VisualDSP code • Future directions Squish-DSP Tool smithmr@ucalgary.ca
Standard “C” code void Convert(float *temperature, int N) { int count; for (count = 0; count < N; count++) { *temperature = (*temperature) * 9 / 5 + 32; temperature++ } Squish-DSP Tool smithmr@ucalgary.ca
2106X-style load/store “C” code void Convert(register float *temperature, register int N) { register int count; register float *pt = temperature; // Ireg <- Dreg register float scratch; for (count = 0; count < N; count++) { scratch = *pt; scratch = scratch * (9 / 5); scratch = scratch + 32; // Order of Ops *pt = scratch; pt++; } Squish-DSP Tool smithmr@ucalgary.ca
Check on required register use #define count scratchR1 #define pt scratchDMpt #define scratchF2 F2 LCNTR = INPAR2, DO LOOP_END UNTIL LDE: scratchF2 = dm(pt, zeroDM); Any special requirements here on F2?? // INPAR1 (R4) is dead -- can reuse #define constantF4 F4 // Must be float constantF4 = 1.8; scratchF2 = scratchF2 * constantF4 Fn = F(0,1,2 or 3) * F(4,5,6 or 7), #define F0_32 F0 // Must be float F0_32 = 32.0; scratchF2 = scratchF2 + F0_32; Fm = F(8, 9, 10 or 11) + F(12, 13, 14 or 15) LOOP_END: dm(pt, plus1DM) = scratchF2; Squish-DSP Tool smithmr@ucalgary.ca
Resource Chart -- Basic code Squish-DSP Tool smithmr@ucalgary.ca
Unroll the loop -- 5 times here Squish-DSP Tool smithmr@ucalgary.ca
Parallelism causes Register/Resource Conflicts SRC DEST SRC DEST SRC DEST SRC DEST SRC DEST SRC DEST SRC DEST SRC DEST Squish-DSP Tool smithmr@ucalgary.ca
Unroll the loop a bit more c Squish-DSP Tool smithmr@ucalgary.ca
Final code version Squish-DSP Tool smithmr@ucalgary.ca
Real Life is not made up of ‘short loops’ • Probably using DSP-intelligent compiler as a starting point • Longer loops -- more tasks to make parallel • Many different opportunities for task ordering • Complicated resource management and register dependency issues • Need a tool to help get the product ‘out the door’ Squish-DSP Tool smithmr@ucalgary.ca
Business Management Tool • One evening went looking for a ‘tree’ program to manage the scheduling of microprocessor resources. • In frustration, decided to take the 2106X tasks and put them into Microsoft Project. • By mistake, found that I had developed a very useful microprocessor management tool, especially with the MS Project GUI! • Question -- how to get it to function in a systematic manner? Squish-DSP Tool smithmr@ucalgary.ca
MS Project -- 21XXX processor • Requires a paradigm shift • Business project concept -- One person can’t be doing two tasks in the same time slot. • Becomes one data bus can’t be transferring two data items at same time • Handled by identifying the ‘processor resources’ needed to complete each ‘basic task’. Squish-DSP Tool smithmr@ucalgary.ca
MS Project -- 21XXX processor • Business project concept. • If you delay building a wall (Task A), then you must delay painting it (Task B) HOWEVER • If you build the wall earlier, you could paint it earlier, but you don’t have to. • Might make more sense to delay Task B so that Task C can be done earlier • since doing Task C allows Task D to be completed in parallel with Task B • so that the whole project is finished earlier. Squish-DSP Tool smithmr@ucalgary.ca
Simple Example 1) F6 = dm(I4, M4); 10) F1 = F2 * F4, F8 = F8 + F12, F12 = pm(I12, M12); 16) F5 = F3 * F6, F8 = F8 + F12, F12 = pm(I12, M12); • Might be able to move Task 1 in parallel with any instruction 2 through 15 BUT not in parallel with 16 • If Task 10 moves earlier, so can Task 16, BUT not before Task 10 • In Task 10 ‘F12=….’ can be made parallel with ‘F6=….’, BUT Task 10 ‘F8=….’ can’t! Squish-DSP Tool smithmr@ucalgary.ca
SquishDSP -- parser 1) F6 = dm(I4, M4); 10) F1 = F2 * F4, F8 = F8 + F12, F12 = pm(I12, M12); 16) F5 = F3 * F6, F8 = F8 + F12, F12 = pm(I12, M12); • Task 16 split into 3 atomic tasks • F12 = pm(I12, M12) -- PMBUS resource, must come after ‘F12=…’ from Task 10, and after ‘F8=…’ in current Task • F8 = F8 + F12 -- ALU resource, must come after ‘F8=…’ and ‘F12=…’ from Task 10 • F5 = F3 * F6 -- MULTIPLIER resource, must come after ‘F6=…’ from Task 1 Squish-DSP Tool smithmr@ucalgary.ca
Preparation for Microsoft Project • .asm Code broken up into sub-tasks with intra and inter dependencies recognized • Reformatted as Microsoft Project Text file • Rescheduled within Microsoft Project, either automatically or using GUI interface • Reformatted as .asm code with increased parallelism Squish-DSP Tool smithmr@ucalgary.ca
Example GUI screen capture ATOMIC TASKS showing RESOURCE and DEPENDENCIES INSTR.BrokenintoATOMICTASKS ATOMIC TASKS with RESOURCE CONFLICTS Squish-DSP Tool smithmr@ucalgary.ca
Task scheduling after ‘LEVELING’ Squish-DSP Tool smithmr@ucalgary.ca
Initial ‘C’ code Squish-DSP Tool smithmr@ucalgary.ca
Code from ‘Visual-DSP’ • VisualDSP unrolled loop 3 times Squish-DSP Tool smithmr@ucalgary.ca
Code from SQUISH-DSP • 12 VisualDSP cycles squished to 8 Squish-DSP Tool smithmr@ucalgary.ca
Final version of code(loopchange) Squish-DSP Tool smithmr@ucalgary.ca
FinalSQUISH • 12 VisualDSP cycles squished to 6 Squish-DSP Tool smithmr@ucalgary.ca
Advantages and Limitations • Current version intended to handle the inner critical loop of algorithm • Not handling ‘Cache’ conflicts • Not optimized for instructions in delay slots in jumps and conditional jumps • Not optimized for multiple DAG delays • e.g. I4 = …. ; DM(I4, M2) = ; I5 =… • Moving to ‘task profile management’ macros with Primavera PV3 Tool Squish-DSP Tool smithmr@ucalgary.ca
Conclusion • SquishDSP is a prototype scheduling tool to identify and reschedule microprocessor resource operations in parallel • Already useful in current form for ‘inner DSP loops’ • Microsoft Project used for concept work but Primavera PV3 tool offers more long term promise Squish-DSP Tool smithmr@ucalgary.ca
Acknowledgements • Financial support of Natural Sciences and Engineering Research Council (NSERC) of Canada and University of Calgary • Financial support from Analog Devices. Dr. Mike Smith is ADI University Professor 2001/2002 • Future financial support from Alberta Provincial Government through Alberta Software Engineering Research Consortium (ASERC) Squish-DSP Tool smithmr@ucalgary.ca