AAA which application form ?

AAAwhich application form ? R. de Simone INRIA EPI Aoste semi-technical considerations

AdequationAlgorithme-ArchitectureAllocation Application / Architecture mapping comprises: • spatial allocation: distribution, placement (of tasks on resources) • temporal allocation: scheduling original formulation: Yves Sorel (1988) since then rephrased as Platform-based design, Y-Chart approach,… Application Architecture mapping adequation

AdequationAlgorithme-ArchitectureAllocation Application / Architecture mapping comprises: • spatial allocation: distribution, placement (of tasks on resources) • temporal allocation: scheduling architecture trends: • networks of processors (multicore, GPGPU/MIC, many-buzz) • communication bandwiththe issue, on-chip networks…  spatial routing, temporal arbitration, what else ? Application Architecture mapping

AdequationAlgorithme-ArchitectureAllocation Application / Architecture what kind of application description models ? • where concurrency and timing could be extracted, made explicit • Under favorable conditions, could be performed at compile time (static, time-predictable) • Dimensioning should be feasible to optimally fit the architecture (cache sizes, PRET notions,…) • (Existing theories developed in neighboring domains could be combined and adapted GOAL is to refine the application so it reflects the architecture Application Architecture mapping

Draft example: Mapping Proc1 Bus Proc2

Mapping = spatial distribution/routing + temporal scheduling Proc1 Bus Proc2

Applications: the scope MoCCs Affine bounds nested loops explicit distribution, concurrency extraction Process Networks explicit scheduling, time constraints extraction Syn/Poly-chronous

Applications: the scope MoCCs Affine bounds nested loops • Exercice: count the communities (do not forget Model-Driven Engineering for model transformations, and Classical or Real-Time Scheduling) explicit distribution, concurrency extraction Process Networks explicit scheduling, time constraints extraction Syn/Poly-chronous

Motivations for the joint use of these models • Similar restrictions !! • little (in fact no) concern for data values • control largely data-independent (uninterpreted functions) • finite-state control property • role of conflict-freeness as functional determinism • Formal models and mathematical analysis • Issues tacked at compile time but… • Largely developed independently (even if in neighboring groups) • Lack of common vocabulary, or of common framework, or common drive (risks to loose AAA grade ?) …Or is it just me who needs to go back to school ?

Process Networks • Data-flow: computations triggered by data availability • Pure Data-Flow: Marked Graphs, SDF extension • w/ data route switches: Boolean DF, CycloStatic DF, Kahn PNs • Conflict-freeness: all computations amount to same partial order, onlydifferentscheduling/time assignments • ASAP schedule: provides best throughput • correctness issues: safety and liveness • Optimization issues: optimize buffer sizes whilepreservingthroughput Second-levelsemantics: computations according to schedules (activation conditions)  Representation as polychronous descriptions

Explicit timing for Process Networks • Classical scheduling of Marked Graphs, Synchronous Data Flow graphs • Latency-Insensitive Design • Ultimately k-periodic schedules Represented with • N-synchronous formalisms • Signal and affine clock calculus Process Networks Syn/Poly-chronous

Ultimately k-periodicscheduling [00100] (01011) 1,1 2,1 1 f bf (11010) (10110) 1 (10101) 1,1 1 3 1 1 1 ef g (01101) loop pause; emitclk; pause; pause; emitclk; pause; emitclk; pause; endloop (11010) 1,1

Explicit routing • Regular switching patterns in BDF, CSDF, KPNs • Extending scheduling with similar expressivity (ultimately periodic infinite binary words) • KRGs (K-periodically) Routed Graphs • Axiomatics for an equational theory of communication re-wiring • Allows to consider sharing of communications (on common interconnect structures • Interplay of scheduling and routing (arbitration between communications using the same channels) Process Networks Syn/Poly-chronous

Interconnect modeling and optimization • On-Chip Networks • Switching conditions of Select/Marge nodes can configure different communication paths, possibly overlapping in time • Predictable routing schemes (ultimately k-periodic) will match the temporal schedules obtained in classical Process Network scheduling theory C2 Select node C1 C3 Merge node C4

one possible routing configuration 1 0 1 0 C2 0 1 0 1 C1 C3 1 0 1 0 C4 0 1 0 1

another possible configuration 1 1 1 1 1 1 C2 0 0 0 0 0 0 0 C1 C3 0 0 0 0 0 0 C4 1 1 1 1 1 1

Nestedloopswith affine bounds • Parallelcompilation models • static control (regular indices, not while (data_cond) loops) • Iterationspace(multidimensionalarrays, polyhedra) • Source-to-source transformations (rewrite as program, onlywith DOSEQ and DOPAR atvariouslevels) • Improving data locality • Variants • Systems of Uniform Recurrence Equations (SUREs, MMAlpha) • Geometrical description formalisms (Array-OL)

Example + Reduced Data Graphs Affine bounds nested loops 2 1 Process Network loop i = 1 to N loop j = 1 to N a(i,j) := a(i-1,j-1) + a(i, j-1) end end dependence levels 0 1 1 1 DOSEQ j = 1 to N DOPAR i = 1 to N a(i,j) := a(i-1,j-1) + a(i, j-1) end end direction vectors

FromNestedLoops to Process Networks Affine bounds nested loops • Existing efforts: basic ideais to assign one computingnode to eachassignment • COMPAAN, Pico Express • Tiling, chunking • Stefanov, et al, PolyhedralProcess Networks • Multidimensional SDF for Array-OL/Gaspard Process Network

Dream (or nightmare?) • How can one produce Process Network descriptions from nested loops (or better said, SUREs or RDGs) • Of course limited, but very often already polyhedra (for bounds) separated from dependency graphs (for computations) • Needs to split clearly between (sequential) iterations and parallel ones (currently expanded) • Issue of potential (application) vs real (architecture) concurrency/parallelism j 4 a(i,j) a(i-1,j-1) a(i,j-1) 8 i

Dream (or nightmare?) • Greenish nodes provide constants (in parallel at the bottom, sequentially on the left) • Blue nodes each compute 4 values in parallel, shifts the last right across the border Lack of generality certainly when values to be sent across boundaries are not ordered as expected at target j 4 1 1 a(i,j) a[1..4,0]) 0.(1) 0.(1) 3 3 a(i-1,j-1) a(i,j-1) 4 4 a[1,1..4] a[1,5..8] 8 i

Challenges of it • Not entirelyclear (to me at least) how data locality and transfer (if any) isdealtwith in differentworks • Typical: a data block istransferedin someorder (row), and consumed in differentorder (column)

SoC/NoC intelligent routers ? • To match (and beprogrammedfrom) the application, routers • shouldbe able to fork the data, • Shouldonly care about directions (not single target) • Should let ccommunications cross one another if not usingsamelines • Shouldbe able to buffer values (at least in the size of the processor array) P Core P Core P Core P Core P Core P Core P Core P Core P Core P Core P Core P Core

ThAAAnk You A ThAAnk You Questions anybody ?

AAA which application form ?