170 likes | 287 Views
Hongtao Du. Part 2. AICIP Research Dec 1, 2005. Partition Scheme. Driving Force. Data-driven How to divide data sets into different sizes for multiple computing resources
E N D
Hongtao Du Part 2 AICIP Research Dec 1, 2005
Driving Force • Data-driven • How to divide data sets into different sizes for multiple computing resources • How to coordinate data flows along different directions such that brings appropriate data to the suitable resources at the right time. • Function-driven • How to perform different functions of one task on different computing resources at the same time.
Data - Flynn's Taxonomy • Single Instruction Flow Single Data Stream (SISD) • Multiple Instruction Flow Single Data Stream (MISD) • Single Instruction Flow Multiple Data Stream (SIMD) • MPI, PVM • Multiple Instruction Flow Multiple Data Stream (MIMD) • Shard memory • Distributed memory
Data Partitioning Schemes Block Scatter Contiguous point Contiguous row
Communication Patterns and Costs • Communication expense is the first concern in data-driven partition. • Successor/Predecessor (S-P) pattern • North/South/East/West (NSEW) pattern is the message preparation latency, is the transmission speed (Byte/s), is the number of processors, is the number of data, is the length of each data item to be transmitted.
Understanding Data-driven • The arrivals of data initiate and synchronize operations in the systems. • The whole system in execution is modeled as a network linked by data streams. • Granularity of the algorithm: the size of data block that transmitted between processors. The flows of data blocks form data streams. • Granularity selection: trade-off between computation and communication • Large: reducing the degree of parallelism; increasing computation time; little overlapping between processors. • Small: increasing the degree of overlapping; increasing communication and overhead time
Data Dependency • Decreasing even dismissing the speedup • Caused by edge pixels on different blocks Block Reverse diagonal
Function • Partitioning procedure • Evaluating the complexity of individual process in function and the communication between processes • Clustering processes according to objectives • Partitioning optimization
Space-time-domain Expansion • Definition: sacrificing the processing time to meet the performance requirements. Time complexity:
One Dimension Partitioning • Keeping the processing size to one column at a time. • Repeatedly feeding in data until the process finishes. • Increases the time complexity by n (the number of column)
Two Dimension Partitioning • Fixing the processing size to a two-dimensional subset of the original processing. • Increasing the time complexity by
Resource Constraints • Multi-processor • Software implementation • Homogenous system • Heterogeneous system • Hardware/software (HW/SW) co-processing • Software and hardware components are co-designed • Process scheduling • VLSI • Hardware implementation • Communication time is ignorable
Multi-processor • Heterogeneous system • Contains computers in different types of parallelism. • Overheads in communicating add extra delays. • Communication tasks such as allocating buffers and setting up DMA channels have to be performed by the CPU and cannot be overlapped with the computation. • Host/Master - a powerful processor • Bottleneck processor - the processor taking the longest amount of time to perform the assigned task.
HW/SW Co-processing • System structure • SW - a single general purpose processor, Pentium or PowerPC • HW- a single hardware coprocessor, FPGA or ASIC • A block of shared memory • Design view • Hardware components: RTL components (adders, multipliers, ALUs, registers) • Software component: general-purpose processor • Communication: between the software component and the local memory • 90-10 Partitioning • Most frequent loops generally correspond to 90 percent of execution time but only consisting of simple designs
VLSI • Constraints • Execution time (DSP ASIC) • Power consumption • Design area • Throughput • Examples • Globally asynchronous locally synchronous on-chip bus (Time) • 4-way pipelined memory partitioning (Throughput)
Question …… Thank you!