1 / 15

Basic Linear Algebra Subroutines (BLAS) – 3 levels of operations

Basic Linear Algebra Subroutines (BLAS) – 3 levels of operations. Memory hierarchy efficiently exploited by higher level BLAS. Fourier Transform.

loc
Download Presentation

Basic Linear Algebra Subroutines (BLAS) – 3 levels of operations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Basic Linear Algebra Subroutines (BLAS) – 3 levels of operations Memory hierarchy efficiently exploited by higher level BLAS

  2. Fourier Transform • The Fourier transform is widely used for designing filters. You can design systems with reject high frequency noise and just retain the low frequency components. This is natural to describe in the frequency domain. • Important properties of the Fourier transform are: • 1. Linearity and time shifts • 2. Differentiation • 3. Convolution

  3. A Simple Model for Parallel Processing • Parallel Random Access Machine (PRAM) model • a number of processors all can access • a large share memory • all processors are synchronized • all processor running the same program • each processor has an unique id, pid. and • may instruct to do different things depending on their pid

  4. Interconnection Networks • Uses of interconnection networks • Connect processors to shared memory • Connect processors to each other • Interconnection media types • Shared medium • Switched medium • Different interconnection networks define different parallel machines. • The interconnection network’s properties influence the type of algorithm used for various machines as it affects how data is routed.

  5. Switch Network Topologies • View switched network as a graph • Vertices = processors or switches • Edges = communication paths • Two kinds of topologies • Direct • Indirect

  6. Terminology for Evaluating Switch Topologies • We need to evaluate 4 characteristics of a network in order to help us understand their effectiveness in implementing efficient parallel algorithms on a machine with a given network. • These are • The diameter • The bisection width • The edges per node • The constant edge length • We’ll define these and see how they affect algorithm choice. • Then we will investigate several different topologies and see how these characteristics are evaluated.

  7. Terminology for Evaluating Switch Topologies • Diameter – Largest distance between two switch nodes. • A low diameter is desirable • It puts a lower bound on the complexity of parallel algorithms which requires communication between arbitrary pairs of nodes.

  8. Terminology for Evaluating Switch Topologies • Bisection width – The minimum number of edges between switch nodes that must be removed in order to divide the network into two halves (within 1 node, if the number of processors is odd.) • High bisection width is desirable. • In algorithms requiring large amounts of data movement, the size of the data set divided by the bisection width puts a lower bound on the complexity of an algorithm, • Actually proving what the bisection width of a network is can be quite difficult.

  9. Evaluating Switch Topologies • Many have been proposed and analyzed. We will consider several well known ones: • 2-D mesh • linear network • binary tree • hypertree • butterfly • hypercube • shuffle-exchange • Those in yellow have been used in commercial parallel computers.

  10. PRAM [Parallel Random Access Machine] (Introduced by Fortune and Wyllie, 1978) PRAM composed of: • P processors, each with its own unmodifiable program. • A single shared memory composed of a sequence of words, each capable of containing an arbitrary integer. • a read-only input tape. • a write-only output tape. PRAM model is a synchronous, MIMD, shared address space parallel computer.

  11. PRAM model of computation Shared memory • p processors, each with local memory • Synchronous operation • Shared memory reads and writes • Each processor has unique id in range 1-p

  12. Characteristics • At each unit of time, a processor is either active or idle (depending on id) • All processors execute same program • At each time step, all processors execute same instruction on different data (“data-parallel”) • Focuses on concurrency only

  13. Why study PRAM algorithms? • Well-developed body of literature on design and analysis of such algorithms • Baseline model of concurrency • Explicit model • Specify operations at each step • Scheduling of operations on processors • Robust design paradigm

  14. Designing PRAM algorithms • Balanced trees • Pointer jumping • Euler tours • Divide and conquer • Symmetry breaking • . . .

  15. Balanced trees • Key idea: Build balanced binary tree on input data, sweep tree up and down • “Tree” not a data structure, often a control structure

More Related