1 / 21

Brook+

Brook+. Matthew Caylor. Introduction. AMD Stream Computing is a first step in harnessing the tremendous processing power the GPU (Stream Processor). high performance, data-parallel computing in a wide range of business, scientific and consumer applications.

camila
Download Presentation

Brook+

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Brook+ Matthew Caylor

  2. Introduction • AMD Stream Computing is a first step in harnessing the tremendous processing power the GPU (Stream Processor). • high performance, data-parallel computing in a wide range of business, scientific and consumer applications. • AMD’s Stream Computing platform provides organizations and individuals the ability to integrate accelerated computing in existing IT Infrastructure, enabling improved decision making, accelerated work-flows and reduced time-to-discovery

  3. Introduction • Brook+ is a special purpose language designed to operate on top of AMD CAL. • Brook is an extension of standard ANSI C • Designed to incorporate the ideas of data parallel computing and arithmetic intensity into a familiar, efficient language. • The general computational model, referred to as streaming • Provides two main benefits over traditional conventional languages: • Data Parallelism: Allows the programmer to specify how to perform the same operations in parallel on different data. • Arithmetic Intensity: Encourages programmers to specify operations on data which minimize global communication and maximize localized computation.

  4. Introduction • AMD along with the open source community are working to mask the GPU's graphics programming heritage. • The open source Brook compiler plus AMD enhancements are geared directly at non-graphics stream computing. • CAL provides high-level language access to the various parts of the GPU as needed.

  5. CAL • AMD Compute Abstraction Layer (CAL) is a device-driver library that provides a forward compatible interface to AMD’s Stream Processors (Devices). • CAL allows software developers to interact with the processing cores at the lowest-level, if needed, for optimized performance, while maintaining forward compatibility.

  6. CAL CAL provides the following main functions: • Device Specific Code Generation • Device Management • Resource Management • Kernel Loading and Execution • Multi-device support • Interoperability with 3D Graphics APIs

  7. CAL • The CAL SDK includes a small set of ‘C’ routines and data types that allow higher level software tools to directly interact with and control hardware memory buffers (device-level streams) and GPU programs (device-level kernels). • The CAL runtime accepts kernels written in AMD IL and generates optimized code for the target architecture.

  8. CAL • A typical CAL program has two parts: • A program running on the host CPU (written in C or C++) The program. • A program running in the stream processor (Written in CAL IL for example.) The kernel. • The CAL API comprises one or more stream processors connected to one or more CPUs by a high-speed bus. • The CPU runs the CAL and controls the stream processor by sending commands using the CAL API. • The stream processor runs the kernel specified by the application. The stream processor device driver program (CAL) runs on the host CPU.

  9. Streams • Streams provide connectivity between processing stages. • A stream is a reference to an N-dimensional array of identically-typed primitive elements. • It has more restricted access semantics than do conventional arrays. • Restrictions permit optimization of both storage requirements and computation locality, providing higher performance for those algorithms that this model can accommodate.

  10. Streams • The syntax for specifying a stream is similar to other C variable or type declarations • The angle brackets are used to mark the type/variable as a stream and to delineate the stream dimensions. • For example: • float a<>; A single dimension stream of unspecified length. • float b<10>; A single dimension stream of length 10. • float c<10,10>; A two dimension steam of length ten by ten. • float d<,10>; A two dimension stream of unspecified length by width 10.

  11. Steam Operators • Streams are accessed by use of a stream operator. • A stream operator looks like a function call. • When reading a stream, it is copied twice: • first, from the host (CPU) memory • to the PCIe memory, • then to the local (stream processor) memory. • The commands are: • streamRead(destinationStream, sourceArray) • streamWrite(sourceStream, destinationArray) • The Stream and Array must have the same number of dimensions, size, and element types must match; otherwise, the behavior is undefined

  12. Kernels • Kernals are where the work of stream processing takes place. • There are two types of kernals: • Basic • Reduction. • A basic kernel looks like the following: void kernel mad(float a<>, float b<>, float c, out float d<>) { d = a * b + c; } void mad_slow(float a[], float b[], float c, float d[])

  13. Kernels • Reductions are kernels that decrease the dimensionality of a stream by folding along one axis using an associative and commutative binary operation. • The requirement that the operation be associative and commutative means that the result is independent of evaluation order. • An example of a reduction kernel is as follows: reduce void kernel sum(float a<>, reduce float b) { b += a; }

  14. Kernels • The stream a is folded and reduced to b. • The out put being one value. • In reduction kernel a reduction variable is • specified as part of a kernel • operated on using any of the C assignment operators that satisfies the associativity and commutativity requirements . • An other examples of reduction kernals as follows: reduce void max_reduce(double a, reduce double b) { if (a > b) b = a; }

  15. Kernels • A partial reduction is possible if the target stream has the same number of dimensions as the source stream. • The size of each dimension must be an integer multiple of the corresponding dimension of the target. • For example float s<100,100>, float t<100,50>, can be partially reduced by sum(s,t) where the dimensionality of s is reduced to match t. • Conversely, float s<100,100>, float t<100,200> is expanded by sum(s,t) where the dimensionality of s expands to match the dimensionality of u.

  16. Kernels • Kernels can call other functions defined in the same .br file or any files it includes • However, there are restrictions: • A top-level kernel must have a return type of void to be callable from host code. • Subkernels can return data of any non-stream type. • A subkernel also can be bound to streams propagated from its parent kernel. • Subkernels are logically expanded inline, so recursion is not permitted. • Kernels cannot call stream operators

  17. Kernels • Kernels can use both stream and non-stream parameters as inputs. • Generally, only streams can be used as outputs. • Within a kernel definition, the following restrictions apply: • The goto, volatile , and static keywords are prohibited. • Pointers are not supported. • Recursion is not allowed. • Any pointers passed into Brook+ code are required not to alias each other

  18. Conclusion • Brook+ is almost as good in readability and writability as C and C++ • It is an environment that a C or C++ programmer can work in. • Do not have to worry about the CAL structure that Brook+ is built on. • There are some abstract concepts about the functionality of kernels that make it slightly harder to write a function to operate on the GPU core rather than the CPU, but that is the trade off of this method of computation. • Since Brook+ is still very new it is not as reliable as C or C++ • Support is not yet readily available for it, but there is a small community growing and using ATI graphics cards for desktop super computing.

  19. Conclusion Brook+ is simply a language to act as an interface between programmer and graphical device.

  20. Spitfire

  21. Spitfire

More Related