430 likes | 604 Views
Video/Graphics of Modern Desktop Board & its Linux programming. Dr A Sahu Dept of Comp Sc & Engg . IIT Guwahati. Outline. Intel 945 Motherboard architecture GMCH ICH7 (8254,8259,8237) PCI and PCI Express Video Ram, In build GPU DirectX, OpenGL, OpenCL Advance GPU from ATI and AMD
E N D
Video/Graphics of Modern Desktop Board & its Linux programming Dr A Sahu Dept of Comp Sc & Engg. IIT Guwahati
Outline • Intel 945 Motherboard architecture • GMCH • ICH7 (8254,8259,8237) • PCI and PCI Express • Video Ram, In build GPU • DirectX, OpenGL, OpenCL • Advance GPU from ATI and AMD • Introduction to NvidiaCuda Programming
Intel 945 Express Chipset Intel Pentium D Processor Support for Media Ext Card DDR2 82945 GMCH/MCH North Bridge Intel GMA 950 Graphics DDR2 PCI Express* x16 Graphics 4 Serial ATA Ports Intel HD Audio 82801 GR ICH7 (io cont. hub sys7) South Bridge Integrated Matrix Storage Technology 8 high Speed USB Ports 6 PCI Express* x1 slot 6 PCI Slots Intel Pro 100/1000 LAN Intel Active Mngement Tech. BIOS Support
82945 : GMCH/MCH • Graphics and Memory Controller Hub • Graphics Interface (GI) and PCI Express for Graphics card support • Host Interface (HI) • Connect to processor and support HT, IntrDelivery, 12 in-order queue, etc. • System Memory Interface (SMI) • Connected to two channel DDR2 • Direct Media Interface (DMI) • Connect to ICH7
82801: ICH7 • IO Controller HUB version 7 (South Bridge) • Enhance DMA controller, IC and timer • Two cascaded 8259 PIC • One 82C54 PIT (Motorola) • One 8237 DMA • Low Pin count (LPC) Interface • PCI and PCI express (Peripheral Component. Int) • AC97 & HD Audio Codec • Serial Peripheral Interface (SPI) Support • Firm wire support (BIOS) • ACPI, SATA, USBs
Introduction Intel Pentium D Processor Support for Media Ext Card • Peripherals : HD monitor • Interfaces : Intermediate Hardware • Nvidia GPU card • Interfaces : Intermediate Software/Program • Nvidia GPU driver DDR2 82945 GMCH/MCH North Bridge DDR2 Intel GMA 950 Graphics PCI Express* x16 Graphics
Migration from Char to Graphics/Video • Char display (80x25 char, 5x7pixel=400x175) • CRT Monitor (400x600, 640x480,600x800) • LCD Monitor (1024x768,1280x1024,…) • Graphics visually more appealing • Display Line, Circle, Rectangle, Curve, Polygon • Character using this primitives • True type font RED ARROW Circle
Multiplexed 1024x768 pixel display Row Ctr Col Ctr CLK > 1024x768x50Hz 0 1 2 3 4 ….. …1023 0 1 2 767 Frame Buffer 1024x768 Pixel LCD 8x3=24 Bits R B G Refresh screen 50 time a Sec
Frame Buffer (24 Bit Pixel) Pixels in Frame Buffer 24 Bit Per Pixels Pixels on the Screen Graphical representation of 24 bit color
Graphics Cards • GPU : specialized processor that accelerates 3D or 2D graphics primitives operations • Lots of Floating point operations • Accelerates Primitives • Line, circle, polygon, mesh, projection, sphere,
Graphics System 3D application 3D API Commands 3D API: OpenGL DirectX/3D CPU-GPU Boundary GPU Command & Data Stream Pixel Updates Pixel Location Stream Assembled polygon, line & points Vertex Index Stream GPU Command Primitive Assembly Rastereisation Interpolation Raster Operation Frame Buffer transformed Vertices RastorizedPretransformed Fragments Pretransformed Vertices Transformed Fragments Programmable Fragment Processors Programmable Vertex Processor
Graphics System Vertices (x,y,z) Memory System Vertex Shadder Vertex Processing Pixel Shadder Pixel Processing Texture Memory Frame Buffer Pixel R, G,B
Access to video memory • We create a Linux device-driver that gives applications access to graphics frame-buffer • Accessing Frame buffer through PCI Express slot • Assume a Graphics card is installed in your system
The role of a device-driver A device-driver is a software module that controls a hardware device in response to OS kernel requests relayed, often, from an application hardware device i/o memory in out RAM device-driver module user application ret call call ret Operating System kernel syscall standard “runtime” libraries sysret user space kernel space
Raster Display Technology The graphics screen is a two-dimensional array of picture elements (‘pixels’) These pixels are redrawn sequentially, left-to-right, by rows from top to bottom Each pixel’s color is an individually programmable mix of red, green, and blue
Special “dual-ported” memory CRT CPU VRAM RAM 16-MB of VRAM 2048-MB of RAM
How much VRAM is needed? • This depends on • the total number of pixels • the number of bits-per-pixel • The total number of pixels • Determined by the screen’s width and height • 1280-by-960= 1,228,800 pixels • The number of bits-per-pixel (“color depth”) is a programmable parameter (varies from 1 to 32) • Certain types of applications also need to use extra VRAM • for multiple displays, or for “special effects” like computer game animations
How ‘truecolor’ works 24 16 8 0 longword alpha red green blue Alpha represent pre-multiplied valued R 0.5, 0, 1, 0 G 0, 0.5, 0 B pixel The intensity of each color-component within a pixel is an 8-bit value
x86 uses “little-endian” order “truecolor” graphics-modes use 4-bytes per picture-element 0 1 2 3 4 5 6 7 8 9 10 VRAM B G R A B G R A B G R … Video Screen
Some operating system issues • Linux is a “protected-mode” operating system • I/O devices normally are not directly accessible • Linux on x86 platforms uses “virtual memory” • Privileged software must “map” the VRAM • A device-driver module is needed: ‘vram.c’ • We can compile it using: $ mmake vram • Device-node: # mknod /dev/vram c 98 0 • Make it ‘writable’: # chmod a+w /dev/vram
Our ‘vram.c’ module • It’s a character-mode Linux device-driver • It implements four device-file ‘methods’: • ‘read()’: lets a program read from video memory • ‘write()’: lets a program write to video memory • ‘llseek()’: lets a program ‘move’ the file’s pointer • ‘mmap()’: lets a program ‘map’ vram to user-space • It also implements a pseudo-file that lets users view the RADEON X300 graphics controller’s PCI Configuration Space parameter-values: $ cat /proc/vram
What is PCI? • It’s an acronym for “Peripheral Component Interconnect” and refers to a collection of industry standards for devices used in PCs • An Intel-sponsored initiative (from 1992-9) having several ambitious goals: • Reduce diversity inherent in legacy PC devices • Improve speed and efficiency of data-transfers • Eliminate (or reduce) platform dependencies • Simplify adding/removing peripheral adapters • Lower PC’s total consumption of electrical power
PCI Configuration Space A non-volatile parameter-storage area for each PCI device-function PCI Configuration Space Header (16 doublewords – fixed format) PCI Configuration Space Body (48 doublewords – variable format) 64 doublewords
Example: Header Type 0 16 doublewords 31 0 31 0 Dwords Status Register Command Register Device ID Vendor ID 1 - 0 BIST Header Type Latency Timer Cache Line Size Class Code Class/SubClass/ProgIF Revision ID 3 - 2 Base Address 1 Base Address 0 5 - 4 Base Address 3 Base Address 2 7 - 6 Base Address 5 Base Address 4 9 - 8 Subsystem Device ID Subsystem Vendor ID CardBus CIS Pointer 11 - 10 reserved capabilities pointer Expansion ROM Base Address 13 - 12 Maximum Latency Minimum Grant Interrupt Pin Interrupt Line reserved 15 - 14
Examples of VENDOR-IDs • 0x8086 – Intel Corporation • 0x1022 – Advanced Micro Devices, Inc • 0x1002 – Advanced Technologies, Inc (My office machine) • 0x10EC – RealTek, Incorporated • 0x10DE – Nvidia Corporation • 0x10B7 – 3Com Corporation • 0x101C – Western Digital, Inc • 0x1014 – IBM Corporation • 0x0E11 – Compaq Corporation • 0x1057 – Motorola Corporation • 0x106B – Apple Computers, Inc • 0x5333 – Silicon Integrated Systems, Inc
Examples of DEVICE-IDs • 0x5347: ATI RAGE128 SG • 0x4C58: ATI RADEON LX • 0x5950: ATI RS480 • 0x436E: ATI IXP300 SATA • 0x438C: ATI IXP600 IDE • 0x5B60: ATI RadeonHD 3200 Graphics See this Linux header-file for lots more examples: </usr/src/linux/include/linux/pci_ids.h>
Defined PCI Class Codes • 0x00: Legacy Device (i.e., built before class-codes were defined) • 0x01: Mass Storage controller • 0x02: Network controller • 0x03: Display controller • 0x04: Multimedia device • 0x05: Memory Controller • 0x06: Bridge device • 0x07: Simple Communications controller • 0x08: Base System peripherals • 0x09: Input device • 0x0A: Docking stations • 0x0B: Processors • 0x0C: Serial Bus controllers • 0x0D: Wireless controllers • 0x0E: Intelligent I/O controllers • 0x0F: Encryption/Decryption controllers • 0x10: Satellite Communications controllers • 0x11: Data Acquisition and Signal Processing controllers
Example of Sub-Class Codes • Class Code 0x01: Mass Storage controller • 0x00: SCSI controller • 0x01: IDE controller • 0x02: Floppy Disk controller • 0x03: IPI controller • 0x04: RAID controller • 0x80: Other Mass Storage controller
Example of Sub-Class Codes • Class Code 0x02: Network controller • 0x00: Ethernet controller • 0x01: Token Ring controller • 0x02: FDDI controller • 0x03: ATM controller • 0x04: ISDN controller • 0x80: Other Network controller
Example of Sub-Class codes • Class Code 0x03: Display Controller • 0x00: VGA-compatible controller • 0x01: XGA controller • 0x02: 3D controller • 0x80: Other display controller
Hardware details may differ • Graphics controllers use vendor-specific mechanisms to perform similar operations • There’s a common core of compatibility with IBM’s VGA (Video Graphics Array) developed in the mid-1980s • But since IBM’s loss of market dominance, each manufacturer has added enhancements which employ incompatible programming interfaces • You need a vendor’s manual! (Download from vendor site)
The ‘frame-buffer’ • Today’s PCI graphics systems all provide a dedicated amount of display memory to control the screen-image’s pixel-coloring • But how much memory will vary with price • And its location within the CPU’s physical address-space can’t be predicted because it depends upon what other PCI devices are installed (and mapped) during startup
The ‘base address’ fields • The PCI Configuration Header has several so-called Base Address fields, and vendors use one of these to hold the frame-buffer’s starting address and to indicate how much vram the video controller can actually use • The Linux kernel provides driver-writers with some convenient functions for getting the location and size of the frame-buffer
ATI Radeonuses Base Address 0 • Our ‘vram.c’ module’s initialization routine employs these kernel helper-functions: • #include <linux/pci.h> • structpci_dev *devp; // for a variable that will point to • //a kernel-structure • // get a pointer to the PCI device’s Linux data-structure • devp = pci_get_device( VENDOR_ID, DEVICE_ID, NULL ); • if ( !devp ) return –ENODEV; // device is not present • // get starting address and length for memory-resource 0 • vram_base = pci_resource_start( devp, 0 ); • vram_size = pci_resource_len( devp, 0 );
Reading from ‘vram’ • You can use our ‘fileview’ utility to see the current contents of the video frame-buffer $ fileview /dev/vram • Our ‘vram.c’ driver’s ‘read()’ method gets invoked when an application-program attempts to ‘read’ from the ‘/dev/vram’ device-file • The read-method is implemented by our driver using ‘ioremap()’ (and ’iounmap()’) to temporarily map a 4KB-page of physical vram to the kernel’s virtual address-space
I/O ‘memcpy()’ functions • Linux provides a ‘platform-independent’ way to do copying from an i/o-device’s memory into an application’s buffer (or vice-versa): • A ‘read’ copies from vram to a user’s buffer memcpy_fromio( buf, vaddr, len ); • A ‘write’ copies to vram from a user’s buffer memcpy_toio( vaddr, buf, len );
‘mmap()’ • This is a standard UNIX system-call that lets an application ‘map’ a file into its virtual address-space, where it can then treat the file as if it were an ordinary array • See the man-page: $ man mmap • This same system-call can also work on a device-file if that device’s driver provided ‘mmap()’ among its file-operations
The user-role • In the application-program, six arguments get passed to the ‘mmap()’ library-function int mmap( (void*)baseaddress, int memorysize, int accessattributes, int flags, int filehandle, int offset );
The driver-role • In the kernel, those six arguments will get validated and processed, then the driver’s ‘mmap()’ callback-function will be invoked to supply missing information and perform further sanity-checks and do appropriate page-mapping actions: int mmap( struct file *file, struct vm_area_struct *vma );
Our driver’s code int mmap( struct file *file, struct vm_area_struct *vma ) { // extract the paramers we will need from the ‘vm_area_struct’ unsigned long region_length = vma->vm_end – vma->vm_start; unsigned long region_origin = vma->vm_pgoff * PAGE_SIZE; unsigned long physical_addr = fb_base + region_origin; unsigned long user_virtaddr = vma->vm_start; // sanity check: mapped region cannot extend past end of vram if ( region_origin + region_length > fb_size ) return –EINVAL; // tell the kernel not to try ‘swapping out’ this region to the disk vma->vm_flags |= VM_RESERVED; // tell the kernel to exclude this region from any core dumps vma->vm_flags |= VM_IO;
Driver’s code continued // invoke a helper-function that will set up the page-table entries if ( remap_pfn_range( vma, user_virtaddr, physical_addr >> 12, region_length, vma->vm_page_prot ) ) return –EAGAIN; return 0; // SUCCESS }
Demo: ‘rotation.cpp’ • This application-program will demonstrate use of our ‘vram.c’ device-driver’s ‘read()’, ‘write()’ and ‘llseek()’ methods (i.e., device-file operations) • It will perform a rotation of the color-components (R,G,B) in every displayed ‘truecolor’ pixel: R G G B B R • After 3 times the screen will look normal again