1 / 34

CS 414 Multimedia Systems Design Lecture 5 Digital Video Representation

CS 414 - Spring 2008. Administrative . Group Directories will be established by Friday, 1/25MP1 will be out on 1/25. Color and Visual System. Color refers to how we perceive a narrow band of electromagnetic energysource, object, observerVisual system transforms light energy into sensory experien

emerald
Download Presentation

CS 414 Multimedia Systems Design Lecture 5 Digital Video Representation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. CS 414 - Spring 2008 CS 414 – Multimedia Systems Design Lecture 5 – Digital Video Representation Klara Nahrstedt Spring 2008

    2. CS 414 - Spring 2008 Administrative Group Directories will be established by Friday, 1/25 MP1 will be out on 1/25

    3. Color and Visual System Color refers to how we perceive a narrow band of electromagnetic energy source, object, observer Visual system transforms light energy into sensory experience of sight

    4. Human Visual System Eyes, optic nerve, parts of the brain Transforms electromagnetic energy

    5. Human Visual System Image Formation cornea, sclera, pupil, iris, lens, retina, fovea Transduction retina, rods, and cones Processing optic nerve, brain

    6. Retina and Fovea Retina has photosensitive receptors at back of eye Fovea is small, dense region of receptors only cones (no rods) gives visual acuity Outside fovea fewer receptors overall larger proportion of rods

    7. Transduction (Retina) Transform light to neural impulses Receptors signal bipolar cells Bipolar cells signal ganglion cells Axons in the ganglion cells form optic nerve

    8. Rods vs Cones Contain photo-pigment Respond to low energy Enhance sensitivity Concentrated in retina, but outside of fovea One type, sensitive to grayscale changes Contain photo-pigment Respond to high energy Enhance perception Concentrated in fovea, exist sparsely in retina Three types, sensitive to different wavelengths

    9. Tri-stimulus Theory 3 types of cones (6 to 7 million of them) Red (64%), Green (32%), Blue (2%) Each type most responsive to a narrow band red and green absorb most energy, blue the least Light stimulates each set of cones differently, and the ratios produce sensation of color

    10. Visual System Facts Distinguish hundreds of thousands of colors more sensitive to brightness Can distinguish about 28 fully saturated hues less sensitive to hue changes in less saturated colors Can distinguish about 23 levels of saturation for fixed hue and lightness 10 times less sensitive to blue than red or green it absorbs less energy in the blue range

    11. Color Perception Hue distinguishes named colors, e.g., RGB dominant wavelength of the light Saturation how far color is from a gray of equal intensity Brightness (lightness) perceived intensity

    12. Spatial Resolution (depends on: ) Image size Viewing distance Brightness Perception of brightness is higher than perception of color Different perception of primary colors Relative brightness: green:red:blue=59%:30%:11% B/W vs. Color CS 414 - Spring 2008 Visual Perception: Resolution and Brightness

    13. Visual Perception: Temporal Resolution CS 414 - Spring 2008 Effects caused by inertia of human eye Perception of 16 frames/second as continuous sequence Special Effect: Flicker

    14. Temporal Resolution Flicker Perceived if frame rate or refresh rate of screen too low (<50Hz) Especially in large bright areas Higher refresh rate requires: Higher scanning frequency Higher bandwidth CS 414 - Spring 2008

    15. Visual Perception Influence Viewing distance Display ratio (width/height – 4/3 for conventional TV) Number of details still visible Intensity (luminance) CS 414 - Spring 2008

    16. Television History 1927, Hoover made a speech in Washington while viewers in NY could see, hear him AT&T Bell Labs had the first “television” 18 fps, 2 x 3 inch screen, 2500 pixels

    17. Television Concepts Production (capture) 2D array of light energy to electrical signals signals must adhere to known, structured formats Representation and Transmission popular formats include NTSC, PAL, SECAM Re-construction CRT technology and raster scanning display issues (refresh rates, temporal resolution) relies on principles of human visual system

    18. Video Representations Composite NTSC - 6MHz (4.2MHz video), 29.97 fps PAL - 6-8MHz (4.2-6MHz video), 25 fps Component Maintain separate signals for color Color spaces RGB, YUV, YCRCB, YIQ

    19. Color Coding: YUV PAL video standard Based on CIE model Y is luminance UV are chrominance YUV from RGB Y = .299R + .587G + .114B U = 0.492 (B - Y) V = 0.877 (R - Y)

    20. YCrCb Subset of YUV that scales and shifts the chrominance values into range 0..1 Y = 0.299R + 0.587G + 0.114B Cr = ((B-Y)/2) + 0.5 Cb = ((R-Y)/1.6) + 0.5

    21. YCC Example

    22. YIQ NTSC standard Similar to YUV, but rotated 33 degrees YIQ from RGB Y = .299R + .587G + .114B I = .74 (R - Y) - .27 (B - Y) Q = 0.48 (R - Y) + 0.41 (B - Y)

    23. YIQ 4:2:2

    24. YIQ 4:1:1

    25. NTSC Video 525 scan lines per frame; 29.97 fps 33.37 msec/frame (1 second / 29.97 frames) scan line lasts 63.6 usec (33.37 msec / 525) aspect ratio of 4/3, gives 700 horizontal pixels 20 lines reserved for control information at the beginning of each field so only 485 lines of visible data

    26. NTSC Video Interlaced scan lines divide each frame into 2 fields, each of which is 262.5 lines phosphors in early TVs did not maintain luminance long enough (caused flicker) scanning also interlaced; can cause visual artifacts for high motion scenes

    27. HDTV Digital Television Broadcast (DTB) System Twice as many horizontal and vertical columns and lines as traditional TV Resolutions: 1920x1080 (1080p) – Standard HDTV Frame rate: options 50 or 60 frames per second CS 414 - Spring 2008

    28. Pixel Aspect Ratio CS 414 - Spring 2008 pixel aspect ratio is used in the context of computer graphics to describe the layout of pixels in a digitized image. Most digital imaging systems use a square grid of pixels—that is, they sample an image at the same resolution horizontally and vertically. But there are some devices that do not (most notably some common standard-definition formats in digital television and DVD-Video) so a digital image scanned at a vertical resolution twice that of its horizontal resolution (i.e. the pixels are twice as close together vertically as horizontally) might be described as being sampled at a 2:1 pixel aspect ratio, regardless of the size or shape of the image as a whole. Increasing the aspect ratio of an image makes its use of pixels less efficient, and the resulting image will have lower perceived detail than an image with an equal number of pixels, but arranged with an equal horizontal and vertical resolution. Beyond about 2:1 pixel aspect ratio, further increases in the already-sharper direction will have no visible effect, no matter how many more pixels are added. Hence an NTSC picture (480i) with 1000 lines of horizontal resolution is possible, but would look no sharper than a DVD. The exception to this is in situations where pixels are used for a purpose other than resolution - for example, a printer that uses dithering to simulate gray shades from black-or-white pixels, or analog videotape that loses high frequencies when dubbed. pixel aspect ratio is used in the context of computer graphics to describe the layout of pixels in a digitized image. Most digital imaging systems use a square grid of pixels—that is, they sample an image at the same resolution horizontally and vertically. But there are some devices that do not (most notably some common standard-definition formats in digital television and DVD-Video) so a digital image scanned at a vertical resolution twice that of its horizontal resolution (i.e. the pixels are twice as close together vertically as horizontally) might be described as being sampled at a 2:1 pixel aspect ratio, regardless of the size or shape of the image as a whole. Increasing the aspect ratio of an image makes its use of pixels less efficient, and the resulting image will have lower perceived detail than an image with an equal number of pixels, but arranged with an equal horizontal and vertical resolution. Beyond about 2:1 pixel aspect ratio, further increases in the already-sharper direction will have no visible effect, no matter how many more pixels are added. Hence an NTSC picture (480i) with 1000 lines of horizontal resolution is possible, but would look no sharper than a DVD. The exception to this is in situations where pixels are used for a purpose other than resolution - for example, a printer that uses dithering to simulate gray shades from black-or-white pixels, or analog videotape that loses high frequencies when dubbed.

    29. CS 414 - Spring 2008

    30. HDTV Interlaced and/or progressive formats Conventional TCs – use interlaced formats Computer displays (LCDs) – use progressive scanning MPEG-2 compressed streams In Europe (Germany) – MPEG-4 compressed streams CS 414 - Spring 2008

    31. Aspect Ratio and Refresh Rate Aspect ratio Conventional TV is 4:3 (1.33) HDTV is 16:9 (2.11) Cinema uses 1.85:1 or 2.35:1 Frame Rate NTSC is 60Hz interlaced (actually 59.94Hz) PAL/SECAM is 50Hz interlaced Cinema is 24Hz non-interlaced

    32. SMPTE Time Codes Society of Motion Picture and Television Engineers defines time codes for video HH:MM:SS:FF For NTSC, SMPTE uses a 30 drop frame code increment as if using 30 fps, when really 29.97 defines rules to remove the difference error Let’s think about the error lose .03 frames every second (30 – 29.97) lose 108 frames every hour (.03 * 3600 sec / hour) SMPTE timecode is a set of cooperating standards to label individual frames of video or film with a timecode defined by the Society of Motion Picture and Television Engineers in the SMPTE 12M specification. Timecodes are added to film, video or audio material, and have also been adapted to synchronize music. They provide a time reference for editing, synchronisation and identification. Timecode is a form of media metadata. The invention of timecode made modern videotape editing possible, and led eventually to the creation of non-linear editing systems. SMPTE (pron :sim-tee) timecodes contains binary coded decimal hour:minute:second:frame identification and 32 bits for use by users. There are also drop-frame and colour framing flags and three extra 'binary group flag' bits used for defining the use of the user bits. The formats of other forms SMPTE timecodes are derived from that of the longitudinal timecode. Time code can have any of a number of frame rates: common ones are 24 frame/s (film) 25 frame/s (PAL colour television) 29.97 (30*1.000/1.001) frame/s (NTSC color television) 30 frame/s (American black-and-white television) (virtually obsolete) In general, SMPTE timecode frame rate information is implicit, known from the rate of arrival of the timecode from the medium, or other metadata encoded in the medium. The interpretation of several bits, including the "colour framing" and "drop frame" bits, depends on the underlying data rate. In particular, the drop frame bit is only valid for a nominal frame rate of 30 frame/s: see below for details. More complex timecodes such as Vertical interval timecode can also include extra information in a variety of encodings. SMPTE time code is a digital signal whose ones and zeroes assign a number to every frame of video, representing hours, minutes, seconds, frames, and some additional user/specified information such as tape number. For instance, the time code number 01:12:59:16 represents a picture 1 hour, 12 minutes, 59 seconds, and 16 frames into the tape.SMPTE timecode is a set of cooperating standards to label individual frames of video or film with a timecode defined by the Society of Motion Picture and Television Engineers in the SMPTE 12M specification. Timecodes are added to film, video or audio material, and have also been adapted to synchronize music. They provide a time reference for editing, synchronisation and identification. Timecode is a form of media metadata. The invention of timecode made modern videotape editing possible, and led eventually to the creation of non-linear editing systems. SMPTE (pron :sim-tee) timecodes contains binary coded decimal hour:minute:second:frame identification and 32 bits for use by users. There are also drop-frame and colour framing flags and three extra 'binary group flag' bits used for defining the use of the user bits. The formats of other forms SMPTE timecodes are derived from that of the longitudinal timecode. Time code can have any of a number of frame rates: common ones are 24 frame/s (film) 25 frame/s (PAL colour television) 29.97 (30*1.000/1.001) frame/s (NTSC color television) 30 frame/s (American black-and-white television) (virtually obsolete) In general, SMPTE timecode frame rate information is implicit, known from the rate of arrival of the timecode from the medium, or other metadata encoded in the medium. The interpretation of several bits, including the "colour framing" and "drop frame" bits, depends on the underlying data rate. In particular, the drop frame bit is only valid for a nominal frame rate of 30 frame/s: see below for details. More complex timecodes such as Vertical interval timecode can also include extra information in a variety of encodings. SMPTE time code is a digital signal whose ones and zeroes assign a number to every frame of video, representing hours, minutes, seconds, frames, and some additional user/specified information such as tape number. For instance, the time code number 01:12:59:16 represents a picture 1 hour, 12 minutes, 59 seconds, and 16 frames into the tape.

    33. Rules to Compensate Every minute, “drop” two frames after 60 minutes, lose 120 frames but this is 12 too many To compensate, every ten minutes (0, 10, 20, …, 50), do not drop the two frames saves 12 frames every 60 minutes Results in a code that is easier to work with and amenable to computation Drop frame timecode dates to a compromise invented when color NTSC video was invented. The NTSC re-designers wanted to retain compatibility with existing monochrome TVs. However, the 3.58 MHz (actually 315/88 MHz = 3.57954545 MHz) color subcarrier would absorb common-phase noise from the harmonics of the line scan frequency. Rather than adjusting the audio or chroma subcarriers, they adjusted everything else, including the frame rate, which was set to 30*1.000/1.001 Hz. This meant that an "hour of timecode" at a nominal frame rate of 30 frame/s was longer than an hour of wall-clock time by 3.59 seconds, leading to an error of almost a minute and a half over a day. This caused people to make unnecessary mistakes in the studio. To correct this, drop frame SMPTE timecode - needing to drop 1 frame every thousand frames - drops frame numbers 0 and 1 of the first second of every minute, and includes them when the number of minutes is divisible by ten. This achieves an "easy-to-track" drop frame rate of 18 frames each ten minutes (18,000 frames @ 30fps) and almost perfectly compensates for the difference in rate, leaving a residual timing error of roughly 86.4 milliseconds per day, an error of only 1.0 ppm. Note: only timecode frame numbers are dropped. Video frames continue in sequence. i.e. - Drop frame TC drops two frames every minute, except every tenth minute, achieving 29.97fps. Drop-frame timecode is used only in systems running at a frame rate of 30*1.000/1.001 Hz. Drop frame timecode dates to a compromise invented when color NTSC video was invented. The NTSC re-designers wanted to retain compatibility with existing monochrome TVs. However, the 3.58 MHz (actually 315/88 MHz = 3.57954545 MHz) color subcarrier would absorb common-phase noise from the harmonics of the line scan frequency. Rather than adjusting the audio or chroma subcarriers, they adjusted everything else, including the frame rate, which was set to 30*1.000/1.001 Hz. This meant that an "hour of timecode" at a nominal frame rate of 30 frame/s was longer than an hour of wall-clock time by 3.59 seconds, leading to an error of almost a minute and a half over a day. This caused people to make unnecessary mistakes in the studio. To correct this, drop frame SMPTE timecode - needing to drop 1 frame every thousand frames - drops frame numbers 0 and 1 of the first second of every minute, and includes them when the number of minutes is divisible by ten. This achieves an "easy-to-track" drop frame rate of 18 frames each ten minutes (18,000 frames @ 30fps) and almost perfectly compensates for the difference in rate, leaving a residual timing error of roughly 86.4 milliseconds per day, an error of only 1.0 ppm. Note: only timecode frame numbers are dropped. Video frames continue in sequence. i.e. - Drop frame TC drops two frames every minute, except every tenth minute, achieving 29.97fps. Drop-frame timecode is used only in systems running at a frame rate of 30*1.000/1.001 Hz.

    34. Take Home Exercise Given a SMPTE time stamp, convert it back to the original frame number e.g., 00:01:00:10

    35. Summary Digitization of Video Signals Composite Coding Component Coding Digital Television (DTV) DVB (Digital Video Broadcast) Satellite connections, CATV networks – best suited for DTV DVB-S – for satellites (also DVB-S2) DVB-C – for CATV CS 414 - Spring 2008

More Related