1 / 48

Video Surveillance: Legally Blind?

This article explores the image quality needed for face identification in video surveillance and the challenges posed by human face recognition. It also discusses the impact of recording to video tape and using image compression on image quality.

barthur
Download Presentation

Video Surveillance: Legally Blind?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Video Surveillance:Legally Blind? Peter Kovesi Centre for Exploration Targeting The University of Western Australia

  2. Questions • What image quality do we need for identification? • How do you measure image quality? • What is the image quality from a video camera? • What is the effect on image quality when you: • record to video tape? • use image compression?

  3. Humans are very bad at recognizing unfamiliar faces • Kemp, Towell and Pike (1997) tested the value of having photos on credit cards. When a user presented a card with a photograph of someone else that had some resemblance to the user, they were challenged less than 40% of the time. • Bruce et al. (1999, 2001) have tested the ability of people to match good quality CCTV images of unfamiliar faces under a variety of scenarios. Correct recognition rates are typically only 70-80%.

  4. Bruce et al (1999). Is this person in the array? If they are present match the person. Good quality photograph of target Array of 10 good quality CCTV images

  5. Bruce et al (1999). Is this person in the array? If they are present match the person. Good quality photograph of target Array of 10 good quality CCTV images

  6. Bruce et al (1999). Is this person in the array? If they are present match the person. Good quality photograph of target When target was present in the array. 12% picked wrong person and 18% said they were not present (overall only 70% correct). When target was not present in the array 70% still matched the target to someone in the array. Array of 10 good quality CCTV images

  7. Face recognition performance by humans is poor. • Face recognition performance by machine is becoming quite good - but only if the images are of good quality. • Surveillance video rarely provides good quality images.

  8. Face recognition performance by humans is poor. • Face recognition performance by machine is becoming quite good - but only if the images are of good quality. • Surveillance video rarely provides good quality images. What image quality is needed for face identification?

  9. Image quality is defined by many attributes • Minimum feature size that can be resolved • Noise level • Quality of luminance reproduction • Quality of colour reproduction.

  10. Human Face Recognition In humans it has been found that face recognition is tuned to a set of spatial frequencies ranging from about 20 cycles per face width down to about 5 cycles per face width. Maximum sensitivity is centred around 8 to 13 cycles/face width. To recognize with confidence you need to be able to resolve down to 20 cycles/face width 20 cycles 10 cycles (Hayes, Morrone and Burr 1986) (Costen, Parker and Craw 1996) (Nasanen 1999) 5 cycles

  11. Human Face Recognition ~ 160mm In humans it has been found that face recognition is tuned to a set of spatial frequencies ranging from about 20 cycles per face width down to about 5 cycles per face width. Maximum sensitivity is centred around 8 to 13 cycles/face width. To recognize with confidence you need to be able to resolve down to 20 cycles/face width 8mm 20 cycles 16mm 10 cycles (Hayes, Morrone and Burr 1986) (Costen, Parker and Craw 1996) (Nasanen 1999) 5 cycles

  12. 1951 USAF Chart Groupings of 6 pairs of bars. Each successive set is half the size of the previous.

  13. 1951 USAF Chart 8mm 16mm Groupings of 6 pairs of bars. Each successive set is half the size of the previous.

  14. Eye charts also provide a simple way of measuring the minimum feature size that can be resolved.

  15. 6 6 20/20 Vision… … or in metric, 6/6 vision Snellen fraction Distance at which you can read the line on the chart Distance at which you should be able to read the line Minimum Angle of Resolution

  16. The logMAR chart Ian Bailey and Jan Lovie

  17. Snellen fraction Letter height 88mm 6/60 (legally blind) Number plate letters 80mm 6/48 72mm Average eye spacing 65mm 58mm 44mm 6/24 36mm 6/12 18mm 6/6 9mm

  18. Tests conducted with Pulnix TM6CN 1/2” CCD camera positioned 6m from the target. C-mount lenses: 4mm 6mm 8.5mm 12.5mm 16mm Images were digitized directly from the camera using a Data Translation 3155 frame grabber

  19. 4mm lens

  20. 6mm lens

  21. 8.5mm lens

  22. 12.5mm lens

  23. 16mm lens

  24. Expect to lose quality when images are recorded to video Camera image recorded to video, then played back and digitized. (Look at the USAF chart) Camera image digitized directly. (cropped images taken with 12.5mm lens)

  25. Compression is problematic. Test targets survive compression well, but faces do not. Original PNG image (190kB) JPEG images compressed using Photoshop. Image ‘quality’ can range from 0 - 12 JPEG image quality 0 (14kB) JPEG image quality 4 (24kB)

  26. Faces do not survive compression well JPEG (14kB) JPEG (24kB) Original

  27. What Does Compression Do? JPEG and MPEG • Image is divided into 8x8 blocks. • Discrete Cosine Transform is applied to each block. • The transform coefficients are quantized, many will be rounded to zero. • When reconstructed, the amplitude and phase of the spatial frequencies within each 8x8 block will be altered. The 64 basis functions of an 8x8 Discrete Cosine Transform

  28. ~ 40 pixels 12.5mm lens at 6m No compression

  29. 12.5mm lens at 6m 18:1 compression

  30. 12.5mm lens at 6m 18:1 compression

  31. 12.5mm lens at 6m 31:1 compression

  32. 12.5mm lens at 6m 40 pixels across face = 5 DCT blocks Spatial frequencies from 5 cycles/face width upwards are all corrupted This is exactly the range that is most important for face recognition! 31:1 compression

  33. A Real Surveillance Camera Installation…

  34. 4.8 m

  35. Image quality is defined by many attributes • Minimum feature size that can be resolved • Noise level • Quality of luminance reproduction • Quality of colour reproduction.

  36. Luminance and colour cues are at least as important as shape cues People perform about equally well using just shape information or just pigmentation cues. (Russell et al 2007) Same pigmentation, varying shape Same shape, varying pigmentation Original laser scanned faces

  37. Image compression typically quantizes colour information very heavily… Hue values as greyscale 16 x16 macro-blocks

  38. Conclusions • Surveillance video, as it is currently used, is almost useless for identification. • Face recognition in low resolution images is badly affected by compression artifacts. • Image quality standards are needed for surveillance camera installations.

  39. Tarja Halonen President of Finland Conan O’Brien US talk show host

More Related