560 likes | 740 Views
Session 703 Book to Computer: Scanning Basics. Gaeir Dietrich Director High Tech Center Training Unit of the California Community Colleges. Overview. Scanning and scanners Understanding scanning terminology Scanning workflow. Scanning. Scanning takes a picture.
E N D
Session 703Book to Computer:Scanning Basics Gaeir Dietrich Director High Tech Center Training Unitof the California Community Colleges
Overview • Scanning and scanners • Understanding scanning terminology • Scanning workflow CTEBVI Conference
Scanning Scanning takes a picture. The better the picture, the less editing later on Similar technology to the copy machine but outputs to a digital file, not paper. CTEBVI Conference
Stand Alone vs. Multi-use • Stand alone scanners… • Provide more control over scans • Result in better scans • Multi-use machines are copiers first, scanners second. • Final products require more editing during production • But it is still better than a flatbed scanner CTEBVI Conference
Scanners • When buying a scanner think about these issues: • Duplex (two-sides at once) • Automatic feed (pages per minute) • Color (for color dropout) • We like Canon, followed by Fujitsu. • Canon DRC-125, DR-3010C CTEBVI Conference
No Money? • A $400 20-page-per-minute scanner is a far better deal than four $100 flat-bed scanners • If you can only afford a flat-bed, look for one with automatic document feed (ADF) CTEBVI Conference
Scanning Outputs Color scanning usually creates a JPEG. JPEGs are single pages only!! Black and white scanning creates a TIFF. TIFFs can be multiple pages. CTEBVI Conference
What is a TIFF? TIFF files are graphics, i.e., pictures of text. Tagged Image File Format (TIFF) Robust, stable standard file type No version issues Any program that can open multipage graphics can open a TIFF Good archival graphical format CTEBVI Conference
But I scan to… • If you get anything other than a TIFF or JPEG, you have used software to convert. • If you scan to PDF, you have used software to transform your file. • Scanning hardware does not create PDFs. • Conversion runs the risk of losing data and increasing editing time. CTEBVI Conference
Converting TIFFs TIFF can be converted to other formats, including other graphic formats like PDF. To get to the text you must run a TIFF file through an optical character recognition (OCR) program. CTEBVI Conference
Scanning Is the First Step • Settings for your scan will be determined by the end-format you want to create • For text, you will scan then run OCR • Optical Character Recognition • See session 901 on Sunday CTEBVI Conference
Duplex vs. simplex Skew/deskew Margin control DPI (Resolution) Mode Brightness Contrast Threshold RGB color Color dropout Scanning Terms CTEBVI Conference
Duplex vs. Simplex • Double-sided vs. single-sided • Duplex = two sides at a time (one pass) • Simplex = one side at a time • Flatbed scanners are simplex scanners • Look for true duplex (one pass) • Not two passes with the program interleaving the scans CTEBVI Conference
Skew • Skew is slant • i.e., the page is not straight • Snug the feed guides! • Use deskew settings. • The computer can correct for some skew—too much and the text cannot be recognized CTEBVI Conference
Margin Control • Scanner determines page size • Avoids large black areas around the edge of the page • On better machine, also removes need for measuring • Better scanners will also have margin adjustment • Note that usually *all* edges are adjusted the same amount. CTEBVI Conference
DPI (Dots per Inch) • “Dots” in scanning are really pixels • Little squares like on graph paper • Imagine drawing by filling in squares on graph paper…the more squares, the smoother the lines • Higher DPI = better resolution • However, more is not always better! CTEBVI Conference
DPI Comparison CTEBVI Conference
Resolution—DPI • Standard for text is 300 DPI • Small text may require 400 DPI • Thin paper may require 150-200 DPI • Really large text may require 200 DPI • Infty Reader for math requires 600 DPI CTEBVI Conference
Mode • Black & white • Looks like line art • Only choices for pixels are black or white • Grayscale • Looks like black & white photo • Also called “halftone” • Color • Comes in different “bits” • The more bits, the more color information CTEBVI Conference
Black and White • Image scanned in B/W—file size 474 KB CTEBVI Conference
Black and White ED • Image scanned in B&W ED (Canon DR 5080C)—file size 474 KB CTEBVI Conference
Grayscale • Image scanned in Grayscale—file size 3,731 KB CTEBVI Conference
Choosing the Mode • Black and white • Best for text; smallest file size • Black and white ED (error diffusion) • Better for graphics; slightly larger files • Usually best to avoid grayscale • Large files that do not OCR as well • Color • Sometimes necessary; large files CTEBVI Conference
Which Mode to Choose? • It depends on how important the graphics are! • Is it for a student who has some usable vision and needs enlargement? • Grayscale or color may be needed • Is it to create braille? • Black and white will usually give the best OCR results. CTEBVI Conference
Brightness • Overall darkness or lightness of page • Balance • Not too dark, not too light • Scale 1-255 • Lower numbers decrease brightness • Down into darkness • Higher numbers increase brightness • Up to the light CTEBVI Conference
Brightness Example • It’s just like turning on lights over an entire room. CTEBVI Conference
Adjusting Brightness • Default is 128 • Too dark • Letter shapes run together • Too light • Letter shapes are thin or broken • Newsprint type papers often need increased brightness CTEBVI Conference
Brightness Guidelines • Check the appearance of the scan • If characters are thick and touching (running together) > increase brightness • If characters are thin and broken (lines thin/missing areas) > reduce brightness CTEBVI Conference
Sample Scans • Too bright • Just right • Too dark CTEBVI Conference
Contrast • Difference between light and dark on page • Scale is 1-13 • Higher number increases contrast • Darks darker, lights lighter • Lower number decreases contrast • Darks get lighter, lights get darker • Becomes more uniform CTEBVI Conference
Contrast Example CTEBVI Conference
Adjusting Contrast • Default is 7 • Low contrast • Entire page is either “muddy” looking • Or washed-out looking • High contrast • Extremes of light and dark • May lose midrange detail • Newsprint-type paper oftens need increased brightness CTEBVI Conference
Threshold • In black and white mode • Sometimes just see brightness (contrast settings disappear) • Sets where gray will be seen • Increased threshold adds more white • More grays seen as white • Decreased threshold adds more black • More grays seen as black CTEBVI Conference
Despeckle • “Erases” speckles • Helps with small stray black dots • Works really well when having to scan a photocopy or newsprint • Beware of going too far and erasing periods and umlauts CTEBVI Conference
Gamma…it’s complicated… • Adjusts the middle tones • Usually more useful for scanning graphics than text • Can be altered to bring out more detail in shadows in photos • Usually only on high-end hardware • Try everything else first! CTEBVI Conference
Settings Summary • Brightness = overall tone • Contrast = difference in highs and lows • Gamma = adjustment in midtones • Threshold = on or off switch for grays • Grays seen as white or black • May appear as just the “brightness” bar CTEBVI Conference
RGB Color • RGB = Red, Green, Blue • RGB color system is used by TVs, computers, and scanners! CTEBVI Conference
“Additive” Color System CTEBVI Conference
Color Scanners • Many color scanners for documents allow “color dropout” • The scanner “ignores” a particular color • “Erases” the color • Red, blue, or green CTEBVI Conference
Color Dropout • Drop out colored markings • Orange highlighter (drop out red) • Blue pen (drop out blue and despeckle) • Yellowish pages • Drop out red (improves contrast) • Tinted backgrounds • Watch out for dropping out text • Be aware of color with white text on it CTEBVI Conference
Scanned Page with Orange Highlighter CTEBVI Conference
Same Page with Red Drop-out CTEBVI Conference
Scanning Workflow Remove spine from book Separate any pages still glued together Choose a few representative pages for a test scan CTEBVI Conference
Procedure Continued • Scan representative pages to TIFF • Check image on screen for possible adjustments • Run OCR on sample pages • Error rate should be no higher than one per page • Higher errors mean you need to adjust the scanner settings CTEBVI Conference
Ready to Scan • With the settings determined, scan the entire book • Now that you have a good picture, your OCR and editing should go quickly! CTEBVI Conference
Advanced Ideas • Be aware of individual pages that may need additional adjustment • A few pages may need to be scanned separately • A few pages may need color • Reassemble in your OCR program • While checking test pages, also create OCR templates as appropriate CTEBVI Conference
Suggestion on Organizing Files Structure Label chapters (or chapter folders): 01 Chapter 02 Chapter Label front matter to place it first: 00 Front Matter Label back matter just with its name: Back Matter This file structure will create a logical order. CTEBVI Conference
Example CTEBVI Conference
Timesaver: Create a Template Folder • The template folder can be copied and pasted—all the inside folders are copied, as well! • Putting the zero in front makes the folder easy to find. CTEBVI Conference
Miscellaneous Tips • Chopping books • Guillotine • Exacto knife to remove spine and check with Fed Ex Office (Kinko’s) about cutting the pages • Spines and flatbeds • If you have to scan a book with a thick spine on a flatbed, get a large dark piece of cloth and cover the scanner—prevents the darkened area along the spine CTEBVI Conference