380 likes | 536 Views
Key terms and concepts: introducing first principles. Overview. digital imaging resolution and bit depth file types colour management metadata digital libraries digital preservation. Reminder: what can be delivered digitally. Born digital content Paper Text content
E N D
Overview • digital imaging • resolution and bit depth • file types • colour management • metadata • digital libraries • digital preservation
Reminder: what can be delivered digitally • Born digital content • Paper • Text content • Bound volumes or manuscripts • Photographs – prints, slides, transparencies • Microfilm, microfiche and aperture cards • Video and audio • Maps, drawings and large paper formats • Original art works, textiles etc. • Physical 3-dimensional objects or views
Digitization is? • Digitization is the process of converting analogue originals to computer-readable form • Digital imaging and scanning are mechanisms for capturing a digital picture • A digital image is sampled and mapped as a grid of squares known as picture elements (pixels) • Digital files use a binary format with a series of ‘1’ and ‘0’, ‘on’ and ‘off, to represent data • like a light switch ... • Tech joke: there are 10 types of people, those who understand binary and those who don’t
Digitization processes: Scanning • Capturing lines of pixels moving across the object • Used in flatbed scanners, slide scanners and scanning back cameras for instance
Digitization processes: Digital Photography • Digital photography or direct digital capture: captures all the pixels in a single matrix • Used in digital cameras and bookscanners for instance
Digital imaging = digital pictures • All we get is a digital picture, not digital text
Digital text • Digital text requires additional processes: • Optical Character Recognition (OCR) • Rekeying • Mark-up:XML, SGML
Pixels redux • Pixels are picture elements • They are usually square • They are the smallest component of the digital image • By combining pixels in different orientations and density we get shapes and content • By changing the tonal values of pixels we get colour • i.e. resolution and bit depth
Resolution • describes the density of spatial detail • is usually expressed as dots-per-inch (dpi) or pixels-per-inch (ppi) • these terms are synonymous, but dpi usually refers to printed images and ppi to screen images • remember the spatial detail is in relation to the original item imaged
Resolution • It is often more useful to use absolute terms for resolution • actual pixel dimensions are given • 2490 x 3510 for example • Equals the pixel dimensions of an A4 sheet of paper scanned at 300 dpi • But also equals* the dimensions of * within 5% • A5 page at 425 dpi • A3 page at 200 dpi • A2 page at 150 dpi • 8.7 Megapixel digital camera image of a landscape @ 96dpi
Bit depth • Defines the colour space for each image and pixel • this is the number of bits (binary digits) used to define each pixels tonal value • Black and white (bitonal) = 1-bit per pixel • Greyscale = 8-bit (256 shades of grey) • RGB Colour = 24-bit (16.7 million colour tones)
Some rules of thumb • Resolution: • capture the smallest significant detail • the smaller the original the higher the resolution • double the resolution - quadruple the filesize • Bit depth: • 1 bit = Black and white • 8 bit = greyscale (x8 filesize) • 24 bit = full RGB colour (x24 filesize) • CMYK: avoid using for scanning or storage • Select the right colour space for your original
Digitization Basics: Tutorials • Cornell Digital Imaging Tutorial www.library.cornell.edu/preservation/tutorial/contents.html
Digital files • Can use compression to reduce file sizes. There are 2 main types: • Lossy • there is irrecoverable loss of data with inevitable worsening of quality, but can achieve considerable size reductions • JPEG • Lossless • no loss of data, but not such great size reductions • LZW, ITU.T.6 (formerly CCITT Group 4)
Some common file formats • There are many, many file formats • The commonest you will meet are probably: • TIFF • GIF • JPEG • PDF
TIFF: Tagged Image File Format • De facto standard • Needs plug-in or external application for web display although some browsers now accept it • Can be tagged with basic metadata • Can be used for files up to a bit depth of 64 • The format of choice for long-term archiving
JPEG: • Joint Photographic Expert’s Group/JFIF (JPEG File Interchange Format) • De facto standard for web display • Native to web browsers (ie no plug-ins needed) • Has free-text comment field for metadata • Can be used for files up to 24 bit • Commonly used for web display images • JPEG2000 – enables zooming and more metadata
GIF: Graphics Interchange Format • De facto standard for web display • Native to web browsers (ie no plug-ins needed) • Has free-text comment field for metadata • Can be used for files up to 8 bit • Commonly used for web display images • Likely to be replaced by PNG (Portable Network Graphics)
PDF: Portable Document Format • Proprietary (Adobe) format, but now a de facto standard for document delivery • Needs plug-in or external application for web display • Can be used for files up to 64 bit • Used for printing and viewing multipage documents • Comes in 3 versions: • Image only • Image and text • Full text
Colour Management: What is it? • Colour is device dependent and looks different when: • printed on different printers • viewed on different monitors • printed on a printer and viewed on a monitor • viewed in a light booth and under office lighting • Colour Management Systems (CMS) maintain the consistent and accurate "appearance" of a colour on different devices (e.g. scanners, monitors, printers, etc.) throughout an imaging workflow
RGB Display RGB Scanner Original CMYK Printer App Driver Sends RGBs or CMYKs to Printer Displays Scanner RGBs "Colour" Workflow
Colour Management: components • Use a consistent colour space • Apply an independent colour profile • International Color Consortiumwww.color.org • Monitor calibration • Colour targets • GretagMacbethwww.gretagmacbeth.com
Metadata • What is metadata • What is metadata for
What is metadata? • Tony Gill – ARTstor • Metadata refers to structured descriptions, stored as computer data, that attempt to describe the essential properties of other discrete computer data objects. • Big picture definition: • the sum total of what can be said about any information object at any level of aggregation
What is metadata for? • World Wide Web consortium say metadata is: • to provide a means to discover that the data set exists and how it might be obtained or accessed • to document the content, quality, and features of a data set, indicating its fitness for use. • Therefore we need to think: • content, context and structure
What characterises a digital library • A digital library is a managed collection of digital objects • The digital objects are created or collected according to principles of collection development • The digital objects are made available in a cohesive manner, supported by services necessary to allow users to retrieve and exploit the resources just as they would any other library materials • The digital objects are treated as long-term stable resources and appropriate processes are applied to them to ensure their quality and survivability."
What is collection development? • American Library Association's definition: "A term which encompasses a number of activities related to the development and determination of the collection, including the determination and coordination of selection policy, assessment of needs of users and potential users, collection evaluation, identification of collection needs, selection of materials, planning for resource sharing, collection maintenance, and weeding." (ALA Glossary of Library & Information Science)
Digital Preservation: digital lifecycle approach • ‘The major implications for lifecycle management of digital resources, whatever their form or function, is the need to actively manage the resource at each stage of its lifecycle and to recognise the interdependencies between each stage and commence preservation activities as early as practicable. This represents a major difference with traditional preservation, where management is largely passive until detailed conservation work is required, typically many years after creation and rarely, if ever, involving the creator. There is an active and interlinked lifecycle to digital resources which has prompted many to promote the term 'continuum' to distinguish it from the more traditional and linear flow of the lifecycle for traditional analogue materials.’ • Preservation Management of Digital Materials: A Handbook - Neil Beagrie & Maggie Jones www.jisc.ac.uk/dner/preservation/dpc/