310 likes | 425 Views
INF 389F—ORGANIZATION OF RECORDS INFORMATION. Professor Fran Miksa November 18, 2003 Data, Metadata, Metadata Formats, and Databases. Data & Databases. Data are strings of characters that record assertions, etc., about something. Data in computers are strings of codes representing characters
E N D
INF 389F—ORGANIZATION OF RECORDS INFORMATION Professor Fran Miksa November 18, 2003 Data, Metadata, Metadata Formats, and Databases
Data & Databases • Data are strings of characters that record assertions, etc., about something. • Data in computers are strings of codes representing characters • Databases are computer programs that allow us to manipulate data in computers. • IE data are data pertaining to IEs including, especially, attribute data. • IE databases are databases the data of which pertain to IEs as objects.
How Does Data Become Machine - Readable (i.e., computerized?) • Basic question--We want computer to write data into its memory, but how does it do that? • Substitution codes—a basic approach to representing data in computers.
Computer Switches as Codes 1 Switch—2 positions (on/off)--How different signals are possible? (21= 2 possible signals) 2 switches—always used together = (22 = 4 possible signals because of 4 possible switch setting combinations) 3 switches = “ “ “ = (23 = 8 “ “ ) 4 switches = “ “ “ = (24 = 16 “ “ ) 5 switches = “ “ “ = (25 = 32 “ “ ) 6 switches = “ “ “ = (26 = 64 “ “ ) 7 switches = “ “ “ = (27 = 128 “ “ ) 8 switches = “ “ “ = (28 = 256 “ “ ) Of course, in each case, the meanings of the signal switch combinations have to be agreed upon.
Where are Codes Placed? • First, each switch or spot is called a ‘bit’ (BInary digiT) • Second, each set of basic bits in a given character set of codes is called a ‘byte’ • Bit codes can be transferred to/triggered/“set” as a series of “switches” on a computer “chip.” • Codes can be represented as magnetized or not magnetized positions (i.e., spots, locations) on a magnetic surface such as a disk. • The bits of each byte are kept together as a unit.
Coding for Colors in Graphics • Colors are also encoded the same way, though the # of bits used for each coding may vary—for example, 8 bit, 16 bit, 24 bit codes for colors. • 8 bit color codes mean that each point that is coded has 256 bit combinations to represent all the colors, or all the shades in a “grayscale.” • 16 bit color codes have 65,536 bit combinations (in groups of 16 bits), and 24 bit color codes have 16,777,216 bit combinations (in groups of 24 bits).
Bits used in Graphics • Pixel = a location in a grid of locations superimposed on a graphic image. • 300 pixels to the inch in each dimension of an image yields for a 8” by 5” picture, 2,400 such pixels (locations/spots/dots, etc.) down and 1,500 pixels across, or 3,600,000 pixels total, each of which are coded for a color in a 8 bit, 16 bit, 24 bit code, etc. Formatting of the pixels are known by such names as tiff, jpeg, gif, etc., files.
Text & Control Characters as Codes • Lower case letters (26 total) • Upper case letters (26 total) • Numerals (10 total) • Special signs . , ; : “ ” ? / < > [ ] { } \ | - _ = + ` ~ @ # $ % ^ & * ( ) (31 total) [93 to here] • Blank space & other special symbols • Special codes for computer operation • Foreign language special signs
Character Codes • ASCII, EBCDIC, • See “A Brief History of Character Codes” • <http://tronweb.super-nova.co.jp/characcodehist.html>
Databases • Flat File Databases • Relational Databases • Data Modeling • Entity-relationship data models • Object oriented data models
Flat File Database From geekgirls reading-”Databases from Scratch—III”
Relatable Tables within the Database From geekgirls reading- “Databases from Scratch —III”
What Kinds of IE data might be useful? • Names (Persons, Corporate bodies) • Titles • Dates, Publishers, Places • Other physical details of packaging • Statements of editions, issues, etc. • Topics, genre, audiences, uses • Relationships
Two Forms of Data • Data that represents IE attributes and is simply recorded in some sequence • Among the foregoing, that data that are used specifically for searching (called access points, index terms, etc.)
Metadata & Metadata Formats • Metadata consists of strings of data within computers that record the attributes of informational objects (IEs). • Metadata formats are organized arrangements of categories of metadata
Original use of term metadata Object = Students; Data = attributes of students; Metadata = Data about data. D = Data; M = Metadata M D
Use of the term Metadata in Information Organization • When object became an IE, it represented data in and of itself. • Therefore, what would the phrase “Metadata = data about data” mean? • Metadata came to mean, all data inside the computer about
Metadata Formats The purpose of metadata formats is to “code” metadata in terms of categories. • The Categories have a wide variety of uses (e.g., content categories, computer instructions, formatting of content as text, etc.) • Some codes are used within databases only and are not generally seen by the information user (e.g., the codes in the MARC format) • Some codes are attached to metadata and text through “markup” in HTML or XML (though they are not usually seen by a user in a browser unless a special switch is clicked).
Mark-up Languages • A text-processing language which embeds commands into the text that is to be processed. These commands then instruct a display device or a printer to carry out some formatting. From “Markup language" A Dictionary of the Internet. Darrel Ince. Oxford University Press, 2001. Oxford Reference Online. Oxford University Press. 23 September 2003 <http://www.oxfordreference.com/views/ENTRY.html?subview=Main&entry=t12.002053>
From “A Gentle Introduction to SGML--http://etext.virginia.edu/bin/tei-tocs?div=DIV1&id=SG • Historically, the word markup has been used to describe annotation or other marks within a text intended to instruct a compositor or typist how a particular passage should be printed or laid out. • Generalizing from that sense, we define markup, or (synonymously) encoding, as any means of making explicit an interpretation of a text. • By markup language we mean a set of markup conventions used together for encoding texts.
From “A Gentle Guide to SGML” (cont’d) • A markup language must specify • what markup is allowed, • what markup is required, • how markup is to be distinguished from text, and • what the markup means. SGML provides the means for doing the first three; documentation such as these Guidelines is required for the last.
Specific Markup “Languages” • SGML--Standard Generalized Markup Language • For texts • DTD--Document-type-description • Header • HTML--Hypertext Markup Language • A subset of SGML for marking up text for browsers that is platform independent
Specific Markup Languages (cont’d) • XML--Extensible Markup Language • Based on SGML, but adds the capacity to define or otherwise insert special categories. • HTXML--Hypertext Extensible Markup Language • Other Markup languages--e.g., for every special purpose imaginable--Geography ML, Chemical ML, Gene Expression ML (GEML), Wireless ML, Rule ML (for XML), Theological ML, Bean ML (for JavaBean), etc.
Why is a Knowledge of Markup Languages Important for Information Organization? • MLs contain Document Description capabilities. • MLs contain categories that can be used in databases. • At some point, an information organizer must use markup language for displaying information organization data.
Metadata Category Codes • No metadata category codes will be useful unless they are consciously deployed in a computer program. • Metadata codes become especially useful for information organization when they are deployed in an IE organization system—i.e., in an IE database.
IE Databases • An IE database is an organized structure of metadata that is used for organizing and retrieving IEs in computers. • Organizing and retrieving IEs by means of a database is possible because the database allows us to manipulate the metadata in terms of the categories represented by the metadata.
A General Maxim • A professional information entity organizer must understand the place of data, metadata, metadata formats, and databases in his or her work • Their general roles • The particular details of specific systems used.