250 likes | 366 Views
I502 Information Management. Lecture 1 January 13, 2004. Outline. Growth in information Basic information units and baselines Dimensions of information management. Information Growth. From 1.8 million to 26 million. Information Growth.
E N D
I502 Information Management Lecture 1 January 13, 2004
Outline • Growth in information • Basic information units and baselines • Dimensions of information management
Information Growth From 1.8 million to 26 million
Information Growth • In 1951 there were 10,000 journals and now there are about 140,000 journals • Estimate: Printed/conventional information double every eight years • How much new information per person? According to the Population Reference Bureau, the world population is 6.3 billion; almost 800 MB of recorded information is produced per person each year. It would take about 30 feet of books to store the equivalent of 800 MB of information on paper * Source: Lyman, Peter and Hal R. Varian, "How Much Information", 2003. Retrieved from http://www.sims.berkeley.edu/how-much-info-2003 on Jan. 10, 2004
Information Growth • In 1999, there were 800 million web pages, now there are at least 3 billion pages (as of this morning!) • Total volume of web content: • Surface web: 167 TB • Deep web: 91850 TB * Source: Lyman, Peter and Hal R. Varian, "How Much Information", 2003. Retrieved from http://www.sims.berkeley.edu/how-much-info-2003 on Jan. 10, 2004.
A Major Factor Behind Growth • Shifts in major economies in the world • From Agro-Industrial -> Service • Information-driven businesses such as banking, entertainment, computing, publishing dominate
Information Processing ->Competitive Advantage • In information-based economy capability to store, organize, process, and learn from information is critical to surviving in the market place
Info growth in relation to Info Technology 1998 1999 2003 1950 1960 1969/70 1973 1984 1991 1993 1995 DNS introduced; Hosts now 1000; sybolics.com registration (‘85) Netscape goes public; Java launched 4.3 million web servers; 800 million web pages ARPA net commissioned by DoD - 4 nodes - 1969 Relational Model Introduced - 1970 Feb. Mosaic introduced by NCSA; by Oct. 500 servers; Nov. Mosaic Mac and Wintel Intel 8080 microprocessor - entire CPU on a chip - cost then $400 -now $1 or less Internet hosts reaches 30 million; WWW sites reaches 2,200,000; 320 million web pages; XML becomes a W3C standard WWW dev. By TBL; NSFNET backbone now T3; traffic 1 trillon bytes/month Random Access Files Introduced - Disk Drives Available 3 billion web pages Indexed by Google
Influence of Info Tech on Growth of Info • Computers make production, manipulation and distribution of data easier … leading to more info • With popularity of computers, data is becoming digital …& there is more of it …
Basic Units of Information • Digital Units of Data • 0 or 1 = single bit • Eight bits = 1 byte • 1000 bytes = 1 Kilo byte (1 KB) - 3 (0’s) • 1000 Kbytes = 1 Mega byte (1 MB) - 6 • 1000 Mbytes = 1 Giga byte (1 GB) - 9 • 1000 Gbytes = 1 Tera byte (1 TB) - 12 • 1000 Tbytes = 1 Peta byte (1 PB) - 15 • 1000 Pbytes = 1 Exa byte (1 EB) - 18
Baselines Unit Amount(bytes) Example Byte 1 one character Kilobyte 30,000 image of a book page 500,000 a typical novel in text format Megabyte 1,400,000 a 3.5 inch disk 10,000,000 a Mozart symphony, compressed 20,000,000 a digitized scanned book 650,000,000 a CD-ROM disk Gigabytes 10,000,000,000 a digitized movie, compressed 17,000,000,000 a DVD disk Terabytes 10,000,000,000,000 LC Print Collection Petabytes 2,000,000,000,000,000 Content of all US research libraries Exabytes 5,000,000,000,000,000,000 New information produced in 2002 (92% on magnetic media – hard disk)
Growth Numbers to Ponder • In 2001 WalMart’s DW was roughly half the size of the world’s largest library (11 TB)
Growth Numbers to Ponder • A high resolution astronomical camera can generate about half the size of Walmart’s DW in about eight hours!
Possible Consequences of Digitization • Convergence in industry … motivated by multi-purposing of content Broadcast Mass Media Publishing Computing Communication By Nicholas Negrponte, MIT
Possible Consequences of Growth • Data Glut • Data knowledge • Data decision • Interesting book -> Data Smog : Surviving the Information Glutby David Shenk
Possible Consequences of Growth • Computers are duel-edged swords -> help produce more data but if used properly can help manage and transform data
Transformation of Data • Data -> Knowledge • Requires a two-pronged approach • IM Macro level - Broad understanding of info management technologies • IM Micro level - Deep understanding of data modeling, organization, retrieval, and analysis
Dimensions of IM Macro Level • Data and collection building • Architecture • Networked access • Users and social impact
Different Types of Data • Need to store and serve text, operational/ transactional data, statistics, image, audio, video • Many primary formats, e.g., ASCII, Proprietary, POSTSCRIPT, LATEX, GIF, JPEG, AIFF, QUICKTIME .. • Many secondary formats, e.g., PKZIP, UUENCODE, TAR, UNIX compress ...
Aggregating Data • Databases = structured data • Digital Libraries = both structured and un-structured data • Data Warehouses = extracted, filtered, classified, integrated, and summarized data • Primary data must be accurate – DW data must be “curated”
Info Architecture • The structure or organization of information can directly influence interaction • IM systems are designed with close attention to navigation, search, and means for access to information • The user interface (UI) is designed with specific attention to user needs and their background • UI provides immediate feedback about organization and supports multiple means of accessing information
Networked Access • Resources may be at different locations • Distributed access supported, meaning users can get at data from any network-accessible devices • Also generally data is available at any time; implies dis-intermediation or absence of a human intermediary between user and data
Embedded in WWW Browser Script Web Server Script Browser Backend systems Script Internet Browser Local Files
Users and social impact • Must remember, technology change revolutionary but human change evolutionary - seek balance • paper highly portable & culturally supported • not everything can be digitized • sensitivity to human and social issues needed (HCI and legal issues can be critical)
IM Micro level • Data modeling • Data normalization and relational model (discrete data) • Handling full-text data and non-text data • WWW based IMS • User Interface design