690 likes | 831 Views
Managing Data for Maximum Utility From Tables and Spreadsheets to Relational Databases and XML. Caryn Anderson Simmons College Boston, MA - 22 April 2006. What are Data? What matters when managing Data? What tools can you use? Where can you learn more?.
E N D
Managing Data for Maximum Utility From Tables and Spreadsheets to Relational Databases and XML Caryn Anderson Simmons College Boston, MA - 22 April 2006
What are Data? • What matters when managing Data? • What tools can you use? • Where can you learn more? Managing Data for Maximum Utility
What are Data? • What matters when managing Data? • What tools can you use? • Where can you learn more? Managing Data for Maximum Utility • Data, Information, Knowledge, Intelligence • Data personality • Data Evolution Continuum • People • Preservation • Tables • Spreadsheets • Relational Databases • XML • Tools • Information Visualization
What are Data? Data, Information, Knowledge & Intelligence What are the distinctions? Managing Data for Maximum Utility
What are Data? • Data, Information, Knowledge & Intelligence • “Dictionaries define • data as factual information (measurements or statistics) used as a basis for reasoning, discussion, or calculation; • information as the communication or reception of knowledge or intelligence; • knowledge as the condition of knowing something gained through experience or the condition of apprehending truth or fact through reasoning, and • intelligence as the ability to understand and to apply knowledge.” • (Bouthillier & Shearer, 2002) Managing Data for Maximum Utility
What are Data? • Data, Information, Knowledge & Intelligence • “Knowledge differs from information in that it is predictive and can be used to guide action while information merely is data in context. • For example, if the raw data is –10 degrees, then information would be it is –10 degrees outside, and the knowledge would be that –10 degrees is cold and one must dress warmly. In other words, knowledge is closer to action while information could be seen as documentation of any of pieces of knowledge.” • (Bouthillier & Shearer, 2002) • Bouthillier, F., and Shearer, K. (2002, October). Understanding knowledge management and information management: the need for an empirical perspective. Information Research. 8(1). Retrieved April 17, 2006, from Information Research Web site: http://informationr.net/ir/8-1/paper141.html Managing Data for Maximum Utility
What are Data? • Data Personality • type – text, numbers, digital objects? … • context – financial, scientific, personnel, health, inventory, operational, scheduling, reference, assessment, … • confidentiality – choice, law, … • volume – small, large, potential for growth, … Managing Data for Maximum Utility
What are Data? Data Evolution Continuum Managing Data for Maximum Utility Generation • types • source info • confidentiality • clean or dirty
What are Data? Data Evolution Continuum Managing Data for Maximum Utility Generation Storage • types • source info • confidentiality • clean or dirty • space • formats • metadata • deposit/input • updates
What are Data? Data Evolution Continuum Managing Data for Maximum Utility Generation Storage Interaction • types • source info • confidentiality • clean or dirty • space • formats • metadata • deposit/input • updates • sharing • context • analysis • meaning • more data
What are Data? Data Evolution Continuum Managing Data for Maximum Utility Generation Storage Interaction Presentation • types • source info • confidentiality • clean or dirty • space • formats • metadata • deposit/input • updates • sharing • context • analysis • meaning • more data • summative • predictive • dynamic
What are Data? Data Evolution Continuum Managing Data for Maximum Utility Generation Storage Interaction Presentation Data Information Knowledge measurements statistics facts add context meaning summarize predict help decisions spur action
What are Data? Data Evolution Continuum Managing Data for Maximum Utility often an iterative process Generation Storage Interaction Presentation Data Information Knowledge measurements statistics facts add context meaning summarize predict help decisions spur action
What are Data? Data Evolution Continuum frequently requiring more data and/or more analysis Managing Data for Maximum Utility Generation Storage Interaction Presentation Data Information Knowledge measurements statistics facts add context meaning summarize predict help decisions spur action
What are Data? Data Evolution Continuum Managing Data for Maximum Utility Generation Storage Interaction Presentation Data Information Knowledge measurements statistics facts add context meaning summarize predict help decisions spur action
What matters when managing Data? • Frequently, poor data management decisions are the result of the exclusive consideration of: • the types of data involved, and • the tools that the person responsible is familiar with. • “When the only tool you have is a hammer, • you tend to treat everything as if it were a nail.” • Abraham Maslow Managing Data for Maximum Utility
What matters when managing Data? • Two most important factors • People • Data should be easy to engage with and easy to understand by everyone that encounters it. • Preservation • The security, privacy and integrity risks increase as the handling of data increases. Managing Data for Maximum Utility
What matters when managing Data? • People • Who will enter/interact/view the data? • affiliation • level of personal investment • intellectual competence • technical competence • What will they do with it? • record • analyze • make decisions • How will they access it? • information retrieval • information visualization Managing Data for Maximum Utility
What matters when managing Data? • Preservation • What are the data personalities? • type • context • confidentiality • volume • Where do they hang out? • home • work/play • wandering • When is there the greatest risk? • risk areas • legal obligations • risk agents Managing Data for Maximum Utility
What tools can you use? • Tables • Tables 101 • Pros • Cons • Examples • OVDLT Literature Review • MLIP Cohort Meeting Schedule • Leadership Model Managing Data for Maximum Utility
What tools can you use? • Tables • Pros • easy to learn and use in word documents • reasonable control of presentation • some sorting allowed • good for one-time only, non-duplicating, non-interacting data • Cons • duplication of data • no calculations • no re-use of data easily • only simple alignment • other… Managing Data for Maximum Utility
What tools can you use? • Tables • Examples • OVDLT - Open Video Digital Library Toolkit Literature Review • MLIP - Managerial Leadership in the Information Professions Cohort Meeting Schedule • MLIP Leadership Model Managing Data for Maximum Utility
What tools can you use? • Spreadsheets • Spreadsheets 101 • Pros • Cons • Examples • NEASIS&T Registration • Simmons ERM • JMA reports Managing Data for Maximum Utility
What tools can you use? • Spreadsheets • Pros • best for numbers and currency • sophisticated sorting and calculating • visualization of information (charts) • can feed mail merge with word documents • Cons • duplication of data for multi-faceted relationships • selection of portions of data complicated • re-use of data requires complex, error-prone, manual formulation • formulas contained within discrete cells – no calculation on the fly • other… Managing Data for Maximum Utility
What tools can you use? • Spreadsheets • Examples • NEASIS&T - New England chapter of the American Society of Information Science & Technology Event Registration • Simmons ERM - Electronic Resource Management • JMA - John More Association Reports Managing Data for Maximum Utility
What tools can you use? • Relational Databases • Databases 101 • Pros • Cons • Examples • I2S Network (Access) • ERUS (and ERMI guidelines) • Open Video • Backpackit • del.icio.us • blogs Managing Data for Maximum Utility
What tools can you use? • Database Development • Clarify • Visualize • Specify • Build • Test • Adjust Managing Data for Maximum Utility
What tools can you use? • Database Development • Clarify your thinking about data and user scenarios • Visualize the relationships between the entities • Specify the details of the entities and relationships • Build the database according to specifications • Test the database against user scenarios • Adjust the structure, editing or reports functions Managing Data for Maximum Utility
What tools can you use? • Clarify • Introduction / History • Collection Description • Users • User Activities (Needs) • User Personas with specific use scenarios Managing Data for Maximum Utility
What tools can you use? • Visualize • Entities • attributes • Relationships • one to one • one to many • many to many • recursive Managing Data for Maximum Utility
What tools can you use? • Specify • Data Dictionary • Relational Schema • Attributes • Relationships Managing Data for Maximum Utility
What tools can you use? • Build • Select a data management tool • Build according to specs • Test • Test database against user scenarios • Adjust • Adjust structures, interfaces and/or reporting functions Managing Data for Maximum Utility
What tools can you use? • Relational Databases • Pros • handles complex multi-faceted relationships • analysis easily customized • easy selection of sub-groups of data • scalable for large amounts of data • accessible from web interfaces (though with some work) • Cons • learning curve steeper than for tables and spreadsheets • difficult to see all data at a glance • if not web-accessible, partners must have same application • other… Managing Data for Maximum Utility
What tools can you use? • Relational Databases • Examples • I2S -Integration and Information Sciences Network (Access) • ERUS - Electronic Resource Usage Statistics • Open Video • Backpackit • del.icio.us • Blogs Managing Data for Maximum Utility
What tools can you use? • XML • XML 101 • Pros • Cons • Examples • OAI PMH • EAD / MODS • ICISC RSS • SUSHI • Bloglines Managing Data for Maximum Utility
What tools can you use? The Display of the Document Managing Data for Maximum Utility My First XML Chapter 1: Introduction to XML What is HTML? What is XML? Chapter 2: XML Syntax Elements must have a closing tag Elements must be properly nested original slide content courtesy of Shaoping Moss
What tools can you use? An HTML Document An HTML document describes the book: Managing Data for Maximum Utility … <h1>My First XML</h1> <h2>Introduction to XML</h2> <p>What is HTML?</p> <p>What is XML?</p> <h2>XML Syntax</h2> <p>Elements must have a closing tag.</p> <p>Elements must be properly nested.</p> … original slide content courtesy of Shaoping Moss
What tools can you use? An XML Document An XML document describes the book: Managing Data for Maximum Utility … <book> <title>My First XML</title> <chapter>Introduction to XML <para>What is HTML?</para> <para>What is XML?</para> </chapter> <chapter>XML Syntax <para>Elements must have a closing tag.</para> <para>Elements must be properly nested.</para> </chapter> </book> … original slide content courtesy of Shaoping Moss
What tools can you use? HTML Elements/Tags An HTML document describes the book: Managing Data for Maximum Utility … <h1>My First XML</h1> <h2>Introduction to XML</h2> <p>What is HTML?</p> <p>What is XML?</p> <h2>XML Syntax</h2> <p>Elements must have a closing tag.</p> <p>Elements must be properly nested.</p> … • Are: • defined by HTML standard • always the same • can be used in any order original slide content courtesy of Shaoping Moss
What tools can you use? XML Elements/Tags An XML document describes the book: Managing Data for Maximum Utility … <book> <title> My First XML</title> <chapter> Introduction to XML <para> What is HTML?</para> <para> What is XML?</para> </chapter> <chapter> XML Syntax <para> Elements must have a closing tag.</para> <para> Elements must be properly nested.</para> </chapter> </book> … • Are: • defined by user/groups (DTD/Schema) • different for each DTD/Schema • hierarchical (tree structure) original slide content courtesy of Shaoping Moss
What tools can you use? XML is flexible and extensible An XML document describes the book for a different user group: Managing Data for Maximum Utility … <manuscript> <name> My First XML</name> <part> Introduction to XML <section> What is HTML?</section> <section> What is XML?</section> </part> <part> XML Syntax <section> Element Rules</section> <para> Elements must have a closing tag.</para> <para> Elements must be properly nested.</para> </part> </manuscript> … Instead of “book” Extend to accommodate greater detail of “part” “section” AND “paragraph” original slide content courtesy of Shaoping Moss
What tools can you use? Differences between HTML and XML XML is not a replacement for HTML. XML and HTML were designed with different goals. - XML was designed to describe data and to focus on what data is. - HTML was designed to display data and to focus on how data looks. HTML structure and tags are very loose while XML structure and tags are strict: - XML documents must be well-formed. - XML elements must be properly nested. - All XML elements must be closed. - Tag names must be case consistent. Managing Data for Maximum Utility original slide content courtesy of Shaoping Moss
What tools can you use? Differences HTML XML Managing Data for Maximum Utility Content Format Selection & Organization • Held in specific containers that describe what the data is (<book>, <chapter>, etc.) • -XSLT files define the formats of each section (i.e. font, color, size, etc.) • -multiple XSLTs for same XML • -XSLT selects and determines order of display of content • Multiple XSLTs for same XML (one to produce just book title list, one to display full text, one for citations, etc.) • - Held in generic containers (<h1>, <p>, etc.) • In the default format of the content tag OR • As defined by a Cascading Style Sheet (internal or external) • -All content always included (no option to easily select or suppress content – must manually change document) • Content only displayed in the order written (to change order you must manually change document original slide content courtesy of Shaoping Moss
What tools can you use? Differences HTML XML Managing Data for Maximum Utility Analogy What you can get Address List in plain WORD document One document of your list of contacts with all the information that you have for each person in the order you typed it. • Address List in database or MAIL MERGE data file • Friends & Family with full addresses for Holiday cards • E-mail list of just Professional contacts for announcing new product • Special formatting of whole list for better display on PDA • Etc. etc. etc. all from SAME XML document original slide content courtesy of Shaoping Moss
What tools can you use? • How to Build an XML file family • Establish the Document Type Definition (DTD) or Schema • Write a well-formed XML document that holds your data in the containers established by your DTD/Schema • Validate your XML document to make sure you conformed to your DTD/Schema • Build as many different XSL documents as you need to select data from your XML file, organize it the way you want it to appear, and format it so it looks the way you want. • Now you can link your XML file to whatever XSL you want • to get the kind of display you want at any given time. Managing Data for Maximum Utility original slide content courtesy of Shaoping Moss
What tools can you use? The XML family unit of files and languages Managing Data for Maximum Utility http://www.mysite.org/myfile.xml WEB PAGE 5. Displays content to browser 1. Calls the .xml file Uses HTML for formatting XML Where the data is held XSL Instructions for using XML data and displaying it 2. Calls .xsl for display instructions Uses XSLT to select data from .xml file and format it DTD or Schema The organizational chart for the data 3. Looks in .xml for content Uses XSL-PATH to access certain spots in the .xml file File type: .xml 4. Returns content to .xsl File type: .xsl Uses XSL-FO for specifying formatting semantics (?) File types: .dtd .xml (schemas) For validation during creation Languages used in XSLT documents during creation
What tools can you use? The DTD or Schema Managing Data for Maximum Utility + means there can be as many of this element as you want <!ELEMENT booklist (book+)> <!ELEMENT book (booktitle,author+,country,publisher,price,year)> <!ELEMENT booktitle(#PCDATA) <!ELEMENT author(#PCDATA)> <!ELEMENT country(#PCDATA)> <!ELEMENT publisher(#PCDATA)> <!ELEMENT price(#PCDATA)> <!ELEMENT year(#PCDATA)> The DTD establishes the hierarchy of elements/tags. original slide content courtesy of Shaoping Moss
What tools can you use? The XML Document Managing Data for Maximum Utility <?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE list SYSTEM "dtdforbooklist.dtd"> <?xml-stylesheet type="text/xsl" href="xslforbooklist.xsl"?> <booklist> <book> <booktitle>HTML and XHTML:the Definitive Guide</booktitle> <author>Chuck Musciano</author> <author>Bill Kennedy</author> <country>USA</country> <publisher>O’ Reilly</publisher> <price>19.95</price> <year>2000</year> </book> <book> <booktitle>XHTML 1.0 LanguageSourcebook</booktitle> <author>Ian S. Graham</author> <country>USA</country> <publisher>John Wiley and Sons</publisher> <price>30.00</price> <year>2000</year> </book> </booklist> This is what DTD is being used. This is what XSL is being used. original slide content courtesy of Shaoping Moss
What tools can you use? Validate your XML file Upload your XML file to this validator: http://www.stg.brown.edu/service/xmlvalid/ You will either need to place your DTD on a web server so that the validator can find it (and put the right URL in the header of the XML), or you can put the DTD lines inside of your XML file (at the top). The validation service has a FAQ, but if you are getting stuck, it might be a good time for some remedial XML at the 3W Schools: http://www.w3schools.com/ Managing Data for Maximum Utility
What tools can you use? The XSL Document <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match="/"> <html> <body> <h1>My Book Collection</h1> <table border="1"> <tr bgcolor="#9acd32"> <th>Title</th> <th>Author</th> <th>Publisher</th> <th>Country</th> <th>Price</th> </tr> <xsl:for-each select="booklist/book"> <xsl:sort select="publisher"/> <xsl:if test="year>1995"> <tr> <td><xsl:value-of select="booktitle"/></td> <td><xsl:value-of select="author"/></td> <td><xsl:value-of select="publisher"/></td> <td><xsl:value-of select="country"/></td> <td><xsl:value-of select="price"/></td> </tr> </xsl:if> </xsl:for-each> </table> </body> </html> </xsl:template> </xsl:stylesheet> Managing Data for Maximum Utility “xsl:template” is XSLT for “use the template below” “match” is X-PATH for “link to” or “start with” and “/” means the root element (“booklist” in this case) This is basic HTML for the template… “xsl:for-each” with the “select” instruction is XSLT for “select from each of the books in the booklist” “xsl:sort” with the “select” instruction is XSLT for “sort by publisher” “xsl:if” with the “test” instruction is XSLT for “only those books when the year is later than 1995” “xsl:value-of” with the “select” instruction is XSLT for “use the data from this element” You must close your XSLT commands You must close the HTML tags of your template