400 likes | 537 Views
Information life-cycle and visualization and check-in for project definitions. Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010. Contents. Review of last class, reading Information life-cycle Information visualization Checking in for project definitions
E N D
Information life-cycle and visualization and check-in for project definitions Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010
Contents • Review of last class, reading • Information life-cycle • Information visualization • Checking in for project definitions • Discussion of reading • Next class
Definitions • Life-cycle elements • Acquisition: Process of recording or generating a concrete artefact from the concept (see transduction) • Curation: The activity of managing the use of data from its point of creation to ensure it is available for discovery and re-use in the future (http://www.dcc.ac.uk/FAQs/data-curator) • Preservation: Process of retaining usability of data in some source form for intended and unintended use • Stewardship: Process of maintaining integrity across acquisition, curation and preservation
Definitions ctd. • Management: Process of arranging for discovery, access and use of data, information and all related elements. Also oversees or effects control of processes for acquisition, curation, preservation and stewardship. Involves fiscal and intellectual responsibility.
The nature of the challenge • To architect information systems today • You may play many roles • You may not get all the metadata or information you need even if you get the data • You will need skills that you were not taught • To work with end-users today • You may have lots of technical experience • You will need new skills in addressing the changing use of data and information • One ‘size’ does not fit all
Acquisition • Learn / read what you can about the developer of the means of acquisition • Documents may not be easy to find • Remember bias!!! • Document things as you go • Have a checklist (the Management list) and review it often
Curation (partial) • Consider the organization and presentation of the data • Document what has been (and has not been) done • Consider and address the provenance to date, you are now THE next person • Be as technology-neutral as possible • Look to add metainformation
Preservation • Usually refers to the full life cycle • Archiving is a component • Stewardship is the act of preservation • Intent is that ‘you can open it any time in the future’ and that ‘it will be there’ • This involves steps that may not be conventionally thought of • Think 10, 20, 50, 200 years…. looking historically gives some guide to future considerations
Remember • The life cycle applies within and before and after your use case… • So, let’s look in a little more detail
How the information is created • Systemic • Environmental • Trial-and-error (or ad-hoc)
How the information is delivered? • One-to-many presentation • White paper • Web site FAQ • Web site informational • Web site directed (link sent with e-mail, and so on) to a specific Web site • Application-based delivery via managed expert system • One-to-one presentation: • Word of mouth • Ad-hoc communication
How the information is managed • Complexity of the information • Complexity of the creation process • Complexity of the management system • Financial impact of IP/IC creation
Type of information created • Tacit (created and stored informally): • Human memory • Local hard drive of the computer • Expert system (moving tacit information into a formalized structure) • Explicit (created and sorted formally): • Network share • Network Web site/intranet • Informal knowledge-management system • Document-management system • Formal KM system • Value of the source • Age of the information • Proximity of the information to the consumer • Source of the information, and previous interactions with that specific source
Value of the source • Age of the information • Proximity of the information to the consumer • Source of the information, and previous interactions with that specific source
Mostly Technical Issues • Data Preservation • Bit-level integrity • Data readability • Documentation • Metadata • Semantics • Persistent Identifiers • Virtual Data Products • Lineage Persistence • Required ancillary data • Applicable standards
Mostly Non-Technical Issues • Policy (constrained by money…) • Front end of the lifecycle • Long-term planning, data formats, documentation... • Governance and policy • Legal requirements • Archive to archive transitions • Money (intertwined with policy) • Cost-benefit trades • Long-term needs of programs • User input • Identifying likely users • Levels of service • Funding source and mechanism
Life cycle is a complex issue • Must be managed • Documented • As part of the use case, but also outside it
Information Visualization • Questions to keep in mind • What is the improvement in the understanding as compared to the situation without visualization? • Which visualization techniques are suitable for one's data/ information?
Why visualization? • Reducing amount of data, quantization • Patterns • Features • Events • Trends • Irregularities • Exit points for analysis • Leading to presentation of data • Recall – cognitive science and the mental representation??!!??
Types of visualization • Color coding (including false color) • Classification of techniques is based on • Dimensionality • Information being sought, i.e. purpose • Line plots • Contours • Surface rendering techniques • Volume rendering techniques • Animation techniques • Non-realistic, including ‘cartoon/ artist’ style
Image (aka Raster) file formats • CGM, the Computer Graphics Metafile, has been an ISO standard since 1987. It has the capability to encompass both graphical and image data. • PostScript or more specifically Encapsulated PostScript Format (EPSF), is a page description language with sophisticated text facilities . For graphics, as compared to CGM, it tends to be expensive in terms of storage.
Image file formats • TIFF, the Tagged Image File Format, encompasses a range of different formats, originally designed for interchange between electronic publishing packages. • GIF, the Graphical Interchange Format , is quite widespread and can encode a number of separate images of different sizes and colors. • PNG, the Portable Network Graphic format
Image file formats • RGB, the Red Green Blue format of Silicon Graphics, is used by most visualization software packages as the internal image format. The format consist of a header containing the dimensions of the image, followed by the actual image data. • The image data is stored as a 2D array of tuples. Each tuple is a vector with 3 components: R, G, and B. The RGB components determine the color of every pixel (picture element) in the image.
Image file formats • PPM, the Portable Pixmap Format (24 bits per pixel), PGM, the Portable Greyscale Format (8 bits per pixel), and PBM, the Portable Bitmap Format (1 bit per pixel) formats are pixel based and are distributed with the the X-Window system (version 11.4).
Image file formats • XBM is the X-Window one Bit image file format, which has been standardized by the MIT X-consortium. • A major constraint on the use of images is the large data volume which has to be dealt with. • Large sets of image data can have severe implications for storage, memory, and transmission costs. • Therefore, compression techniques are very important. • There are two categories based on whether or not it is possible to reconstruct the initial picture after compression.
Compression (any format) • Lossless compression methods are methods for which the original, uncompressed data can be recovered exactly. Examples of this category are the Run Length Encoding, and the Lempel-Ziv Welch algorithm. • Lossy methods - in contrast to lossless compression, the original data cannot be recovered exactly after a lossy compression of the data. An example of this category is the Color Cell Compression method. • Lossy compression techniques can reach reduction rates of 0.9, whereas lossless compression techniques normally have a maximum reduction rate of 0.5.
Vector formats • Postscript • PDF • SVG • ‘Shape files’ • CGM (also) • …
Animation formats • Mpeg • Avi • Qt • Wmv • Animated GIF
Remember - metadata • Many of these formats already contain metadata or fields for metadata, use them!
Tools • Conversion • Imtools • GraphicConverter • Gnu convert • Many more • Combination/Visualization • IDV • Gnuplot • http://disc.sci.gsfc.nasa.gov/giovanni
New modes • http://www.actoncopenhagen.decc.gov.uk/content/en/embeds/flash/4-degrees-large-map-final • http://www.smashingmagazine.com/2007/08/02/data-visualization-modern-approaches/ • Many modes: • http://www.siggraph.org/education/materials/HyperVis/domik/folien.html
Managing visualization products • The importance of a ‘self-describing’ product • Visualization products are not just consumed by people • How many images, graphics files do you have on your computer for which the origin, purpose, use is still known? • How are these logically organized?
Discovery of visualizations • When represented as images: • Image-based type free text search? • Referred to in publications (articles, books, web pages) • Vector graphics: • Postscript or PDF • SVG • Others? • What makes this easy or hard or impossible?
Discussion • About life-cycle in general? • Visualization?
Reading for this week • Is retrospective
Check in for Project Assignment • Analysis of existing information system content and architecture, critique, redesign and prototype redeployment
What is next • Week 11 – Information and Workflow Management • Week 12 – Information Discovery, Information Integration