Lecture 04: Data Storage

Lecture 04:Data Storage September 16, 2010 COMP 150-12Topics in Visual Analytics

Lecture Outline Examples Computer graphics Data structure vs. hardware rendering RDBMS 3rd normal form • Data Storage and • Retrieval • Define interactivity • Memory • Data Representations and Structures • Storage vs. speed

Assumption about Bottlenecks • In most visual analysis tools, the size of the data usually causes the most amount of delay. • This can occur in: • Data processing • Data retrieval • Data transformation • Etc. • It is also possible that rendering is the bottleneck… However, that’s often still related to the amount of data that needs to be rendered.

Speed of Data Transfer 12.8 GB/s 16GB/s Ethernet 100Base-T 100Mb/s = 0.0125 GB/s SQL queries ~ 1000 /s SATA 0.5 GB/s Hard drive 0.06 GB/s

Ideal Retrieval Time Jakob Nielson’s Alertbox: www.useit.com/alertbox • Powers of “Ten” • <= 0.1 second • < 1 second • < 10 seconds • < 1 minute • < 10 minutes • < 1 hour • < 10 hours • < 1 day • > 1 day Card, Robertson, and Mackinlay (1991). The information visualizer: An information workspace. ACM CHI'91 Conf.

Ideal Retrieval Time Atkinson-Shiffrin memory model Image courtesy: http://www.dynamicflight.com/avcfibook/learning_process/

Ideal Retrieval Time Correlates to “Sensory Memory”, which lasts for several tenths of a second. Also known as “the perceptual processing time constant” Movie frames are shot at 16 frames per second (fps) Retrieval + rendering time at 0.1 second = 10 fps (compare that with most 3D video games) Visual trace is generally retained in sensory memory for 0.25 second Implications for image comparison? • Powers of “Ten” • <= 0.1 second • < 1 second • < 10 seconds • < 1 minute • < 10 minutes • < 1 hour • < 10 hours • < 1 day • > 1 day

Ideal Retrieval Time Sensory memory starts to decay Also known as “The immediate response time constraint” A person can make an unprepared response to some stimulus within about a second. Beyond that, they make a backchannel response to indicate interest (either listener or speaker) Limits of an animation sequence Much longer than that, the user gets bored • Powers of “Ten” • <= 0.1 second • < 1 second • < 10 seconds • < 1 minute • < 10 minutes • < 1 hour • < 10 hours • < 1 day • > 1 day

Ideal Retrieval Time Sensory memory begins to transfer into short term memory (STM) If a sentence is spoken, sensory memory would be the sound, STM would hold the words. Also known as “The unit task time constant” The rough amount of time to complete a certain task. For example, pick up a mouse, move it to the menu, find the right element and click. Approximate limit for users keeping their attention on the task. Re-orientation is sometimes necessary. Only acceptable during natural breaks in the user’s work. • Powers of “Ten” • <= 0.1 second • < 1 second • < 10 seconds • < 1 minute • < 10 minutes • < 1 hour • < 10 hours • < 1 day • > 1 day

Ideal Retrieval Time Most interactive systems need to stay in this range Beyond 10 seconds the user’s mind starts wandering and doesn’t retain enough information in STM. Flow of thoughts can be broken after 10 seconds. From a web experience perspective, a user will often leave the site. • Powers of “Ten” • <= 0.1 second • < 1 second • < 10 seconds • < 1 minute • < 10 minutes • < 1 hour • < 10 hours • < 1 day • > 1 day

Ideal Retrieval Time Starts to push the limit of STM, or WORKING memory Retrieval time in this range will require a “progress bar” Certain automated computation will require minutes to complete. In this range, the user are sometimes still willing to wait for the results. Really need to justify the cost of time in the design process! • Powers of “Ten” • <= 0.1 second • < 1 second • < 10 seconds • < 1 minute • < 10 minutes • < 1 hour • < 10 hours • < 1 day • > 1 day

Ideal Retrieval Time Transitions from STM to Long Term Memory (LTM) What memory is transferred from STM to LTM is not clear. LTM is subjected and related to mental models. Highly unreliable that the user will be able to “pick up where they left off”. Need to start considering computational “tricks” like computation, caching, pre-fetching, etc. • Powers of “Ten” • <= 0.1 second • < 1 second • < 10 seconds • < 1 minute • < 10 minutes • < 1 hour • < 10 hours • < 1 day • > 1 day

Questions?

Data Representations and Structures • How do we accomplish fast retrieval time? • Better data structure • Memory / storage vs. speed • Better system functionalities

Better Data Structure • Consider the Data Cube (OLAP) vs. SQL example before • Same amount of data transfer • Representation and structure allows faster analysis of the same data from different perspectives

Other Types of Data Structures • An overview of some existing data structure taxonomies: • 1996: Shneiderman • 1999: Card, Mackinglay, Shneiderman • 2004: Ware • 2005: Thomas, Cook • 2010: Ward, Grinstein, Keim

Other Types of Data Structures • Ben Shneiderman (1996): • 1-D: documents, source code, sequential lists • 2-D: maps, floor plans, grids • 3-D: physical objects • N-D: multi-attribute data • Temporal: time-varying data • Tree/Hierarchy: file systems, org charts • Network: arbitrary relationships between objects, social networks Shneiderman. “The eyes have it: A task by data type taxonomy for information visualization”, Visual Language, 1996

Other Types of Data Structures • Stu Card, Jock Mackinlay, Ben Shneiderman (1999): • Data Table: (see last lecture) • Spatial (Scientific) • Geographic • Documents • Time • Database • Hierarchies • Networks • World Wide Web Card, Mackinlay, Shneiderman. “Readings in Information Visualization: Using Vision To Think”, Morgan Kaufman, 1999

Other Types of Data Structures • Colin Ware (2004): • Entities: objects of interests, people, hurricanes, a school of fish • Relationships: structures that relate entities, “part-of” (wheel is part of a car), structural and physical (components that make up a house), or conceptual (store and customers), causal (events that cause another), temporal (time lapse) Ware. “Information Visualization”, Morgan Kaufman, 2004

Other Types of Data Structures • Jim Thomas and Kris Cook (2005): • Numeric Data: quantitative results from sensors • Language Data: human language • Image and Video: • Structural Characteristics • Loosely vs. highly structured: free text and image vs. transactions and RDBMS • Geospatial Characteristics • Temporal Characteristics Thomas and Cook. “Illuminating the Path”, IEEE, 2005

Other Types of Data Structures • Matt Ward, Georges Grinstein, Daniel Keim (2010): • Scalars, vectors, and tensors: 1-n dimensions • Geometry and Grids: requiring coordinates • Temporal: time stamps • Topology: how data records are connected • MRI: density (scalar) + 3D grid • CFD: 3D grid + temporal + 3D vectors + topology • Financial: temporal + n-D tensor • CAD: 3D grid + topology • Remote Sensing: 2 or 3D grid + temporal + connectivity • Census: 2D grid + temporal + n-D tensor • Social Network: n-D tensor + connectivity + (temporal) + (2D grid) Ward, Grinstein, Keim. “Interactive Data Visualization”, AK Peters, 2010

How To Model Your Problem?

How To Model Your Problem? • As a network / topology What operators can we do on this?

How To Model Your Problem? • As a table What operators can we do on this?

How To Model Your Problem? What operators can we do on this? • As a set of 2D vectors

How To Model Your Problem? What operators can we do on this? • As a geometry

How To Model Your Problem? What operators can we do on this? • As an image

How To Model Your Problem? • No single right way… It is heavily task dependent. • Key point is coming up with a problem isomorph that can transform a particular problem into an existing, efficient data structure (which is not always obvious) • Example: Google

Questions?

Can I Have a Flexible Structure? • An age old question – why don’t we have a structure that stores all these structures? • Answer: too expensive to store!

An Example: Triangle Strip d i a c e g h b f Image Source: Wikipedia. “Triangle Strip”

A General Structure Class Edge { Vertex* vertices [2]; Face* faces[2]; }; Class Face { Vertex* vertices[3]; Edge* edges[3]; }; float*3 = 12b ptr*(4) = 16b ptr*(3) = 12b Total: 40b * 6 = 240b ptr*2 = 8b ptr*2 = 8b Total: 16b * 9 = 144b ptr*3 = 12b ptr*3 = 12b Total: 24b * 4 = 96b Class Vertex { float position [3]; Edge* list_of_edges [n]; Face* list_of_faces [m]; };

A General Structure Class Edge { Vertex* vertices [2]; Face* faces[2]; }; Class Face { Vertex* vertices[3]; Edge* edges[3]; }; Grand Total: 480 bytes float*3 = 12b ptr*(4) = 16b ptr*(3) = 12b Total: 40b * 6 = 240b ptr*2 = 8b ptr*2 = 8b Total: 16b * 9 = 144b ptr*3 = 12b ptr*3 = 12b Total: 24b * 4 = 96b Class Vertex { float position [3]; Edge* list_of_edges [n]; Face* list_of_faces [m]; };

A Task-Specific Structure • 6 vertices * 3 floats each = 6 * 3 * float = 72b • glBegin(GL_TRIANGLE_STRIP); • glVertex3f( A.x, A.y, A.z); //vertex 1 glVertex3f( B.x, B.y, B.z); //vertex 2glVertex3f( C.x, C.y, C.z); //vertex 3glVertex3f( D.x, D.y, D.z); //vertex 4glVertex3f( E.x, E.y, E.z); //vertex 5glVertex3f( F.x, F.y, F.z); //vertex 6 • glEnd(); • Difference of: 480b / 72b = 6.66666

Questions?

Memory / Storage vs. Speed • Theoretical problem in computer science • Typically, faster speed means more memory and storage • For example, sorting:

Image Source: Wikipedia. “Sorting Algorithm”

Memory / Storage vs. Speed • Notice that with additional use of memory, the algorithm is either faster, or has additional properties that might be desirable (such as stability) • For your assignment 1, notice the same thing: • Fastest retrieval is to duplicate all elements from parent to child, but memory consumption is non-trivial • More memory efficient algorithms would require (recursively) looking to the parent node for information

Problems with Duplicating Data • Imagine an update to the root node that needs to be propagated. • Others? • Example: in databases, maintaining 3rd normal form

2NF vs. 3NF Image Source: Wikipedia. “Third Normal Form”

Comparison • What are the advantages of 2nd normal form? • What are the advantages of 3rd normal form? • Can we go further? • Should we go further?

Questions?

Lecture 04: Data Storage

Lecture 04: Data Storage

Presentation Transcript

Storage Performance for SQL Server

Everything You Wanted to Know About Storage, but Were Afraid to Ask

GIS Tutorial 1

Hello!

Data Storage – Part 2

Secondary Storage Management

Data Structures

LIS6 54 lecture 5 repository interoperability

Lecture 8 and 9

CSC 211 Data Structures Lecture 6

CSC 211 Data Structures Lecture 20

Lecture 2

Chapter 1: Data Storage

Data Mining: Introduction

Chapter 11: Storage and File Structure

Data Mining: Data

UNIT-1 Introduction

Chapter 11: Storage and File Structure

Chapter 13

DBMS Storage and Indexing

05/11/09 15:18

Chapter 11: Storage and File Structure