Lecture 05: Data Retrieval

Lecture 05:Data Retrieval September 21, 2010 COMP 150-12Topics in Visual Analytics

Lecture Outline • Data Retrieval • Methods for increasing retrieval speed: • Pre-computation • Pre-fetching and Caching • Levels of Detail (LOD) • Hardware support • Data transform (pre-processing) • Subsample • Aggregate • Simplification (dimension reduction) • Appropriate representation (finding underlying mathematical representation)

Speed of Data Transfer 12.8 GB/s 16GB/s Ethernet 100Base-T 100Mb/s = 0.0125 GB/s SQL queries ~ 1000 /s SATA 0.5 GB/s Hard drive 0.06 GB/s

Size of Data Transfer L2/L3 Cache 0.008 MB Main Memory 8 GB GPU 2GB SQL Database 512,000 GB Hard drive 2,000 GB

Pre-computation • Problem Statement: storage is plentiful, computation time and user patience are not. What can we do? • Assuming no storage constraint, can we do all computations a priori? • Is the number of possible states in a visualization system finite?

Consider the van Wijk Model

But Wait…!! • Are all the states possible to a visualization system traversable?

Reconsider the van Wijk Model

Reconsider the van Wijk Model S’ User Interface

An Example: • How many variables are there in a program in order to generate the visualization to the right? • Are all those variables adjustable by the user?

An Example: S’ • Does the size of S change from left to right image? • What about S’?

What is S? • Do all camera positions and perspectives need to be pre-computed? • For such “continuous” variables, what pre-computations can be done?

Implications for Pre-Computation • Depending on the system and the interface, many visualizations have a finite state. • Typically, the variables associated with the interface would be nominal or ordinal. • For quantitative variables, need to convert Q -> O or Q -> N. • For retrieval time above a certain threshold, pre-computation should be considered.

Questions?

Pre-fetching • Even if all pre-computations are done, they need to reside on disk (too large). • Assume visualization of large datasets, fetching data on user request could still take some time. • The strategy is to guess what data the user would like to see next, and bring the data from disk to memory beforehand.

Pre-fetching 1 Too Large! Main Memory

Pre-fetching 2 OK! Main Memory

Pre-fetching 3 This program’s great! Main Memory

Pre-fetching 4 This program sucks! Main Memory

Pre-fetching • Used in: • web browser: link pre-fetch • operating systems: page pre-fetch • CPU design: instruction pre-fetch • computer graphics: polygon pre-fetch • In computer graphics, pre-fetching is often related to “Out-Of-Core” operations (e.g. Out-Of-Core Rendering)

Pre-fetching • Pre-fetching is used in nearly all computer graphics games that have open worlds.

Example Movie • “Spatial Frame” • Rob Jensen, Pixar • http://www.youtube.com/watch?v=n27NLuc44Lk Video courtesy of Rob Jensen, Pixar

Predict User’s Behaviors • Consider the user’s possible interactions. • In most computer graphics, the user can: • Move (read: translation) forward/backward, up/down, left/right • Turn (read: rotation) pitch, yaw ,roll. • Six degrees of freedom, sometimes with even more constraints

Pre-fetching in Graphics Image courtesy of SGI.com

Predict Based on Data Use • In assignment 1: • The user’s interactions with the data is constrained by the data structure • The user can: • Go up to the parent • Go down to one of the children • Maybe go to a sibling?

Pre-fetching Mechanisms • Almost always a multi-threaded architecture • In the most simple form, there are two threads: • Data thread: fetching data • UI thread: rendering and responding to user interactions

Pre-fetching Mechanisms • Data consistency is key • Maintain strict read/write locks on data • Does not work well with dynamic data • What happens when the data fetching is too slow?

Questions?

Caching • Cache is directly related to pre-fetch. • While pre-fetch is the mechanism to grab data. • Cache is the strategy for storing data that might be used in the future. • Pre-fetching is great, but what to keep around? • Cache-misses is a metric for determining the quality of the caching strategy

Caching I think he’s going for ear, head, and leg! Main Memory Ear Head Leg

Caching View 1: Head CHECK! Main Memory Ear Head Leg

Caching View 2: Ear CHECK! Main Memory Ear Head Leg

Caching Change of Plan: Let’s look at the tail! Main Memory Ear Head Leg

Caching Where do I put this? ?? Tail Main Memory Ear Head Leg

Buffer vs. Cache • The idea of pre-fetching without caching can be thought of as “buffering”. • The main challenge in designing caching strategy is to determine if a piece of data could be reused in the future. • Another metric for caching strategy could be the size of the cache vs. the amount of data being re-fetched.

Examples of Caching Algorithms • Belady’s Algorithm (the clairvoyant algorithm): • Method: replace the data in cache that will not be used for the longest time in the future • Problem: need to be able to predict the future • Serves as the (optimal) benchmark

Examples of Caching Algorithms • Least Recently Used (LRU): • Maintains a priority queue • For every recently used data item, move it to the front • Deletes from the back

Caching View 1: Head CHECK! Priority Main Memory Ear Head Leg

Caching View 1: Head Priority Main Memory Head Ear Leg

Caching View 2: Ear CHECK! Priority Main Memory Head Ear Leg

Caching View 2: Ear Priority Main Memory Ear Head Leg

Caching Change of Plan: Let’s look at the tail! Main Memory Ear Head Leg

Caching Which memory do I erase? ?? Tail Main Memory Ear Head Leg

Caching Ah! Least Recently Used! !! Tail Main Memory Ear Head Tail Leg

Examples of Caching Algorithms • Many Others… • Least Frequently Used • Most Recently Used • : • : • Not one is perfect… Predicting a user’s behavior is hard!

Questions?

Levels of Detail

Levels of Detail • Two types of LOD: • Continuous • Requires a “mathematical” definition of the model • Pros: no need for pre-computation • Cons: the math model could be hard, computation might not scale • Discrete • Requires a pre-computation of all possible LODs, results are usually stored in a hierarchical tree structure • Pros: fast fetching time • Cons: storage requirement could be prohibitive. Might cause “popping”

Discrete Levels of Detail Example • Google Maps

Discrete Levels of Detail Example

Lecture 05: Data Retrieval

Lecture 05: Data Retrieval

Presentation Transcript

Lecture Topic 05

IRS Data Retrieval

Data retrieval

Lecture 05

Lecture 05

Lecture 05-06

Video Data Retrieval

Lecture 2 : Tolerant Retrieval

Lecture 05

Lecture 2: Retrieval Models

BULK DATA RETRIEVAL

Lecture 05: Design

LECTURE 05

Lecture 05: Spatial Data Structure for Computer Cartography

Lecture 05: SQL

Lecture#05

Lecture 05

Lecture 21: XML Retrieval

Lecture 05

Data retrieval