710 likes | 870 Views
Lecture 05: Data Retrieval. September 21, 2010 COMP 150-12 Topics in Visual Analytics. Lecture Outline. Data Retrieval Methods for increasing retrieval speed: Pre-computation Pre-fetching and Caching Levels of Detail (LOD) Hardware support Data transform (pre-processing) Subsample
E N D
Lecture 05:Data Retrieval September 21, 2010 COMP 150-12Topics in Visual Analytics
Lecture Outline • Data Retrieval • Methods for increasing retrieval speed: • Pre-computation • Pre-fetching and Caching • Levels of Detail (LOD) • Hardware support • Data transform (pre-processing) • Subsample • Aggregate • Simplification (dimension reduction) • Appropriate representation (finding underlying mathematical representation)
Speed of Data Transfer 12.8 GB/s 16GB/s Ethernet 100Base-T 100Mb/s = 0.0125 GB/s SQL queries ~ 1000 /s SATA 0.5 GB/s Hard drive 0.06 GB/s
Size of Data Transfer L2/L3 Cache 0.008 MB Main Memory 8 GB GPU 2GB SQL Database 512,000 GB Hard drive 2,000 GB
Pre-computation • Problem Statement: storage is plentiful, computation time and user patience are not. What can we do? • Assuming no storage constraint, can we do all computations a priori? • Is the number of possible states in a visualization system finite?
But Wait…!! • Are all the states possible to a visualization system traversable?
Reconsider the van Wijk Model S’ User Interface
An Example: • How many variables are there in a program in order to generate the visualization to the right? • Are all those variables adjustable by the user?
An Example: S’ • Does the size of S change from left to right image? • What about S’?
What is S? • Do all camera positions and perspectives need to be pre-computed? • For such “continuous” variables, what pre-computations can be done?
Implications for Pre-Computation • Depending on the system and the interface, many visualizations have a finite state. • Typically, the variables associated with the interface would be nominal or ordinal. • For quantitative variables, need to convert Q -> O or Q -> N. • For retrieval time above a certain threshold, pre-computation should be considered.
Pre-fetching • Even if all pre-computations are done, they need to reside on disk (too large). • Assume visualization of large datasets, fetching data on user request could still take some time. • The strategy is to guess what data the user would like to see next, and bring the data from disk to memory beforehand.
Pre-fetching 1 Too Large! Main Memory
Pre-fetching 2 OK! Main Memory
Pre-fetching 3 This program’s great! Main Memory
Pre-fetching 4 This program sucks! Main Memory
Pre-fetching • Used in: • web browser: link pre-fetch • operating systems: page pre-fetch • CPU design: instruction pre-fetch • computer graphics: polygon pre-fetch • In computer graphics, pre-fetching is often related to “Out-Of-Core” operations (e.g. Out-Of-Core Rendering)
Pre-fetching • Pre-fetching is used in nearly all computer graphics games that have open worlds.
Example Movie • “Spatial Frame” • Rob Jensen, Pixar • http://www.youtube.com/watch?v=n27NLuc44Lk Video courtesy of Rob Jensen, Pixar
Predict User’s Behaviors • Consider the user’s possible interactions. • In most computer graphics, the user can: • Move (read: translation) forward/backward, up/down, left/right • Turn (read: rotation) pitch, yaw ,roll. • Six degrees of freedom, sometimes with even more constraints
Pre-fetching in Graphics Image courtesy of SGI.com
Predict Based on Data Use • In assignment 1: • The user’s interactions with the data is constrained by the data structure • The user can: • Go up to the parent • Go down to one of the children • Maybe go to a sibling?
Pre-fetching Mechanisms • Almost always a multi-threaded architecture • In the most simple form, there are two threads: • Data thread: fetching data • UI thread: rendering and responding to user interactions
Pre-fetching Mechanisms • Data consistency is key • Maintain strict read/write locks on data • Does not work well with dynamic data • What happens when the data fetching is too slow?
Caching • Cache is directly related to pre-fetch. • While pre-fetch is the mechanism to grab data. • Cache is the strategy for storing data that might be used in the future. • Pre-fetching is great, but what to keep around? • Cache-misses is a metric for determining the quality of the caching strategy
Caching I think he’s going for ear, head, and leg! Main Memory Ear Head Leg
Caching View 1: Head CHECK! Main Memory Ear Head Leg
Caching View 2: Ear CHECK! Main Memory Ear Head Leg
Caching Change of Plan: Let’s look at the tail! Main Memory Ear Head Leg
Caching Where do I put this? ?? Tail Main Memory Ear Head Leg
Buffer vs. Cache • The idea of pre-fetching without caching can be thought of as “buffering”. • The main challenge in designing caching strategy is to determine if a piece of data could be reused in the future. • Another metric for caching strategy could be the size of the cache vs. the amount of data being re-fetched.
Examples of Caching Algorithms • Belady’s Algorithm (the clairvoyant algorithm): • Method: replace the data in cache that will not be used for the longest time in the future • Problem: need to be able to predict the future • Serves as the (optimal) benchmark
Examples of Caching Algorithms • Least Recently Used (LRU): • Maintains a priority queue • For every recently used data item, move it to the front • Deletes from the back
Caching View 1: Head CHECK! Priority Main Memory Ear Head Leg
Caching View 1: Head Priority Main Memory Head Ear Leg
Caching View 2: Ear CHECK! Priority Main Memory Head Ear Leg
Caching View 2: Ear Priority Main Memory Ear Head Leg
Caching Change of Plan: Let’s look at the tail! Main Memory Ear Head Leg
Caching Which memory do I erase? ?? Tail Main Memory Ear Head Leg
Caching Ah! Least Recently Used! !! Tail Main Memory Ear Head Tail Leg
Examples of Caching Algorithms • Many Others… • Least Frequently Used • Most Recently Used • : • : • Not one is perfect… Predicting a user’s behavior is hard!
Levels of Detail • Two types of LOD: • Continuous • Requires a “mathematical” definition of the model • Pros: no need for pre-computation • Cons: the math model could be hard, computation might not scale • Discrete • Requires a pre-computation of all possible LODs, results are usually stored in a hierarchical tree structure • Pros: fast fetching time • Cons: storage requirement could be prohibitive. Might cause “popping”
Discrete Levels of Detail Example • Google Maps