400 likes | 572 Views
Data Organization. Problem. Huge amounts of information How do I find Information that I know I want Information related to what I want How do I understand Particular pieces of information The whole collection of information. Limitations. Screen space Network bandwidth
E N D
Problem • Huge amounts of information • How do I find • Information that I know I want • Information related to what I want • How do I understand • Particular pieces of information • The whole collection of information
Limitations • Screen space • Network bandwidth • Bandwidth - how much information can be transmitted per second • Human attention
Kinds of things to organize • Menu items • MS Word - about 150 menu items • Text • Pages in a book - 500 • Documents on the WWW - gazillions • Images • All of the pictures created in a commercial advertising company
Kinds of things to organize • Sounds • Sound tracks to all TV and Radio news broadcasts • Video • A complete collection of classic movies • Structured information (records) • People • Cars • Students • Electronic appliance parts
A question of scale • 10 things • 100 things - menu • 1,000 things - files on your computer • 10,000 things - students at a university • 1,000,000 things - books in a library • gazillion things - WWW pages
Three ways to find things • Lists • arrays • Trees • organize in to categories • Search • describe what you want and have the computer find it
Finding things in lists • How long will it take to find “Ron Dallin” in the Provo/Orem phone book? • How long will it take to find “764-0588” in the Provo/Orem phone book?
Binary search - for “Goodrich” Lower = 0 Upper = 10 Guess = (0+10)/2 = 5
Binary search - for “Goodrich” Lower = 0 Upper = 5 Guess = (0+5)/2 = 2
Binary search - for “Goodrich” Lower = 2 Upper = 5 Guess = (2+5)/2 = 3
Binary search - for “Goodrich” Lower = 3 Upper = 5 Guess = (3+5)/2 = 4
Binary search • If there are 64 things in a list, how many times can you divide that list in half? • 32, 16, 8, 4, 2, 1 • 6 times
Binary search • If there are 1024 things in a list, how many times can you divide that list in half? • 512, 256, 128, 64, 32, 16, 8, 4, 2, 1 • 10 times
Binary search • If the size of the list doubles, how many more steps are required in a binary search? 1
Binary search • If there are N items in a list then binary search takes • log2(N) steps
Binary search • Estimating log2(N) • Count the number of digits and multiply by 2.5 • 1000 • 4*2.5 = 10 steps • 1,000,000 • 7*2.5 = 17-18 steps • 1,000,000,000 • 10*2.5= 25 steps
Provo/Orem phone book • How long to find “Ron Dallin” • 400,000 in Utah county • Log2(400,000) approx 6*2.5 = 15 steps
How to find a phone number • 920-3231 • 1 step • 130-2313 • 11 steps • Average? • 5 steps • Average N? • N/2
Provo/Orem phone book • How many steps to find a phone number? • 400,000/2 = 200,000 average • How can we improve this?
Sort the phone book by phone number • What if I want to search on both name and number?
Phone number Last Name Using an Index
Phone number Last Name Using an Index Anderson
Phone number Last Name Using an Index Anderson, Bilinski
Phone number Last Name Using an Index Anderson, Bilinski, Clark
Phone number Last Name Using an Index Anderson, Bilinski, Clark, Garcia
Phone number Last Name Using an Index 123-3123
Phone number Last Name Using an Index 123-3123, 130-2313
Phone number Last Name Using an Index 123-3123, 130-2313, 232-0312
Phone number Last Name Using an Index 123-3123, 130-2313, 232-0312, 238-1234
Last Name Search for Goodrich Lower = 0 Upper = 10 Guess = 5 lower
Last Name Search for Goodrich Lower = 0 Upper = 5 Guess = 2 above
Last Name Search for Goodrich Lower = 2 Upper = 5 Guess = 3 above
Last Name Search for Goodrich Lower = 3 Upper = 5 Guess = 4 above
Phone number Search for 823-1242 Lower = 0 Upper = 10 Guess = 5 above
Phone number Search for 823-1242 Lower = 5 Upper = 10 Guess = 7 below
Phone number Search for 823-1242 Lower = 5 Upper = 7 Guess = 6 MATCH
Phone number Last Name Using an Index • What about first name or city? • another index
Data Organization • What are we organizing for? • Scale • 10 - 1,000 - 1,000,000 - 1,000,000,000 • Lists • Unsorted (N/2) • Sorted Log2(N) • count the digits and multiply by 2.5 • To access in many ways • Use many indices into the same data