1 / 40

Data Organization

Data Organization. Problem. Huge amounts of information How do I find Information that I know I want Information related to what I want How do I understand Particular pieces of information The whole collection of information. Limitations. Screen space Network bandwidth

Download Presentation

Data Organization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Organization

  2. Problem • Huge amounts of information • How do I find • Information that I know I want • Information related to what I want • How do I understand • Particular pieces of information • The whole collection of information

  3. Limitations • Screen space • Network bandwidth • Bandwidth - how much information can be transmitted per second • Human attention

  4. Kinds of things to organize • Menu items • MS Word - about 150 menu items • Text • Pages in a book - 500 • Documents on the WWW - gazillions • Images • All of the pictures created in a commercial advertising company

  5. Kinds of things to organize • Sounds • Sound tracks to all TV and Radio news broadcasts • Video • A complete collection of classic movies • Structured information (records) • People • Cars • Students • Electronic appliance parts

  6. A question of scale • 10 things • 100 things - menu • 1,000 things - files on your computer • 10,000 things - students at a university • 1,000,000 things - books in a library • gazillion things - WWW pages

  7. Three ways to find things • Lists • arrays • Trees • organize in to categories • Search • describe what you want and have the computer find it

  8. Finding things in lists • How long will it take to find “Ron Dallin” in the Provo/Orem phone book? • How long will it take to find “764-0588” in the Provo/Orem phone book?

  9. Binary search - for “Goodrich”

  10. Binary search - for “Goodrich” Lower = 0 Upper = 10 Guess = (0+10)/2 = 5

  11. Binary search - for “Goodrich” Lower = 0 Upper = 5 Guess = (0+5)/2 = 2

  12. Binary search - for “Goodrich” Lower = 2 Upper = 5 Guess = (2+5)/2 = 3

  13. Binary search - for “Goodrich” Lower = 3 Upper = 5 Guess = (3+5)/2 = 4

  14. Binary search • If there are 64 things in a list, how many times can you divide that list in half? • 32, 16, 8, 4, 2, 1 • 6 times

  15. Binary search • If there are 1024 things in a list, how many times can you divide that list in half? • 512, 256, 128, 64, 32, 16, 8, 4, 2, 1 • 10 times

  16. Binary search • If the size of the list doubles, how many more steps are required in a binary search? 1

  17. Binary search • If there are N items in a list then binary search takes • log2(N) steps

  18. Binary search • Estimating log2(N) • Count the number of digits and multiply by 2.5 • 1000 • 4*2.5 = 10 steps • 1,000,000 • 7*2.5 = 17-18 steps • 1,000,000,000 • 10*2.5= 25 steps

  19. Provo/Orem phone book • How long to find “Ron Dallin” • 400,000 in Utah county • Log2(400,000) approx 6*2.5 = 15 steps

  20. How to find a phone number • 920-3231 • 1 step • 130-2313 • 11 steps • Average? • 5 steps • Average N? • N/2

  21. Provo/Orem phone book • How many steps to find a phone number? • 400,000/2 = 200,000 average • How can we improve this?

  22. Sort the phone book by phone number • What if I want to search on both name and number?

  23. Phone number Last Name Using an Index

  24. Phone number Last Name Using an Index Anderson

  25. Phone number Last Name Using an Index Anderson, Bilinski

  26. Phone number Last Name Using an Index Anderson, Bilinski, Clark

  27. Phone number Last Name Using an Index Anderson, Bilinski, Clark, Garcia

  28. Phone number Last Name Using an Index 123-3123

  29. Phone number Last Name Using an Index 123-3123, 130-2313

  30. Phone number Last Name Using an Index 123-3123, 130-2313, 232-0312

  31. Phone number Last Name Using an Index 123-3123, 130-2313, 232-0312, 238-1234

  32. Last Name Search for Goodrich Lower = 0 Upper = 10 Guess = 5 lower

  33. Last Name Search for Goodrich Lower = 0 Upper = 5 Guess = 2 above

  34. Last Name Search for Goodrich Lower = 2 Upper = 5 Guess = 3 above

  35. Last Name Search for Goodrich Lower = 3 Upper = 5 Guess = 4 above

  36. Phone number Search for 823-1242 Lower = 0 Upper = 10 Guess = 5 above

  37. Phone number Search for 823-1242 Lower = 5 Upper = 10 Guess = 7 below

  38. Phone number Search for 823-1242 Lower = 5 Upper = 7 Guess = 6 MATCH

  39. Phone number Last Name Using an Index • What about first name or city? • another index

  40. Data Organization • What are we organizing for? • Scale • 10 - 1,000 - 1,000,000 - 1,000,000,000 • Lists • Unsorted (N/2) • Sorted Log2(N) • count the digits and multiply by 2.5 • To access in many ways • Use many indices into the same data

More Related