1 / 20

Linear Search Efficiency Assessment

Linear Search Efficiency Assessment. P. Pete Chong Gonzaga University Spokane, WA 99258-0009 chong@gonzaga.edu. Why Search Efficiency?. Information systems help users to obtain “right” information for better decision making, thus require search

grady
Download Presentation

Linear Search Efficiency Assessment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linear Search Efficiency Assessment P. Pete Chong Gonzaga University Spokane, WA 99258-0009 chong@gonzaga.edu

  2. Why Search Efficiency? • Information systems help users to obtain “right” information for better decision making, thus require search • Better organization reduces time needed to find this “right” info, thus require sort • Is savings in search worth the sort?

  3. Search Cost • If we assume random access, then the cost is (n+1)/2. That is, for 1000 records, the average search cost is approximately 1000/2=500. • For large n, we may use n/2 to simplify the calculation

  4. Search Cost • In reality, the usage pattern is not random • For payroll, for example, every record is accessed once and only once. In this case sorting have no effect to search efficiency • Most of the time the distribution follows the 80/20 Rule (Pareto Principle)

  5. The Pareto Principle

  6. Group/ Numbers Numbers Paper Cumulative Cumulative Author Paper Group/ Numbers Numbers Paper Cumulative Cumulative Author Paper Index Papers Authors Subtotal Authors Papers Proportion Proportion i ni f(ni) nif(ni) f(ni) nif(ni) xi i 26 242 1 242 1 242 0.003 0.137 25 114 1 114 2 356 0.005 0.202 24 102 1 102 3 458 0.008 0.260 23 95 1 95 4 553 0.011 0.314 22 58 1 58 5 611 0.014 0.347 21 49 1 49 6 660 0.016 0.374 20 34 1 34 7 694 0.019 0.394 19 22 2 44 9 738 0.024 0.419 18 21 2 42 11 780 0.030 0.442 17 20 2 40 13 820 0.035 0.465 16 18 1 18 14 838 0.038 0.475 15 16 4 64 18 902 0.049 0.512 14 15 2 30 20 932 0.054 0.529 13 14 1 14 21 946 0.057 0.537 12 12 2 24 23 970 0.062 0.550 11 11 5 55 28 1025 0.076 0.581 10 10 3 30 31 1055 0.084 0.598 9 9 4 36 35 1091 0.095 0.619 8 8 8 64 43 1155 0.116 0.655 7 8 8 56 51 1121 0.138 0.687 6 6 6 36 57 1247 0.154 0.707 5 5 10 50 67 1297 0.181 0.736 4 4 17 68 84 1365 0.227 0.774 3 3 29 87 113 1452 0.305 0.824 2 2 54 108 167 1560 0.451 0.885 1 1 203 203 370 1763 1.000 1.000 Total number of Groups (m): 26 Average number of publications (m): 4.7649

  7. A Typical Pareto Curve

  8. Formulate the Pareto Curve Chen et al. (1994) define f(ni) = the number of authors with ni papers, T = = total number of authors, R = = total number of papers, m = R/T = the number of published papers per author

  9. Formulate the Pareto Curve for each index level, let xi be the fraction of total number of authors and i be the fraction of total paper published, then xi = and qi = .

  10. Formulate the Pareto Curve Plug in the values above into (i - i+1)/(xi - xi+1), Chen et al. derive the slope formula: si = When ni = 1, si = 1/m = T/R, let’s call this particular slope a.

  11. Revisit the Pareto Curve a = 370/1763 = 0.21

  12. The Significance • We now have a quick way to quantify different usage concentrations • Simulation shows that in most situations a moderate sample size would be sufficient to assess the usage concentration • The inverse of average usage (a) is easy to calculate

  13. Search Cost Calculation • The search cost for a randomly distributed list is n/2. Thus, for 1000 records, the search cost is 500. • For a list that has 80/20 distribution, the search cost is (200/2)(80%)+[(200+1000)/2](20%) = 200 Or a saving of 60%

  14. Search Cost Calculation Let the first number in the 80/20 be a and the second number be b. Since these two numbers are actually percents, we have a + b = 1. Thus, the expected value for searching cost for a list of n records is the weighted average: (bn/2)(a) + [(bn+n)/2](b) = (bn/2)(a+b+1) = (bn/2)(2) = bn

  15. Search Cost Calculation • Thus, b indicates the cost of search in terms of the percentage of records in the list. bn represent an upperbound of the number of searches. • For a fully sorted list (by usage) with 80/20 distribution, Knuth (1973) has shown that the average search cost C(n) is only 0.122n.

  16. Search Cost Simulation

  17. Search Cost Simulation

  18. Search Cost Estimate Regression Analyses yield: b = 0.15 + 0.359a, for 0.2<a<1.0 b = 0.034 + 0.984a, for 0<a<0.2, and C(n) = 0.02 + 0.49a.

  19. Conclusion • The true search cost is between the estimation of b and C(n) • We may use C(n)~0.5a as a way to quickly estimate the search cost of a fully sorted list. • That is, take a moderate sample of usage, the search cost will be half of the inverse of the average usage times the total number of records.

  20. “Far-fetched” (?) Applications • Define and assess the degree of monopoly? What is the effect of monopoly? Note the gap between b and C(n) (ideal). • Gini Index?

More Related