1 / 50

Collections and Algorithms 2

Collections and Algorithms 2. Selecting, hashing, trees, and alternatives to the JCF Triangle JUG presentation April 21, 2003 Stuart Halloway and Jim Scarborough. Presenter: Stuart Halloway. Chief Technical Officer at DevelopMentor ( www.develop.com ) provides training for software developers

romeo
Download Presentation

Collections and Algorithms 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Collections and Algorithms 2 Selecting, hashing, trees, and alternatives to the JCFTriangle JUG presentationApril 21, 2003Stuart Halloway and Jim Scarborough

  2. Presenter: Stuart Halloway • Chief Technical Officer at DevelopMentor (www.develop.com) • provides training for software developers • Author • Component Development for the Java Platform • Moderator of the Advanced Java mailing list • discuss.develop.com

  3. Presenter: Jim Scarborough • Java consultant for Science Applications International Corporation • Developing simulations, GUI’s, and various calculations for air and water quality efforts. • NCSU Masters Student in Computer Science

  4. Overview • JCF collections and algorithms (continued) • selection • hashing • trees • Choosing the right structures for your programs • beyond "Big O" • profiling • alternatives to the JCF

  5. Selection • Minimum / Maximum • The nth element

  6. Minimum or Maximum public int minimum(int[] a) { int min=a[0]; for (int i=1;i<a.length;i++) { if (a[i]<min) min=a[i]; } return min; }

  7. Minimum and Maximum in 3/2 n public int[] minAndMax(int[] a) { int min, max; for (int i=0;i<a.length/2;a++) { int low,high; if (a[i*2]<a[i*2+1]) { low=a[i*2]; high=a[i*2+1]; } else { low=a[i*2+1]; high=a[i*2]; } if (low<min) min=low; if (high>max) max=high; } // plus code to check the last if odd return new int[] { min, max }; }

  8. nth element – O(n), O(n2) W.C.

  9. nth element – O(n) W.C.

  10. Hashing • Purposes? • Data structures?

  11. Collections with Hashing Map <<Interface>> IdentityHashMap AbstractMap HashMap WeakHashMap SortedMap <<Interface>> LinkedHashMap Dictionary PrinterStateReasons Hashtable TreeMap Properties HashSet

  12. Writing your own hashCode() public class SimpleDate { public int year, month, day; public int hashCode() { return year<<9 + month<<5 + day; } public boolean equals(Object o) { return (o!=null) && o.getClass().equals(getClass()) && hashCode() == ((SimpleDate)o).hashCode(); } }

  13. SimpleDate Hash Bits

  14. Division Method h(k) = k mod m • k represents your data as an integer • m is best prime and worst a power of 2 • m is also the number of slots in the table

  15. Multiplication Method • A=(50.5-1)/2=0.6180339887… • m=whatever you want! (232 is tidy)

  16. Integer Multiplication Method static final long S=(long) (((Math.sqrt(5)-1)/2)*0x100000000L); public int hash(int k) { return (int)((k*S)&0xFFFFFFFF); }

  17. Collisions • Chaining • Open Addressing • Linear Probing • h(k,i)=(h’(k)+i) mod m • Quadratic • h(k,i)=(h’(k) + c1i + c2i2) mod m • Double Hashing • h(k,i)=(h1(k) + ih2(k)) mod m

  18. *NIX Password Authentication • crypt(3) • Modified version of DES encryption • http://www.itl.nist.gov/fipspubs/fip46-2.htm • The password is the shared secret, a 56 bit key (7 bits for each of 8 characters). • Something standard (usually all zeroes) is encrypted repeatedly using the password. • Salt perturbs the algorithm in one of 4096 ways. • When you type the password, if its encrypted version matches the stored gobbledygook, you are granted access.

  19. Speed • Hashing gives O(1) performance • Universal hash functions foil Diabolic • ha,b(k)= ((ak+b) mod p) mod m • Not instantaneous!

  20. Review • Selection • Better than n-1 comparisons for min & max • O(n) worst case for selecting any particular element • Hashing • Hashing is simple when you can make it fit in an int. • Division method has its limitations. • Multiplication rules! • There are several schemes for resolving collisions.

  21. Trees • A tree is an undirected, connected acyclic graph • Certain trees perform well for ordered sets • binary search tree • red/black tree • B-tree 12 7 14 3

  22. Binary search trees • A binary search tree begins at a root node • Each node can have a maximum of two children • Left child <= its parent • Right child >= its parent 12 7 14 3

  23. Binary search tree performance

  24. Traversal • Inorder traversal returns values in sorted order • report left subtree • report parent • report right subtree • Traversal implicitly defines successor and predecessor 12 3 7 14 4 2 3 1

  25. Insert and select • Walk down the tree, turning left for <, right for > • number of steps limited by height of the tree • height of a balanced tree is log n 8 inserting 12 <12 7 14 >7 3 8

  26. Deletion • Deletion has three cases • no children (trivial) • one child (trivial) • two children (install successor) 10 7 14 3 12 17 11

  27. Unbalanced trees • Binary search tree's desirable properties require balance • tree height should be log n • Some input sequences may lead to height n! 5 4 3 unbalanced tree 2 1

  28. Use red/black trees for balance • The red/black properties: • 1. each node colored black or red • 2. leaves (nil) are black • 3. red nodes can have only black children • 4. every path from node to leaf contains same # black nodes • Balancing algorithm works as follows • new nodes start red • updates that unbalance the tree break rule 3 • series of transformations re-establish rule 3 • Best taught by example

  29. Red/black example step 1 8 First node becomes root. (Root recolored to black at the end of each step.)

  30. Red/black example step 2 8 7 Second node added. Tree still balanced.

  31. Red/black example step 3a 8 7 2 Node '2' breaks red/black properties. Tree is no longer balanced.

  32. Red/black example step 3b 8 7 uncle? 2 • Uncle does not match parent color, and • ancestor is same direction for two generations: • Recolor parent generation and grandparent. • Rotate parent in direction of ancestor.

  33. Red/black example step 3c 7 8 2 • Uncle does not match parent, and • ancestor is same direction for two generations: • Recolor parent generation and grandparent. • Rotate parent in direction of ancestor.

  34. Red/black example step 4a 7 8 2 6 Unlucky again. Node '6' breaks red/black property #3.

  35. Red/black example step 4b 7 8 2 6 Uncle matches parent Recolor parent generation and grandparent. (Recolor root to black when done.)

  36. Red/black example step 5a 7 8 2 6 5 Bad luck again (somebody crafted these inputs!) Node '5' breaks red/black property #3.

  37. Red/black example step 5b 7 8 2 5 6 • Uncle does not match parent, and • ancestor is opposite direction for two generations: • Rotate parent childward • Recolor self and (new) parent • Rotate self parentward

  38. Red/black example step 5c 7 8 2 5 6 • Uncle does not match parent, and • ancestor is opposite direction for two generations: • Rotate parent childward • Recolor self and (new) parent • Rotate self parentward

  39. Red/black example step 5d 7 8 5 6 2 • Uncle does not match parent, and • ancestor is opposite direction for two generations: • Rotate parent childward • Recolor self and (new) parent • Rotate self parentward

  40. Why red/black works • Fixup keeps tree balanced • path from root to every leaf has same # of black nodes, say b • path has at most (b-1) red nodes • no path from root to leaf more than double any other path • Fixup process doesn't impose excessive performance penalty • best case: no fixup necessary • worst case: percolate up the tree, i.e. log n

  41. Trees in the JCF • Java Collections Framework provides two tree impls • TreeMap • TreeSet (degenerate case of TreeMap)

  42. TreeMap key methods

  43. TreeSet key methods

  44. Tree API features and gotchas • Ordering • default to using natural order • change sort order by passing Comparator to constructor • Thread safety • TreeSet and TreeMap are not thread safe • collections do "best effort" fail fast if concurrency detected • Collections class provides wrappers • unmodifiableSortedSet, unmodifiableSortedMap • synchronizedSortedSet, synchronizedSortedMap

  45. Beyond "Big O" • Algorithm selection is important, but not all-important • Many factors confound easy analysis of real Java code • memory footprint and paging • garbage collection • JIT and adaptive compilation • processor caches • layers of abstraction • Don't trust your judgment • profile • unit profile

  46. Java profiling • Numerous command line options built in to JDK • java -Xrunhprof • java -prof • java -verbose:gc • Tools may process command line output • Tools may use JVMPI to hook into running VMs • Java Virtual Machine Profiling Interface • Extensive reference (open source and commerical) at • http://www.javaperformancetuning.com/

  47. Java unit profiling • JUnitPerf extends JUnit to do performance testing • test decorators time execute of a JUnit test • can fail test if time exceeds certain bound long maxElapsedTime = 1000; Test testCase = new ExampleTestCase("testOneSecondResponse"); Test timedTest = new TimedTest(testCase, maxElapsedTime, false);

  48. Limitations of the JCF • Three common complaints about the JCF • support for primitive types is clunky • no strong typing • special purpose / niche algorithms absent

  49. Open source alternatives to the JCF • PCJ (Primitive Collections for Java) • Søren Bak, http://pcj.sourceforge.net/ • GNU Trove • Eric D. Friedman, http://trove4j.sourceforge.net/ • Colt (libraries for Scientific and Technical Computing) • Wolfgang Hoschek, http://hoschek.home.cern.ch/hoschek/colt/index.htm • tclib • Dennis Sosnoski, http://www.sosnoski.com/opensrc/tclib/

  50. Summary • Trees useful for maintaining sorted order and fast access • Tree must stay balanced to be efficient • Java uses red/black trees • Performance advice • know your algorithms • don't just trust your knowledge- PROFILE! • be aware of JCF limitations and alternatives

More Related