500 likes | 685 Views
Collections and Algorithms 2. Selecting, hashing, trees, and alternatives to the JCF Triangle JUG presentation April 21, 2003 Stuart Halloway and Jim Scarborough. Presenter: Stuart Halloway. Chief Technical Officer at DevelopMentor ( www.develop.com ) provides training for software developers
E N D
Collections and Algorithms 2 Selecting, hashing, trees, and alternatives to the JCFTriangle JUG presentationApril 21, 2003Stuart Halloway and Jim Scarborough
Presenter: Stuart Halloway • Chief Technical Officer at DevelopMentor (www.develop.com) • provides training for software developers • Author • Component Development for the Java Platform • Moderator of the Advanced Java mailing list • discuss.develop.com
Presenter: Jim Scarborough • Java consultant for Science Applications International Corporation • Developing simulations, GUI’s, and various calculations for air and water quality efforts. • NCSU Masters Student in Computer Science
Overview • JCF collections and algorithms (continued) • selection • hashing • trees • Choosing the right structures for your programs • beyond "Big O" • profiling • alternatives to the JCF
Selection • Minimum / Maximum • The nth element
Minimum or Maximum public int minimum(int[] a) { int min=a[0]; for (int i=1;i<a.length;i++) { if (a[i]<min) min=a[i]; } return min; }
Minimum and Maximum in 3/2 n public int[] minAndMax(int[] a) { int min, max; for (int i=0;i<a.length/2;a++) { int low,high; if (a[i*2]<a[i*2+1]) { low=a[i*2]; high=a[i*2+1]; } else { low=a[i*2+1]; high=a[i*2]; } if (low<min) min=low; if (high>max) max=high; } // plus code to check the last if odd return new int[] { min, max }; }
Hashing • Purposes? • Data structures?
Collections with Hashing Map <<Interface>> IdentityHashMap AbstractMap HashMap WeakHashMap SortedMap <<Interface>> LinkedHashMap Dictionary PrinterStateReasons Hashtable TreeMap Properties HashSet
Writing your own hashCode() public class SimpleDate { public int year, month, day; public int hashCode() { return year<<9 + month<<5 + day; } public boolean equals(Object o) { return (o!=null) && o.getClass().equals(getClass()) && hashCode() == ((SimpleDate)o).hashCode(); } }
Division Method h(k) = k mod m • k represents your data as an integer • m is best prime and worst a power of 2 • m is also the number of slots in the table
Multiplication Method • A=(50.5-1)/2=0.6180339887… • m=whatever you want! (232 is tidy)
Integer Multiplication Method static final long S=(long) (((Math.sqrt(5)-1)/2)*0x100000000L); public int hash(int k) { return (int)((k*S)&0xFFFFFFFF); }
Collisions • Chaining • Open Addressing • Linear Probing • h(k,i)=(h’(k)+i) mod m • Quadratic • h(k,i)=(h’(k) + c1i + c2i2) mod m • Double Hashing • h(k,i)=(h1(k) + ih2(k)) mod m
*NIX Password Authentication • crypt(3) • Modified version of DES encryption • http://www.itl.nist.gov/fipspubs/fip46-2.htm • The password is the shared secret, a 56 bit key (7 bits for each of 8 characters). • Something standard (usually all zeroes) is encrypted repeatedly using the password. • Salt perturbs the algorithm in one of 4096 ways. • When you type the password, if its encrypted version matches the stored gobbledygook, you are granted access.
Speed • Hashing gives O(1) performance • Universal hash functions foil Diabolic • ha,b(k)= ((ak+b) mod p) mod m • Not instantaneous!
Review • Selection • Better than n-1 comparisons for min & max • O(n) worst case for selecting any particular element • Hashing • Hashing is simple when you can make it fit in an int. • Division method has its limitations. • Multiplication rules! • There are several schemes for resolving collisions.
Trees • A tree is an undirected, connected acyclic graph • Certain trees perform well for ordered sets • binary search tree • red/black tree • B-tree 12 7 14 3
Binary search trees • A binary search tree begins at a root node • Each node can have a maximum of two children • Left child <= its parent • Right child >= its parent 12 7 14 3
Traversal • Inorder traversal returns values in sorted order • report left subtree • report parent • report right subtree • Traversal implicitly defines successor and predecessor 12 3 7 14 4 2 3 1
Insert and select • Walk down the tree, turning left for <, right for > • number of steps limited by height of the tree • height of a balanced tree is log n 8 inserting 12 <12 7 14 >7 3 8
Deletion • Deletion has three cases • no children (trivial) • one child (trivial) • two children (install successor) 10 7 14 3 12 17 11
Unbalanced trees • Binary search tree's desirable properties require balance • tree height should be log n • Some input sequences may lead to height n! 5 4 3 unbalanced tree 2 1
Use red/black trees for balance • The red/black properties: • 1. each node colored black or red • 2. leaves (nil) are black • 3. red nodes can have only black children • 4. every path from node to leaf contains same # black nodes • Balancing algorithm works as follows • new nodes start red • updates that unbalance the tree break rule 3 • series of transformations re-establish rule 3 • Best taught by example
Red/black example step 1 8 First node becomes root. (Root recolored to black at the end of each step.)
Red/black example step 2 8 7 Second node added. Tree still balanced.
Red/black example step 3a 8 7 2 Node '2' breaks red/black properties. Tree is no longer balanced.
Red/black example step 3b 8 7 uncle? 2 • Uncle does not match parent color, and • ancestor is same direction for two generations: • Recolor parent generation and grandparent. • Rotate parent in direction of ancestor.
Red/black example step 3c 7 8 2 • Uncle does not match parent, and • ancestor is same direction for two generations: • Recolor parent generation and grandparent. • Rotate parent in direction of ancestor.
Red/black example step 4a 7 8 2 6 Unlucky again. Node '6' breaks red/black property #3.
Red/black example step 4b 7 8 2 6 Uncle matches parent Recolor parent generation and grandparent. (Recolor root to black when done.)
Red/black example step 5a 7 8 2 6 5 Bad luck again (somebody crafted these inputs!) Node '5' breaks red/black property #3.
Red/black example step 5b 7 8 2 5 6 • Uncle does not match parent, and • ancestor is opposite direction for two generations: • Rotate parent childward • Recolor self and (new) parent • Rotate self parentward
Red/black example step 5c 7 8 2 5 6 • Uncle does not match parent, and • ancestor is opposite direction for two generations: • Rotate parent childward • Recolor self and (new) parent • Rotate self parentward
Red/black example step 5d 7 8 5 6 2 • Uncle does not match parent, and • ancestor is opposite direction for two generations: • Rotate parent childward • Recolor self and (new) parent • Rotate self parentward
Why red/black works • Fixup keeps tree balanced • path from root to every leaf has same # of black nodes, say b • path has at most (b-1) red nodes • no path from root to leaf more than double any other path • Fixup process doesn't impose excessive performance penalty • best case: no fixup necessary • worst case: percolate up the tree, i.e. log n
Trees in the JCF • Java Collections Framework provides two tree impls • TreeMap • TreeSet (degenerate case of TreeMap)
Tree API features and gotchas • Ordering • default to using natural order • change sort order by passing Comparator to constructor • Thread safety • TreeSet and TreeMap are not thread safe • collections do "best effort" fail fast if concurrency detected • Collections class provides wrappers • unmodifiableSortedSet, unmodifiableSortedMap • synchronizedSortedSet, synchronizedSortedMap
Beyond "Big O" • Algorithm selection is important, but not all-important • Many factors confound easy analysis of real Java code • memory footprint and paging • garbage collection • JIT and adaptive compilation • processor caches • layers of abstraction • Don't trust your judgment • profile • unit profile
Java profiling • Numerous command line options built in to JDK • java -Xrunhprof • java -prof • java -verbose:gc • Tools may process command line output • Tools may use JVMPI to hook into running VMs • Java Virtual Machine Profiling Interface • Extensive reference (open source and commerical) at • http://www.javaperformancetuning.com/
Java unit profiling • JUnitPerf extends JUnit to do performance testing • test decorators time execute of a JUnit test • can fail test if time exceeds certain bound long maxElapsedTime = 1000; Test testCase = new ExampleTestCase("testOneSecondResponse"); Test timedTest = new TimedTest(testCase, maxElapsedTime, false);
Limitations of the JCF • Three common complaints about the JCF • support for primitive types is clunky • no strong typing • special purpose / niche algorithms absent
Open source alternatives to the JCF • PCJ (Primitive Collections for Java) • Søren Bak, http://pcj.sourceforge.net/ • GNU Trove • Eric D. Friedman, http://trove4j.sourceforge.net/ • Colt (libraries for Scientific and Technical Computing) • Wolfgang Hoschek, http://hoschek.home.cern.ch/hoschek/colt/index.htm • tclib • Dennis Sosnoski, http://www.sosnoski.com/opensrc/tclib/
Summary • Trees useful for maintaining sorted order and fast access • Tree must stay balanced to be efficient • Java uses red/black trees • Performance advice • know your algorithms • don't just trust your knowledge- PROFILE! • be aware of JCF limitations and alternatives