180 likes | 302 Views
Data Structures, Algorithms and Design Patterns in real projects . Faraz Ahmed Director of Research and Development Nakisa Inc. (nakisa.com). NAKISA.
E N D
Data Structures, Algorithms and Design Patterns in real projects Faraz Ahmed Director of Research and Development Nakisa Inc. (nakisa.com)
NAKISA Nakisa Inc. is a leading Org and Talent Management software company, providing the world's largest organizations with the ability to visualize and maintain accurate HCM data, confidently execute organization design, devise harmonized succession and career plans, and engage a highly productive workforce. 600+ enterprise customers 4 million subscribers 24 industries 125 countries.
Enterprise Software and Challenges Problems … for real. Solutions … for real!!!
Large XML Behavior/Customizations
Large XML problem • 6 GB of XML • 1363150 Pages (approx). • No, we never printed raw XML on pages to check if we are right!!! • Java’s Answer: StAX (Streaming API for XML) • Design Pattern: Visitor, Events • Data Structures: Maps
Too many objects, few representations • Organization Units • Direct Report • Dotted Line Report • Position • Vacant • Manager Position • Normal Position
Use of Maps They are like ‘cheat codes’ when used correctly
Languages Map<String, Map<String, String>>languages = …; Stringmsg = languages.get(“main.welcome”, “en”); /********* OR *********/ Map<String> languages = …; Stringmsg = languages.get(“9082821291EN”); //Why? welcome באַגריסן ยินดีต้อนรับ accueil SelamatDatang powitanie خوش آمدید સ્વાગત vítejte imirëpritur bienvenida आपका स्वागत है
XML Diff problem <data secure="true" enable="true"> <field name="firstName"></field> <field name="lastName"></field> </data> • <data enable="true“secure="true"> <field name="lastName"></field> • <field name="firstName"></field> </data> No Logical Change • Sorted Set of Nodes • Level Order representation of the tree • Tree • Nodes • Data is Sorted Map • Children is Sorted Set (Sorted on Data!) of Nodes
XML Diff problem • Sorted Set • Level Order representation of the tree • Some very quick incomplete analysis: • Nodes: m nodes take O(m log m) to sort • Attributes:nnodes take O(n log n) to sort • Node Sets: iSorted Node Sets each of length pi (length of 1stp1, length of 2ndp2, …) takes O(p1 + p2 + … + pi) to merge. • Comparing two Sorted Sets of lengths m and n takes O(min(m, n))
Lessons • Testing is the best way to be sure. • This led to a very interesting software development paradigm Test Driven Development (Google It!) • This doesn’t mean that we are now allowed to jump to coding without requirements and design. • Knowledge if utmost importance!
What is MapReducehttp://en.wikipedia.org/wiki/MapReduce • Prepare the Map() input – the "MapReduce system" designates Map processors, assigns the K1 input key value each processor would work on, and provides that processor with all the input data associated with that key value. • Run the user-provided Map() code – Map() is run exactly once for each K1 key value, generating output organized by key values K2. • "Shuffle" the Map output to the Reduce processors – the MapReduce system designates Reduce processors, assigns the K2 key value each processor would work on, and provides that processor with all the Map-generated data associated with that key value. • Run the user-provided Reduce() code – Reduce() is run exactly once for each K2 key value produced by the Map step. • Produce the final output – the MapReduce system collects all the Reduce output, and sorts it by K2 to produce the final outcome.
Word Count Sort/Shuffle Key Value Mapping Apple 1 Apple 1 Apple 1 Apple 1 Apple 4 Input to Mapper Instance Apple 1 Orange 1 Mango 1 Input Files Apple Orange Mango Apple Orange Mango Orange Grapes Plum Grapes 1 Grapes 1 Orange 1Grapes 1 Plum 1 Output Orange Grapes Plum Apple 4 Grapes 1 Mango 2 Orange 2 Plum 4 Mango 1 Mango 1 Mango 2 Apple 1 Plum 1 Mango 1 Apple Plum Mango Orange 1 Orange 1 Orange 2 Apple Plum Mango Apple Apple Plum Apple 1 Apple 1 Plum 1 Apple Apple Plum Plum 1 Plum 1 Plum 1 Plum 1 Plum 4