110 likes | 251 Views
SciMark – All Compilers Summary. Testing done on Millennium (550MHz, Katmai), Titanium version 1.910 Except for Java testing, data collected 11/2/01, Java collected on 1/23/02. SciMark – Selected Compilers Small Dataset. Testing done on Millennium (550MHz, Katmai), Titanium version 1.910
E N D
SciMark – All Compilers Summary Testing done on Millennium (550MHz, Katmai), Titanium version 1.910 Except for Java testing, data collected 11/2/01, Java collected on 1/23/02
SciMark – Selected Compilers Small Dataset Testing done on Millennium (550MHz, Katmai), Titanium version 1.910 Except for Java testing, data collected 11/2/01, Java collected on 1/23/02
SciMark – Selected Compilers Large Dataset Testing done on Millennium (550MHz, Katmai), Titanium version 1.910 Except for Java testing, data collected 11/2/01, Java collected on 1/23/02
SciMark – Titanium Version Comparisons Small Dataset All data collected on mm62 (550MHz, Katmai) on 1/23/02 Large Dataset
PIER – Application Details • Network/Database Discrete Event Simulator • A query engine (relational join & group by) on top of a distributed hash table • Simulates end-to-end network communication (latency, bandwidth divided among flows, etc.) • Application written in Java for compatibility with other Berkeley database research projects • Software Engineering • Over 200 class files, heavy use of inheritance, polymorphism, etc. • About 25,000 lines of code (and not too many comments yet) • Layered, easily ported to real, working implementation • Some parts of the simulation are faked for performance reasons, tuples are kept small (<100bytes), but simulated at >1Kb • Primarily an object moving program with some processing (string manipulations, basic math, etc.) • All objects are kept in memory, disk I/O is minimal (for result logging) and not timed in following slides
PIER – Language Summary 83.3% (0.8% faster Java) 84.0% (5.1% faster Java ) 63.4% faster 62.7% 77.7% 83.1% • Small simulation • 64 Simulated Nodes • 5000 Tuples per table Testing done on Millennium (600MHz, 2G RAM), collected on 1/28/02
PIER – Memory Footprint Memory usage & runtime grow exponential with primary simulation parameters (Test parameters same as previous slide)
PIER – Parallel Attempts • Parallel attempt with Titanium failed miserably • Negative speedup (our best almost matched sequential execution) • Simulated nodes were divided among processes, best version utilized out-of-order execution to improve performance, earlier versions used small time steps to keep all processes synchronized. • Problems we encountered • Lots of small remote accesses (when using 8 processes on 2 hosts, the MPI performance counters rolled over at least once) • All small accesses… due to the movement of our objects, with sub objects, and sub objects, and more sub objects. • Globally, processes were load balanced, within time steps they were not… various allocations of simulated nodes to processes were attempted • Application is more memory intensive then computationally bound
Parallel Execution Time Breakup Post Communication (Comm imbalance) Communication Pre Communication (Execution imbalance) Execution Region 10ms Async 300ms Heap 300ms List 300ms Vect
Titanium Wish List • Titanium Features that would be nice for our application (yes, you can laugh at them) • Serialization to move objects with encapsulated objects • Better Memory Management (Regions just were not enough) • Global Garbage Collection • Directed memory deletion (i.e. delete object x) • Performance counters/profiling