380 likes | 391 Views
Explore the measurement of time intervals and cycles in Linux environments for computer performance evaluation and programming. Learn about interval timers, cycle counters, and practical implementation in C language.
E N D
Lecture 4Timing Linux Processes & Threads CSCE 713 Advanced Computer Architecture • Topics • … • Readings January 19, 2012
Overview • Last Time • PosixPthreads: create, join, exit, mutexes • /class/csce713-006 Code and Data • Readings for today • http://csapp.cs.cmu.edu/public/1e/public/ch9-preview.pdf • New • Website alive and kicking; dropbox too! • From Last time: Gauss-Seidel Method, Barriers, Threads Assignment • Next time performance evaluation, barriers and MPI intro
Threads programming Assignment • Matrix addition (embarassingly parallel) • Versions • Sequential • Sequential with blocking factor • Sequential Read without conversions • Multi threaded passing number of threads as command line argument (args.c code should be distributed as an example) • Plot of several runs • Next time
Time in the Computer World • Clock cycle time = ? • 1GHz processor has 10-9 = 1 ns clock cycle time Computer Systems: A Programmers Perspective, Bryant and O’Hallaron
Times in the Unix World Command • Real Time • User Time • System Time • Waiting Time – time waiting on I/O operations and while other processes execute • Clock Time • Which time is most important? • Latency vs throughput
The Time Command • TIME(1) TIME(1) • NAME • time - run programs and summarize system resource usage • Example of a sleepy program • real 3m10.006s • user 0m0.004s • sys 0m0.000s
MIPS, MFLOPS, • Benchmarks • SPEC2006 – relevance to this course
Multitasking and Interval Interrupts • Recall Multitasking from OS class: • multiple processes ready to go • scheduler decides who goes next • each process given small time slice (1-10ms) • The process end either because of an I/O operation, page-fault, even cache-miss, or it uses up time slice • System has an “interval timer” to interupt at the end of the time slice
CSAPP Fig 9.2 • . Computer Systems: A Programmers Perspective, Bryant and O’Hallaron
Measuring Time by Interval Counting • . Computer Systems: A Programmers Perspective, Bryant and O’Hallaron
Practice Problem 9.4 • On a system with a timer interval of 10 ms, some segment of process A is recorded as requiring 70 ms, combining both system and user time. What are the minimum and maximum actual times used by thissegment?
Programmer Access to Interval Timers • #include <sys/times.h> • structtms • clock_ttms_utime; /* user time * / • clock_ttms_s time; /* system time * / • clock_ttms_cutime; /* user time of reaped children */ • clock_ttms_cstime; /* system time of reaped children */ • } ; • clock_t times(structtms *buf); • Returns: number of clock ticks elapsed since system started Computer Systems: A Programmers Perspective, Bryant and O’Hallaron
Cycle Counters on the IA32 • Cycle counters increment every clock cycle • 32 bit counter with 1GHz clock wraps after 232/109 ~ 4.3 seconds • 64 bit counter takes a lot longer 264/109 ~ 570 years • IA32 counter accessed with the “rdtsc” • (read time stamp counter) instruction • It places high order 32 bits in %edx and low 32 in %eax • It would be nice if we had a c interface • void access counter(unsigned *hi, unsigned *lo);
Including Assembly Code in C • void access_counter(unsigned *hi, unsigned *lo) • { • asm("rdtsc; movl %%edx,%0; movl %%eax,%1" /* Read cycle counter */ • : "=r" (*hi), "=r" (*lo) /* and move results to */ • : /* No input */ /* the two outputs */ • : "%edx", "%eax"); • }
Closer Look at Extended ASM void access_counter (unsigned *hi, unsigned *lo) { /* Get cycle counter */ asm("rdtsc; movl %%edx,%0; movl %%eax,%1" : "=r" (*hi), "=r" (*lo) : /* No input */ : "%edx", "%eax"); } asm(“Instruction String" : Output List : Input List : Clobbers List); } • Instruction String • Series of assembly commands • Separated by “;” or “\n” • Use “%%” where normally would use “%”
Closer Look at Extended ASM void access_counter (unsigned *hi, unsigned *lo) { /* Get cycle counter */ asm("rdtsc; movl %%edx,%0; movl %%eax,%1" : "=r" (*hi), "=r" (*lo) : /* No input */ : "%edx", "%eax"); } asm(“Instruction String" : Output List : Input List : Clobbers List); } • Output List • Expressions indicating destinations for values %0, %1, …, %j • Enclosed in parentheses • Must be lvalue • Value that can appear on LHS of assignment • Tag "=r" indicates that symbolic value (%0, etc.), should be replaced by register
Closer Look at Extended ASM void access_counter (unsigned *hi, unsigned *lo) { /* Get cycle counter */ asm("rdtsc; movl %%edx,%0; movl %%eax,%1" : "=r" (*hi), "=r" (*lo) : /* No input */ : "%edx", "%eax"); } asm(“Instruction String" : Output List : Input List : Clobbers List); } • Input List • Series of expressions indicating sources for values %j+1, %j+2, … • Enclosed in parentheses • Any expression returning value • Tag "r" indicates that symbolic value (%0, etc.) will come from register
Closer Look at Extended ASM void access_counter (unsigned *hi, unsigned *lo) { /* Get cycle counter */ asm("rdtsc; movl %%edx,%0; movl %%eax,%1" : "=r" (*hi), "=r" (*lo) : /* No input */ : "%edx", "%eax"); } asm(“Instruction String" : Output List : Input List : Clobbers List); } • Clobbers List • List of register names that get altered by assembly instruction • Compiler will make sure doesn’t store something in one of these registers that must be preserved across asm • Value set before & used after
Using access_counter • #include "clock.h" • void start counter(); • double get counter();
The K-Best Measurement Scheme • Measurements of process time in a loaded system are always overestimates • Context switching, cache operations, branch predictions • Interference from other processes • Even unloaded I/O operations will vary because of locations of disk heads etc. • Heisenberg uncertainty principle • Heisenbugs – bugs influenced/created by monitoring code
The K-Best Measurement Scheme • Warm up the instruction cache • Record the K fastest times • If these agree with some tolerance say .1% then the fastest of these represents the true execution time.
Measuring Using Time of DayCode/CSAPP/perf/tod. c • #include <stdlib.h> • #include <stdio.h> • #include <time.h> • #include <sys/time.h> • #include <unistd.h> • static structtimevaltstart; • /* Record current time */ • void start_timer() • { • gettimeofday(&tstart, NULL); • }
/* Get number of seconds since last call to start_timer */ • double get_timer() • { • structtimevaltfinish; • long sec, usec; • gettimeofday(&tfinish, NULL); • sec = tfinish.tv_sec - tstart.tv_sec; • usec = tfinish.tv_usec - tstart.tv_usec; • return sec + 1e-6*usec; • }
/* Determine how many "seconds" are in tod counter */ • static void callibrate_tod() • { • double quick, slow; • start_timer(); • quick = get_timer(); • start_timer(); • sleep(1); • slow = get_timer(); • printf("%.2f - %f = %.2f seconds/sleep second\n", • slow, quick, slow-quick); • }
static void run_timer() • { • inti = 0; • double t1, t2, t3, t4; • start_timer(); • t1 = get_timer(); • do { • t2 = get_timer(); • } while (t2 == t1); • do { • t3 = get_timer(); • i++; • } while (t3 == t2); • printf("Time = %f usecs, in %d iterations, %f usec/iteration\n“, 1e6 *(t3-t2), i, 1e6 *(t3-t2)/i);
start_timer(); • i = 0; • do { • t4 = get_timer(); • i++; • } while (t4 < 1.0); • printf("%d iterations in %f secs = %f usec/iteration\n", • i, t4, 1e6*t4/i); • }
static void check_time() { • long e; • int s, m, h, d, y; • start_timer(); • e = tstart.tv_sec; • s = e % 60; • e = e / 60; • m = e % 60; • e = e / 60; • h = e % 24; • e = e / 24; • d = e % 365; • e = e / 365; • y = e; • printf("This clock started %d years, %d days, %d hours, %d minutes, %d seconds ago\n“, y, d, h, m, s); • }
int main(intarg, char *argv[]) • { • callibrate_tod(); • run_timer(); • check_time(); • return 0; • }
Gprof – profiling Linux programs • Links • https://computing.llnl.gov/tutorials/performance_tools/ • http://www.cs.utah.edu/dept/old/texinfo/as/gprof_toc.html • http://www.cs.utah.edu/dept/old/texinfo/as/gprof.html • http://en.wikipedia.org/wiki/Profiling_%28computer_programming%29 • http://en.wikipedia.org/wiki/List_of_performance_analysis_tools • http://docs.freebsd.org/44doc/psd/18.gprof/paper.pdf • http://linuxgazette.net/100/vinayak.html • http://ececmpsysweb.groups.et.byu.net/cmpsys.2004.winter/citizenship/Bryan_Wheeler/Profiling_Tutorial.html
Man gprof • GPROF(1) GNU GPROF(1) • NAME gprof- display call graph profile data • SYNOPSIS • gprof [ -[abcDhilLrsTvwxyz] ] [ -[ACeEfFJnNOpPqQZ][name] ] • [ -I dirs ] [ -d[num] ] [ -k from/to ] • …] • [ --[no-]exec-counts[=name] ] • … • [ --debug[=level] ] [ --function-ordering ] • … • [ --file-info ] [ --help ] [ --line ] [ --min-count=n ] • … • [ --static-call-graph ] [ --sum ] [ --table-length=len ] • … • [ image-file ] [ profile-file ... ]
Gprof – GNU profiler • http://www.unix-tutorials.com/go.php?id=321 -ArnoutEngelen • $ time event2dot //this took more than 3 minutes on this input: • real 3m36.316s user 0m55.590s sys 0m1.070s • $ g++ -pg dotgen.cpp readfile.cpp main.cpp graph.cpp config.cpp -o event2dot • $ gprof event2dot | less'. • gprof now shows us the following functions are important: • % cumulative self self total • time seconds seconds calls s/call s/call name • 43.32 46.03 46.03 339952989 0.00 0.00 CompareNodes(…) • 25.06 72.66 26.63 55000 0.00 0.00 getNode(…) • 16.80 90.51 17.85 339433374 0.00 0.00 CompareEdges(…) • 12.70 104.01 13.50 51987 0.00 0.00 addAnnotatedEdge(… ) • 1.98 106.11 2.10 51987 0.00 0.00 addEdge(…)
Valgrind • Valgrind is a memory mismanagement detector. It shows you memory leaks, deallocation errors, cache profiling • Memcheck can detect: • Use of uninitialised memory • Reading/writing memory after it has been free'd • Reading/writing off the end of malloc'd blocks • Reading/writing inappropriate areas on the stack • Memory leaks -- where pointers to malloc'd blocks are lost forever • Mismatched use of malloc/new/new; vs free/delete/delete; • Overlapping src and dst pointers in memcpy() and related functions • Some misuses of the POSIX pthreads API http://cs.ecs.baylor.edu/~donahoo/tools/valgrind/
General Parallel Computing Wiki-Links • http://en.wikipedia.org/wiki/Category:Parallel_computing • http://en.wikipedia.org/wiki/Intel_Parallel_Studio • http://www.itservices.hku.hk/sp2/workshop/html/ibmhwsw/ibmhwsw.html • http://www.itservices.hku.hk/sp2/workshop/html/ibmhwsw/ibmhwsw.html#power
Libraries • http://static.msi.umn.edu/tutorial/scicomp/sp/math-numerical-lib-SP/num_lib.html