1 / 29

A race-cure case study

A race-cure case study. A look at how some standard software tools can illuminate what is happening inside Linux. Our recent ‘race’ example.

amir-hart
Download Presentation

A race-cure case study

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A race-cure case study A look at how some standard software tools can illuminate what is happening inside Linux

  2. Our recent ‘race’ example • Our ‘cmosram.c’ device-driver included a ‘race condition’ in its ‘read()’ and ‘write()’ functions, since accessing any CMOS memory-location is a two-step operation, and thus is a ‘critical section’ in our code: outb( reg_id, 0x70 ); datum = inb( 0x71 ); • Once the first step in this sequence is taken, the second step needs to follow

  3. No interventions! • To guarantee the integrity of each access to CMOS memory, we must prohibit every possibility that another control-thread may intervene and access that same i/o-port • The main ways in which an intervention by another ‘thread’ might happen are: • The current CPU could get ‘interrupted’; or • Another CPU could access the same i/o-port

  4. Linux’s solution • Linux provides a function that an LKM can call which is designed to insure ‘exclusive access’ to a CMOS memory-location: datum = rtc_cmos_read( reg_id ); • By using this function, a programmer does not have to expend time and mental effort analyzing the race-condition and devising a suitable ‘cure’ for it

  5. But how does it work? • As computer science students, we are not satisfied with just using convenient ‘black-box’ solutions which we don’t understand • Such purported ‘solutions’ may not always accomplish everything that they claim – if they perform correctly today, they still may fail in some way in the future (if hardware changes); we don’t want to be helpless!

  6. Is ‘open source’ enough? • In theory we could try to track down the actual behavior of the ‘rtc_cmos_read()’ function, by reading Linux’s source-code • But is that really a practical approach? • In some cases the answer might be ‘yes’, but in other situations it might be ‘no’! • Life is short, and the kernel source-files are very numerous – with many layers

  7. ‘LXR’ can help • The Linux Cross-Reference tool offers a way to automate searching kernel source • This tool is online (see our website’s link under ‘Resources’) and it is hosted on a server in Norway: http://lxr.linux.no/ • Here you just click on “Browse the Code”

  8. From: <arch/i386/kernel/time.c> unsigned char rtc_cmos_read(unsigned char addr) { unsigned char val; lock_cmos_prefix( addr ); outb_p( addr, RTC_PORT(0) ); val = inb_p( RTC_PORT(1) ; lock_cmos_suffix( addr ); return val; } EXPORT_SYMBOL( rtc_cmos_read );

  9. Another approach… • There is an alternative to searching kernel source files -- which may well be faster • We can use some standard command-line tools, including ‘objdump’ and ‘grep’ • In this approach, we look at the compiled kernel’s object-file, named ‘vmlinux’, found normally in the ‘/usr/src/linux’ subdirectory • Using ‘objdump’ that file can be parsed!

  10. ‘objdump’ can disassemble • Change the current working directory: $ cd /usr/src/linux • Then, to disassemble the ‘vmlinux’ kernel file we use can this command: $ objdump -d vmlinux • But the amount of output will be huge, so it’s hard to find the part we’re interested in

  11. ‘grep’ can do filtering • If we want to see the ‘rtc_cmos_read’ code we could use ‘grep’ to eliminate irrelevant parts of the disassembly-output: $ objdump –d vmlinux | grep rtc_cmos_read • But we still see too many lines of output (because the ‘rtc_cmos_read()’ function gets called at many places in the kernel)

  12. ‘System.map’ • We can use a special textfile, located in the ‘/boot’ directory, which tells us where each ‘exported’ kernel-symbol will reside at run-time in the virtual address-space • You can use ‘cat’ to look at this textfile: $ cat /boot/System.map • And you can use ‘grep’ to find only the symbol you care about: $ cat /boot/System.map | grep rtc_cmos_read

  13. Example on our machines $ cat /boot/System.map-2.6.22.5cslabs | grep rtc_cmos_read c0105574 T rtc_cmos_read c029b8a8 r __ksymtab_rtc_cmos_read c02a0bff r __kstrtab_rtc_cmos_read Note that the usual ‘symbolic link’ is missing from the ‘/boot’ directory on our class and lab machines -- so you have to type a longer name With superuser privileges this could be fixed using the ‘ln’ command: root# ln System.map-2.6.22.5cslabs System.map

  14. Now we know where to look… • From the ‘System.map’ we learn where in the kernel our ‘rtc_cmos_read()’ function will reside • We can ‘extract’ that function’s code, for study purpose, using these steps: • Save the complete ‘vmlinux’ disassembly • Use ‘grep’ to find its starting-address • Use ‘vi’ to delete earlier and later instructions

  15. Step 1: saving the ‘vmlinux’ disassembly $ objdump –d /usr/src/linux/vmlinux > ~/vmlinux.asm • Step 2: finding our function’s entry-point $ cat ~/vmlinux.asm | grep -n c0105574

  16. What we discover Find the line that shows this virtual address (with colon) $ cat vmlinux.asm | grep -n c0105574: 6812:c0105574: 53 push %ebx …and tell us which line-number it’s on OK, here’s that line …and this is it’s line-number

  17. Use a text-editor • Remove all the lines in your ‘vmlinux.asm’ textfile whose line-numbers precede 6812 • Scroll down, to find where your function ends (i.e., find its return-instruction ‘ret’): c01055b7: c3 ret • Delete all the lines that follow the ‘return’

  18. The complete function c0105574 <rtc_cmos_read>: c0105574: 53 push %ebx c0105575: 9c pushf c0105576: 5b pop %ebx c0105577: fa cli c0105578: 64 8b 15 08 20 30 c0 mov %fs:0xc0302008,%edx c010557f: 0f b6 c8 movzbl %al,%ecx c0105582: 42 inc %edx c0105583: c1 e2 08 shl $0x8,%edx c0105586: 09 ca or %ecx,%edx c0105588: a1 3c 99 30 c0 mov 0xc030993c,%eax c010558d: 85 c0 test %eax,%eax c010558f: 75 f7 jne c0105588 <rtc_cmos_read+0x14> c0105591: f0 0f b1 15 3c 99 30 lock cmpxchg %edx,0xc030993c c0105598: c0 c0105599: 85 c0 test %eax,%eax c010559b: 75 eb jne c0105588 <rtc_cmos_read+0x14> c010559d: 88 c8 mov %cl,%al c010559f: e6 70 out %al,$0x70 c01055a1: e6 80 out %al,$0x80 c01055a3: e4 71 in $0x71,%al c01055a5: e6 80 out %al,$0x80 c01055a7: c7 05 3c 99 30 c0 00 movl $0x0,0xc030993c c01055ae: 00 00 00 c01055b1: 53 push %ebx c01055b2: 9d popf c01055b3: 0f b6 c0 movzbl %al,%eax c01055b6: 5b pop %ebx c01055b7: c3 ret

  19. Some ‘magic’ numbers • There are some hexadecimal constants in this code-disassembly which we probably will not understand without more research • This memory-address: 0xc030993c • This i/o-port address: 0x80 • This memory-address: %fs:0xc0302008 • There’s also a jump-target, but we do have some help in deciphering what it means: jne c0105588 <rtc_cmos_read+0x14>

  20. The ‘cmpxchg’ instruction • The ‘cmpxchg’ instruction performs these CPU actions in a single operation: cmpxchg source, destination • The destination-operand is compared with the accumulator-register’s value, and the eflags-bits are adjusted to reflect this comparison’s result • If ZF is set, the value of the source-operand is copied to the destination-operand; otherwise, the destination operand is copied to the accumulator register • A ‘lock’ prefix stops another CPUs’ bus-access

  21. ‘spinlock’ Before the code’s ‘critical section’ we have this: c0105588: a1 3c 99 30 c0 mov 0xc030993c,%eax c010558d: 85 c0 test %eax,%eax c010558f: 75 f7 jne c0105588 <rtc_cmos_read+0x14> c0105591: f0 0f b1 15 3c 99 30 lock cmpxchg %edx,0xc030993c c0105598: c0 c0105599: 85 c0 test %eax,%eax c010559b: 75 eb jne c0105588 <rtc_cmos_read+0x14> Then we have the function’s ‘critical section’ of code: c010559d: 88 c8 mov %cl,%al c010559f: e6 70 out %al,$0x70 c01055a1: e6 80 out %al,$0x80 c01055a3: e4 71 in $0x71,%al c01055a5: e6 80 out %al,$0x80 I/O-port 0x80 has an ‘undefined’ system function used for time-delay And then after the code’s ‘critical section’ we have this: c01055a7: c7 05 3c 99 30 c0 00 movl $0x0,0xc030993c

  22. The ‘System-map’ again • The ‘System.map’ shows what the other mysterious memory-addresses mean: • We see that memory-address c030993c has the label ‘cmos_lock’ (supporting our previous conclusion about a ‘spinlock’); also we get a ‘clue’ about 0xc0302008 $ cat /boot/System.map-2.6.22.5cslabs | grep c030993c c030993c B cmos_lock $ cat /boot/System.map-2.6.22.5cslabs | grep c0302008 c0302008 D per_cpu__cpu_number

  23. What is ‘per_cpu’ data? • With SMP systems there is often a need for each CPU to have its own version of some program-variable’s value • One example: each CPU needs a unique identification-number (used in scheduling tasks for ‘load-balancing’ and respecting ‘processor-affinity’, and keeping track of which CPU now owns a particular ‘lock’) • That’s what ‘per_cpu__cpu_number’ is

  24. Role of segmentation • Linux has a clever way of allowing CPUS to access their ‘per_cpu’ variables using the same name for different locations • This can be arranged by exploiting the CPU’s memory-segmentation architecture • The FS segment-register is used by the kernel to reference identically-named, but differently positioned, storage-locations

  25. Each CPU has its own GDT • The Operating System sets up a Global Descriptor Table for each CPU; it’s an array of memory-segment descriptors: 63 32 segment- base[ 31..24 ] G D segment- limit[ 19..16 ] segment access rights segment- base[ 23..16 ] segment-base[ 15..0 ] segment-limit[ 15..0 ] 31 0 ‘segment-base’ tells where the memory-area begins, ‘segment-limit’ tells how far the memory-area extends, and ‘access rights’ specifies how the memory-area will be used by the CPU (e.g., user or kernel)

  26. In-class exercise #1 • Install our ‘dram.c’ device-driver, so you can run our ‘showgdt.cpp’ application • You will see a CPU’s memory-descriptors (displayed as quadwords in hex format) • You will probably see a slightly different table when you run ‘showgdt’ again – if Linux schedules it on a different CPU

  27. What’s in register FS? • You can use our ‘newinfo.cpp’ utility to quickly create an LKM that displays the values in the CPU’s segment-registers: // using ‘global variables’ simplifies the inline assembly language short _cs, _ds, _es, _fs, _gs, _ss; // global variables int my_get_info( ) { int len; asm(“ mov %cs, _cs \n mov %ds, _ds “); len = sprintf( buf, “CS=%04X DS=%04X \n”, _cs, _ds ); return len; }

  28. In-class exercise #2 • Use the value in the FS segment-register to look up that segment’s ‘base-address’ (different base-address on different CPU) • Convert the ‘virtual’ base-address to its corresponding ‘physical’ base-address • Use our ‘fileview’ utility to look at what’s stored in physical memory at those spots • Check the location: %fs:0xc0302008

  29. ‘virtual-to-physical’ • If a virtual address is not in the ‘high’ area (i.e., if it’s below 0xF8000000), then it is easy to calculate it’s physical address by doing a simple subtraction High Memory Area 0xF8000000 kernel space (1GB) 0xC0000000 user space (3GB) 4GB Subtract 0xC0000000 from virtual address to get physical address – but NOT in HMA virtual address-space

More Related