Prelude to Multiprocessing

Prelude to Multiprocessing Detecting cpu and system-board capabilities with CPUID and the MP Configuration Table

CPUID • Recent Intel processors provide a ‘cpuid’ instruction (opcode 0x0F, 0xA2) to assist software in detecting a CPU’s capabilities • If it’s implemented, this instruction can be executed in any of the processor modes, and at any of its four privilege levels • But this ‘cpuid’ instruction might not be implemented (e.g., 8086, 80286, 80386)

Intel x86 EFLAGS register 31 16 21 0 0 0 0 0 0 0 0 0 0 I D V I P V I F A C V M R F 15 0 0 N T IOPL O F D F I F T F S F Z F 0 A F 0 P F 1 C F Software can ‘toggle’ the ID-bit (bit #21) in the 32-bit EFLAGS register if the processor is capable of executing the ‘cpuid’ instruction

But what if there’s no EFLAGS? • The early Intel processors (8086, 80286) did not implement any 32-bit registers • The FLAGS register was only 16-bits wide • So there was no ID-bit that software could try to ‘toggle’ (to see if ‘cpuid’ existed) • How can software be sure that the 32-bit EFLAGS register exists within the CPU?

Detecting 32-bit processors • There’s a subtle difference in the way the logical shift/rotate instructions work when register CL contains the ‘shift-factor’ • On the 32-bit processors (e.g., 80386+) the value in CL is truncated to 5-bits, but not so on the 16-bit CPUs (8086, 80286) • Software can exploit this distinction, in order to tell if EFLAGS is implemented

Detecting EFLAGS # Here’s a test for the presence of EFLAGS mov $-1, %ax # a nonzero value mov $32, %cl # shift-factor of 32 shl %cl, %ax # do logical shift or %ax, %ax # test result in AX jnz is32bit # EFLAGS present jmp is16bit # EFLAGS absent

Testing for ID-bit ‘toggle’ # Here’s a test for the presence of the CPUID instruction pushfl # copy EFLAGS contents pop %eax # to accumulator register mov %eax, %edx # save a duplicate image btc $21, %eax # toggle the ID-bit (bit 21) push %eax # copy revised contents popfl # back into EFLAGS pushfl # copy EFLAGS contents pop %eax # back into accumulator xor %edx, %eax # do XOR with prior value bt $21, %eax # did ID-bit get toggled? jc y_cpuid # yes, can execute ‘cpuid’ jmp n_cpuid # else ‘cpuid’ unimplemented

How does CPUID work? • Step 1: load value 0 into register EAX • Step 2: execute ‘cpuid’ instruction • Step 3: Verify ‘GenuineIntel’ character- string in registers (EBX,EDX,ECX) • Step 4: Find maximum CPUID input-value in the EAX register

Version and Features • load 1 into EAX and execute CPUID • Processor model and stepping information is returned in register EAX • 20 19 16 13 12 11 8 7 4 3 0 Extended Family ID Extended Model ID Type Family ID Model Stepping ID

Some Feature Flags in EDX 28 H T T 13 9 3 2 1 0 P G E A P I C P S E D E V M E F P U HTT = HyperThreading Technology (1 = yes, 0 = no) PGE = Page Global Entries (1=yes, 0=no) APIC = Advanced Programmable Interrupt Controller on-chip (1 = yes,0 = no) PSE = Page-Size Extensions (1 = yes, 0 = no) DE = Debugging Extensions (1=yes, 0=no) VME = Virtual-8086 Mode Enhancements (1 = yes, 0 = no) FPU = Floating-Point Unit on-chil (1=yes, 0=no)

Some Feature Flags in ECX 5 V M X VMX = Virtual Machine Extensions (1 = yes, 0 = no)

Multiprocessor Specification • It’s an industry standard, allowing OS software to use multiple processors in a uniform way • OS software searches in three regions of the physical address-space below 1-megabyte for a “paragraph-aligned” data-structure of length 16-bytes called the MP Floating Pointer Structure: • Search in lowest KB of Extended Bios Data Area • Search in topmost KB of conventional 640K RAM • Search in the 128KB ROM-BIOS (0xE0000-0xFFFFF)

MP Floating Pointer Structure • This structure may contain an ID-number for one a small number of standard SMP system architectures, or may contain the memory address for a more extensive MPConfiguration Table having entries that specify a “customized” system architecture • The machines in our classroom employ the latter of these two options

An example record • The MP Configuration Table will contain a record for each logical processor reserved (=0) reserved (=0) Feature Flags CPU signature (stepping, model, family) CPU Flags BP (bit 1), EN (bit 0) Local-APIC version Local-APIC ID Entry Type 0 BP = Bootstrap Processor (1=yes, 0=no), EN = Enabled (1=yes, 0=no)

Our ‘mpinfo.cpp’ utility • We created a Linux utility that will display the system-information contained in the MP Configuration Table (in hex format) • You can refer to the ‘MP Specification 1.4’ document (online) to interpret this display • This utility needs a device-driver ‘dram.c’ to be pre-installed (in order that it be able to directly access the system’s memory)

A processor’s Local-APIC • The purpose of each processor’s APIC is to allow the CPUs in a multiprocessor system to send messages to one another and to manage the delivery of the interrupt-requests from the various peripheral devices to one (or more) of the CPUs in a dynamically programmable way • Each processor’s Local-APIC has a variety of registers, all ‘memory mapped’ to paragraph-aligned addresses within the 4KB page at physical-address 0xFEE00000

Local-APIC’s register-space APIC 0xFEE00000 4GB physical address-space RAM 0x00000000

Analogies with the PIC • Among the registers in a Local-APIC are these (which had analogues in the older 8259 PIC’s design: • IRR: Interrupt Request Register (256-bits) • ISR: In-Service Register (256-bits) • TMR: Trigger-Mode Register (256-bits) • For each of these, its 256-bits are divided among eight 32-bit register addresses

New way to do ‘EOI’ • Instead of using a special End-Of-Interrupt command-byte, the Local-APIC contains a dedicated ‘write-only’ register (named the EOI Register) which an Interrupt Handler writes to when it is ready to signal an EOI # issuing EOI to the Local-APIC mov $0xFEE00000, %ebx # address of the cpu’s Local-APIC movl $0, %fs:0xB0(%ebx) # write any value into EOI register # Here we assume segment-register FS holds the selector for a segment-descriptor # for a ‘writable’ 4GB-size expand-up data-segment whose base-address equals 0

Each CPU has its own timer! • Four of the Local-APIC registers are used to implement a programmable timer • It can privately deliver a periodic interrupt (or one-shot interrupt) just to its own CPU • 0xFEE00320: Timer Vector register • 0xFEE00380: Initial Count register • 0xFEE00390: Current Count register • 0xFEE003E0: Divider Configuration register

Timer’s Local Vector Table 0xFEE00320 7 0 12 17 16 M O D E M A S K B U S Y Interrupt ID-number MODE: 0=one-shot 1=periodic MASK: 0=unmasked 1=masked BUSY: 0=not busy 1=busy

Timer’s ‘Divide-Configuration’ 0xFEE003E0 3 2 1 0 reserved (=0) 0 Divider-Value field (bits 3, 1, and 0) 000 = divide by 2 001 = divide by 4 010 = divide by 8 011 = divide by 16 100 = divide by 32 101 = divide by 64 110 = divide by 128 111 = divide by 1

Initial and Current Counts 0xFEE00380 Initial Count Register (read/write) 0xFEE00390 Current Count Register (read-only) When the timer is programmed for ‘periodic’ mode, the Current Count is automatically reloaded from the Initial Count register, then counts down with each CPU bus-cycle, generating an interrupt when it reaches zero

Using the timer’s interrupts • Setup your desired Initial Count value • Select your desired Divide Configuration • Setup the APIC-timer’s LVT register with your desired interrupt-ID number and counting mode (‘periodic’ or ‘one-shot’), and clear the LVT register’s ‘Mask’ bit to initiate the automatic countdown operation

In-class exercise #1 • Run the ‘cpuid.cpp’ Linux application (on our course website) to see if the CPUs in our classroom implement HyperThreading (i.e., multiple logical processors in a cpu) • Then run the ‘mpinfo.cpp’ application, to see if the MP Base Configuration Table has entries for more than one processor • If both results hold true, then we can write our own multiprocessing software in H235!

In-class exercise #2 • Run the ‘apictick.s’ demo (on our CS 630 website) to observe the APIC’s ‘periodic’ interrupt-handler drawing ‘T’s onscreen • It executes for ten-milliseconds (the 8254 is used here to create that timed delay) • Try reprogramming the APIC’s Divider Configuration register, to cut the interrupt frequency in half (or perhaps to double it)

Prelude to Multiprocessing

Prelude to Multiprocessing

Presentation Transcript

Prelude to War

Prelude to Fusebox

Prelude to Revolution

Prelude to War

Prelude to Disunion

Prelude to WWII

Prelude to War

Prelude to War

Prelude to War

Prelude to WWII

PRELUDE TO WAR

Prelude to War

Prelude to Multiprocessing

Prelude to Fusebox

Prelude to War

Prelude to Interviews

Prelude to War

Prelude to War

Prelude to War

Prelude to Exploration

Prelude to Revolution