1 / 17

Lecture 10: Hyper-Threading

Lecture 10: Hyper-Threading. Intel's Hyper-Threading Technology Overview. Intel's Hyper-Threading Technology brings the concept of simultaneous multi-threading to the Intel Architecture. Hyper-Threading Technology makes a single physical processor appear as two logical processors.

ofira
Download Presentation

Lecture 10: Hyper-Threading

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 10: Hyper-Threading

  2. Intel's Hyper-Threading TechnologyOverview • Intel's Hyper-Threading Technology brings the concept of simultaneous multi-threading to the Intel Architecture. • Hyper-Threading Technology makes a single physical processor appear as two logical processors. • the physical execution resources are shared and the architecture state is duplicated for the two logical processors. • From a software or architecture perspective, this means operating systems and user programs can schedule processes or threads to logical processors as they would on multiple physical processors. • From a microarchitecture perspective, this means that instructions from both logical processors will persist and execute simultaneously on shared execution resources.

  3. Thread Level - Amdahl’s Law • Maximum Efficiency • Fraction parallel limits scalability • Key: Parallelize everything significant

  4. CPU 1 CPU 2 CPU 3 Brief Introduction to Threads Multiprocessing • Run threads using multiple processors Multithreading + Multiple Processors = Improved Performance

  5. Brief Introduction to Threads Functional Parallelism • Apply different operations to different data elements Open DB’s Address Book Concurrent Tasks InBox Calendar

  6. Open File Edit Spell Check Brief Introduction to Threads Data Parallelism • Apply the same operation to different data elements function SpellCheck { loop (word = 1, words_in_file) compare_to_dictionary (word); }

  7. Brief Introduction to Threads Thread Libraries – Win32* API • C language interfaces • Threads exist within a single process • Good for asynchronous concurrency • All threads are peers • No explicit parent-child model • Exception: main() thread • Creating Win32* Threads HANDLE CreateThread( LPSECURITY_ATTRIBUTES ThreadAttributes, DWORD StackSize, LPTHREAD_START_ROUTINE StartAddress, LPVOID Parameter, DWORD CreationFlags, LPDWORD ThreadId ); Functions are explicitly mapped to threads Thread handle is a synchronization object *Other names and brands may be claimed as the property of others

  8. Pentium 4 Block diagram

  9. Execution pipeline • A high-level view of the microarchitecture pipeline. • buffering queues separate major pipeline logic blocks. • The buffering queues are either partitioned or duplicated to ensure independent forward progress through each logic block.

  10. Fetch and deliver • Alternate between logical processors • Execution trace cache/ Microcode ROM • Fetch and decode instructions • Register rename and allocation

  11. Execution • The out-of-order execution engine consists of the allocation, register renaming, scheduling, and execution functions • Logical processors execute simultaneously • Compete for schedulers, ALUs • Schedulers map independent instructions to available execution resources

  12. The memory subsystem • The memory subsystem includes: • The DTLB: translates addresses to physical addresses. It has 64 fully associative entries; each entry can map either a 4K or a 4MB page. • Although the DTLB is a shared structure between the two logical processors, each entry includes a logical processor ID tag. • Each logical processor also has a reservation register to ensure fairness and forward progress in processing DTLB misses. • the low-latency Level 1 (L1) data cache • the Level 2 (L2) unified cache, • and the Level 3 unified cache (the Level 3 cache is only available on the Intel® XeonTM processor MP). • Access to the memory subsystem is also largely oblivious to logical processors. • The schedulers send load or store uops without regard to logical processors and the memory subsystem handles them as they come.

  13. Instruction retirement • Alternate between logical processors • Commit state in program order

  14. Hyper Threading implementation • Two logical processors for very small additional die area • Alternate between logical processors • Fetch and deliver • Reorder and retire • Competitive sharing between logical processors • Rapid execution engine • Caches

  15. OS support • From the OS point of view HT is just like multi processing • Needs BIOS support for initialization • HLT instruction (for idle) • The HLT instruction stops instruction execution and places the processor in HALT stat. An enabled interrupt, NMI or Reset resume execution. The return instruction from HLT is the next instruction • There are two modes of operation referred to as single-task (ST) or multi-task (MT). • On a processor with Hyper-Threading Technology, executing HALT transitions the processor from MT-mode to ST0- or ST1-mode, depending on which logical processor executed the HALT • Spin loops: • Use new pause instruction • For long wait use the OS call • Wait on object

More Related