490 likes | 652 Views
Computer Architecture Chapter 1 Computer Abstractions and Technology. Yu-Lun Kuo 郭育倫 Department of Computer Science and Information Engineering Tunghai University, Taichung, Taiwan R.O.C. sscc6991@gmail.com http://www.csie.ntu.edu.tw/~d95037/. This book.
E N D
Computer ArchitectureChapter 1 Computer Abstractions and Technology Yu-Lun Kuo 郭育倫 Department of Computer Science and Information Engineering Tunghai University, Taichung, Taiwan R.O.C. sscc6991@gmail.com http://www.csie.ntu.edu.tw/~d95037/
This book • http://www.elsevierdirect.com/product.jsp?isbn=9780123744937
Related Courses Parallel & Advanced Computer Architecture Parallel Architectures, Hardware-Software Interactions System Optimization ComputerOrganization Computer Architecture Hardware-Software Co-design Why, Analysis, Evaluation How to build it, Implementation details How to make embedded systems better Software Embedded Systems Software Special Topics on Computer Performance Optimization OS, Programming Lang, System Programming RTOS, Tools-chain, I/O & Device drivers, Compilers Performance tools, Performance skills, Compiler optimization tricks
Computer Architecture and Organization • Architecture is those attributes visible to the programmer • Instruction set, number of bits used for data representation, I/O mechanisms, addressing techniques. • e.g. Is there a multiply instruction? • Organization is how features are implemented • Control signals, interfaces, memory technology. • e.g. Is there a hardware multiply unit or is it done by repeated addition?
Computer Architecture and Organization • All Intel x86 family share the same basic architecture • The IBM System/370 family share the same basic architecture • This gives code compatibility • At least backwards • Organization differs between different versions
Class of Computing Applications (1/2) • Desktop computers • Emphasize delivering good performance to a single user at low cost • Price-performance, Graphics performance • Intel, AMD, Apple, Microsoft, Linux • Servers • Accessed only via a network • Provide for greater expandability of both computing and input/output capacity • Availability, Scalability, Throughput • IBM, HP-Compaq, Sun, Intel, Microsoft, Linux
Class of Computing Applications (2/2) • Supercomputers • Consist of hundreds to thousands of processors • Usually gigabytes to terabytes of memory • Terabyte to petabytes of storage • Cost million to hundreds of millions of dollars • Embedded computers • Computer inside another device • Include the microprocessors • Washing machine, car, cell phone, video game, PDA, and digital TVs
Where is the Market? 百萬台電腦 圖1.1從1988至2002年,不同種類的處理器的銷售量。這些數字的獲得有些許不同,因此需要注意這些結果的解釋。如桌上型電腦和伺服器的總數計算完整的電腦系統,因為其中的一部份為多重處理器,使的處理器的銷售數字較高些,但大約只有全部的10~20%(由於伺服器平均雖有著超過一顆以上的處理器,但僅為單一處理器系統的桌上型電腦銷售量3%)。嵌入式電腦的總數,實際上是計算處理器的數目。有些嵌入式系統是看不見處理器的,更有些單一設備卻有多顆的處理器。
Instruction Set Architecture (ISA) • ISA: An abstract interface between the hardware and the lowest level software of a machine that encompasses all the information necessary to write a machine language program that will run correctly, including instructions, registers, memory access, I/O, and so on. “... the attributes of a [computing] system as seen by the programmer, i.e.,the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls, the logic design, and the physical implementation.” – Amdahl, Blaauw, and Brooks, 1964
百萬顆處理器 圖1.2 1998至2002年所有的指令集架構為處理器的銷售量。關於「其餘」 的種類是指定應用或客製化的處理器。在ARM的例子裡,大約有80%的 銷售量是使用在手機上,他們結合了ARM和特定應用邏輯在單一晶片上。
Hierarchical Layers • System Software • Sitting between the hardware and applications software • Including operating systems, compilers, and assemblers
Compilers & assemblers • Compilers • Translation of a program written in a high-level language, such as C or JAVA, into instructions that the hardware can execute • Assemblers • Translates a symbolic version of an instruction into the binary version • Assembly language • A symbolic representation of machine instructions
編譯器 組譯器 高階語言程式 (c語言) 圖1.4 C程式編譯成組合語言 在組譯成二位元機械語言。 雖然從高階語言轉譯成二位 元機械語言有兩個步驟,有 些編譯器會將中間過程刪除 ,直接產生二位元機械語言。 這些語言和程式在第二章會 有更為詳細的介紹。 組合語言程式 (MIPS規格) 二位元機械 語言程式 (MIPS規格)
Operating System Compiler Firmware Instruction Set Architecture (ISA) Instr. Set Proc. I/O system Datapath & Control Digital Design Circuit Design Layout & fab What is “Computer Architecture”? Applications • Coordination of many levels of abstraction • Under a rapidly changing set of forces • Design, Measurement, and Evaluation Semiconductor Materials
Registers vs. Memory • Arithmetic instructions operands must be registers, • only 32 registers provided • Compiler associates variables with registers • What about programs with lots of variables
Impacts of Advancing Technology • Processor • logic capacity: increases about 30% per year • performance: 2x every 1.5 years ClockCycle = 1/ClockRate 500 MHz ClockRate = 2 nsec ClockCycle 1 GHz ClockRate = 1 nsec ClockCycle 4 GHz ClockRate = 250 psecClockCycle
Impacts of Advancing Technology • Memory • DRAM capacity: 4x every 3 years, now 2x every 2 years • memory speed: 1.5x every 10 years • cost per bit: decreases about 25% per year • Disk • capacity: increases about 60% per year
圖1.6 桌上型電腦。液晶顯示螢幕是主要的輸出裝置,鍵盤與滑鼠為主要 的輸入裝置。主機箱內則包含了處理器和額外的輸入/輸出裝置。本圖是 Dell Optiplex GX260系統。
DVD 驅動器 電源 供應器 ZIP 驅動器 有罩子 的風散 主機板 硬碟 圖1.8 在15頁圖1.6的個人電腦內部圖。這種包裝因為它開啟的方式,旁邊有絞鍊 ,所以有時稱做蛤殼式(clamshell)包裝。為了看看裡邊有什麼,我們從左上角開始。 左上角的金屬盒是電源供應器,下方是個有罩子的風散。在風扇的右下方是印刷 電路板(printed circuit (PC)board),在電腦裡稱做主機板,包含了電腦裡大部分的 電子零件。圖1.10是個接近此種板子的圖例。處理器就是在風扇右邊的大型凸起 矩形物。在右手邊我們可以看見擺放各種驅動盤機器的隔間,最上面是DVD驅動 器,中間是ZIP驅動器,下面是硬碟。
Example Machine Organization • Workstation design target • 25% of cost on processor • 25% of cost on memory (minimum memory size) • Rest on I/O devices, power supplies, box Computer CPU Memory Devices Control Input Datapath Output
編譯器 介面 電腦 輸入 控制單元 資料路徑 效能評估 輸出 處理器 記憶體 圖1.5 構成電腦五種要素的組織圖。處理器從記憶體中抓取指令和資料。 記憶體中的資料由輸入裝置寫入,並由輸出裝置讀出。控制單元則送出 運作訊號以決定資料流程、記憶體、輸入和輸出裝置的動作。
控制 單元 控制 單元 輸入/輸 出介面 其它介面邏輯 指令快取記憶體 資料快取 記憶體 增強型浮點 及多媒體運 算單元 第二階 快取及 記憶體 介面 控制單元 控制 單元 進階管線化多執 行緒支援單元 圖1.9 在圖1.8的電路板上所使用的處理器的內部圖。左手邊的是Pentium4處理器晶片 的縮影照片,右手邊則顯示了該處理器內部的主要區塊。
處理器 記憶體 碟盤及通 用序列埠 介面 處理器 介面 圖形化介面卡 輸入/輸出裝置 匯流排插槽 圖1.10 貼近個人電腦主機板。這塊板子使用Intel Pentium 4處理器,位 於板子的左上角。它的上面覆蓋了一個似鰭狀的金屬散熱器。這是個散 熱裝置,幫助晶片散去熱量。記憶體部分包含了一個或多個電路板,垂 直插在主機板上,靠近中央。動態隨機存取記憶體鑲嵌在這些小電路板 上(稱之為雙同軸記憶體模組(dual inline memory modules,DIMMS)),然 後插入進接器。主機板上其餘的大部分用來連接外部輸入/輸出裝置, 如音頻信號/MIDI、右邊的平行/序列埠、底部的兩個週邊元件連接介面 (PCI)卡插槽和連接硬碟的進階連接技術(advanced technology attachment,ATA)連接器。
Safe Place for Data • Memory • Primary memory (Main memory) • Volatile, when it loses power • Secondary memory • Nonvolatile memory • Magnetic disk – hard disk • Floppy disks • Optical disks • CDs, DVDs, HDVD, BD • Flash based removable memory
Total transistors in PCs • 1972 – 4004 - 2000 trs • 1974 – 8080 - 7000 trs • 1978 – 8086 - 50,000 trs • 1982 – 286 - 200,000 trs • 1985 – 386 - 500,000 trs • 1987 – 486 - 1 million trs • 1992 – Pentium - 5 million trs • 1995 – Pentium II - 7 million trs • 1999 – Pentium III - 10 million trs
Moore’s Law • In 1965, Gordon Moore predicted that the number of transistors that can be integrated on a die would double every 18 to 24 months (i.e., grow exponentially with time). • Amazingly visionary – million transistor/chip barrier was crossed in the 1980’s. • 2300 transistors, 1 MHz clock (Intel 4004) - 1971 • 16 Million transistors (Ultra Sparc III) • 42 Million transistors, 2 GHz clock (Intel Xeon) – 2001 • 55 Million transistors, 3 GHz, 130nm technology, 250mm2 die (Intel Pentium 4) - 2004 • 140 Million transistor (HP PA-8500)
Moore’s Law • “Cramming More Components onto Integrated Circuits” • Gordon Moore, Electronics, 1965 • # on transistors on cost-effective integrated circuit double every 18 months
空白晶圓 矽碇 薄片 測試過 的晶圓 測試過的晶片 圖樣晶圓 晶圓 測試機 將晶片 封裝 切割機 測試過的 封裝晶片 封裝過的晶片 賣給 顧客 零件 測試機 20到40道的 製程 圖1.14 晶片的製造過程。矽碇在切成薄片後,空白的晶圓會經過20到40道的圖 樣製造(查閱第28頁圖1.15),處理過後的晶圓會以晶圓測試機測試,並顯示好的 部份的電腦映圖。之後晶圓會被切成一塊一塊的小方塊,(查閱第19頁的圖1.9) 。在本圖裡,這片晶圓有20個晶片,其中有17個通過測試(x表示壞的晶片)。本 例中的良率是17/20/即85%,之後好的晶片會封裝起來,在賣給消費者前在測試 一次。這個例子裡,封裝過後的晶片有一顆是壞的。
圖1.15 包含了Intel Pentium 4晶片的8吋(200mm)晶圓。百分之百良率 的晶圓裡,有165顆Pentium晶片。第19頁圖1.9便是這些Pentium4晶片 的顯微照片。一顆晶片的面積為250 ,裡頭有5500萬顆電晶體, 使用0.18製程,意思是最小的電晶體大小約0.18微米,然而一般來說它 們會稍微較實際的製程大小較小些,而實際的製程大小意指電晶體的大 小相對於最後製造出的大小是差不多的。Pentium4晶片也有使用更先進 的0.13製程製造。晶圓的周圍有數十顆部份製造的晶片是無用的,它們 之所以會被製造,是如此一來會較容易設計晶圓圖樣所需的光罩圖。
圖1.16 散熱片上的Intel Pentium4(3.06Ghz)晶片,散熱片要散去 晶片所製造出的82瓦熱量。
年 使用於電腦的技術 相對效能/單位成本 • 真空管(vacuum tube) 1 • 電晶體 35 • 積體電路 900 • 超大型積體電路 2,400,000 • 2005 極大型積體電腦 6,200,000,000 圖1.12 長時間以來,使用在電腦的各項技術其單位成本的相對效能。 資料來源:波士頓電腦博物館,2005年為作者推算而得。
效能 圖1.17 1978~2003年,工作站效能增進圖。此處,效能以大約比VAX-11/780 快幾倍的數字表示,這是常用的衡量標準。每年的效能成長率介於1.5和1.6倍 間。這些效能數字是基於SPECint(見第二章),根據時間之不同調整以應付測試 程式的變動。處理器名字後方所列出的x/y,x是模型數字,y是速度(MHz)。
千位元容量 發表時間 圖1.13 動態隨機存取記憶體晶片隨時間演變的容量成長圖。Y軸以千位元 做量測,千指的是1024 。這二十年來,動態隨機存取記憶體工業幾乎 每三年便會提高四倍的容量,相當每年百分之六十。每三年增加四倍的估 計為動態隨機存取記憶體的成長法則。近年來,成長率已經逐漸趨緩,而 收為接近每二年倍增或每四年增加四倍。
Seagate 373453, 2003 15000 RPM (4X) 73.4 GBytes (2500X) Tracks/Inch: 64000 (80X) Bits/Inch: 533,000 (60X) Four 2.5” platters (in 3.5” form factor) Bandwidth: 86 MBytes/sec (140X) Latency: 5.7 ms (8X) Cache: 8 MBytes CDC Wren I, 1983 3600 RPM 0.03 GBytes capacity Tracks/Inch: 800 Bits/Inch: 9550 Three 5.25” platters Bandwidth: 0.6 MBytes/sec Latency: 48.3 ms Cache: none Disks: Archaic (Nostalgic) vs. Modern (Newfangled)
Performance Milestones Disk: 3600, 5400, 7200, 10000, 15000 RPM (8x, 143x) Latency Lags Bandwidth(for last ~20 years) (latency = simple operation w/o contention BW = best-case)
1980 DRAM(asynchronous) 0.06 Mbits/chip 64,000 xtors, 35 mm2 16-bit data bus per module, 16 pins/chip 13 Mbytes/sec Latency: 225 ns (no block transfer) 2000 Double Data Rate Synchr. (clocked) DRAM 256.00 Mbits/chip (4000X) 256,000,000 xtors, 204 mm2 64-bit data bus per DIMM, 66 pins/chip (4X) 1600 Mbytes/sec (120X) Latency: 52 ns (4X) Block transfers (page mode) Memory: Archaic (Nostalgic) vs. Modern (Newfangled)
Performance Milestones Memory Module: 16bit plain DRAM, Page Mode DRAM, 32b, 64b, SDRAM, DDR SDRAM (4x,120x) Disk:3600, 5400, 7200, 10000, 15000 RPM (8x, 143x) Latency Lags Bandwidth(last ~20 years) (latency = simple operation w/o contention BW = best-case)
Ethernet 802.3 Year of Standard: 1978 10 Mbits/s link speed Latency: 3000 msec Shared media Coaxial cable "Cat 5" is 4 twisted pairs in bundle Twisted Pair: Copper, 1mm thick, twisted to avoid antenna effect LANs: Archaic (Nostalgic) vs. Modern (Newfangled) • Ethernet 802.3ae • Year of Standard: 2003 • 10,000 Mbits/s (1000X)link speed • Latency: 190 msec (15X) • Switched media • Category 5 copper wire Coaxial Cable: Plastic Covering Braided outer conductor Insulator Copper core
Performance Milestones Ethernet: 10Mb, 100Mb, 1000Mb, 10000 Mb/s (16x,1000x) Memory Module:16bit plain DRAM, Page Mode DRAM, 32b, 64b, SDRAM, DDR SDRAM (4x,120x) Disk:3600, 5400, 7200, 10000, 15000 RPM (8x, 143x) Latency Lags Bandwidth(last ~20 years) (latency = simple operation w/o contention BW = best-case)
1982 Intel 80286 12.5 MHz 2 MIPS (peak) Latency 320 ns 134,000 xtors, 47 mm2 16-bit data bus, 68 pins Microcode interpreter, separate FPU chip (no caches) 2001 Intel Pentium 4 1500 MHz (120X) 4500 MIPS (peak) (2250X) Latency 15 ns (20X) 42,000,000 xtors, 217 mm2 64-bit data bus, 423 pins 3-way superscalar,Dynamic translate to RISC, Superpipelined (22 stage),Out-of-Order execution On-chip 8KB Data caches, 96KB Instr. Trace cache, 256KB L2 cache CPUs: Archaic (Nostalgic) vs. Modern (Newfangled)
Performance Milestones Processor: ‘286, ‘386, ‘486, Pentium, Pentium Pro, Pentium 4 (21x,2250x) Ethernet: 10Mb, 100Mb, 1000Mb, 10000 Mb/s (16x,1000x) Memory Module: 16bit plain DRAM, Page Mode DRAM, 32b, 64b, SDRAM, DDR SDRAM (4x,120x) Disk : 3600, 5400, 7200, 10000, 15000 RPM (8x, 143x) CPU high, Memory low(“Memory Wall”) Latency Lags Bandwidth(last ~20 years)
Computing Devices Then… EDSAC, University of Cambridge, UK, 1949
Computing Devices Now Sensor Nets Cameras Games Set-top boxes Media Players Laptops Servers Robots Routers Smart phones Automobiles Supercomputers CS152-Spring’08