1 / 26

Computer Architecture Research Overview Focus on: Transactional Memory Rajeev Balasubramonian

Computer Architecture Research Overview Focus on: Transactional Memory Rajeev Balasubramonian School of Computing, University of Utah http://www.cs.utah.edu/~rajeev. What is Computer Architecture?. To a large extent, computer architecture determines:

Download Presentation

Computer Architecture Research Overview Focus on: Transactional Memory Rajeev Balasubramonian

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer Architecture Research Overview Focus on: Transactional Memory Rajeev Balasubramonian School of Computing, University of Utah http://www.cs.utah.edu/~rajeev

  2. What is Computer Architecture? • To a large extent, computer architecture determines: • the number of instructions used to execute a program • the time each instruction takes to execute • the idle cycles when no work gets done • the number of instructions that can execute in parallel

  3. The Best Chip in 2004 2MB L2 Cache P4 –like Core 2010 2004

  4. The Advent of Multi-Core Chips Core Cache bank • In the past, performance magically increased by 50% every year • In the future, this improvement will be only ~20% every year • … unless … the application is multi-threaded!

  5. Upcoming Architecture Challenges • Improving single core performance • Functionalities in multi-core chips • Simplifying the programmer’s task • Efficient interconnects and on-chip communication • Power and temperature-efficient designs • Designs tolerant of errors For publications, see http://www.cs.utah.edu/~rajeev/research.html

  6. Multi-Threaded Applications • Parallel or multi-threaded applications are difficult to • write: lots of co-ordination and data exchange between • threads (referred to as synchronization) • Example: Banking Database Alice & Bob’s joint account: $1000 ATM 1 Alice: Deposit $100 ATM 2 Bob: Deposit $100

  7. Multi-Threaded Applications Banking Database Alice & Bob’s joint account: $1000 $1100 $1000 $1100 ATM 1 Alice: Deposit $100 ATM 2 Bob: Deposit $100 $1000 $1100 $1000 $1100 Rd balance -- $1000 Rd balance -- $1000 Update balance -- $1100 Update balance -- $1100 Write balance -- $1100 Write balance -- $1100

  8. Synchronization with Locks Bank: lock(L1); read balance; calculate interest; update balance; unlock(L1); Each snippet executes atomically, as if it is the only process in the system ATM-withdraw: lock(L1); read balance; decrement; update balance; unlock(L1); ATM-deposit: lock(L1); read balance; increment; update balance; unlock(L1);

  9. Problems with Locks • Deadlocks! lock(L1); lock(L2); … unlock(L2); unlock(L1); lock(L2); lock(L1); … unlock(L1); unlock(L2);

  10. Problems with Locks • Performance inefficiencies! lock(L1); if (condt1) traverse linked list till you find the entry if (condt2) sell the ticket unlock(L1);

  11. Transactions • New paradigm to simplify programming • instead of lock-unlock, use transaction begin-end • Can yield better performance; Eliminates deadlocks • Programmer can freely encapsulate code sections within • transactions and not worry about the impact on • performance and correctness • Programmer specifies the code sections they’d like to see • execute atomically – the hardware takes care of the rest • (provides illusion of atomicity)

  12. Transactions • Transactional semantics: • when a transaction executes, it is as if the rest of the system is suspended and the transaction is in isolation • the reads and writes of a transaction happen as if they are all a single atomic operation • if the above conditions are not met, the transaction fails to commit (abort) and tries again transaction begin read shared variables arithmetic write shared variables transaction end

  13. Applications • A transaction executes speculatively in the hope that there • will be no conflicts • Can replace a lock-unlock pair with a transaction begin-end • the lock is blocking, the transaction is not • programmers can conservatively introduce transactions without worsening performance lock (lock1) transaction begin read A read A operations operations write A write A unlock (lock1) transaction end

  14. Example 1 lock (lock1) counter = counter + 1; unlock (lock1) transaction begin counter = counter + 1; transaction end No apparent advantage to using transactions (apart from fault resiliency)

  15. Example 2 Producer-consumer relationships – producers place tasks at the tail of a work-queue and consumers pull tasks out of the head Enqueue Dequeue transaction begin transaction begin if (tail == NULL) if (head->next == NULL) update head and tail update head and tail else else update tail update head transaction end transaction end With locks, neither thread can proceed in parallel since head/tail may be updated – with transactions, enqueue and dequeue can proceed in parallel – transactions will be aborted only if the queue is nearly empty

  16. Detecting Conflicts – Basic Implementation • When a transaction does a write, do not update memory; • save the new value in cache and keep track of all modified • lines (if the transaction is aborted, invalidate these lines) • Also keep track of all the cache lines read by the transaction • When another transaction commits, compare its write set • with your own read set – a match causes an abort • At transaction end, express intent to commit, broadcast • write-set

  17. Key Problem • At the end of the transaction, the transaction’s writes are • broadcast – the commit does not happen until everyone • that needs to see the writes has seen them • Broadcasts are not scalable! In a multi-core with 64 • processors, 63 other transactions may have to wait while • one transaction is busy broadcasting its writes • Need efficient algorithms to handle a commit and need • clever design of on-chip networks to improve speed/power

  18. Algorithm 1 – Sequential • Distribute memory into N nodes – each transaction keeps track of • the nodes that are read and written PN – TN P1 – T1 P2 – T2 M1 M2 MN • If two transactions touch different nodes, they can commit in parallel • If two transactions happen to touch the same node, they must be • aware of each other in case one has to abort Algorithm designed by Seth Pugsley, Junior in the CS program See tech report at http://www.cs.utah.edu/~rajeev/pubs/tr-07-016.pdf

  19. Algorithm 1 – Sequential • Each transaction attempts to occupy the nodes in its commit set in • ascending order – a node can be occupied by only one transaction • Must wait if another transaction has occupied the node; once all • nodes are occupied, can proceed with commit PN – TN P1 – T1 P2 – T2 M1 M2 MN Example 2: T1: nodes 1, 4, 7 T2: nodes 3, 5, 8 Example 1: T1: nodes 1, 4, 7 T2: nodes 3, 4, 8

  20. Algorithm 1 – Sequential • Cannot have hardware deadlocks: since nodes are • occupied in increasing order, a transaction is always • waiting for a transaction that is further ahead – cannot • have a cycle of dependences • If transactions usually do not pose conflicts for nodes, • multiple transactions can commit in parallel • Disadvantages: must occupy nodes sequentially, conflicts • lead to long delays

  21. Algorithm 2 – Speculative • Attempt to occupy every node in the commit set in • parallel – if any node is already occupied, revert back • to the sequential algorithm (else, can lead to deadlocks) • Should typically perform no worse than the sequential • algorithm

  22. Algorithm 3 – Momentum • Attempt to occupy nodes in parallel – every request has • a momentum value to indicate how many nodes have • already been occupied by the transaction • If a transaction finds that a node is already occupied, it • can attempt to steal occupancy if it has a higher momentum • The system is deadlock- and livelock-free (the transaction • with the highest momentum at any time has a path to • completion)

  23. Interconnects as a Bottleneck • In the past, on-chip data transmission on wires cost almost nothing • Interconnect speed and power has been improving, but not at the • same rate as transistor speeds • Hence, relative to computation, communication is much more expensive • In the near future, it will take 100 cycles to travel across the chip • 50% of chip power can be attributed to interconnects

  24. On-Going Explorations • For the various on-chip communications just • described, what is the optimal on-chip network? • What topology works best? What router microarchitecture • is most efficient in terms of performance and power? • What wires work best? Depends on criticality of specific • data transfer…

  25. To Learn More… • CS/EE 3810: Computer Organization • CS/EE 6810: Computer Architecture • CS/EE 7810: Advanced Computer Architecture • CS/EE 7820: Parallel Computer Architecture • CS 7937 / 7940: Architecture Reading Seminar

  26. Title • Bullet

More Related