60 likes | 193 Views
Angstrom Without The Angst. Stanislav Rost | Hari Balakrishnan | Sam Madden Kilocore Meeting Group February 21/2008. Multicores are Networks. Threads write to/read from shared memory space Most CPUs directly operate on data in the fastest cache
E N D
Angstrom Without The Angst StanislavRost | HariBalakrishnan | Sam Madden Kilocore Meeting Group February 21/2008 Stanislav Rost | 2/28/2008 | 1K-Core
Multicores are Networks • Threads write to/read from shared memory space • Most CPUs directly operate on data in the fastest cache • Accessing a non-resident memory block is “communication” • Cache coherence mechanisms copy/move the data B B Tilera TILE64 (scaled down) AMD “Barcelona” Stanislav Rost | 2/28/2008 | 1K-Core
High-Level Problem • Long-”distance” communication is slow • Bandwidth ↓ latency ↑ with distance • Propagation delay (wire latency) • Overhead and delays of communication protocols • Contention for shared caches, busses • Dipping deeper into slower memory layers • How do we make software fast without increasing complexity? • Yes! Provide high-level abstractions to build programs with good locality Stanislav Rost | 2/28/2008 | 1K-Core
Our Proposal • Proximity-based processing data structures • Multi-producer/consumer set: “obtain closest block” • Microservice: “send this block to closest thread performing function X” • Traffic management by thread migration • Measure inter-thread communication • Migrate threads closer to • Other threads they communicate with • Off-chip resources they use most • Avoid overload on the individual cores, interconnects • Example data structures: queues, key/value store Stanislav Rost | 2/28/2008 | 1K-Core
Contributions • Expose hardware specifics • Core geography, cache sizes • Make location explicit • Data and thread localization • Hide complexity via abstractions • Data structures help ensure locality, capture thread-to-thread communication • Optimize thread locality • Manage processor affinity to migrate communicating threads closer together,to resources Compass CrossCore Stanislav Rost | 2/28/2008 | 1K-Core
Angstrom Wishlist Less Crazy More Crazy • Localization support • core_id curcore() • loc_id where(void *word) • time_t distance(core_id, loc_id) • Expose core heterogeneity • Connected to memory controller? I/O device?Has vector arithmetic? I want to know! • Cache coherence control • “Pin” memory blocks, turn replacement on/off • “Hand off” ownership of blocks to other cores • Software resolution to coherence conflicts, per block • On-chip network • Cores measure own, cross traffic • Congestion • Breakdown by flow • Software routing • Control the routing tables in a “user network” • Multicast support • Broadcast with hopcount • Bounded broadcast Stanislav Rost | 2/28/2008 | 1K-Core