50 likes | 176 Views
MILESTONE 2 Project : Request prioritization in on-chip networks prioritizing between prefetches and demands from multiple applications -Akshay Katti -Keshav Sai Nanduri -Aakanksha Pudipeddi Progress Made : Exploration of BLESS simulator Implementation of Linear Stream Prefetcher
E N D
MILESTONE 2 • Project: Request prioritization in on-chip networks prioritizing between prefetches and demands from multiple applications • -Akshay Katti • -Keshav Sai Nanduri • -Aakanksha Pudipeddi • Progress Made: • Exploration of BLESS simulator • Implementation of Linear Stream Prefetcher • Issuing Prefetch demand from processor (in simulator) • Base level design and coding of Prioritization scheme Things to be done (in brief): • Detailed analysis of usage of memory bandwidth with regard to Prefetch • Including Application Aware in the prioritization mechanism • Implementation of the designed algorithm in the simulator ( considering the present implemented code as base)
Exploration of the BLESS simulator • The relevant models of the simulator ( Memory, Proc, Controller, Net and Common) are analyzed in detail • The analysis provided a thorough understanding of the work flow of the memory request mechanism implemented in the simulator • Furthermore, this helped to identify the spots (in different files) required to be handled or modifed for 'wiring in' a prefetch mechanism (namely – request.cs, cmpcache.cs, mem.cs, nodes.cs, mainmemory.cs and so on...) • Parameters in the configuration file (like network size, memory schedule priority mechanism and so on) are varied to check the behaviour of the system in various conditions Implementation of Linear Stream Prefetcher • A basic stream prefetcher that issues linear memory address prefetch requests is implemented. Since the primary motive was to only generate a prefetch request, a separate Buffer (Prefetch Buffer) is used; instead of mapping it to the L2 cache • On arrival of a prefetch request(of size N) for a particular bank, all the N bytes are stored in the prefetch buffer
Implementing Prefetch request (in simulator) • A prefetch request is generated when a particular address encounters a L2 miss. Once the L2 miss is encountered a series of 'N' words starting from the present request address are requested from the main memory • The check for a L1, L2 miss is performed in the Cmpcache.cs file. The prefetch request is hardcoded in the the particular location ( for intital testing purpose) • The 'N' words requested are stored in a separate Prefetch Buffer (Stream Buffer). As a first level exploration, the memory words are stored in a separate buffer, but further analysis and detailed design is to be done to map these words to the existing L2 caches ( keeping in mind the Memory Bandwidth utilized if both Prefetch and Demand are to use the same bank, and its effect on the memory bandwidth) • The testing of prefetch, for now, is done only for a particular, fixed, memory bank, considering a bottom to top aproach
Base level design and coding of Prioritization mechanism • A basic algorithm prioritizing between prefetches and demands generated is implemented. The algorithm is based on the design proposed in the 'project proposal'. • But, the algorithm, for now, does not consider different parameters regarding the application (i.e it is not application aware, as yet) . The algorithm only counts the number of prefetch request and demand request made by aparticular 'node' and based on the proposed mechanism prioritizes amongst the two. Future Work • As explained, a separate Prefetch Buffer is used to store the prefetched data. But, the L2 cache is to be used for the same, and inclusion/eviction of other blocks present in cache is to be properly performed, taking into consideration memory bandwidth, accuracy of prefetch and so on... • Inclusion of 'Application aware' mechanism in the implemented design. For this, the application aware parameters are to be checked in the existing simulator, and then the respective parameters are to be included.
Future Work • The prefetch is hardcoded for only a single bank – for feasibilty analysis. This is to be scaled for all avaialable banks, and accurate prefetches are to be generated for applications in different nodes, requesting memory in different banks • Performance analysis based on the priority mechanism designed