570 likes | 1.44k Views
Ongoing Computer Engineerin g Research Projects at the Lucian Blaga University of Sibiu Prof. Lucian VINTAN, PhD-Director Advanced Computer Architecture & Processing Systems Research Lab - http://acaps.ulbsibiu.ro/research.php The Research Team Prof. Lucian VINTAN, PhD – Research Chair
E N D
Ongoing Computer Engineering Research Projects at the Lucian Blaga University of Sibiu Prof. Lucian VINTAN, PhD-Director Advanced Computer Architecture & Processing Systems Research Lab - http://acaps.ulbsibiu.ro/research.php Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
The Research Team • Prof. Lucian VINTAN, PhD – Research Chair • Assoc. Prof. Adrian FLOREA, PhD • Senior Lecturer Daniel MORARIU, PhD • Senior Lecturer Ion MIRONESCU, PhD • Lecturer Arpad GELLERT, PhD • Radu CRETULESCU, PhD student • Horia CALBOREAN, PhD student • Ciprian RADU, PhD student Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
Computing hardware 14 Intel Compute nodes (2 processor HS21 blades with quad-core Intel Xeon) 2 Cell Compute nodes (2 processor QS22 blades withIBM PowerXCell 8i Processor ) Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
Our current research topics • Anticipatory Techniques in Advanced Processor Architectures • An Automatic Design Space Exploration Framework for Multicore Architecture Optimizations • Optimizing Application Mapping Algorithms for NoCs through a Unified Framework • Optimal Computer Architecture for CFD calculation • Adaptive Meta-classifiers for Text Documents Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
Anticipatory Techniques in Advanced Processor Architectures Prof. Lucian VINTAN, PhD Assoc. Prof. Adrian FLOREA, PhD Lecturer Arpad GELLERT, PhD Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
FetchBottleneck • Fetch Rateis limited by the basic-blocks’dimension (7-8 instructions in SPEC 2000); Solutions • Trace-Cache & Multiple (M-1) Branch Predictors; • Branch Prediction increases ILP by predicting branch directions and targets andspeculatively processing multiple basic-blocks in parallel; • As instruction issue width and the pipeline depth are getting higher, accurate branch prediction becomes more essential. Some Challenges • Identifying and solving some Difficult-to-Predict Branches (unbiased branches); • Helping the computer architect to better understand branches’ predictability and also if the predictor should be improved related to Difficult-to-Predict Branches. Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
Difficult to predict unbiased branches • A difficult-to-predict branch in a certain dynamic context • unbiased • „highly shuffled“. Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
Predicting Unbiased Branches • State of the art branch predictors are unable to accurately predict unbiased branches; The problem: • Finding new relevant information that could reduce their entropy instead of developing new predictors; Challenge: • Adequately representing unbiased branches in the feature space! • Accurately Predicting Unbiased Branches is still an Open Problem! Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
Random DegreeMetrics Based on: • Hidden Markov Model (HMM) – a strong method to evaluate the predictability of the sequences generated by unbiased branches; • Discrete entropy of the sequences generated by unbiased branches; • Compression rate (Gzip, Huffman) of the sequences generated by unbiased branches. Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
Issue Bottleneck (Data-flow) Conventional processing models are limited in their processing speed by the dynamic program’s critical path (Amdahl); 2 Solutions • Dynamic Instruction Reuse (DIR) is a non-speculative technique. • Value Prediction (VP) is a speculative technique. Common issue • Value locality Chalenges • Selective Instruction Reuse (MUL & DIV) • Selective Load Value Prediction (“Critical Loads”) • Exploiting Selective Instruction Reuse and Value Prediction in a Superscalar / Simultaneous Multithreaded (SMT) Architecture to anticipate Long-Latency Instructions Results Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
Exploiting Selective Instruction Reuse and Value Prediction in a Superscalar Architecture Selective Instruction Reuse (MUL & DIV) Selective Load Value Prediction (Critical Loads) Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
Selective Instruction Reuse and Value Prediction in Simultaneous Multithreaded Architectures Physical Register File ROB Fetch Unit Issue Queue Functional Units I-Cache Decode Branch Predictor Rename Table PC RB LSQ D-Cache LVPT SMT Architecture (M-Sim) enhanced with per Thread RB and LVPT Structures Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
Power Estimation Power Models Hardware Configuration Cycle-Level Performance Simulator Hardware Access Counts Performance Estimation SPEC Benchmark Exploiting Selective Instruction Reuse and Value Prediction in a Superscalar Architecture The M-SIM Simulator Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
Exploiting Selective Instruction Reuse and Value Prediction in a Superscalar Architecture Relative IPC speedup and relative energy-delay product gain with a Reuse Buffer of 1024 entries, the Trivial Operation Detector, and the Load Value Predictor Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
Conclusions and Further Work • Indexing the SLVP table with the memory address instead of the instruction address (PC); • Exploiting an N-value locality instead of 1-value locality; • Generating the thermal maps for the optimal superscalar and SMT configurations (and, if necessary, developing a run-time thermal manager); • Understanding and exploiting instruction reuse and value prediction benefits in a multicore architecture. Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
Anticipatory multicore architectures • Anticipatory multicores would significantly reduce the pressure on the interconnection network performance/energy; • Value prediction, multithreading and the cache coherence/consistence mechanisms there are subtle, not well-understood relationships; • data consistency errors consistency violation detection and recovery; • The inconsistency cause: VP might execute out of order some dependent instructions; • Dynamic Instruction Reuse in a multicore system. Reuse Buffers coherence problemscache coherence mechanisms • Details at http://webspace.ulbsibiu.ro/lucian.vintan/html/#11 Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
An Automatic Design Space Exploration Framework for Multicore Architecture Optimizations Horia CALBOREAN, PhD student Prof. Lucian VINTAN, PhD Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
Multiobjective optimization • Number of (heterogeneous) cores in the processor becomes higher – the systems become more and more complex • More configurations have to be simulated (NP-hard problem) • Time needed to simulate all configurations prohibitive • Performance evaluation has become a multiobjective evaluation Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
Solutions • Reducing simulation time • parallel & distributed simulation • sampling simulation • Reducing number of simulations • intelligent multiobjective algorithms Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
Proposed framework • We developed FADSE (framework for automatic design space exploration) • Compatible with most of the existing simulators • Portable - implemented in java • Includes many well known multiobjective algorithms • Is able to run simulators and also well known test problems Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
Existing tools • Bounded to a certain simulator (Magellan) • Lack portability - bounded to a certain operating system (M3Explorer, Magellan) • Perform design space exploration of small parts of the system (only the cache - Archexplorer) Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
FADSE – application architecture Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
Features • Parallel simulation (client server model) • Ability to introduce constrains through XML interface • Easily configurable through XML files: • change DSE algorithm, • specify input parameters and their possible values, • specify desired output metrics, etc. Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
Our target • Perform an evaluation of the existing algorithms on different simulators • Find out which one performs best • Improve the algorithms - map them on the specific problem of design space exploration Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
Conclusions • We have developed a framework which is able to perform automatic design space exploration • Extensible, portable • Many implemented multiobjective algorithms (through the use of jMetal) • Reduces time through parallel &distributed execution of simulators Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
Optimizing Application Mapping Algorithms for NoCs through a Unified Framework Ciprian RADU, PhD student Prof. Lucian VINTAN, PhD Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
Outline • Introduction • The application mapping problem for NoCs • The relation between application mapping and routing • Evaluating application mapping algorithms for Networks-on-Chip • The framework design • The ns-3 NoC simulator • Automatic Design Space Exploration for Networks-on-Chip • The framework Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
The application mapping problem for NoCs Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
Application mapping & routing Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
Evaluating application mapping algorithms for Networks-on-Chip • Existing application mapping algorithms are currently evaluated on specific NoCs • e.g.: NoCs with 2D mesh topology • Existing comparisons between the algorithms are not made on the same NoC architecture • We propose a unified framework for the evaluation and optimization of application mapping algorithms on different NoC designs Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
The framework design • 3 major components: • A module that contains the implementation of different application mapping algorithms; • A network traffic generator; • A Network-on-Chip simulator. Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
The framework design flow Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
The ns-3 NoC simulator • Based on ns-3, an event driven simulator for Internet systems • Aims for a good accuracy – speed trade-off • Flexible and scalable • Current parameters: • Packet size, packet injection rate, packet injection probability; • Buffer size; • Network size; • Switching mechanism (SAF, VCT, Wormhole); • Routing protocol (XY, YX, SLB, SO); • Network topology (2D mesh, Irvine mesh); • Traffic patterns (bit-complement, bit-reverse, matrix transpose, uniform random). Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
Automatic Design Space Exploration for Networks-on-Chip • Motivation • There is no NoC suitable for all kinds of workload • There is an exponential number of possible NoC architectures • Exhaustive DSE is no longer suitable • Automatic DSE uses an heuristic driven exploration of the design space • Disadvantage: near-optimal solutions • Advantage: speed Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
Design Space Exploration module Network-on-Chip simulator Configure the simulator Simulation results The framework • Components: • DSE module • NoC simulator • The DSE module determines the parameters of the NoC architecture • Uses algorithms from Artificial Intelligence • The NoC simulator (ns-3 NoC) is automatically configured to simulate the network architecture determined by the DSE module • The simulation results (network performance) help the DSE module at generating a better NoC architecture Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
Optimal computer architecture for CFD calculation Senior Lecturer Ion Dan MIRONESCU, PhD Prof. Lucian VINTAN, PhD Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
Practical aplication • Modelling and simulation of multiscale, multicomponent, multiphase flow in complex geometry (ongoing projects) for : • optimisation of sugar crystalisation • prediction of the flow properties of polymer based dispers systems (starch and starch fractions, microbial polysacharides) HPC/CFD Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
Goals • Speed-up of this application on the given architecture • Finding the optimal manycore architecture for CFD application (e.g. NoC) Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
Method - Lattice Boltzmann (Chirila,2010) Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
Method advantages • easy discretization of complex geometry • easy incorporation of “multi” models • easy paralelisation • easy cupling to other scale models (Molecular Dynamics) Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
Computational model COMPUTE COMPUTE COMPUTE Ghost data EXCHANGE COMPUTE COMPUTE COMPUTE Local Values COMPUTE COMPUTE COMPUTE Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
General-purpose manycore platform What can be used and what must be accounted for: • ILP (super scalar, out of order, branch prediction) • Task and Thread LP (multicore/multiprocessor) • Mixed programming model (shared memory on blade, message passing between blades) • Cache system Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
Special purpose many core platform What can be used and what must be accounted for: • SIMD • Task and Thread LP (hardware multithreading, multicore/multiprocessor) • Message passing • Local store model –full user control Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
Charm++ • provides a high-level abstraction of a parallel program • cooperating message-driven objects called chares • support for load balancing, fault tolerance, automatic checkpointing • support for all architectures trough a specific low level tier • NAMD MD implementd in charm++ Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
Charm++ LB implementation Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
Charm++ LB implementation Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
DSE Search optimal values for • sites/bloc • blocs (chares)/core, /thread, /blade • communication patterns Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
Adaptive Meta-classifiers for Text Documents Prof. Lucian VINTAN, PhD Daniel MORARIU, PhD Radu CRETULESCU, PhD student Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
Introduction • We investigated a way to create a new adaptive meta-classifier for classifying text documents in order to increase the classification accuracy. • During the first processing phase (pre-classification) the meta-classifier uses a non-adaptive selector. • In the second phase (classification) we use a feed-forward neural network based on the back-propagation learning method. Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php
The architecture of the adaptive meta-classifier M-BP Advanced Computer Architecture & Processing Systems Research Lab http://acaps.ulbsibiu.ro/research.php