70 likes | 146 Views
CDT PROJECTS 2013-14 John Keane, Software Systems Group jak@cs.man.ac.uk 1. Data Analytics / Big Data 2. Parallel & Distributed Systems 3. Decision Support Systems HAPPY TO DISCUSS. Big Data Analytics (IBM funded). With Nenadic CHALLENGE Investigate:
E N D
CDT PROJECTS 2013-14John Keane, Software Systems Groupjak@cs.man.ac.uk1. Data Analytics / Big Data2. Parallel & Distributed Systems3. Decision Support SystemsHAPPY TO DISCUSS
Big Data Analytics (IBM funded) With Nenadic CHALLENGE • Investigate: • Applications: characteristics and predictability • Data Analytic / Machine Learning Algorithms – relatively simple so far • Software: Map-Reduce, Hadoop • Hardware: various platforms
Bio-medical data analytics With Nenadic, Zeng, Stivaros (Consultant, RMCH) • Adverse drug event detection (EU funded) • Bayesian/Fuzzy association rules algorithms CHALLENGE • Compare/contract accuracy of prediction • Clinical Outcome Mining (Christie Hospital) • Data/text-based clinical records – better diagnose and predict CHALLENGE • Illness staging; multi-modal data; changes over time; • Decision Support for Radiology (NIHR-funded) • Decision aid to assist better description of scans CHALLENGE • Usability; Integration with existing tools; Link to literature
Itemset Mining Algorithms {baby nappies}->{beer} • Colossal itemsets: - Very high dimensional datasets - Run-time increases exponentially as average row length increases; • Minimal unique itemsets (MUI) SUDA: Special Unique Detection - “risky” records, those likely to be linked– 16 years old + widow - Records of most concern have many, small MUIs - SUDA s/w used by ONS, UK; licensed by Singaporean govt; - Algorithm used by UN/World Bank International Household Survey CHALLENGES: • Data structure to represent itemsets during search process • Search space pruning • Algorithm: bottom-up; top-down; hybrid; • Parallelism
Eco-service composition (EU funded) with Mehandjiev, MBS • Aims to determine conditions for achieving eco-friendly, resilient and optimal service compositions on a distributed cloud infrastructure • Two service optimisation approaches deployed: 1. Global: analyses end-to-end interaction between services 2. Local: computes local optimization by creating dynamic service chains between service provider/consumer CHALLENGE • Energy-efficient load balance and scheduling
HPC + Finance (EU funded, UK Government) • High Frequency Trading • Flash crashes: dramatic sudden drop in share price describe/predict • Working paper: High Frequency Trading and Mini Flash Crashes http://arxiv.org/abs/1211.6667 • HPCFinance • New models of risk analysis (diverse data integration) • Role of HPC in Finance and comparison of technologies • Trade-off: accuracy, speed, cost comparison: Cloud; GPGPUs, FPGA (Maxeler box) CHALLENGES: Data engineering; Analytics; Algorithms; High performance;
Preference Elicitation from Pairwise Comparison with Mikhailov, MBS; Siraj, COMSATS IIT, Pakistan • Decision making is complex in presence of uncertainty and insufficient knowledge. • Aim to estimate preference using pairwise comparison: PC used when unable to assign scores to available options; judgements provided may be inconsistent • Work has proposed consistency measures and prioritization measures where revision not allowed. • PriEsT tool now has sensitivity analysis -> best solution. • CHALLENGES • Evolutionary approach to multi-criteria DSS • Work on preference elicitation model and tool • Group decision making • Bridge PriEsT and R (popular data mining tool) via XMCDA