420 likes | 665 Views
HPC over Cloud . East-West Neo Medicinal u- Lifecare Research Center. Workshop January 2014. Presented By: Muhammad Bilal Amin Cloud Computing Team, Ubiquitous Computing Lab. Kyung Hee University, Global Campus, Korea. Agenda. High Performance Computing over Cloud
E N D
HPC over Cloud East-West Neo Medicinal u-Lifecare Research Center Workshop January 2014 Presented By: Muhammad Bilal Amin Cloud Computing Team, Ubiquitous Computing Lab Kyung Hee University, Global Campus, Korea
Agenda • High Performance Computing over Cloud • Motivation for HPC over Cloud (HPCoC) • Related work • HPCoC Architecture • HPCoC Contribution • SPHeRe • Motivation for SPHeRe • Implementation Details • Evaluation • SPHeRe’s Contributions & Achievements • Conclusion
UCLab Cloud Infrastructure Physical Machine Physical Machine Physical Machine Physical Machine Physical Machine Physical Machine VM 2 VM 4 VM1 VM1 VM 2 VM 3 Hadoop Hadoop Hadoop Hadoop Windows 7 Windows 7 Guest OS Guest OS Java Runtime Java Runtime Java Runtime Java Runtime 1 Tb 1 Tb Linux Linux Linux Linux 4 Gb 4 Gb 250 Gb 250 Gb 250 Gb 250 Gb 2 1 2 1 4 Gb 4 Gb 4 Gb 4 Gb Virtual Machines Virtual Machines 1 2 3 4 5 6 7 8 Hypervisor VM Ware ESXI Native OS Hypervisor Windows 7 x64 Xen Hard drive Hard drive 2 Tb 2 Tb 4 Gb 8 Gb RAM 4 Gb 16 Gb RAM 4 Gb 4 Gb 4 Gb 4 Gb 8 Core i7 CPU 1 2 3 4 1 2 3 4 5 6 7 8 4 virtual nodes 16 virtual nodes 20 Virtual Nodes
HPCoC Contributions & Uniqueness • A unified Java-based High performance platform for Grande Applications (Data and Computation Intensive). • Cloud-enable Java-based HPC messaging and distribution middle-wares e.g. MPJ-Core. MPI-Like messaging with fault tolerance incorporated from Hadoop. • Implement parallel computation intensive and data intensive processing on unshared data in MapReduce through In-map/In-reduce parallelism. • Green HPC: Virtualized resources are a big step for the HPC to step into green computing and energy efficient. • Releasing the solution under an open source licensing for the academic community.
A Performance Initiative towards Large-scale Bio-medical Ontology Matching by Implementing Thread Level Parallelism (TTP) over Multicore Platforms
MotivationforSPHeRe • Effective ontology matching is a computationally intensive (processing power and memory) operation requiring matching algorithms with quadratic complexity to be executed over candidate ontologies • Gross et al. “On Matching Large Life Science Ontologies in Parallel”, Lecture Notes in Computer Science (LNCS), 2010 • Delay in matching results, makes ontology matching ill-equipped for semi-real-time , semantic web-based systems • Stoilos et al. “A string metric for ontology alignment” ISWC’05, Heidelberg, Germany 2005 • The core techniques for achieving better performance are either related to the optimization of matching algorithms or the fragmentation of ontologies for matching algorithms . Utilization of parallel and distributed platforms has largely been missing • P. Shvaiko and J Euzenat “Ontology matching: State of the art and future challenges” IEEE Transaction on Knowledge and Data Engineering, January 2013 • Commodity hardware capable of parallelism i.e., multi-core processors over a distributed platform (Cloud) • Amin et. Al “High Performance Java Sockets (HPJS) for scientific Health Clouds” 13th IEEE HealthCom, Beijing 2012 • Cloud is affordable (utility-based pricing), cloud is available (ubiquitous) • Armbrust et al. “ A view of Cloud Computing” ACM Communication April 2010 “Research Opportunity: Ontology Matching over parallel and distributed commodity hardware”
Implementation Challenges • “End to end Parallelism” Resolution: Methodology to exploit for parallelism from loading till delivery
Implementation Challenges 2. “Memory Strain” • Amount of related information not required at the moment of time, flooding Memory • Parsing and Loading for Inference vs. Parsing and Loading for Matching • Java Heap Blow-up (2 GB Heap is not Enough) • Unable to iterate over properties of FMA and NCI • Cloud Instances have limited memory per instance • 2. Resolution: • Load what we need (Smaller memory foot print during execution)
Implementation Challenges 3. “Accuracy Preservation” 3. Resolution: Decoupling of Matching Algorithms from Distribution
Implementation Challenges 4. “Thread Safety” • Shared ontology data among multiple threads (synchronize access leads to sequential execution) • The available owl frameworks are not thread safe • Result guarantee • 4. Resolution: • Thread Safe ontology model, shared among multithreaded execution
Implementation Challenges 5. “Scalability with optimal resource utilization” • Exploit the available computational resources for concurrency with equality (Effective load balancing) • Implementation of right parallelism technique (partitioning) • Better reduction rate • 5. Resolution: • Effective distribution of matching requests over available computational resources
Matcher Distribution • The matching request received by the system is subdivided from macro (matching request) to micro (matching task) level
Mappings Aggregation • Responsible for accumulating the matched results, creating a corresponding Bridge Ontology (Mapping), and its delivery
SPHeRe Performance Evaluation Large Scale Biomedical Ontology Matching tool over High Performance Computing
Ontology Loading Time 3 x Faster Loading time
Total Memory Footprint 8 x Memory efficient
Scalability (Reduction Score) Outperforms by 40%
Performance Evaluation ~4 to 8 x Performance efficient
Uniqueness / Contributions Exploitation of Parallel Commodity hardware for matching • Implementing data parallelism based distribution over subsets of candidate ontologies of ontology subsets over multicore hardware of multicore platform and provides a collection of mappings among the ontologies as a bridge ontology file End-to-End Performance Initiative (from loading till delivery) • Creating subsets of ontologies depending on the needs of matching algorithms and caches them in serialized formats, providing a single-step ontology loading for matching algorithms in parallel Smaller Memory footprint • Each subset is lightweight due to matcher-based and redundancy-free creation, providing smaller memory footprints and contributing in overall system performance Better Scalability • Utilization of computational resources most efficiently with the help of its matching task distribution
Achievements • OAEI 2013. Evaluation at ISWC 2013 (A-Rated Conference) • SPHeRe was presented and evaluated over large-scale biomedical track • It was remarked as the first Ontology Matching system that utilizes distributed Cloud resource • Our first release of this year ranked among the top-15 systems of 2013 (globally) • Microsoft Research Asia Award 2013-2014 • Research Funding Awarded by Microsoft Research Asia for SPHeRe over Microsoft Azure platform. • Microsoft Azure4Research Award 2014-2015 • SPHeRe for Large scale Biomedical Ontology Matching over Microsoft Azure Platform
Publications • Conferences • Wajahat Ali Khan, Muhammad Bilal Amin, AsadMasoodKhattak, MaqboolHussain, and Sungyoung Lee, “System for Parallel Heterogeneity Resolution (SPHeRe) results for OAEI 2013”12th Int. Semantic Web Conference (ISWC), 21-25 October 2013, Sydney, Australia. • Ammar Ahmad Awan, Muhammad Bilal Amin, ShujaatHussain, AamirShafi and Sungyoung Lee, “An MPI-IO Compliant Java based Parallel I/O library”, 13th IEEE CCGrid. Delft , Netherlands, May 2013 • Ammar Ahmad Awan, Muhammad ShoaibAyub, AamirShafi and Sungyoung Lee, “Towards Efficient Support for Parallel I/O in Java HPC”, 13th PDCAT, Beijing 2012. • Muhammad Bilal Amin, Wajahat Ali Khan, ShujaatHussain and Sungyoung Lee, “High Performance Java Sockets (HPJS) for healthcare cloud systems”, 13thHealthCom 2012, Beijing, Oct 2012. • Muhammad Bilal Amin, Wajahat Ali Khan, Ammar Ahmad Awan and Sungyoung Lee, “Intercloud Message Exchange Middleware”, 7th ICUIMC 2012, Kuala Lampur, Malaysia, Feb 2012.
Publications • Journals • Muhammad Bilal Amin, Wajahat Ali Khan and Sungyoung Lee, “SPHeRe: A performance initiative towards ontology matching by implementing parallelism over cloud platforms”, Jr. of Supercomputing (SCI, IF 0.9), 2013 • Wajahat Ali Khan, MaqboolHussain, Muhammad Afzal, Muhammad Bilal Amin, Muhammad AamirSaleem, and Sungyoung Lee, “Personalized-Detailed Clinical Model for Data Interoperability among Clinical Standards”, Telemedicine and EHealth (SCI, IF:1.416), 2013 • Muhammad Bilal Amin, Wajahat Ali Khan and Sungyoung Lee, “Enabling Data Parallelism for Large-scale Bio-medical Ontology Matching over Multicore Platforms”, Jr. of Applied Intelligence (SCI, IF 1.8) (under review), 2014
Conclusion • HPC over cloud is a very cost effective solution with all the ability that can be provided by expensive clusters or grids • To fully exploit its utilization, efforts are required to implement platforms and applications for computation and data intensive problems. • Applications like SPHeRe can be built to provide resolution of compute and data intensive problems over multicore platforms for performance needs. • Commodity hardware consumes lesser man hours for maintenance and consume far less of energy which makes it an excellent candidate for “Green Computing”.
References • N. Carriero, M. V. Osier, K.-H. Cheung, P. L. Miller, M. Gerstein, H. Zhao, B. Wu, S. Rifkin, J. T. Chang, H. Zhang, K. White, K. Williams, M. H. Schultz, Case report: A high productivity/low main- tenance approach to high-performance computation for biomedicine: Four case studies., JAMIA 12 (1) (2005) 90–98. • G. Bueno, R. Gonzlez, O. Dniz, M. Garca-Rojo, J. Gonzlez-Garca, M. Fernndez-Carrobles, N. Vllez, J. Salido, A parallel solution for high resolution histological image analysis, Computer Methods and Programs in Biomedicine 108 (1) (2012) 388 – 401. doi:http://dx.doi.org/10.1016/j.cmpb.2012. 03.007. • F. Perez, J. Huguet, R. Aguilar, L. Lara, I. Larrabide, M. Villa-Uriol, J. Lpez, J. Macho, A. Rigo, J. Rossell, S. Vera, E. Vivas, J. Fernndez, A. Arbona, A. Frangi, J. H. Jover, M. G. Ballester, Radstation3g: A platform for cardiovascular image analysis integrating pacs, 3d+t visualization and grid computing, Computer Methods and Programs in Biomedicine 110 (3) (2013) 399 – 410. doi:http://dx.doi.org/10.1016/j.cmpb.2012.12.002. • A. Eklund, M. Andersson, H. Knutsson, fmri analysis on the gpupossibilities and challenges, Computer Methods and Programs in Biomedicine 105 (2) (2012) 145 – 161. doi:http://dx.doi.org/10.1016/ j.cmpb.2011.07.007. • E. I. Konstantinidis, C. A. Frantzidis, C. Pappas, P. D. Bamidis, Real time emotion aware applications: A case study employing emotion evocative pictures and neuro-physiological sensing enhanced by graphic processor units, Computer Methods and Programs in Biomedicine 107 (1) (2012) 16 – 27, advances in Biomedical Engineering and Computing: the conference case. doi:http://dx.doi.org/10.1016/j. cmpb.2012.03.008. • H. L ́opez-Fern ́andez, M. Reboiro-Jato, D. Glez-Pea, F. Aparicio, D. Gachet, M. Buenaga, F. Fdez- Riverola, Bioannote: A software platform for annotating biomedical documents with application in medical learning environments, Computer Methods and Programs in Biomedicine 111 (1) (2013) 139 – 147. doi:http://dx.doi.org/10.1016/j.cmpb.2013.03.007. • J. Cimino, X. Zhu, of on, IMIA Yearbook of Medical 1 (1) (2006) 124–135. • D. Isern, D. Snchez, A. Moreno, Ontology-driven execution of clinical guidelines, Computer Methods and Programs in Biomedicine 107 (2) (2012) 122 – 139. doi:http://dx.doi.org/10.1016/j.cmpb. 2011.06.006. • P. De Potter, H. Cools, K. Depraetere, G. Mels, P. Debevere, J. De Roo, C. Huszka, D. Colaert, E. Mannens, R. Van De Walle, Semantic patient information aggregation and medicinal decision support, Comput. Methods Prog. Biomed. 108 (2) (2012) 724–735. doi:10.1016/j.cmpb.2012.04.002. URL http://dx.doi.org/10.1016/j.cmpb.2012.04.002