370 likes | 471 Views
A.R.M.S. Active Resource Management Services. Presentation One. Outline Introductions Societal Issue E xamined. Michael Rajs. Outline. Group Members and Roles: s lide 4 Introduce Mentor: slide 5 Societal I ssue: slide 6 History: slides 7-11 Case S tudy: slides 12-16
E N D
A.R.M.S. Active Resource Management Services Presentation One
OutlineIntroductionsSocietal Issue Examined Michael Rajs
Outline • Group Members and Roles: slide 4 • Introduce Mentor: slide 5 • Societal Issue: slide 6 • History: slides 7-11 • Case Study: slides 12-16 • Problem Statement: slide 18 • Computer Components Identified: slides 19 -21 • Major Functional Component Diagram: slide 22 • Current Process Flow: slide 23 • Solution Statement: slide 25 • Objectives: slide 26 • Improved Process Flow: slide 27 • Competition Identified: slides 28-30 • Benefits of Solution: slide 32 • Problems with Solution: slide 33 • Conclusion: slide 34 • References: slides 35-36
Group Members and Roles • Michael Rajs (Group Manager) • Adam Willis (Research Specialist) • Sybil Acotanza (Visualization Engineer) • Scott Pardue (Team Leader) • Jordan Heinrichs (Marketing Analyst) • David Crook (Documentation Specialist)
Yaohang Li • Is an Associate Professor in the Department of Computer Science at Old Dominion University. • His research interests are in Computational Biology, Markov Chain Monte Carlo (MCMC) methods and Parallel Distributed Grid Computing.
What is the societal issue being faced? How do researchers handle the massive amounts of data they are collecting?
Historical Background Adam Willis
Collection of Data • 1890 Census Recorded With an Electric Machine 1 • 1935 Social Security Act 2 • 1974 Privacy Act 3 • 1989 World Wide Web 4 • 1997 Big Data 5 • 2011 IBM’s Watson 6 • Now “Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone.”7
Examples of Big Data • Large Hadron Collider 8 • 150 million sensors report 40 million times per second • Facebook 9 • 2.5 billion – content items shared • 2.7 billion – “Likes” • 300 million – photos uploaded • Walmart 8 • 1 million customer transactions • 2.5 petabytes of data
Big Data Analysis Hardware • Cluster Computing 10 • A cluster consists of many nodes (computers). • Big data can be generated and analyzed quicker by spreading the workload amongst the nodes.
Managing the Cluster • Distributed Resource Management Systems (D-RMS) • Job management subsystem • Physical resource management subsystem • Scheduling and queuing subsystem
Case Study Sybil Acotanza
Dinosolve Case Study • Bioinformatics • Disulfide bond prediction program (Cronk, 2012)
Dinosolve Users • Who will use it? • Drug and antibody design • Bio-energy development • Genetic mapping11 • Why will they use it? • 2% accuracy improvement12
Dinosolve Web Site (Li & Yaseen, http://hpcr.cs.odu.edu/dinosolve/)
Dinosolve Possible Problems • Hard resources for computation • CPU cycles • Memory • Disk space • Network bandwidth • Server crashes
Problem statementComponents of Hardware and SoftwareCurrent Process Flow Scott Pardue
What is the problem? Processing time on big data sets is computationally expensive and as the volume of queries grows the system will progressively drop in performance until the system fails.
What are the components of our current system? The current system uses the following software and hardware.
Software • Unix operating system installed on the dinosolve cluster • Dinosolve algorithm • Sun Grid Engine which will be our Distributed Resource Management System (D-RMS) installed on the cluster. • MySQL (database software) • Web based user interface (website)
Hardware • MySQL database server • A computer cluster to run the dinosolve algorithm • Web server for our web based user interface
Solution StatementObjectivesImproved Process FlowCompetition Identified Jordan Heinrichs
How will we correct the problem? We aim to configure a distributed resource management system (D-RMS), in this case Sun Grid Engine (SGE), to handle resource allocation on the dinosolve cluster.
Objectives • Interpret and visualize current usage statistics • Configure, utilize, and optimize the SGE • Aesthetically pleasing and professional user interface
Competing Distributed Resource Management Systems • Sun Grid Engine (SGE) • Portable Batch System (PBS) • Load Sharing Facility (LSF)
Competing Resource Management Systems Reference 31
Competing Protein Prediction Servers Reference 19,20 and 21
Benefits of solutionProblems with solutionConclusion David Crook
What benefits will come from attaining our goals? • Efficient utilization of available resources • Increased throughput of the cluster • An intuitive and professional user interface • Rise in popularity due to excellent accuracy, efficiency, and professional design
Problems with solution • Improper synchronization of cluster resources can lead to a deadlock in the system • Race conditions between the HPCR cluster and the MySQL database
Conclusion With the updated user interface and correctly configured Sun Grid Engine we hope to establish a reputable Disulfide Bonding Prediction Server.
References for history • http://www.columbia.edu/cu/computinghistory/hh/index.html • http://query.nytimes.com/gst/abstract.html?res=F50C11FE385D13728DDDAE0A94DA415B868FF1D3 • http://www.census.gov/history/pdf/kraus-natdatacenter.pdf • http://www.bbc.co.uk/history/historic_figures/berners_lee_tim.shtml • http://dl.acm.org/citation.cfm?id=266989.267068&coll=DL&dl=GUIDE • http://www.nytimes.com/2012/08/12/business/how-big-data-became-so-big-unboxed.html?_r=1 • http://www-01.ibm.com/software/data/bigdata/ • http://en.wikipedia.org/wiki/Big_data • http://techcrunch.com/2012/08/22/how-big-is-facebooks-data-2-5-billion-pieces-of-content-and-500-terabytes-ingested-every-day/ • http://en.wikipedia.org/wiki/Computer_cluster
References for case study 11. Li, Y. (2010, September 1). CAREER: Novel Sampling Approaches for Protein Modeling Applications [Abstract]. National Science Foundation Award Abstract #1066471. 12. Li, Y., & Yaseen, A. (2012). Enhancing Protein Disulfide Bonding Prediction Accuracy with Context-based Features. Biotechnology and Bioinformatics Symposium 13. bioinformatics. 2011. In Merriam-Webster.com. Retrieved February 15, 2013, from http://www.merriam-webster.com/dictionary/bioinformatics 14. Cronk, J. D. (2012). Disulfide Bond. Retrieved February 15, 2013, from Biochemistry Dictionary: http://guweb2.gonzaga.edu/faculty/cronk/biochem/D-index.cfm?definition=disulfide_bond 15. Yan, Y., & Chapman, B. (2008). Comparative Study of Distributed Resource Management Systems–SGE, LSF, PBS Pro, and LoadLeveler. Technical Report-Citeseerx. 16. Li, Y., & Yaseen, A. (2012). Dinosolve. Retrieved from http://hpcr.cs.odu.edu/dinosolve/
References for competition 17. Arvind Krishna, “Why Big Data? Why Now?”, IBM , 2011 URL: http://almaden.ibm.com/colloquium/resources/Why%20Big%20Data%20Krishna.PDF 18. Yonghong Yan, Barbara M. Chapman, Comparative Study of Distributed Resource Management Systems - SGE, LSF, PBS Pro, and LoadLeveler, Department of Computer Science, University of Houston, May 2005 (pdf) 19. Dr. Li’s site http://hpcr.cs.odu.edu/dinosolve/ 20. Scratch Predictor http://scratch.proteomics.ics.uci.edu/ 21. DiANNAserver http://clavius.bc.edu/~clotelab/DiANNA/ Portable Batch System (PBS) 22. http://resources.altair.com/pbs/documentation/support/PBSProUserGuide12-2.pdf 23. http://www.pbsworks.com/SupportDocuments.aspx?AspxAutoDetectCookieSupport=1 24. http://resources.altair.com/pbs/documentation/support/PBSProRefGuide12-2.pdf 25.http://resources.altair.com/pbs/documentation/support/PBSProAdminGuide12-2.pdf 26.http://www.pbsworks.com/(S(tykrsyqbemmlf3o5zwrmjrgf))/images/solutions-en-US/PBS-Pro_Datasheet-USA_WEB.pdf 27.http://agendafisica.files.wordpress.com/2011/05/pbs.pdf Moab HPC Suite 28.http://www.adaptivecomputing.com/publication/420/wppa_open/ IBM Platform LSF 29.http://public.dhe.ibm.com/common/ssi/ecm/en/dcd12354usen/DCD12354USEN.PDF Apache Hadoop with Zookeeper 30. http://zookeeper.apache.org/doc/current/zookeeperOver.html 31. http://www.cloud-net.org/~swsellis/tech/solaris/performance/doc/blueprints/0102/jobsys.pdf References