280 likes | 395 Views
Astronomy databases. Helen Xiang, University of Portsmouth Using the NGS database resources
E N D
Astronomy databases • Helen Xiang, University of Portsmouth • Using the NGS database resources • Large amounts of data presents difficulties with both storage and access for the users. Helen used the Oracle databases hosted on the NGS to store large amounts of data from the Sloan Digital Sky Survey (SDSS) • Succeeded in transferring almost 2 TB of SDSS data to the NGS Oracle database in Manchester. A separate Microsoft SQL database at Portsmouth holds another 2 Terabytes of similar data. Joint queries on the two databases have been successfully run.
Real-time visualization of blood flow through the brain • Steven Manos, Marco Mazzeo, Peter Coveney, University College London (UCL) • The GENIUS project (Grid Enabled Neurosurgical Imaging Using Simulation) to study blood flow around the brain prior to surgery greatly increasing the chances of success. • MRI scans provide the data to reconstruct the neurovascular structure and then a lattice-Boltzmann fluid flow solver is used to simulate the flow of blood. • GENIUS makes use of MPI-g, a grid-enabled implementation of MPI and HARC (Highly Available Robust Co-scheduler) system which allows the reservation of multiple machines at a specified time.
Development of a chemical properties database • Keiron Taylor, University of Southampton • Develop methods of handling large amounts of data as chemical data now needs multiple annotations and metadata associated with it to make any real sense to the user. • Need for this information to be stored and easily available increases the difficulty of maintaining databases. A ‘semantic web’ approach is being taken. • Created a Resource Description Framework (RDF) triple store for chemical data. Looking at using Oracle 10.2 and 11G Databases to improve the speed of querying. • Aim to dynamically combine data from both triplestores without needing to copy the entire databases. RDF is flexible so can cope with data and continuously changing annotations unlike traditional relational databases. RDF triplestore allows for more complex queries to be run on the data.
Modelling criminal patterns using the NGS • Nick Malleson, University of Leeds • Built an application to predict the effects of new environmental developments or policies (e.g. housing developments or improved transport networks) on crime rates. • Using agent-based modelling where an agent is an independent component of a system that interacts with other agents and its environment. Large systems of agents can be created to mimic real scenarios. • Simulation model written in Java using the Repast Simphony agent-based modelling toolkit. Multiple compute nodes were utilised to run separate models simultaneously. Large amounts of data are created and stored in an NGS Oracle database. • Model was computationally expensive with each run taking days to complete on a desktop PC. Using the NGS enabled hundreds of identical simulations to run simultaneously on different nodes giving hundreds of results in only a few days. • “Without the use of NGS resources the project would not have had the computational power it required to generate reliable, robust results.”
High-throughput virtual drug screening using the NGS • Narcis Fernandes-Fuentes, University of Leeds • Carrying out simulations of over 300,000 potential compounds to see how well they can bind to a highly relevant therapeutic target. Aim to find promising binders to take forward for experimental verification. • Simulations carried out using the Autodock and Autogrid applications. The large number of jobs were spread across these NGS sites using the SRB service to store the large collection of input files and ensuring standard methods of accessing them, independent of where a particular simulation was running. • Allowed a very large number of simulations (almost 1 million CPU hours used) to take place in a few months. Would have taken over a year to complete on local resources.
The effects of defibrillation on the heart • Thushka Maharaj, University of Oxford • Studying the effects of applying an electrical shock to both healthy and diseased hearts to understand exactly how defibrillation works. • Simulates how the application of electric shocks and differing types of tissue properties affects the behaviour of a normal, healthy heart. • Use parallel code with around a million nodes (computationally intensive) to obtain 20ms of animation in 20 minutes using 32 CPUs on the NGS • “I didn’t even know the NGS existed before starting my doctorate, but I don’t think we could have run these simulations without it. The benefits of services such as the Storage Resource Broker (SRB) are immense - it’s fantastic to be able to share data with colleagues all over the world so easily.”
Geodemographic modelling • Andy Turner, University of Leeds • Aim to develop national demographic simulations to help answer specific questions and analyse scenarios e.g. demand for services such as transport schemes • Basis of the demographic modelling is 2001 census data. • Various datasets from this are used to generate individual and household level datasets for 2001. Data is then projected at individual and household level for small areas on a yearly basis up to 2031. • All the modelling code has been written using Java and parts of it have been parallelised using the MPJExpress software to take advantage of NGS resources. • NGS staff have helped the project migrate their code from their own 32 node cluster to the larger NGS clusters.
Ion channel simulations • Philip Fowler, University of Oxford • Using classical molecular dynamics to study membrane proteins, in particular the ion channels that allow ions such as potassium to diffuse in and out of cells. Ion channel research is extremely important as ion channels are intimately involved in the functioning of the brain and heart. • Well-established molecular dynamics packages e.g. NAMD, GROMACS are used to produce simulations. • The techniques used are well-known but they take a long time to run. 34 parallel simulations take 34-68 days to run but by using the NGS, 34 parallel simulations can now be run in under 14 days. • With such a drastic reduction in time, researchers can run more complicated simulation scenarios e.g. running two mutations, checking the results by repeating simulations and running simulations backwards.
Membrane permeation • Brian Cheney, University of Southampton • Researching what physical and chemical features make a molecule a good or bad permeant, and in developing ways to quantify and estimate a molecule’s permeability. • Important economically as millions of pounds are lost throughout the drug discovery process due to the pursuit of lead compounds that turn out to have unsuitable permeability. • To estimate small molecule permeation, all-atom classical molecular dynamics (MD) simulations of a drug permeating through a membrane were used. • The simulations require many millions of time-steps and would take several years on a modern desktop computer for each drug studied. • Using grid-enabled versions of the MD software package CHARMM on the NGS cut the simulation time from years per drug to about 2 weeks, allowing drug research to move forward much faster than would have otherwise been possible.
mRNA analysis using the NGS • Paul Wilkinson, University of Exeter • Analyzing the transcriptomes of insects that are major crop pests. • Entire transcriptome of an insect pest is sequenced using the latest ‘Next generation’ DNA sequencing technologies. Assembling this sequence data generates in excess of 40,000 sequences which are then annotated. • Sequencing transcriptomes can help to understand how insects develop resistance to pesticides. • The homology searches are performed on the NGS using a parallel implementation of NCBI BLAST which distributes computational resources through database fragmentation, query segmentation, intelligent scheduling, and parallel I/O. • Planning on using mpiBLAST to hopefully improve NCBI BLAST performance by several orders of magnitude while scaling to hundreds of processors. • “The NGS staff have been a great help to us in the implementation of our high-throughput BLAST’s. Making use of the NGS resources has significantly reduced the time it takes to process our data".
Quantitative Genetic Analyses on the NGS • Jean-Alain Grunchec, University of Edinburgh • QTL Express was developed 8 years ago as a web site to allow geneticists to run Quantitative Trait Loci (QTL) analyses. QTL Express distributed calculations on a set of 6 local dedicated computers but computational demands often exceeded the capacity of the local cluster. • GridQTL can provide web services that extend QTL Express capacities and allow scientists to run the newer QTL mapping methods on the NGS. The Globus Toolkit is used to submit jobs to several NGS sites allowing users to experience an automated robust, fast, dynamic, transparent and free level of service thanks to the combined use of the SWARM meta-scheduler, an AJAX driven GridSphere interface and the NGS resources. • Some analyses that would have taken 30 hours on single core computers could be done in 12 minutes on the NGS allowing some users to run many of these analyses per day. The epistasis analyses can also be run faster, analysis that previously took 26 days can now be done in 15 hours.
Molecule formation from ultracold gases • Tom Hanna , University of Oxford • Studying the dynamics of ultracold gases, in particular molecule formation at very low temperatures less than 1µK above absolute zero. • Written programs to solve an equation for the dynamics of ultracold gases and applied this to the formation of molecules from cold gases using the variation of a magnetic field. • Large computations are required to model this process for a realistic set of parameters. • Calculations used to take 2 months on a single computer but with the NGS and using up to 100 processors at a time, the time taken is now less than two days. • “The NGS has made a huge difference to my research. It’s shown the method works. It’s one of those problems you couldn’t think about doing without access to a cluster!”.
Understanding electrical defibrillation of the heart • Blanca Rodriguez, University of Oxford • Using computer simulations to study the mechanisms of defibrillation in the heart to study how the heart tissue reacts to the application of an electrical shock. Part of the Integrative Biology project. • Defibrillation is well used in hospitals but the mechanisms behind it are still not fully understood. In order to understand exactly how defibrillation works researchers are simulating the application of electric shocks to both healthy and diseased hearts. • Many sequential simulations are run with parameter sweeps of variables such as the shock strength and timing of application. To obtain 250ms of animated data, 28 hours of processing time must be used for each parameter. • Using the NGS, they have run hundreds of sequential simulations on many CPUs, something which would not have been possible without NGS resources.
Simulating the universe on the NGS • Cristiano Sabiu, University of Portsmouth • Studied the distribution of galaxies in the universe • Involved running many large scale N-body simulations of dark matter from which a multiverse of almost 2000 mock universes was produced. The mock universes were then compared to the actual galaxy distribution as observed by the Sloan Digital Sky Survey (SDSS). • Simulations were run using GADGET2 code which is freely available. The code can be run on essentially all supercomputer systems presently in use, including clusters of workstations or individual PCs. • Completed a series of cosmological simulations, using a 128 CPU configuration on the NGS RAL facility and successfully ran 20 full scale simulations, requiring ~100,000 CPU hours over a 12 month time period. • “GADGET2 was installed for me on the NGS and optimised for their system. Without the NGS my PhD project could have taken 10 years!”
Computational modelling of ion channel biophysics • Ranjit Vijayan, University of Oxford • Studying the fundamental principles of neuro- transmitter receptor molecules. Understanding the behaviour of these molecules is critical in designing therapies for a range of neurological disorders e.g. Alzheimer's disease, Parkinson's disease and epilepsy. • Makes extensive use of the NGS clusters to perform rigorous molecular dynamics simulations and free energy calculations to look at the energetics of physiological processes like ligand and ion binding. • Using a range of computational techniques on the NGS, Ranjit identified the precise location of modulatory ion binding sites in a class of neurotransmitter receptor called ionotropic glutamate receptors which had puzzled experimentalists for years. • “The NGS has been an immense help to me. I can honestly say that without the NGS I would probably not have been able to complete my PhD on time.”
Simulating carbon nanotubules on the NGS • Rebeca Garcia Fandino, University of Oxford • Carbon nanotubes (CNTs) have promising applications e.g. in the fields of nanotechnology, nanoelectronics and composite materials but they have poor solubility limiting their application • Simulated the addition of functional groups to CNTs and how it changes their dynamical behaviour and transport properties • Simulations were performed using Gromacs and Amber • Required simulations for 100 subsequent points, which led to 1600 calculations using 2 processors, which would translate to more than 300,000 hours needed for the simulation of only 1 ns for each one of the 100 points explored in each system • “The large amount of calculations required constitute a very obvious reason for the necessity of HPC capabilities for carrying out this project. The NGS can provide these facilities, and we are very satisfied from the benefit we are obtaining from these resources to make this project real”.
Using the NGS to model the climate impact of aircraft emissions • Laura Wilcox, University of Reading • Quantify the climate impact of water vapour emissions from aircraft by modelling the four-dimensional motion of emissions after their release into the atmosphere • Outcome of the research will be a revised estimate of the climate impact of water vapour emissions which will allow for more informed policy and operational choices • Used a code written in FORTRAN and compiled when needed but reads large amounts of pre-prepated data while running and therefore needs large amount of disk space and memory • Identified NGS sites with enough capacity to run the code and built copies of the program at each site. When the jobs were completed, the results could be copied back. • “Without the NGS, it would take 2 years to model the motion of water vapour emissions during one winter. To complete all the model runs that we hope to perform on the NGS over the next few months would take 10 years on our department computers”.
Exploiting solar power using the NGS • Marco Califano, University of Leeds • Identifying the best materials, sizes and shapes of the nanocrystals that could lead to high-efficiency energy conversion in the next-generation solar cells. • Uses two different (non-open-source) parallel codes at one NGS site, one of which has fairly large memory requirements and can be used on up to 16 CPUs. The other has a good scalability with the number of processors up to about 128 CPUs. • Running his parallel codes on the NGS allows the usage of a larger number of nodes and larger memory, compared to local resources resulting in faster results and being able to run calculations which would be impossible on a small desk top machine. • “I would like to praise the great support I received from NGS staff that have helped me to overcome all the problems related to porting, compiling and running my codes on the NGS”.
Quantum Mechanics modelling using the NGS • Stewart Reed, University of Leeds • Aiming to develop new methods of performing accurate computer simulations of “tunnelling” molecules; tunnelling is important for a large range of atomic scale processes. • involves developing methodologies, he runs his own specially made simulation programs on the NGS • makes use of the high quality compilers and libraries on the NGS - in particular the Intel Fortran compiler, Lapack, ScaLapack and ideally MPI2 libraries. The code is written so that it can be compiled into a serial code or a parallel code • Quantum mechanical simulations are inherently expensive in terms of the computer resources required to perform them. • “the NGS provides excellent computing resources with which to perform these calculations. The computational capacity available from the NGS allows larger systems to be studied more accurately than are possible with standard workstations”.
Scalable Road Traffic Monitoring using Grid Computing • Aengus McCulloch, Newcastle University • Been an explosion in the volume of data relating to real world events e.g. systems monitoring weather, road traffic. The data is geographically referenced and most useful in the short term. • Processing large volume of data generated by geographical sensors in a short time period presents a large enormous computational challenge. • Research focused on processing near real-time geographic information delivered through Sensor Web Enablement (SWE) interfaces. • Used traffic monitoring and routing as a real-world case study. Matched GPS tracks from a fleet of vehicles to the road network using the GridSAM job submission service to run open ended compute jobs. • Retrieved GPS tracks in near real-time from a SWE service, combined this with base map data of the road network and publish the matched position to another SWE service. System has been successfully tested with up to 250 vehicles. • “The NGS has made endless CPU hours available to me through a standards based interface; it has been an invaluable resource for my research”.
Modelling the effect of the peptide sequence on the binding affinity to carbon nanotubes • Susana De Tomasio, University of Warwick • Carbon nanotubes (CNTs) have promising applications in the fields of nanotechnology and medicine but problems with extreme hydrophobicity and lack of dispersion • The ability of biomolecules, such as peptides, to non-covalently bind to CNTs gives rise to new expectations for its selective dispersion and manipulation but the nature of the peptide-CNT interaction is not yet fully understood. • Purpose of the research was to better understand how the order of the residues in a peptide affects interactions with a CNT • Performed modelling simulations using TINKER which was installed at the Leeds NGS site • Found that the binding affinity to CNTs is strongly dependent on the order of the content of the peptide sequences. Represent a first step in the identification of the rules of design for peptide-CNT interfaces. • “The biggest advantage of using NGS is that we can submit several jobs simultaneously”
Systematic modeling of ionic liquids • Edward Ballard, University of Wales, Bangor • Ionic liquids have many potential uses in future tech- nology e.g. in ion lithium batteries, solar thermal energy, replacement for volatile organic solvents. • Looked at the fundamental ways in which the properties of ionic liquid solutions can change the reaction outcome. • How ions move around a reagent in solution plays an important role in dictating how fast the reaction goes. Can work out how the reaction can be made to go faster and more efficiently. • Molecular dynamic simulations were performed using DL_POLY2 on NGS resources and were run using 8-16 computing processors. The NGS UI-WMS broker was used to stage calculations. • “Using the NGS was a huge benefit to my research. Being able to run the simulations on a large number of processors significantly reduced the time for them to complete and I benefited substantially from the ability to run multiple jobs. My results have already attracted interest from other research groups”
Optimisation of imaging during radionuclide therapy using simulations • Maria Holstensson, Institute of Cancer Research • Research on the imaging of cancer treatment involving a drug labelled with radionuclide iodine-131 • Iodine-131 emits beta particles and gamma rays that can be imaged with a medical gamma camera. The images can be used to estimate the radiation dose to the tumour. Improving gamma camera imaging will improve the accuracy of radiation dose calculations. • Two problems with imaging - scatter leading to degradation of image quality and photon penetration of the gamma camera collimator which degrades image quality. Scatter correction is currently applied to the images. • Have developed a computer model of the gamma camera used to image the patients using the Monte Carlo program GATE. Have investigated phenomena using the model which are not possible to directly measure experimentally e.g. details of scattering processes within the patient and the camera. • Visual and quantitative comparisons between experiment and simulation have shown excellent agreement. • PI Dr Glenn Flux said "The availability and support of the NGS enabled us to obtain more accurate and detailed results than would otherwise have been possible. The results from this study are likely to have a significant impact on patient care in the years to come".
Finding new messages hidden in the genetic code • Charlie Laughton, University of Nottingham • Studied the flexibility and folding of DNA in cells. Currently no clear understanding of how this works • Understanding the flexibility of DNA can help us to understand how cells use these properties to switch genes on and off • Used molecular dynamic (MD) simulations of many different fragments of DNA using the AMBER installed on the NGS. The programs to prepare the DNA fragments for simulation and analyse the results were written by the research team. • MD simulations need a large amount of computer power – to study one of these fragments on a typical workstation would have taken about two weeks of CPU time. Using the NGS this was reduced to about 24 hours meaning the research was completed in about three months instead of two years. • “without the compute power and high-throughput provided by the NGS, we would not have been able to deliver our part of the project in a timely manner. At a more personal level, it led to one of the most highly cited publications I have ever had.”
Using the NGS to Help Determine the Suitability of Thoria for a Next Generation Nuclear Fuel • Paul Martin, University of Huddersfield • Investigating the impact of defects in nuclear power fuel rods caused by high energy fission products and initial neutron bombardment. • Increased interest in the use of thorium dioxide for nuclear power rods as it is high in abundance, all of the thorium can be usefully burnt, produces far less radiotoxic waste than other nuclear fuel and does not result in plutonium production. • Used METADISE and PARAPOCS programs for surface calculations to investigate two factors - the stability of thoria with doping levels of uranium, usually found in fuel rods, and the segregation of uranium ions to the stable {111} surface of thoria, both over a range of simulated temperatures • Software used is easily ported to pretty much any architecture, so the jobs are particularly suited to task farming across the NGS. The METADISE jobs were farmed to run many at the same time instead of one after the other resulting in sixty-four 5 minute calculations running in 5 minutes and not 6 hours. • The NGS provides us with easy access to a wide range of compute resources that otherwise would not be available to us, even with considerable investment at a local level, enabling us to get our research done quickly without fuss.
Using the NGS to run a computer tournament on social learning strategies • Luke Rendell, University of St. Andrews • Many organisms can learn for themselves but they also acquire information from others. Unknown what kind of strategies for combining these information sources natural selection will favour. • Ran a computer-based tournament in which entrants were invited to devise a strategy to guide how agents learn and prosper in a simulated evolutionary environment. • Entries consisted of a learning strategy, defining how an individual agent would make decisions about when and how to learn based on the information they currently had. Strategies then competed in evolutionary computer simulations to see which would win out in a virtual world. • Simulation model was coded in MATLAB then ported to Octave which was made available on a number of NGS clusters. The tournament attracted over 100 entries requiring over 100,000 individual simulations taking over 60,000 CPU hours. • Luke explained “We are incredibly grateful to the NGS for the helpful and flexible way that resources were made available to us. Without it, our research would simply not have happened. We are looking forward to working with the NGS again to run follow-up tournaments”.
Accelerating the Processing of Large Corpora: Using Grid Computing Technologies for Lemmatizing 176 Million Words Arabic Internet Corpus • Majdi Sawalha, University of Leeds • Added the “lemma” and root to each word in the 176 million word Arabic Internet Corpus. • Used the SALMA – Tagger (Sawalha Atwell Leeds Morphological Analyses – Tagger) to add the additional information but it is slow relatively slow - about 7 words per second • Used the NGS to lemmatize the Arabic internet corpus by dividing it into half-million-word files and then wrote a program that generates scripts to run the lemmatizer for each file in parallel • The output files were combined into one lemmatized Arabic Web Corpus and 10 random samples, of 100 words each, were selected to evaluate the accuracy of the lemmatizer. The average root accuracy was about 81.20% and the average lemma accuracy was 80.80%. • “Roughly, an estimated execution time for lemmatizing the full Arabic Internet Corpus was 300 days but by using the NGS, a massive reduction in execution time was gained – instead it only took 5 days. Not only did it make the processed Arabic Internet Corpus available to other translation studies and Arabic and middle eastern study researchers at the University of Leeds but also to other world-wide institutions."
Computer Simulations of Biological Molecules at the Atomic Level • Sarah Harris, University of Leeds • To study the fragmentation of amyloid fibrils which associated with a number of degenerative diseases including Alzheimer’s, Parkinson’s and type II diabetes • Shown that if the fibrils are broken up into short fragments through stirring, then their toxicity increases but there is no information about breakage events at the atomic level • Used atomistic modelling to apply sufficient force to an amyloid fibril in silico so that it fragments. Calculations show that the fibrils are more fragile if they contain structural defects at the molecular level as these defects act as weak points that are prone to failure • Used AMBER, NAMD and GROMACS which scale well up to ~500 processors, depending on the size of the system. Most of the teams calculations, however, use between 16 and 64 cores • “The NGS has provided us with additional computational resources which have enabled us to perform more accurate calculations than would otherwise have been possible”.