10 likes | 100 Views
The original array uses random memory blocks. N. Memory blocks are chosen random for the new array. n. Fig. 3 How memory looks. Query Processing in Molecular Simulation Databases Reu 2009 ~ Amarilys Méndez Solís , Mentor: Yicheng Tu. N – Big size of data. n – Smallest size of data.
E N D
The original array uses random memory blocks N Memory blocks are chosen random for the new array n Fig. 3 How memory looks. Query Processing in Molecular Simulation Databases Reu 2009 ~ Amarilys MéndezSolís , Mentor: YichengTu N – Big size of data n – Smallest size of data Abstract: . In our time, the advantage of technology is the biggest thing for current scientific works. One of those works is the simulation of particles. In the project, “Computing Distances Histogram Efficiently in Scientific Databases", they do the transition from the manual labor of these simulations to an automated labor with a computer, and this was a very difficult process. But they automate the process in an innovative and more efficient way with an implemented algorithm. The algorithm's work was to calculate the distance between the particles and trace this distances in a histogram. The only one detail missing, is the rapidness of these calculations. Before this algorithm the code was too slow than what they wanted it to be. For that issue, we propose a new algorithm that shortens the time in which it makes the simulation, time is very valuable for us because the datasets of these simulations are very big and take too much time to analyze. This new algorithm gave us very good results and we can see that the time of processing decreased with the new implemented algorithm. Results: . For all those reasons, the biggest data cannot be calculated by human means, it requires a most advanced algorithm to calculate all this data in a computer. That is why they created an algorithm called Density Map (DM). That algorithm lets them calculate the data with much accuracy and then, create a histogram. The algorithm is very complicated and consists of complex data structures (Quad trees) and other important things that are required to calculate the data sets and organize the data in the computer memory. DM was the most useful thing for them because they can calculate with accuracy and now they can use incredibly large data sets. Because of the size of the dataset size and the many calculations the execution time of this algorithm is very slow. Fig. 2 From a big size of data N takes some random particles for a new smallest data set n. username@ubuntu:$ ./pdh 0 100000 6500.0 40000.0 0.0 0 50.0 50.0 00: 837311350 | 1708321490 | 1602561651 | 772114165 | 79094586 | 05: 546758 | 0 T:4999950000 339.189511 username@ubuntu:$ ./pdhr 0 100000 6500.0 40000.0 0.0 0 50.0 50.0 CutArray = 49825 00: 1241190576 | 0 | 0 | 0 | 49824 | 05: 0 | 0 T:1241240400 56.094437 Original Output Time 85% improvement New Output Bucket Time Fig 4. Output of the results. The results given with the new algorithm reduced time approximately 85% than the original one. • Objectives: . • Create an algorithm in C that shortens the actual run time. • Compare and analyze the difference in power and accuracy for the old and new algorithms and see if it is a good idea shortening time and lacking a bit of accuracy or detail. The result was like we expected, that the time of the run was faster than the original one. That is 85% of improvement. We lost some accuracy in the bucket results but this is affordable because of the gain in time. Background: In the project, one of the biggest inspirations was the commodity of the technology in their works. Their work requires data sets of particle simulations of one large system are obtained. The particle simulation is important because they can simulate large systems into a classical entity like particles. The size of these data sets could be of millions to billions of particles. Also, those require some configuration and statistical calculations to complete their purpose that is to find the distance between particles in these datasets. The configuration stores some important information of the particles and the statistical calculation uses Spatial Distance Histogram (SDH), with this they can trace a histogram that calculates the distance between all pairs of particles on the simulated system. Algorithm: . Fig 4. Representation of the results. The results given with the new algorithm show that the time reduced approximately 85% than with the original one. The only drawback is that the accuracy was more in the original than with the new code. • The algorithm is an small code in their original code, Density Map . In there they use a function which takes the original array of data N and shortens it to a smaller one n. • The code makes the data set smaller by using a random function to produce a true or false flag and with this flag we decide if the data in that particular place is going to be used or not. • At the end, the code reassigns a new size to the original array and all the new values are set there. After this the code performs the analysis of the smaller dataset. • Pseudocode: • Start with the original array. This is assigned to store the values on it. For each elements in the arrayA copy it to tempArray . Then, free the memory allocation of arrayA. For each elements in the tempArray. Flip a coin randomly. If the value is true or 1 Take the value of tempArray and assign to the newly allocated arrayA. Conclusion: . The idea of the code was very good because we reach our goal , the run time of the code is 85% faster than original one. We need improvements on the code because we need to add more accuracy to the code and run more faster that now. But above all , the result are excellent for our goal. Astronomical Acknowledgements: To the University of South Florida for give me the opportunity to participate in the REU. But the most special greeting to Dr. Yicheng Tu, Dr. Miguel Labrador and Prof. Daladier Jabba. Also to Universidad Metropolitana for help me to reach that goal. Biological/Chemical – molecular simulation Department of Computer Science & Engineering Fig.1 Some examples of astronomical and biological/chemical simulation used for data.