1 / 13

Big Data Infrastructure for Scientific Computing

Big Data Infrastructure for Scientific Computing. Mathijs Kattenberg – mathijs.kattenberg@surfsara.nl. Big Data Landscape. Large Hadron Collider: Uses: Grid Volume: ~15 PB per year (~4PB @ SURFsara) Type of data : structured. Big Data Landscape. Next Generation Sequencing ( GoNL ):

vinnie
Download Presentation

Big Data Infrastructure for Scientific Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Big Data Infrastructure for Scientific Computing Mathijs Kattenberg – mathijs.kattenberg@surfsara.nl

  2. Big Data Landscape Large Hadron Collider: • Uses: Grid • Volume: ~15 PB per year (~4PB @ SURFsara) • Type of data: structured

  3. Big Data Landscape Next Generation Sequencing (GoNL): • Uses: Grid, Cloud, Cluster • Volume: ~100 GB to 300 TB • Type of data: various formats and noise

  4. Big Data Landscape Information retrieval and NLP • Uses: Hadoop, Cloud • Volume: ~70 TB • Type of data: Text, unstructured http://bit.ly/173ddfz

  5. Effectiveness of Data Where having and exploiting data leads to insights: • Brainscanr • Healthmap

  6. (Open) Data Sources • Lots of open data: • Open data Nederland • CitySDK • Community of Amsterdam • Rijkswaterstaat • Twitter • Facebook • Google • Different formats: • Excel files • JSON • Webservices • Different quality: • Noise • Missing values • Availability

  7. Computing Big Data Complexity: • Data: • Noise, missing data • Formats • Access • Distributed computing • Failures • Parallel programming Solutions: • Data: deal with it • Distributed computing: • Super/Cluster computer • Grid • Hadoop Capacity: • CPU cores • Hard drive space • Network bandwidth Solutions: • Scale up: get faster tools • Scale out: work with more tools

  8. Computing Big Data

  9. Computing Big Data

  10. What SURFsara Offers SURFsara provides: • Infrastructure: Supercomputer, clusters, grid, cloud, hadoop • Support: development, parallelization, consultancy • R&D: piloting new technologies • Hosting datasets for common use

  11. Mathijs Kattenberg mathijs.kattenberg@surfsara.nl

  12. www.sendsteps.com Prepare to react; keep your phone ready! Internet Go tosendc.com 1 Log in withSession 2 Type WS4 <space> youranswer 3 TXT Textto+316 4250 0030 1 Type Session <space> WS4 <space> youranswer 2 Posting messages is anonymous No additional charge per message

  13. What kind of technologies would you consider using in order to deal with technical Big Data challenges?

More Related