130 likes | 285 Views
Big Data Infrastructure for Scientific Computing. Mathijs Kattenberg – mathijs.kattenberg@surfsara.nl. Big Data Landscape. Large Hadron Collider: Uses: Grid Volume: ~15 PB per year (~4PB @ SURFsara) Type of data : structured. Big Data Landscape. Next Generation Sequencing ( GoNL ):
E N D
Big Data Infrastructure for Scientific Computing Mathijs Kattenberg – mathijs.kattenberg@surfsara.nl
Big Data Landscape Large Hadron Collider: • Uses: Grid • Volume: ~15 PB per year (~4PB @ SURFsara) • Type of data: structured
Big Data Landscape Next Generation Sequencing (GoNL): • Uses: Grid, Cloud, Cluster • Volume: ~100 GB to 300 TB • Type of data: various formats and noise
Big Data Landscape Information retrieval and NLP • Uses: Hadoop, Cloud • Volume: ~70 TB • Type of data: Text, unstructured http://bit.ly/173ddfz
Effectiveness of Data Where having and exploiting data leads to insights: • Brainscanr • Healthmap
(Open) Data Sources • Lots of open data: • Open data Nederland • CitySDK • Community of Amsterdam • Rijkswaterstaat • Twitter • Facebook • Google • Different formats: • Excel files • JSON • Webservices • Different quality: • Noise • Missing values • Availability
Computing Big Data Complexity: • Data: • Noise, missing data • Formats • Access • Distributed computing • Failures • Parallel programming Solutions: • Data: deal with it • Distributed computing: • Super/Cluster computer • Grid • Hadoop Capacity: • CPU cores • Hard drive space • Network bandwidth Solutions: • Scale up: get faster tools • Scale out: work with more tools
What SURFsara Offers SURFsara provides: • Infrastructure: Supercomputer, clusters, grid, cloud, hadoop • Support: development, parallelization, consultancy • R&D: piloting new technologies • Hosting datasets for common use
Mathijs Kattenberg mathijs.kattenberg@surfsara.nl
www.sendsteps.com Prepare to react; keep your phone ready! Internet Go tosendc.com 1 Log in withSession 2 Type WS4 <space> youranswer 3 TXT Textto+316 4250 0030 1 Type Session <space> WS4 <space> youranswer 2 Posting messages is anonymous No additional charge per message
What kind of technologies would you consider using in order to deal with technical Big Data challenges?