1 / 18

Technology and Infrastructure Support for Large Scale Information

Technology and Infrastructure Support for Large Scale Information. Marcio Faerman The Brazilian National Education and Research Network - RNP marcio@rnp.br www.rnp.br. Generating Large Data Collections. Large Data Volumes can be generated much faster than they can be analyzed

paley
Download Presentation

Technology and Infrastructure Support for Large Scale Information

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Technology and Infrastructure Support for Large Scale Information Marcio Faerman The Brazilian National Education and Research Network - RNP marcio@rnp.br www.rnp.br

  2. Generating Large Data Collections • Large Data Volumes can be generated much faster than they can be analyzed • Instrument Observations • Particle Accelerators (Cern LHC) • Telescopes, Satellites • Sensor Networks • Virtual Observatories • Large Model Simulations • High resolution, Very complex • Scientific Experiments • medical imaging (fMRI): ~ 1 GByte per measurement (day) • Bio-informatics queries: 500 GByte per database • Satellite world imagery: ~ 5 TByte/year • Current particle physics: 1 PByte per year • LHC physics (2007): 10-30 PByte per year • LSST Astronomy (2012): 5 PBytes per year

  3. Challenges Managing Large Volume Data • Scalability • What works for small datasets does not necessarily work for large collections • Data Integrity • At a terabyte scale failuresand data corruption are very likely to occur • Is data provenance reliable? • Efficiency • Data should be accessed at a rate which keeps work feasible • More data – need for more speed • Distributed Access • Data can be at remote (and possibly unknown) location • Infrastructure Management • Heterogeneous • Distributed • Prone to failures • Very Complex

  4. Challenges – Getting to Know your Data • Extract knowledge from raw data files • Data product derivation • Vizualization • Relationships • Patterns • New derived quantities • Cross institutional and cross disciplinary collaborations • What if experiments • Your data with our model? • Dataset Access • Multiple formats • Each sensor, simulation has its own storage format • Federated collections • Discovery by content

  5. Technological Response • Integration of compute, communication, storage and instrument resources into a powerful infrastructure – Information Grids • Very powerful infrastructure • Economy of scale • Serves broad range of customers • biologists, pysicists, government, industry • Infrastructure is heterogeneous, distributed, very complex • Middleware and Data Oriented tools act as facilitators to tackle data management complexities

  6. Open Access and Preservation Functionalities • Federated Digital Libraries • Integration of distributed repositories • Access control – can decide who can see it • Organize the data in collections • Describe your data – Metadata • Data Grids • Access to efficient parallel I/O systems • Hierarchical Systems • Disk caches, tapes • Often Distributed • Analysis, Data Mining • Visualization • Workflow based systems • Transaction based data ingestion • Data provenance, Data fingerprinting • What if virtual lab • End User Oriented Portals • "I deal with the data in the way it makes sense to me"

  7. Middlewares and Tools • Data Management • Storage Resource Broker (SRB) • Globus Data Management • L-Store • IBP • Storage Resource Manager (SRM) • Data Representation Libraries • HDF5 • NetCDF • Portals • OGCE • JSR 168

  8. Today’s Reality • Exceptional achievements by early adopters • Integration between domain scientists – data users and producers still a challenge • Need much more cross-disciplinary interaction • Emphasis on scale and performance • Failures are still a taboo • Frustration factor should be addressed in partnership with users • Focus on failure recovery and quality of service getting more attention

  9. Grid Initiatives around the World e-Infrastructure Workshop, NUDI/USP, São Paulo, 07.05.2007

  10. UNAM OurGrid EELA SPRACE SINAPAD HEPGrid Ringrid CL Grid UCRAV

  11. CUDI-MX REACCIUN-VE RAAP-PE RNP-BR REUNA-CL Networking in Latin America

  12. Brazilian National Research And Education Network - RNP • In November 2005 the RNP networking infrastructure was entirely renovated. It consists of • A multigigabit core connecting 10 capitals at 2.5 and 10 Gbps • Connections at 34 Mbps to 11 capitals • Connections up to16 Mbps to 6 capitals

  13. Communitary Metropolitan Networks • It is not enough to bring high speed connectivity to each city – it is necessary bring it to the university campus / research lab as well. • The metropolitan network is the solution • Infrastructure sharing to support: • Campi interconnection of each partner institution • Access to RNP national network backbone • This sharing substantially reduces deployment costs • Preferably, the infrastructure will be owned by the partners themselves (reducing operating costs) • Pilot: The Metrobel project in the city of Belém do Pará in the Amazon region Infra-estrutura para e-Ciência

  14. Metrobel – Belém Metropolitan Network

  15. Redecomep Project(2005-7) • Following Metrobel, Brazilian Ministry of Science and Technology is supporting the Communitary Networks for Education and Research (Redecomep) Project, with a R$ 39,7 M (~ U$ 19,0 M) through Finep (dec/2004) • Goals: • Extend the metropolitan optical network to other 26 cities with RNP points of presence • Promote integration in metropolitan area • High speed access to RNP point of presence Infra-estrutura para e-Ciência

  16. Next steps • Integration between network, data repositories, compute, storage resources and applications • Identify who needs better connectivity • Developing Brazilian cyberinfrastructure • Generally uncoordinated funding for infrastructure resources • Need broad vision at funding agencies and partners level of application requirements and cyberinfrastructure integration • RNP articulating with scientific communities and infrastructure providers e-Science/Infrastructure initiative in Brazil

  17. JRU- Brazil: 22 members in EELA-2 e-Infrastructure Workshop, NUDI/USP, São Paulo, 07.05.2007

  18. Developing Together • Information infrastructure is being redefined in Brazil and Latin America • Now is the time to have as much cross-disciplinary interaction as possible to define needs, partnerships and investments • Please contact us THANK YOU!

More Related