1 / 24

Processors, Pipelines, and Protocols for Advanced Modeling Networks

This book discusses the complexity and volume of data in modern technology and the need for software to aid scientists in reducing data volume. It explores two types of modeling efforts - discovery and production - and the barriers faced in production modeling. It also delves into high-performance computing, frameworks, automation, and language design.

ferma
Download Presentation

Processors, Pipelines, and Protocols for Advanced Modeling Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Processors, Pipelines, and Protocols for Advanced Modeling Networks Joseph Coughlan 1, James Fisher 1 Eric Bjorkstedt 2, 1 National Aeronautics and Space Administration 2 National Oceanic and Atmospheric Administration, Fisheries

  2. Mission Complexity and Performance

  3. Data Volume and Dimensionality • Scientists are overwhelmed by the sheer volume of data collection enabled by modern technology. • A sensor–web will increase the potential for more data collection therefore, software is essential to aid the scientist and reduce data volume. Any algorithms used to aid the scientist and reduce data volume must originate in the discovery process before they can be implemented in data production or moved on-board.

  4. Two Types of Model Efforts Modeling can be roughly categorized as discovery or production. Discovery modeling typically is PI led, often funded with grants, and oriented towards discovery and scientific inquiry of fundamental mechanisms within a discipline. Production modeling is community led, interdisciplinary, and driven by the need to generate standard products, such as forecasts and monitoring the productivity of the earth.

  5. = dataflow = user control Knowledge Discovery in Databases Based on Data Flow Pipeline - Abstract components, instantiated as needed - Abstract data flow Exact form of pipeline is research Example: User Pattern/Knowledge Display Pattern Integration/ Knowledge Discovery Data Mining Conditioning Each box contains several distributed components Data

  6. Wine Quality = F( Sea Surface T)

  7. Production modeling • As successful discovery activities mature, they can provide products to assess impacts on the system of study. • Often called “high-end” modeling, it requires larger teams, budgets and computing resources, and often undertakes the unconstrained modeling of long-term scenarios.

  8. Production modeling barriers • Barriers include processing limitations, the difficulty of integrating new scientific models, competing for scarce talent with the commercial sector, costly computing resources and, implementation of these software on ever evolving, distributed computer architectures.

  9. High Performance Computing • The high performance computing is fundamentally changing due to Moore’s Law and this change has a short-term, negative impact on production modeling in Earth science. • Parallel computers are now built with commodity processors and components. While the benchmarks of these single image systems are impressive; • scientists suffer from a significant usability gap and are not achieving needed performance.

  10. Frameworks A framework is a software architecture that consists of several large components and the bonds between them. The goals of a framework are to: • Foster reusability among software components and portability among high-end computing architectures, • Reduce the time required to modify research application codes, • Structure systems for better management of evolving codes, and. • Enable software exchange between major centers of research. http://research.hq.nasa.gov/code_y/nra/current/CAN-00-OES-01/index.html

  11. Automatic Programming • Automation must help implement and integrate models by abstracting the hardware, automatically finding parallelisms in code, and reducing the labor needed to write, link and modify efficient codes.

  12. Automation • Define algorithms from data (discovery modeling). • Define problem solutions from problem descriptions.

  13. Automation • The benefit of automation is that it scales beyond the ability of humans to remember, recognize and apply all optimizations on code. • Intelligent compilers and concise languages can reduce the labor and hardware expertise for high-end computers just as today’s optimizing compilers surpassed the optimizing skills possessed by programmers in the early 80’s.

  14. Language Design – Recent History • Algorithms Imply Data Products • Modifications to the Procedural Paradigm to Accommodate Reuse and Concurrent Execution. • Hard part of programming is program understanding – making implied data product appear in the mind of the programmer. • Data Products Imply Algorithms • Need an abstraction that will subsume most of the technical needs – one where most control structures are implied • One that can imply iterative and parallel structures, because…

  15. Because – Iteration/Recursion: • Complexity: The hard part of programming is making the implied data product appear in the programmer’s mind when reading or writing the explicit control structures that produce or process the data product. • Since complex data products are iteratively or recursively defined, much of the complexity in writing programs centers around the development of nested iterative or recursive program structures. This fact is supported by the literature. • Software engineers have long realized that the construction of loops is complex and costly.Mills • Bishop noted that "Since Pratt's paper on the design of loop control structures was published more than a decade ago, there has been continued interest in the need to provide better language features for iteration."Bishop

  16. Because – Parallelisms: • The affordability and availability of parallel computing facilities has increased dramatically in recent years. • The development of parallel programs, however, remains a difficult, time-consuming, and expensive process. It has been estimatedPancake that the development of parallel code costs on average $800 per line of code. • Even the migration of existing serial code to parallel execution, a problem of critical interest in many enterprises, may cost anywhere from $50 to $500 per line of code.

  17. Dynamic Scheduling: Quicksort # of processors Q [5,2,3,14,1,7,4,6] 1 L G 2 Q [2,3,1,4] Q [14,7,6] 2 L G L G 4 Q [1] Q [3,4] Q [7,6] Q [14] 4 L G L G 4 Q Q Q Q 4 [1 2 ~ 3 4 5 6 7 ~ 14] 1 [1,2,3,4,5,6,7,14]

  18. Automation Summary A high level language (SequenceL) can provide an abstraction suitable for automatically generating iterative and parallel program structures. The language is based upon a simple execution strategy that follows an iterative consume, simplify, and produce process, halting when no further simplification is possible.

  19. What Is a Computational Grid? Grid refers to an infrastructure that enables the integrated, collaborative use of high-end computers, networks, databases, and scientific instruments owned and managed by multiple organizations. Grid applications often involve large amounts of data and/or computing and often require secure resource sharing across organizational boundaries, and are thus not easily handled by today’s Internet and Web infrastructures. • For more information, see (http://www.mkp.com/grids/).

  20. Distributed High Performance Modeling Network • This is a model for the future • Nodes and Interconnects will evolve • Picture the year 2020 with carbon-nano compute-nodes doing Protein Folding calculations, stored in quantum-devices and accessible to the world over exa-bit wireless networks! Desktop/ Your Office Supercomputers Experimental Facilities High-Speed Networks Collaborative Environments Databases Mass Storage

  21. Protocols • A successful protocol interface must be simple, well defined and stable. • Currently, data transmission protocols (TCP) and streaming protocols such as streaming media are maturing to support increasing bandwidth and stream sizes. • There are emerging protocols, such as CORBA, that support real-time instrumentation and Java/RMI for data intensive applications

  22. New Protocols • New protocols should support distributed parallel processing over clusters, scalable multi-cast protocols, and protocols for intergroup data sharing. • In the future, flow and congestion control and latency problems must be resolved.

  23. Mission Complexity and Performance

More Related