300 likes | 314 Views
This workshop aims to explore the role of cyberinfrastructure in policy informatics, focusing on the management and understanding of large datasets in an interconnected world. Topics include distributed systems, web services, grids, and more.
E N D
Computational Infrastructure for Policy Informatics Policy Informatics in an Interdependent World Workshop Washington DC September 13 2007 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 http://grids.ucs.indiana.edu/ptliupages/presentations/ gcf@indiana.eduhttp://www.infomall.org
e-moreorlessanything ‘e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it.’ from its inventor John Taylor Director General of Research Councils UK, Office of Science and Technology e-Science is about developing tools and technologies that allow scientists to do ‘faster, better or different’ research Similarly e-Business captures an emerging view of corporations as dynamic virtual organizations linking employees, customers and stakeholders across the world. This generalizes to e-moreorlessanything including presumably e-Policyinformatics A deluge of data of unprecedented and inevitable size must be managed and understood. People (see Web 2.0), computers, data and instruments must be linked. On demand assignment of experts, computers, networks and storage resources must be supported 2
Role of Cyberinfrastructure Cyberinfrastructure is infrastructure that supports distributed science (e-Science)– data, people, computers Exploits Internet technology (Web2.0) adding (via Grid technology) management, security, supercomputers etc. It has two aspects: parallel – low latency (microseconds) between nodes and distributed – highish latency (milliseconds) between nodes Parallel needed to get high performance on individual large simulations, data analysis etc.; must decompose problem Distributed aspect integrates already distinct components – especially natural for data Cyberinfrastructure is in general a distributed collection of parallel systems Cyberinfrastructure is made of services (originally Web services) that are “just” programs or data sources packaged for distributed access 3
Structure of Cyberinfrastructure • Distributed software systems are being “revolutionized” by developments from e-commerce, e-Science and the consumer Internet. There is rapid progress in technology families termed “Web services”, “Grids” and “Web 2.0” • The emerging distributed system picture is of distributed services with advertised interfaces but opaque implementations communicating by streams of messages over a variety of protocols • Complete systems are built by combining either services or predefined/pre-existing collections of services together to achieve new capabilities • As well as Internet/Communication revolutions (distributed systems), multicore chips will likely be hugely important (parallel systems) • Industry not academia is leading innovation in these technologies
Policy Informatics Infrastructure • The Party Line approach is clear – one creates a Cyberinfrastructure consisting of distributed services accessed by portals/gadgets/gateways/RSS feeds • Services include: • “original data” • Transformations or filters implementing DIKW (Data Information Knowledge Wisdom) pipeline • Final “Decision Support” step converting wisdom into action • Generic services such as security, profiles etc. • Some filters could correspond to large simulations • Infrastructure will be set up as a System of Systems (Grids of Grids) • Services and/or Grids just accept some form of DIKW and produce another form of DIKW • “Original data” has no explicit input; just output
SS Database SS SS SS SS SS SS SS Raw Data Data Information Knowledge Wisdom AnotherGrid Decisions AnotherGrid SS SS SS SS FS FS OS MD MD FS Portal OS OS FS OS OS Inter-Service Messages FS FS FS AnotherService FS FS MD MD OS MD OS OS FS Other Service FS FS FS FS MD OS OS OS FS FS FS MD MD FS Filter Service OS AnotherGrid FS MetaData FS FS FS MD Sensor Service SS SS SS SS SS SS SS SS SS SS AnotherService
Information Management/Processing • Diagram describes e-Science, Military Command and Control and perhaps Policy Informatics • Data Information Knowledge Wisdomtransformation • (SOAP or just RSS) messages transport information expressed in a semantically rich fashion between sources and services that enhance and transform information so that complete system provides • Semantic Web technologies like RDF and OWL might help us to have rich expressivity but they might be too complicated • We are meant to build application specific information management/transformation systems for each domain • Each domain has specific services/standards (for API’s and Information) and will use generic services (like R for datamining) and standards (RDF, WSDL) • What is PIML Policy Informatics Markup Language? • Standards made before consensus or not observant of technology progress are dubious (cf. HLA in simulation or many grid standards)
Too much Computing? • Historically one has tried to increase computing capabilities by • Optimizing performance of codes • Exploiting all possible CPU’s such as Graphics co-processors and “idle cycles” • Making central computers available such as NSF/DoE/DoD supercomputer networks • Next Crisis in technology area will be the opposite problem – commodity chips will be 32-128way parallel in 5 years time and we currently have no idea how to use them – especially on clients • Only 2 releases of standard software (e.g. Office) in this time span • Gaming and Generalized decision support (data mining) are two obvious ways of using these cycles • Intel RMS analysis • Note even cell phones will be multicore • “Too much data” matched to “Too much computing” but implications involved rather different
Today Tomorrow RMS: Recognition Mining Synthesis Recognition Mining Synthesis Is it …? What is …? What if …? Find a model instance Create a model instance Model Model-less Real-time streaming and transactions on static – structured datasets Very limited realism Model-based multimodal recognition Real-time analytics on dynamic, unstructured, multimodal datasets Photo-realism and physics-based animation
Recognition Mining Synthesis What is a tumor? Is there a tumor here? What if the tumor progresses? It is all about dealing efficiently with complex multimodal datasets Images courtesy: http://splweb.bwh.harvard.edu:8000/pages/images_movies.html
What should we do? • There will be high quality parallel data mining algorithms • Speech Recognition, Text and multimedia search and browsers • New generation of desktop aides • What are synergies to “Personal aides in an information rich world” (future of PC?) and Policy Informatics? • What filters (data mining) does policy informatics need? • As computing free, focus on identifying information/knowledge/wisdom needed (there is probably too much data but not so much wisdom in DIKW pipeline) • We should use supercomputer/computer services but Information services more important and less “controversial” • Identify standards for data and data-mining API’s • Set up distributed Policy Informatics Services • Use Web 2.0 (as it makes things easier) not current Grids (which makes things harder) • Build a “Programmable Policy Informatics Web”’ • Emphasize Simplicity • Is “Secrecy” important and in fact viable? • Should we care just about “original data” or also about the whole pipeline DIKW?
Web 2.0 Mashups and APIs • http://www.programmableweb.com/apis has (Sept 12 2007) 2312 Mashups and 511 Web 2.0 APIs and with GoogleMaps the most often used in Mashups • Mashups are called workflow in Grid arena
The List of Web 2.0 API’s • Each site has API and its features • Divided into broad categories • Only a few used a lot (49 API’s used in 10 or more mashups) • RSS feed of new APIs • Amazon S3 growing in popularity
Grid Service Philosophy I • Services receive data in SOAP messages, manipulate it and produce transformed data as further messages • Knowledge is created from information by services • Information is created from data by services • Semantic Grid comesfrom building metadata rich systems of services • Meta-data is carried in SOAP messages • The Grid enhances Web services with semantically rich system and application specific management • One must exploit and work around the different approaches to meta-data (state) and their manipulation in Web Services
Grid Service Philosophy II • There are a horde of support services supplying security, collaboration, database access, user interfaces • The support services are either associated with system or application where the former are WS-* and GS-* which implicitly or explicitly define many support services • There are generalized filter services which are applications that accept messages and produce new messages with some data derived from that in input • Simulations (including PDE’s and reactive systems) • Data-mining • Transformations • Agents • Reasoning • Decision making Tools are all termed filters here • Agent Systems are a special case of Grids • Peer-to-peer systems can be built as a Grid with particular discovery and messaging strategies
Grid Service Philosophy III • Filters can be a workflow which means they are “just collections of other simpler services” • Grids are distributed systems that accept distributed messages and produce distributed result messages • A service or a workflow is a special case of a Grid • A collection of services on a multi-core chip is a Grid • Sensors or Instruments are “managed” by services; they may accept non SOAP control messages and produce data as messages (that are not usually SOAP)
Virtual Observatory Astronomy GridIntegrate Experiments Radio Far-Infrared Visible Dust Map Visible + X-ray Galaxy Density Map
Service or Web service Approach • One uses GML, CML etc. to define the data in a system and one uses services to capture “methods” or “programs” • In eScience, important services fall in three classes • Simulations • Data access, storage, federation, discovery • Filters for data mining and manipulation • Services use something like WSDL (Web Service Definition Language) to define interoperable interfaces (see OPAL talk!) • WSDL establishes a “contract” independent of implementation between two services or a service and a client • Services should be loosely coupled which normally means they are coarse grain • Services will be composed (linked together) by mashups (typically scripts) or workflow (often XML – BPEL) • Software Engineering and Interoperability/Standards are closely related
Philosophy of Web Service Grids • Much of Distributed Computing was built by natural extensions of computing models developed for sequential machines • This leads to the distributed object (DO) model represented by Java and CORBA • RPC (Remote Procedure Call) or RMI (Remote Method Invocation) for Java • Key people think this is not a good idea as it scales badly and ties distributed entities together too tightly • Distributed Objects Replaced by Services • Note CORBA was considered too complicated in both organization and proposed infrastructure • and Java was considered as “tightly coupled to Sun” • So there were other reasons to discard • Thus replace distributed objects by services connected by “one-way” messages and not by request-response messages
Web services • Web Services build loosely-coupled, distributed applications, (wrapping existing codes and databases) based on the SOA (service oriented architecture) principles. • Web Services interact by exchanging messages in SOAPformat • The contracts for the message exchanges that implement those interactions are described via WSDL interfaces.
PortalService Security Catalog A typical Web Service • In principle, services can be in any language (Fortran .. Java .. Perl .. Python) and the interfaces can be method calls, Java RMI Messages, CGI Web invocations, totally compiled away (inlining) • The simplest implementations involve XML messages (SOAP) and programs written in net friendly languages like Java and Python PaymentCredit Card Web Services WSDL interfaces Warehouse Shipping control WSDL interfaces Web Services
The Grid and Web Service Institutional Hierarchy 4: Application or Community of Interest (CoI)Specific Services such as “Map Services”, “Run BLAST” or “Simulate a Missile” XBMLXTCE VOTABLE CML CellML 3: Generally Useful Services and Features (OGSA and other GGF, W3C) Such as “Collaborate”, “Access a Database” or “Submit a Job” OGSA GS-*and some WS-* GGF/W3C/….XGSP (Collab) 2: System Services and Features (WS-* from OASIS/W3C/Industry) Handlers like WS-RM, Security, UDDI Registry WS-* fromOASIS/W3C/Industry 1: Container and Run Time (Hosting) Environment (Apache Axis, .NET etc.) Apache Axis.NET etc. Must set standards to get interoperability
Two-level Programming I Service Data • The Web Service (Grid) paradigm implicitly assumes a two-level Programming Model • We make a Service (same as a “distributed object” or “computer program” running on a remote computer) using conventional technologies • C++ Java or Fortran Monte Carlo module • Data streaming from a sensor or Satellite • Specialized (JDBC) database access • Such services accept and produce data from users files and databases • The Grid is built by coordinating such services assuming we have solved problem of programming the service
Service1 Service3 Service2 Service4 Two-level Programming II • The Grid is discussing the composition of distributed serviceswith the runtime interfaces to Grid as opposed to UNIX pipes/data streams • Familiar from use of UNIX Shell, PERL or Python scripts to produce real applications from core programs • Such interpretative environments are the single processor analog of Grid Programming • Some projects like GrADS from Rice University are looking at integration between service and composition levels but dominant effort looks at each level separately
Grid Workflow Data Assimilation in Earth Science • Grid services triggered by abnormal events and controlled by workflow process real time data from radar and high resolution simulations for tornado forecasts Typical graphical interface to service composition