1 / 66

Survey of Emerging IT Trends and Technologies

Survey of Emerging IT Trends and Technologies. Chaitan Baru Monday, 10 th Aug. OUTLINE. Trends in data sharing And, Discovery/Search Trends in service-oriented architectures Trends in computing and data infrastructure The road ahead. Geoinformatics Use Cases.

Download Presentation

Survey of Emerging IT Trends and Technologies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Survey of Emerging IT Trends and Technologies Chaitan Baru Monday, 10th Aug

  2. OUTLINE • Trends in data sharing • And, Discovery/Search • Trends in service-oriented architectures • Trends in computing and data infrastructure • The road ahead

  3. Geoinformatics Use Cases • “…a use has access from a terminal to vast stores of data of almost any kind, with the easy ability to visualize, analyze and model those data.” • “For a given region (i.e. lat/long extent, plus depth), return a 3D structural model with accompanying geophysical parameters and geologic information, at a specified resolution”

  4. Implied IT Requirements • Search and discovery of resources • Integration of heterogeneous 3D / 4D Earth Science data • Integration of data with tools • Analysis and Visualization • Ability to feed data to tools, and analyze & visualize model outputs • (data-centric view…)

  5. Search and Discovery • Searching “structured data”, i.e. metadata catalogs Search Structured metadata catalogs

  6. Search and Discovery • Searching “unstructured data”, i.e. the Web Search The Web • Structured databases are a major component of the “Deep Web”

  7. Combined Search and Discovery Search Structured metadata catalogs The Web

  8. Advanced Search • Proposed: • Geoscience Knowledge System, GeoKnowSys • Built using Yahoo Build Your Own Search (BOSS) service • E.g. See wolframalpha.com

  9. Advanced Search: PaleoLit • Research project at Dept of CS, CMU • Dr. Judith Gelernter and Prof. Jamie Carbonell • Use ontologies to match search requests to related publications • Demo…

  10. Informatics Informatics Issues: The Informatics Progression Courtesy: Prof. Peter Fox, RPI, CSIG’08

  11. The Computer Science / Domain Science continuum Computer IT Geoinformatics  Domain  Domain Science Standards Standards Standards Science Topics Topics e.g. Database e.g. ODBC, e.g. Ontologies, e.g. domain e.g. geology Systems, XML GeoSciML vocabularies Semistructure data definitions (Geologic Time, rock description,…)

  12. The data interoperability onion Social Networks Semantics Syntax Systems Social Networks Semantics Syntax Systems • System Interop • Approaches: e.g., ODBC, JDBC, Java, Web services, … • Purview of: Computer Science • Syntactic • Approaches: Schema standards • Purview of: Standards organizations, domain science • repositories, data archives • Semantic • Approaches: Controlled vocabularies, thesaurii, domain ontologies • Purview of: Domain scientists • Social Networks • Approaches: recommendation systems • Purview of: social networking software (CS and domain science, data driven)

  13. Software interoperability onion Social Networks Semantics Syntax Systems • System Interop • Approaches: e.g., REST, Web services • Syntactic • Approaches: e.g., SOAP, WSDL • Semantic • Approaches: Controlled vocabularies, thesaurii, domain ontologies • Purview of: Domain scientists • Social Networks • Approaches: recommendation systems • Purview of: social networking software • Service orchestration via worflow systems

  14. Geologic Map Integration

  15. Data Mediation • Dealing with heterogeneities in (distributed) data sources • Data may be in different “administrative domains” •  Manage authentication • Data schemas may be different among sources • Terminologies may be different among sources • Terminologies may be different among sources and user • Software infrastructure (“stack”) may be different • Solve the problem with “middleware” • Layers of software between the original application and the end user • Mediator • Middleware that bridges across heterogeneities without requiring sources to change

  16. MT MT WY ID NV • Operating system • File storage • Database schemas • Data Semantics Heterogeneities UT AZ CO NM A Data Integration Example: Geologic Maps DB2 SRB GML Shapefile (ESRI) PostGIS Oracle Windows Linux iMac

  17. Advantages MT MT WY WMS WMS ID NV WMS WMS UT WMS AZ CO NM Adopting WMS/WFS: Can provide Syntactic Integration • Integrated presentation • Uniform syntactical structure • Uniform spatial definition • Each resource may use a • different schema • Difficult to build a a uniform • query interface for • multiple resources. Problems FORMATION UNIT_NAME ROCK_TYPE ERA SYSTEM SERIES LITH ROCK_TYPE PERIOD

  18. Integrated schema • Partial integrated semantics Advantages GeoSciML: Can Provide Schema Integration MT MT WY ID GeoSciML GeoSciML NV GeoSciML UT • Each resource may use • different vocabulary and • semantic model. Problem GeoSciML GeoSciML AZ CO NM British Rock Classification Multi-hierarchical Rock Classification

  19. British Rock Classification Multi-hierarchical Rock Classification NM Semantic Mapping CO GeoSciML Application Ontology Semantic Mediation with GeoSciML • Mappings may also be • needed between the • data and the • application ontology • E.g., say, mapping • 240 mya to Mesozoic

  20. Query Rewriting:Example: A Rock Classification Ontology Genesis Fabric Composition Texture

  21. Query: Concept Expansion • Concept expansion: • what else to look for when • user asks for ‘Mafic’ Composition

  22. Query: Concept Generalization • Generalization: • finding data that are ‘like’ X and Y Composition

  23. domain knowledge Show formations where AGE = ‘Paleozic’ (without age ontology) Knowledge representation Geologic Age ONTOLOGY Show formations where AGE = ‘Paleozic’ (with age ontology) +/- a few hundred million years Nevada Ontology-based Geologic Map Integration: Implemented in GEON

  24. <odal:NamedIndividuals odal:id="RockSample" odal:database="VTDatabase"> <odal:Class odal:resource="http://geon.vt.edu#RockSample" /> <odal:Table>Samples</odal:Table> <odal:Table>RockTexture</odal:Table> <odal:Table>RockGeoChemistry</odal:Table> <odal:Table>ModalData</odal:Table> <odal:Table>MineralChemistry</odal:Table> <odal:Table>Images</odal:Table> <odal:Column>ssID</odal:Column> </odal:NamedIndividuals> GUI generate to ODAL processor ODAL, SOQL, and Data Integration Carts™ • ODAL: Ontological Database Annotation Language • Create a partial model of ontologies from database The values in the column ssID of the tables Samples, RockTexture, RockGeoChemistry, ModalData,MineralChemistry and Images represent instances of RockSample

  25. location RockSample Location hasSiO2 lat long value float ValueWithUnit unit SELECT X.location.*; FROM RockSample X WHERE X.location.lat > 60 AND X.location.long > 100 AND X.hasSiO2.value < 30 AND X.hasSiO2.unit =‘weightPercetage’ string GUI generate to SOQL processor SOQL: Simple Ontology Query Language • Query single or many resources • via ontologies (i.e., high level logical views) • independent of physical representation (i.e. schemas)

  26. Issues in sharing data: Primary vs secondary (derived) Collect Data Process and Visualize Share Results Share data Share intermediate results

  27. Sources of Data • Distributed data collections • By individual PIs • “Informal” sharing, e.g. via social network • “Formal” sharing, e.g. via submission to community data archives / databases • Centralized data collections • E.g. via a large project (standardized protocols) • By agencies (internal protocols) • Metadata to the rescue • Data description standards • Process description standards (workflows) • State Surveys and USGS are major sources

  28. Major Interoperability Efforts • OneGeology.org • International initiative of geological surveys to create dynamic geological map data available via the web. • US Geoscience Information Network (US GIN) • Led by Lee Allison, AZGS

  29. Federating Metadata Catalogs • Local vs Community “View” • Individual data providers may choose to “export” a community view • Direct access to the source may still provide more “rich” access to data • Federated Catalogs • The Geosciences Information Network, GIN approach • Adopt standards for catalog content (ISO) and implementation (CSW)

  30. REQUEST REQUEST CSW CSW CSW Composite Service Interoperation between GEON and GEO GRID GEON GEO Grid ADN • Implement CSW interfaces • Collaboration with the NSF PRAGMA project (Pacific Rim Assembly for Grid Middleware Applications) Geogrid Catalog GEON Catalog 600 scenes/day Catalog Service Web Adapter Catalog Service Web RESPONSE Storage RESPONSE SRB RESPONSE WMS URL WMS Server WMS URL WMS Server

  31. Derived 3D volumetric model • Multiple isosurfaces with different transparencies • Slices through the volume • Variable gridding: data typically has lower resolution at greater depths • 2D surface data: Topography (“2.5D”) Satellite imagery, street maps, geologic maps, fault lines, and other derived features etc. • Bore hole or well data and point observations. Integration & Visualization of 3D/4D data • “For a given region (i.e. lat/long extent, plus depth), return a 3D structural model with accompanying physical parameters of density, seismic velocities, geochemistry, and geologic ages, using a cell size of 10km”

  32. OpenEarth Framework Goals Geoscience Integration: • Data types - topography, imagery, bore hole samples, velocity models from seismic tomography, gravity measurements, simulation results… • Data coordinate spaces and dimensionality - 2D and 3D spatial representations and 4D that covers the range of geologic processes (EQ cycle to deep time).

  33. OpenEarth Framework Goals Structural Integration: • Data formats – shapefiles, NetCDF, GeoTIFF, and other formal and defacto standards. • Data models - 2D and 3D geometry to semantically richer models of features and relationships between those features. • Data delivery methods & Storage Schemes- local files to database queries, web services (WMS, WFS) and services for new data types (large tomographic volumes, etc.).

  34. OEF Philosophy • OEF focused on integrating data spanning the geosciences. • Open software architecture and corresponding software that can properly access, manipulate and visualize the integrated data. • Open source to provide the necessary flexibility for academic research and to provide a flexible test bed for new data models and visualization ideas.

  35. OEF Architecture

  36. OEF Architecture • Data Integration Services: • Designed to support rapid visualization of integrated datasets • operations to grid data, resample it at multiple resolutions and subdivide data to better support progressive changes to the display as the user pans and zooms

  37. OEF Architecture • Visualization Tools: • Run on the user's computer, dynamically query spatial and temporal data from the OEF services • Uses 3D graphics hardware for fast display • Open architecture supports multiple visualization tools authored throughout the community (e.g GEON IDV) • New viz capabilities developed as necessary

  38. OEF Visualization

  39. The software services stackExample: GEON Pushing down the service interface Compute nodes Disk Storage

  40. Software as a Service:At different levels of software • Software as a Service: SaaS • E.g., Google Apps, Salesforce.com, SAP, … • Infrastructure as a Service, IaaS • E.g., Amazon EC2, … • Platform as a Service, PaaS SaaS PaaS Compute nodes Disk Storage IaaS

  41. The evolving computational architecture • Mainframe computers (institutional computing) • Minicomputers (departmental computing) • Workstations (laboratory computing) • Laptops (personal computing) • …back to the future..??

  42. Cloud Computing: A meeting of trends Price/performance of computing platforms Models for system management (autonomic computing) Capabilities of networking and distributed systems Cost of Ownership Data Volumes

  43. Cloud Computing Origins • Cloud computing: Many definitions • Here’s one: Use of remote data centers to manage scalable, reliable, on-demand access to applications • Origins • Goes back to the need by Web search engines to inexpensively process all the pages on the Web • Done by creating a grid of datacenters and processing data in parallel across them • Development of a parallel data programming environment by Google: MapReduce • Data + cloud computing • what about remote centers for scalable, reliable, on-demand access to data?

  44. Cloud Computing • A different pricing model • No upfront cost of acquisition. Rent don’t buy. • Can access 1000’s of processors / disks • Scalability • “Elastic computing” • A different model for dealing with system failures • Retry, loose consistency, …

  45. Cloud computing for data • Data as a service: what is the abstraction for storage? • Table, Blob, Queue • …?? • Describing characteristics of the data • Metadata about storage to specify policies to be applied • Security, reliability, performance, etc • Scaling to meet application needs • Large configurations • Dealing with virtualization • New failure models • Retry, loose consistency

  46. Storage as a Service • Amazon S3: An example • Charges for Storage, Data Transfer, and Requests (e.g. PUT, COPY, POST, LIST, GET) • Issues • Bandwidth to storage • Quality of Service • Storage Elasticity • Privacy / security • Standardization efforts • Storage Networking Industry Assocation (SNIA) Technical Working Group (TWG) on Cloud Storage has just started • Important Issues • Metadata for storage • Scaling up to large dataset sizes

  47. The two sides of Cloud Computing • Large distributed infrastructure • “Everything is in the cloud” • Interesting as a proposition for the IT operations of an enterprise • Cloud companies would like to reach deep into enterprise IT • “Our business is not the entrenched data centers in current large organizations, but the new companies…” • Large-scale infrastructure in the Datacenter • Seeding the cloud • Shared-nothing parallelism • Data on the cheap…a la Google

  48. The NSF Cluster Exploratory (CluE) Program • Google-IBM-NSF Cluster • Well over a thousand processors • When fully built out, will comprise approximately 1,600 processors • Terabytes of memory • Hundreds of terabytes of storage • Open source software • Linux and Apache Hadoop • IBM Tivoli • System management, monitoring and dynamic resource provisioning • A platform for “apples-to-apples” comparisons • Can reserve time on nodes for exclusive access

  49. Our CluE Project • Project (PI: Baru; co-PI: Krishnan) • Performance Evaluation of On-Demand Provisioning Strategies for Data Intensive Applications • Investigate hybrid software model • Database system / Hadoop system • Some parts of the application require features provided by a DBMS • Transactional capability, full SQL support • Other parts of the application can exploit Hadoop model • Very large data sets • Data parallel processing • Loose consistency models • Price / performance is an issue • Including energy costs

  50. San Andreas Fault LiDAR Dataset:Data Access Patterns • B4 Dataset

More Related