1 / 19

Streams – DataStage Integration InfoSphere Streams Version 3.0

Mike Koranda Release Architect. Streams – DataStage Integration InfoSphere Streams Version 3.0. Agenda. What is InfoSphere Information Server and DataStage? Integration use cases Architecture of the integration solution Tooling. Information Integration Vision.

henryhanson
Download Presentation

Streams – DataStage Integration InfoSphere Streams Version 3.0

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mike Koranda Release Architect Streams – DataStage Integration InfoSphere Streams Version 3.0

  2. Agenda • What is InfoSphere Information Server and DataStage? • Integration use cases • Architecture of the integration solution • Tooling

  3. Information Integration Vision Transform Enterprise Business Processes & Applications with Trusted Information Deliver Trusted Information for Data Warehousing and Business Analytics Address information integration in context of broad and changing environmentSimplify & accelerate: Design once and leverage anywhere Build and Manage a Single View Secure Enterprise Data & Ensure Compliance Integrate & Govern Big Data Consolidate and Retire Applications Make Enterprise Applications more Efficient

  4. IBM Comprehensive Vision Traditional ApproachStructured, analytical, logical New ApproachCreative, holistic thought, intuition HadoopStreams DataWarehouse Data Warehouse HadoopStreams Transaction Data Web Logs Social Data Internal App Data StructuredRepeatableLinear UnstructuredExploratoryIterative InformationIntegration &Governance Mainframe Data Text & Images OLTP System Data Sensor Data TraditionalSources NewSources ERP data New Sources Traditional Sources RFID

  5. IBM InfoSphere DataStage Industry Leading Data Integration for the Enterprise Simple to design - Powerful to deploy Rich capabilities spanning six critical dimensions Developer Productivity Rich user interface features that simplify the design process and metadata management requirements Runtime Scalability & FlexibilityPerformant engine providing unlimited scalability through all objects tasks in both batch and real-time 1 4 Transformation ComponentsExtensive set of pre-built objects that act on data to satisfy both simple & complex data integration tasks Operational ManagementSimple management of the operational environment lending analytics for understanding and investigation. 2 5 Connectivity ObjectsNative access to common industry databases and applications exploiting key features of each Enterprise Class AdministrationIntuitive and robust features for installation, maintenance, and configuration 6 3

  6. Use Cases - Parallel real-time analytics

  7. Use Cases - Streams feeding DataStage

  8. Use Cases – Data Enrichment

  9. Runtime Integration High Level View Streams DataStage Job Job DSSource / DSSink Operator StreamsConnector TCP/IP Composite operators that wrap existing TCPSource/TCPSink operators

  10. Streams Application (SPL) use com.ibm.streams.etl.datastage.adapters::*;composite SendStrings { type RecordSchema = rstring a, ustring b; graph stream<RecordSchema> Data = Beacon() { param iterations : 100u; initDelay:1.0; output Data : a="This is single byte chars"r, b="This is unicode"u; } () as Sink = DSSink(Data) { param name : "SendStrings"; } config applicationScope : "MyDataStage"; } • When the job starts, the DSSink/DSStage stage registers its name with the SWS nameserver

  11. DataStage Job User adds a Streams Connector and configures properties and columns

  12. DataStage Streams Runtime Connector • Uses nameserver lookup to establish connection (“name” + “application scope”) via HTTPS/REST • Uses TCPSource/TCPSink binary format • Has initial handshaking to verify the metadata • Supports runtime column propagation • Connection retry (both initial & in process) • Supports all Streams types • Collection types (List, Set, Map) are represented as a single XML column • Nested tuples are flattened • Schema reconciliation options (unmatched columns, RCP, etc) • Wave to punctuation mapping on input and output • Null value mapping

  13. Tooling Scenarios • User creates both DataStage job and Streams application from scratch • Create DataStage job in IBM Infosphere DataStage and QualityStage Designer • Create Streams Application in Streams Studio • User wishes to add Streams analysis to existing DataStage jobs • From Streams Studio create Streams application from DataStage Metadata • User wishes to add DataStage processing to existing Streams application • From Streams Studio create Endpoint Definition File and import into DataStage

  14. Streams to DataStage Import On Streams side, user runs ‘generate-ds-endpoint-defs’ command to generate an ‘Endpoint Definition File’ (EDF) from one or more ADL files User transfers file to DataStage domain or client machine User runs new Streams importer in IMAM to import EDF to StreamsEndPoint model Job Designer selects end point metadata from stage. The connection name and columns are populated accordingly. IMAM Streams command line or Studio menu EDF EDF ADL Xmeta ADL FTP

  15. Stage Editor

  16. Stage Editor

  17. DataStage to Streams Import On Streams side, user runs ‘generate-ds-spl-code’ command to generate a template application that from a DataStage job definition The command uses a Java API that uses REST to query DataStage jobs in the repository The tool provides commands to identify jobs that use the Streams Connector, and to extract the connection name and column information The template job includes a DSSink or DSSource stage with tuples defined according to the DataStage link definition Streams command line or Studio menu Java API SPL Xmeta REST API HTTP

  18. DataStage to Streams Import

  19. Availability • Streams Connector available in InfoSphere Information Server 9.1 • The Streams components available in InfoSphere Streams Version 3.0 in the IBM InfoSphere DataStage Integration Toolkit

More Related