290 likes | 390 Views
Got Big Data?. Presentation Topics. IRI, The CoSort Company Fast Extract (FACT) for Oracle CoSort – Big Data Manipulation RowGen - Safe Test Data Conclusion. The CoSort Company. Innovative Routines International (IRI), Inc. 29 years of self-funded, sustained growth
E N D
Presentation Topics • IRI, The CoSort Company • Fast Extract (FACT) for Oracle • CoSort – Big Data Manipulation • RowGen - Safe Test Data • Conclusion
The CoSort Company Innovative Routines International (IRI), Inc. • 29 years of self-funded, sustained growth • Headquarters in Melbourne, Florida • 30 international offices • Core products: Fast Extract for Oracle, reorgs, and ETL Big data transformations, conversion, reporting, protection, DB load and ETL tool acceleration, legacy sort migrations Safe, realistic test data
Fast Extract for Oracle • √ Unloads large tables in parallel to flat files • √ Leverages SQL SELECT syntax • √ Writes CoSort metadata for transforms, reports, etc. • √ Creates SQL*Loader metadata
FACT vs. Oracle Unload 7X FASTER HPUX B.11.11, Oracle 9.2, 50byte VARCHAR HP9000 L200044, 4 PA 8500 CPUs @ 440MHz, 8GB RAM
FACT Business Benefits √ Faster unloads increase data availability for business intelligence √ Speeds data migrations for faster CRM, ERP, and SCM roll-outs √ Delays hardware and software upgrade expenditures √ Helps businesses meet SLAs and other commitments √ Faster reorgs and ETL frees people and computing resources for higher-value operations, improving enterprise agility
Detailed Schematic • Parallel file manipulation engine for simultaneous, high-volume data: • Transformation • Conversion • Protection • Reporting • Applications: • DW data integration and staging • DB and ETL tool acceleration • Batch and delta reporting • Data privacy and governance • File compares and mapping • Legacy sort and data migrations
Data Transformations • Select/Filter • Sort/Merge • Match/Join • Aggregate • Cross-Calculate • Re-Map/Reformat • Scrub/Cleanse • Substrings • Table Lookup • Type-Convert • PCR Expressions • User Functions CoSort v9 for Linux on Dell PowerEdge 2950, 2 CPUs1000 -line query vs. 46, ~15-line SortCL scripts. Transform CoSort v9 can run all these functions in the same job script and I/O pass
Sorting Speeds Oracle Loads • Problem: • Unsorted inserts into indexes: • Require more internal work (less efficient block splits) • Require more temporary space • Run at half the sorted sustained rate • Solution: • 1. Unload tables to flat files via FACT • 2. Sort File on longest index field via CoSort • 3. Load with SQL*Loader where DIRECT=TRUE • 4. Create indexes during load with SORTED INDEXES • After loading, use CREATE INDEX with NOSORT Do this all in one pass! Details in FACT/Oracle ETL Whitepaper
Conversion • File Formats: • ACUCOBOL Vision • CLF/ELF Web Logs • CSV • LDIF • L/R/V Sequential • MF I-SAM • MF Variables • Text • Variable Block • XML Convert SunFire 4800, 8 x 1200MHz CPUs, 16 GB RAM, 64-bit Solaris 9 CoSort also transforms and converts > 100 Data Types
Field Protection • Field-Level Functions: • AES-256/Decrypt • Filter/Redact • Anonymize/Mask • De/Re-Identify • Pseudonymize Protect • XML Audit (Compliance) Logs • Safe Test Data:
Field Protection Unique Benefits • Only CoSort delivers all these security advantages together: • √ Choice of protection methods, libraries or keys • √ Precise RBACs • √ Source and platform portability • √ Integration with Data Transformation & Reporting • √ Speed • √ XML Audit Logs
Batch Reporting • Custom Layouts: • Condition Logic • Embedded HTML • Field Padding • Field Remapping • Variables • User Exits • Clickstream Analytics • CDI/Segmentation • Change Data Capture • Hand-offs to BI Tools • iDashboard Option Report
Transform, Convert, Protect, Report - Together Client SS# Symbol Symbol Shares LastTrade Shares*LT Ln. CEE 12.55 1 CEO 16.44 2 Moses Dinan HKT4rcaFaJrFWuvjHepZtw== CEQ CEQ 2000 27.47 54940.00 3 Jonathan Lawhon 2zQtfMY2KoyLyFnXuKZeSw== CEW CEW 825 47.25 38981.25 4 Cathrine McDougal cZxTLE3gGW0V98pgYvTJ7Q== CEX CEX 9000 2.61 23490.00 5 Eugenio Killen VOmswdpk3OuJ08eTxaC1jQ== CFU 855 6 Estelle Culbert v6EvcxAdThRyminIj0VLDg== CIH CIH 1500 52.81 79215.00 7 CMG 4.84 8 Valentine Ormond a+WOaP8znyuC3mkgw9Q9RA== COV 3250 9 CQJ 50.86 10 CSU 42.40 11 Penny Worthley 9aATT49TjxlLP7P8ncCZXg== ECG 5000 12 Isaiah Nordin lRQ92+/HuEHXraIABcso1A== FMU 1000 13 Rosalee Torre Nh14RLmiVG2Sfa6k1JM6qA== HFU 950 14 IJZ 25.05 15 IJ] 24.71 16 Rey Gaffney CFEhSs5L6cv1IYz3L9416g== KEG 400 17 Virgil Kerner T5BtEtioca/UJmvp4aUlgg== KMI 2100 18 Tonya Dove a9nvad/P0DnQACLsFlWAvQ== MEO 50 19 Adrianna Brand 8rN6FT/s0ijmWldemST9mw== OGN 1000 20 Clarissa Dicus yLo9RIHDT3Wg2w2x/4XfLw== RVY 3333 21 Chuck Britton gn+nfQHsR1m2Y73PvkVPhA== VQW VQW 8500 24.05 204425.00 22 Martin Baynes 7S92fb+kyrMJeYgRtquCeA== UFU 9000 23 Lakesha Croy Lna+zcnwXRTyHmbXX4EaXw== UQH 3500 24 Kenton Medlin 52ouvtttaeDKV1fg5RPr0A== UYQ 90 25 U\C 25.00 26 Bobbie Watson 2Ng7KIGL1Nm69gzeSr8uww== WHO 950 27 WQW 103.0 28 Suzanna Koster 33GbsTFaldxviCEtcTli9g== WUI 775 29 Gretchen Delima RCHBP7u0yHsNEatXUtky+Q== YQK 4300 30 Petra Kivi u3iFMokehLXjFPgWe75YnQ== ZOW 25000 31 --------------- $401,051.25 Pseudonym Encrypted De-Identified Calc Sort Re-Map Sequence Customer Data Join NYSE Data Aggregate
Delta Reporting ChangeData Capture • Problems: • Non-scalability for large transaction volumes • Single-source acquisition and refresh • Complex 3GL and SQL code • Slow roll-ups and cube builds • Solutions: • 1. Capture database and legacy file changes off-line • 2. Compare with CoSort’s parallel sort/join engines • Identify inserts, deletes, and updates simultaneously • Roll up, transform, segment, and report together • Populate multiple cubes and protect sensitive data • Enable fast “what if” testing
Dashboard Reporting iDashboards Option • Translate raw data into visual BI • Unify and direct departments around business goals • Integrate CoSort and other outputs with DB sources • Import/export Excel data and preferences Applications • Balanced Scorecard • Supply Chain Management • Process and Quality Control • Sales & Marketing Intelligence • Facility Performance • Project Management • Market Research and Analysis • Enterprise Resource Planning • Financial Intelligence • Executive Reporting • IT Systems Monitoring • SLA Monitoring Applications
Other Applications Legacy Sort Migrations • ACUCOBOL–GT • Clerity/UniKix • Micro Focus COBOL • MVS JCL • SAG Natural • SAS PROC • SS Unix • Unix /bin/sort • VAX VMS • VSE JCL ~4GB, 4-key sort of variable-length records Hardware: IBM p5 570 with 2 CPUs running 64-bit AIX 5 .3 MF COBOL 4 (Workbench) sorted the input file in ~50 minutes CoSort v9 sorted the same file in 3 minutes, 12 seconds
Other Applications ETL Tool Accelerations • Exclusive sort plug ‘n’ plays for PowerCenter • and DataStage • External CoSort transform and load jobs can • use .xml and .dsx file layouts, thanks • to MIMB from • Direct calls from ETI, Kalido, OWB, SAS, • TeraStream, and other DW applications
FACT-CoSort vs. In-Database Transform 7X FASTER HP-UX B.11.23, Oracle 9i ia64hp server rx5670 4x1ghz Itanium2 CPUs, 32GB RAM
CoSort Business Benefits • √ Parallel transforms increase data availability for business intelligence • √ Task consolidation reduces runtimes and software expenses, and delays hardware upgrades • √ Built-in field security reduces risk of litigation, fines, brand damage • √ Intuitive syntax reduces development time and training costs • √ Seamless accelerations optimize packaged application ROI • √ Flexible, perpetual-use licensing models
Need Safe Test Data? RowGen: • √ Creates multiple targets for loads, development, and outsourcing • √Reads your data models and preserves referential integrity • √Supports virtually all data types, files sizes, value ranges, and conditions • √Leverages CoSort selection, transformation and pre-load sorting • √Includes CoSort formatting functions for custom file/report outputs • √Re-uses the metadata in your applications
Your Data Models, Your Metadata • Build Test DBs for • Oracle • Microsoft SQL Server • DB2 • Sybase • Teradata • Packaged Apps
RowGen Business Benefits • √ Cuts development and testing time/costs • √ Realistic volume and range testing = better quality control • √ Higher quality products increase customer satisfaction/loyalty and decrease support costs • √Complies with data privacy policies and eliminates production data risk
Conclusion • IRI, The CoSort Company • Fast Extract (FACT) for Oracle • CoSort – Big Data Manipulation • RowGen - Safe Test Data