1 / 16

Scalable Architecture for Tax File Processing System

Scalable Architecture for Tax File Processing System. Ambily K K. Contents. Introduction. A Professional Services Organization is involved in processing the tax files of various organization – the volume includes: 30000 customers 36000 Business Rules

chelsey
Download Presentation

Scalable Architecture for Tax File Processing System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scalable Architecture for Tax File Processing System Ambily K K

  2. Contents

  3. Introduction • A Professional Services Organization is involved in processing the tax files of various organization – the volume includes: • 30000 customers • 36000 Business Rules • 10500 Standard Business and Tax Codes • 1000 Files to be processed / day • 250 MB of Average file size • Tax File processing system receives Tax files from different entities which has their own Tax code system defined for BSI, Vertex, Coins, custom codes, etc., this codes need to be mapped to standard code for completing the file processing. • Tax File Processing System: • Number of files are very high • File size varies from 25KB to 5GB • File formats vary from client to client • Dynamic File content format • Standardization process involves number of rules • Scalable to new client needs

  4. Proof of Concepts Goal Optimize and experiment on various possibilities of improving the performance of File Processing • Improve the current performance levels by introducing the necessary extensions and levers in the current design • Explore the possible alternatives of redesign the implementation • Establish the facts and evidences to conclude and finalize on the approach which is scalable and extensible • 1000 files per day to be processed

  5. Proof of Concepts (Cont.) POC Scope • Compare SSIS via .NET solution to read and process a file (Time consumed, Memory and CPU) • Both Solutions to be optimized so that we can compare Optimized code via Optimized code • Compare using large size file (5GB>test file>1GB) • Compare with multiple files run in parallel • Explore different approach to read file– Chunking – to reduce time/memory/CPU

  6. High Level Architecture Building Blocks Receiving Variability Business Rule Engine Variability is a baseline. Our system is designed to manage variability once, and keep the rest of our technology consistent & stable. This supports XML, H2H, & ERP files. Smart File Scheduler Smart File Chunk File Chunk Validation Normalized File Filter Cache Set Stage File XML H2H

  7. SSIS – Design Thoughts • Decouple the file chunk process from existing design • Parallelize the file chunk operations to chunk files into smaller manageable sets • File Chunk Criteria • Chunk only larger files 500 MB or more. • Chunk into logical sets – leverage XSD templates • Chunking logic for ASCII files can be based on header and detail break-up horizontally • Best Practices for File Chunk • Visit head and tail nodes for chunk • Use LINQ query • Avoid nested Loops Smart File Chunk Chunk Task Chunk Task Chunk Task Chunk Task Logical set Logical set

  8. Database – Design Thoughts This is one of the critical component to built in proactive intelligence into the system about Lifecycle of the File getting processed. Database Design • Core Subsystem entities to be identified and generalized Ex : Data Map, File DeComp, Rule Management Entities • Non-core subsystem entities to be identified and also the key attributes which would help in building intelligence and analytics. Ex : File Priority, Reprocess Flag, Smart Scheduling Entities for dynamic processing Optimization Code Practices • Eliminate redundant visits to entities • Perform block operations • Replace Cursors with Common Table Expressions • Avoid while loops • Increase reusability through usage of Functions

  9. SSIS Via .NET -Benchmarks Both the solutions are optimized to the best possible extent considering all possible limitations and constraints

  10. SSIS Via .NET – Multiple files : Parallel Parallel run on SSIS (de-coupled mode) is showing huge improvement over business layer result when multiple files processed

  11. SSIS – Progressive Improvement Baseline File Size Tracked : 500 MB

  12. Business Layer – Progressive Improvement Baseline File Size Used: 500 MB

  13. Architecture and Design Decisions SSIS Design Decisions • To manage large files – File Chunking approach was chosen • To improve the file processing performance – file chunking was decoupled • To gain better performance - Parallelism implemented in SSIS Packages • Designed a new database for parallel loading of data into cache tables for large sets. • Best practices of coding is followed to gain high performance in SQL Server Business Layer Decisions • End to end processing of data in memory • Removed the temporary tables • Processing the data and loading data into Cache tables • Implemented parallelism while loading data into Cache tables

  14. Key Observations and Advice Key Observations – Business Layer • Memory footprint consumption was high in business layer implementation • Memory footprint consumption was high because of ReadXMLbehavior • Scalability of business layer is not encouraging for large files as memory footprint grows quite large Key Observations – SSIS • Key architecture and design choices made in SSIS is showing credible performance • The current ability of SSIS showing good scalability meeting up the Client file processing objectives Throughput Calculations ( SSIS) • BASELINE: 1000 files / day with average file size of 250mb – requires ability to process 250GB / day • Sequential Throughput : 500 MB in 115 Seconds equates to 375 GB / day to be processed. It crosses the demand of 1000 Files / day by 50% productivity • Multiple File in parallel Throughput: 2.5 GB in 325 seconds equates to 664 GB / day to be processed. It crosses demand of 1000 Files / day by 165% productivity

  15. Final Impression and Recommendation • Able to achieve 250% Performance improvement from baseline results. It allows business to process files of varied sizes from 250 MB to 5 GB • Improved the file processing rate from 250 files / day to 700 files / day. • Introducing more Architecture and Design extensions to the system from Application and Infrastructure angle helps to Cross the 1000 files / day limit with full length of Tax File Operations. Based on the current POC Results and implementation choices by looking into the various Design and Resource Constraints It is recommended that SSIS Design method would be a reliable and scalable implementations for the Tax File Processing system of Tax File Processing System.

More Related