290 likes | 473 Views
A Multithreading C# Data Synchronization Program and Its Realization. Course: ECE 1747H Parallel Programming Professor: Christiana Amza Student / Presenter: Bin Li Dec. 12, 2006 @ University of Toronto. Agenda. Background Problem & Solution Parallel Implementation
E N D
A Multithreading C# Data Synchronization Program and Its Realization Course: ECE 1747H Parallel Programming Professor: Christiana Amza Student / Presenter: Bin Li Dec. 12, 2006 @ University of Toronto
Agenda • Background • Problem & Solution • Parallel Implementation • Performance Measuring • Other Approaches • Future Work • Q & A
Background (Company & Project) • “Retail Value Canada Inc.” • Markham-based Specialty Retailer • 384 stores in Canada & USA, 30K types of items • Head Office-side information maintained in Windows • Store-side information maintained in Unix • Data synchronization is needed • Data type: product code, status, cost, price, promo, deal, subsidy, vendor, warehouse, etc. (by item, by store) • Current application: iSync (developed in 2000 in Visual C# 1.0)
Agenda • Background • Problem & Solution • Parallel Implementation • Performance Measuring • Other Approaches • Future Work • Q & A
Problem & Solution • Scheduled Data Synchronization (iSync process) starts at 10 pm, and ends at 12 am • iSync extracts and transforms data from Windows into ASCII file (.dat), and sends it to Unix • Mass data modification takes iSync quite a long time (4-5 hours) to run, which is over 2-hour schedule limit • The latest change (i.e. prices) in head office cannot reach stores before the opening hour of the next business day
Problem & Solution (cont’d) • The store-side information delay causes inaccurate sales information in retail stores • Bottleneck: iSync (only 10% CPU usage on a 4-CPU database server)
Problem & Solution (cont’d) • Sequential program • iSync generates .dat file by each store, which is slow • Parallel solution • Implementing qSync to replace iSync (using Microsoft C# multithreading) • Parallelly generating .dat file by store groups
Agenda • Background • Problem & Solution • Parallel Implementation • Performance Measuring • Other Approaches • Future Work • Q & A
Parallel Implementation • Development Environment • Design/UML Tool: Microsoft Visio 2003 • Development Tool: Microsoft Visual Studio .NET 2003 • Programming Language: Visual C# 2.0 (multithreading similar to Linux PThreads) • Parallelization Steps • Store Data Segmentation • Parallel Data Processing • Result Data Consolidation
Sample Code using System.Threading; private int getNbrOfInstance() { //... string sqlStmt = "select cast(RBSValue as int) from RulesBasedSystem " + "where RBSTxt = 'HISSPNbrOfInst' and RBSScopeKey = 'Retail Value'"; //... } HISSPCLPSyncComponent clpComponent = null; clpComponent = new HISSPCLPSyncComponent(); ThreadStart threadDelegate=null; Thread threadObj=null;
Sample Code (cont’d) for (int i=0; i<dtb.Rows.Count; i++) { //... threadDelegate = new ThreadStart(clpComponent.ExtCLPPrice); threadObj = new Thread(threadDelegate); threadObj.Name = Convert.ToString(i); threadList.Add(threadObj); //Start the thread threadObj.Start(); } // Join the threads for (int i = 0; i<dtb.Rows.Count; i++) { threadObj = (Thread) threadList[i]; threadObj.Join(); } while(j>0) //Approach #3 { lock(this) { consolidateCLPPrice(baseStoreId[j], itemBaseId[j], marketZoneId[j], itemPackId[j]); } j--; }
Agenda • Background • Problem & Solution • Parallel Implementation • Performance Measuring • Other Approaches • Future Work • Q & A
Performance Measuring • Testing Environment • Database Server • Intel Xeon CPU 2.40 GHz, 4 CPUs, 3GB RAM • Windows 2000 w/SP4, MS SQL Server 2000 • Subset of real production data • Web/Application Server • Intel Pentium4, 2 CPU 3.40Ghz (HT), 2GB RAM • Windows XP w/SP2, IIS 5.1 • Performance Counters • CPU % Usage • Execution Time
Performance Comparison (cont’d) • CPU % Usage • Execution Time
Agenda • Background • Problem & Solution • Parallel Implementation • Performance Measuring • Other Approaches • Future Work • Q & A
Other Approaches (Approach #2) • “Locking Temp Files”
Other Approaches (Approach #2 cont’d) • “Locking Temp Files” • All threads write to single .dat file • Using lock for file appending • Result: bad as sequential • Explanation: same disk file cannot be shared simultaneously by different threads, needs to close/re-open (different from shared memory)
Other Approaches (Approach #3) • “Locking Temp Tables”
Other Approaches (Approach #3 cont’d) • “Locking Temp Tables” • All threads share single temporary database table • Using lock for table record inserting • Result: much better than sequential, not as good as the Main Approach • Explanation: database server has enough memory; lock brings slight delay
Agenda • Background • Problem & Solution • Parallel Implementation • Performance Measuring • Other Approaches • Future Work • Q & A
Further Work • Database Parallelism • Upgrading SQL Server 2000 to 2005 • Migrating C# code of data synchronization to database stored procedures, optimizing SQL queries • Changing temporary table(s) to permanent schema • Using SQL Server Integration Services (SSIS 2005) to do parallel data load & transformation • Accessing permanent table (which contains final data to be synchronized) to generate .dat file
Q & A Thanks!
Additional Slide for Q&A (C#) • C# (pronounced “C Sharp”) • Microsoft .NET Framework-compliant language • Simple, modern, object oriented programming language derived from C and C++ • Aims to combine the high productivity of Visual Basic and the raw power of C++. • C# vs Java • Similar but not same in language specifications • Compilation: C# to Microsoft Intermediate Language (MSIL), and Java to Java bytecode • Running: C# in Common Language Runtime (CLR), Java in Java Virtual Machine (JVM)