90 likes | 218 Views
Sorting by the Numbers. Sorting Part Four. Question. Suppose you are given the task of writing an application to sort a big data file. What do you need to know to pick a good solution? File Size = 1 GB Record Size = 250 Bytes Available Memory = ¼ GB. How many Runs? How big is each Run?.
E N D
Sorting by the Numbers Sorting Part Four
Question • Suppose you are given the task of writing an application to sort a big data file. What do you need to know to pick a good solution? • File Size = 1 GB • Record Size = 250 Bytes • Available Memory = ¼ GB
How many Runs?How big is each Run? • Total Records to Process • 1 billion bytes in the file • 250 bytes for each record • = 4 million records in the file • Run Size • 1GB file • ¼ GB memory • = 4 Runs of 1 million records each
Time to Create the Runs • Sorting One Run • Using either Quicksort or Ordered Binary Tree • N log2 N • 1million * 20 • approximately 20 million comparisons of internal memory locations • Sorting Four Runs • 80 million internal memory comparisons
Refresher on Merging Files File One 1 3 5 7 9 File Two 2 4 6 8 10 File One 1 2 3 4 5 File Two 6 7 8 9 10 So, to merge 2 files of N random records each, requires 2N compares And, to merge 2 files where the runs were built from a sorted file requires N compares
Merging the Four Files R1 R2 R1 R2 R3 R4 2 million compares 2 million compares 2 million compares T1 R3 T1 T2 3 million compares 4 million compares T2 R4 4 million compares
Total Processing Time • Time to Create the 4 Runs • 80 million comparisons • Time to Merge the 4 Runs • 8 million comparisons • Assuming a File Read takes just 100 times longer than a Memory Read • Total Time = 880 million time units • note, we have omitted the time to read the runs into memory and to write the runs to temp files
Second Example • 2 Runs of 2 Million Records each • Internal Sorting • N log2 N = 2million * 24 = 48 million compares • 96 million to create both runs • File Merging • 4 million compares • Total Time • 496 million time units
Next in this course So how much time does it take to access the disk?