Advance Database Systems and Applications COMP 6521

Advance Database Systems and Applications COMP 6521 Professor:Dr. Gosta GrahneLabInstructor:ashkan azarnikGroup 15 Aditya Dewal Mohammad Iftekharul Hoque Saleh Ahmed

PROJECT 1 • Develop a program which sort numbers in ascending order using 2 Phase Multiway Merge Sort(2PMMS) with limitation of 5MB of virtual memory. • External sorting is required when the data being sorted do not fit into the main memory of a computing device and instead they must reside in slower external memory (usually hard drive).

Our approached to solve the problem • External sorting typically uses a sort-merge technique. • In the sorting phase, chunks of data small enough to fit in main memory are read, sorted in ascending order using quick sort algorithm and written out to a temporary file. • In the merge phase, the sorted temporary files are combined using 2 phase multiway merge sort into a single larger file.

Challenges Which algorithm to choose ? • Quicksort is one of the fastest and simplest sorting algorithm because its inner loop can be efficiently implemented on most architectures. • Efficient average case compared to other sort algorithms. • The complexity of quick sort in the average case is O(n log(n)

List of Data Structures • Primitive Types: Boolean, Integer, Long • Abstract Types: Array, String • Arrays (Linear Data Structure) Integer Array, Boolean Array, Long Array • I/O: Scanner, PrintWriter

Buffer Size Experiments

Conclusion • After our buffer size experiments we concluded that for 160000 number of data which occupying 2.5mb of memory gives best execution time for us.

Results from Demo • The execution time to run our program during the demo was 3 minutes. • The reason for taking too much time was the way we were taking our input and writing output in our program.

Project2 Mining Frequent Itemsets from Secondary Memory Build an application that will compute the frequent itemsets of all sizes (Pairs, Triples, Quadruples, etc.) from a set of transactions based on input support threshold percentage.

Algorithms Considered Apriori Horizontal Data Layout Eclat Vertical Data Layout

Algorithms Considered Apriori Breadth-First Traversal Eclat Depth-First Traversal

ECLAT Better Execution Time Execution time is better than Apriori Memory Efficient Require less amount of memory compare to Apriori if itemsets are small in number Depth-First Search Explore the unexplored

ECLAT Algorithm For each item, store a list of transaction ids (tids) TID-list

ECLAT Algorithm Determine support of any k-itemset by intersecting tid-lists of two of its (k-1) subsets. 3 traversal approaches: top-down bottom-up hybrid  

ECLAT Algorithm

ECLAT Implementation List of Data Structures Primitive Types Boolean, Integer, Double Abstract Types Map, Set, List, Array, String Arrays (Linear Data Struc.) Hash Map (Hash Table) Hash Set (Hash Map) Array List (Dynamic Array) Bit Set (Bit Array) String Array Trees Search Tree

ECLAT Implementation Our implementation denotes the set of transactions as a bit set. Intersects rows to determine the support of item sets. The search follows a depth first traversal of a prefix tree as it is shown in Figure 1.

ECLAT Implementation Divide and Conquer Phase Divide the file in N partitions. If an item is frequent in one partition we don’t check it again. Merge Phase Suppose an item is not frequent in any partition but it is frequent globally, it is going to come when we would merge. In the merge part we would run the algorithm again with the infrequent items.

ECLAT Implementation File size = 10000, Threshold = 2% An item is frequent if it occurs >= 200 times We would get intermediate results by checking all the partitions. Merge part we would work with the infrequent items for each partition, and then merge the results to get the final output list of frequent items

Eclat Execution Time Execution time of Eclat for Small and Medium datasets:

Eclat VS Apriori We have compared the execution time for Apriori and Eclat for Small and Medium datasets and found the following:

Benefits of Divide and Conquer • Program executes for Large files. • Gives better performance.

Results from Demo Execution time was 35 seconds.

REFERENCES Project 1 Database Systems, the complete book by Hector Gracia-Molina, Jeff Ullman, and Jennifer widom http://en.wikipedia.org/wiki/Quicksort Project 2 http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=846291&userType=inst http://www.ece.northwestern.edu/~yingliu/papers/para_arm_cluster.pdf http://ceur-ws.org/Vol-90/borgelt.pdf http://www.isca.in/COM_IT_SCI/Archive/v1i1/2.ISCA-RJCITS-2013-001.pdf http://www.intsci.ac.cn/shizz/fimi.pdf

Advance Database Systems and Applications COMP 6521

Advance Database Systems and Applications COMP 6521

Presentation Transcript

Comp 231 Database Management Systems

Database Applications

Advance Database Systems and Applications COMP 6521

Advance Database Systems

COMP – 6521 Advance database SYSTEMS and applications

Review Applications of Database Systems

COMP 530 Database Architecture and Implementation

Review for Database Comp

Video Database Systems Applications

Comp 231 Database Management Systems

Database Applications

CPSC 310 Database Systems CPSC 603 Database Systems and Applications

Advance Database System

Comp 231 Database Management Systems

Advance Database Systems

COMP 207 Database development and design

Comp 3311 Database Management Systems

Review Applications of Database Systems

Comp 3311 Database Management Systems

CPSC 310 Database Systems CPSC 603 Database Systems and Applications

Review Applications of Database Systems