Advance Database Systems and Applications COMP 6521

Professor:Dr. Gosta Grahne Lab Instructor:Ashkan azarnikGroup 5 Deyvid William Romeo Honvo Venkatesh S R Advance Database Systems and Applications COMP 6521

Contents • Project 1 External Sorting Algorithm, 2PMMS Implementation • Project 2 Mining Frequent Itemsets from Secondary Memory Part 1: Problem Analysis & Algorithm consideration Part 2: Algorithm Description & Design principles

Project 1External Sorting Algorithm2PMMS Implementation

Problem Statement

PROJECT 1 • Develop a program which sort numbers in ascending order using 2 Phase Multiway Merge Sort(2PMMS) with limitation of 5MB of virtual memory. • External sorting is required when the data being sorted do not fit into the main memory of a computing device and instead they must reside in slower external memory (usually hard drive).

Two-Phase Multiway Merge-Sort (2PMMS) Solution Unsorted File Sorted File Sorted Runs Phase 2 Phase1

Approach to the problem • In the 1ST Phase, chunks of data that fit in main • memory are read, sorted using the built-in • function from Arrays class (Java) and written out • to temporary files. • In the 2nd Phase (Merging), the sorted temporary • files are combined using 2 phase multiway • merge sort into a single larger file.

Challenges Faced Which algorithm to choose ? • After a few tests, we decided to use the built-in sort function from Java that implements a tuned quicksort algorithm. • This algorithm offers n*log(n) performance on many data sets that cause other quicksort's to degrade to quadratic performance. • Efficient average case compared to other sort algorithms. • A buffer of size 750,000 was used for the 1st phase • newBufferedReader from Java 7 used to read files

List of Data Structures • Primitive Types: Boolean, Integer, Long • Abstract Types: Array, String • Arrays (Linear Data Structure) Integer Array, Boolean Array, Long Array • I/O: newBufferedReader

Project2 Mining Frequent Itemsets from Secondary Memory Develop an application that will compute the frequent itemsets of all sizes (Pairs, Triples, Quadruples, etc.) from a set of transactions based on input support threshold percentage.

Algorithms Considered FP-Growth vs Eclat Eclat uses a purely vertical representation whereas FP-growth combines in its FP-tree structure both vertical and horizontal representations Fp-Growth takes lot of memory and difficult to implement compared to Eclat

ECLAT Better Execution Time Memory Efficient Basic algorithm Very good for dense datasets Require less amount of memory compared to FP-growth Map of Bitsets

ECLAT ImplementationList of Data Structures • Primitive Types • Boolean, Integer, Double • Abstract Types • Hash Map • String • Arrays • Array List (Dynamic) • Bit Set (Bit Array) • String Array

ECLAT Implementation 1.Scan original file, find frequent items 2.Generate n partitions (files) that contain groups of frequent items 3.Read every file, register items/transactions, find and write items in the output file

ECLAT Implementation Divide and conquer approach Algortihm based on the concepts of Diskmine and Projection described in Professor’s paper “Mining Frequent Itemsets from Secondary Memory” Large database is decomposed into a number of small databases to be processed Each database contains a percentage of frequent items and all greater items in the same transaction

ECLAT Implementation 1.Scan original file, find frequent items 2.Generate n partitions (files) that contain groups of frequent items 3.Read every file, register items/transactions, find and write items in the output file

ECLAT Implementation Improved 1.Scan original file, find frequent items 2.Generate n partitions that contain groups of frequent items based on the frequency 3.Read every file, register items/transactions, find and write items in the output file

Thanks! Merci!

Advance Database Systems and Applications COMP 6521

Advance Database Systems and Applications COMP 6521

Presentation Transcript

Comp 231 Database Management Systems

Database Applications

Advance Database Systems

Advance Database Systems and Applications COMP 6521

COMP – 6521 Advance database SYSTEMS and applications

Review Applications of Database Systems

COMP 530 Database Architecture and Implementation

Review for Database Comp

Video Database Systems Applications

Comp 231 Database Management Systems

Database Applications

CPSC 310 Database Systems CPSC 603 Database Systems and Applications

Advance Database System

Comp 231 Database Management Systems

Advance Database Systems

COMP 207 Database development and design

Comp 3311 Database Management Systems

Review Applications of Database Systems

Comp 3311 Database Management Systems

CPSC 310 Database Systems CPSC 603 Database Systems and Applications

Review Applications of Database Systems