1 / 28

1431227-3 File Organization and Processing “ Advanced Data Structres ” “ Algorithms ”

1431227-3 File Organization and Processing “ Advanced Data Structres ” “ Algorithms ”. Books and Materials. REQUIRED TEXTBOOK: Introduction to Algorithms by T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein, Second Edition, MIT Press, 2001. ISBN 0-262-03293-7 RECOMMENDED MATERIALS :

mccormickb
Download Presentation

1431227-3 File Organization and Processing “ Advanced Data Structres ” “ Algorithms ”

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 1431227-3File Organization and Processing“Advanced Data Structres”“Algorithms”

  2. Books and Materials • REQUIRED TEXTBOOK: Introduction to Algorithms by T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein, Second Edition, MIT Press, 2001. ISBN 0-262-03293-7 • RECOMMENDED MATERIALS: Data Structure and Algorithms in Java, Robert Lafore, Sams Publications- Online copy (I will provide) Data Structures and Algorithms in Java, 2nd Edition, Michael T. Goodrich and Roberto Tamassia, John Wiley and Sons Inc. 2007 ISBN: 81-265-1226-1 I will try to provide everything in the lecture slides

  3. Tentative Course Outline • Chapter 6 Trees • Chapter 7 Priority Queues • Chapter 8 Dictionaries • Chapter 9 Search Trees • Chapter 11 Text Processing • Chapter 12 Graphs

  4. Grading Scheme • Attendance 10% • Homework 20% • Midterm 20% • Quizzes 10% • Final exam 40% • Total 100%

  5. OVERVIEW OF ADS AND FILE STRUCTURE AND PROCESSING

  6. Topics Already Covered in ADS Course: • Asymptotic complexity • Big O, small o, big omega, small omega • Linear data structures • Arrays, linked list, stacks, ADT, queue ADT, a notion of dynamic arrays • Sorting • Insertion sort, merge sort (randomized), quick sort • Sorted sequence • Dictionary ADT and important operations, trees, binary search trees, AVL trees, in-order traversal • Hash tables • Hashing concepts, open hashing, closed hashing, probing, rehashing, implementing dictionary operations using hash tables • Priority queue • Binary heaps, implementation, dictionary operations in pririoty queue, heapsort

  7. Refreshing ADS Basic Concepts • Data Structure? • Algorithms? • File Structure?

  8. Basic Concepts- Data Structures • A data Structures is the organization of data in a computer’s memory or in a disk file. • Examples: Arrays, stacks, linked list • Algorithms are the procedure a software program uses to manipulate the data in these structure. • Example: Printing address labels • Array to store the address- Data Structure • For loop – sequential access to the array- Algorithm

  9. Characteristics of Data Structures • Array • Advantages: quick insertion, very fast access if index is known • Disadvantages: Slow search, slow deletion, fixed size • Ordered Array • Advantages: Quicker search than unsorted array • Disadvantages: Slow insertion and deletion, fixed size • Stack • Advantages: Provides LIFO access • Disadvantage: Slow access to other items

  10. Characteristics of Data Structures • Queue • Advantages: Provides FIFO access • Disadvantages: Slow access to other items • Linked List • Advantages: Quick insertion, quick deletion • Disadvantages: Slow search • Binary Tree • Advantages: Quick search, insertion, deletion (if tree remains balanced) • Disadvantages: Deletion algorithm is complex

  11. Overview of Algorithms • Basic operations: • Insert a new data item • Search for a specified item • Delete a specified item • Definitions: • Database • All the data that will be dealt within a particular situation • Stored on a disk- File • Records • Units into which database is divided • Provide format for storing information • Fields • Records are usually divided into several fields • Field holds a particular kind of data • In Java records are usually represented by objects of an appropriate class

  12. Data Structure vs. File Structure • Both involve: • Representation of Data • + • Operations for accessing data • Difference: • Data Structures deal with data in main memory • File Structures deal with data in secondary storage device (File).

  13. Computer Architecture CPU RAM ─ Fast ─ Small ─ Expensive ─ Volatile Registers Cache Disk, Tape, DVD-R Main Memory Secondary Storage ─ Slow ─ Large ─ Cheap ─ Stable

  14. Memory Hierarchy ►On systems with 32-bit addressing, only 232 bytes can be directly referenced in main memory. ►The number of data objects may exceed this number! ►Data must be maintained across program executions. This requires storage devices that retain information when the computer is restarted. – We call such storage nonvolatile. – Primary storage is usually volatile, whereas secondary and tertiary storage are nonvolatile.

  15. How Fast? • Typical times for getting info • Main memory: ~120 nanoseconds =120x10-09 • Magnetic Disks: ~30 milliseconds = 30x10-03 • An analogy keeping same time proportion as above • Looking at the index of a book: 20 seconds versus • Going to the library: 1 hour

  16. Comparison • Main Memory • Fast (since electronic) • Small (since expensive) • Volatile (information is lost when power failure occurs) • Secondary Storage • Slow (since electronic and mechanical) • Large (since cheap) • Stable, persistent (information is preserved longer)

  17. Goals of this course • Advanced data structure and algorithm that builds the student’s knowledge in the areas of • search tree structures (Red-Black Trees, B- Trees, Splay Trees), • advanced heap structures (Fibonacci Heaps), • graphs and graphs algorithm (Depth-first, Breadth- first, Minimum Spanning Trees, Shortest path, Maximum flow, Matching) and • geometric algorithm (Intersection of line segments, convex hull).

  18. Goals of this course (Cont’d) • The successful student, learning these concepts, will be able to analyze algorithms for different data structures and file structures. • The objective of Data Structures and Algorithm was to teach ways of efficiently organizing and manipulating data in main memory. In this course you will learn equivalent techniques for organization and manipulation of data in secondary storage.

  19. Algorithm Analysis- Big O Notation • Example: automobile • Large, medium, economy (compacts, subcompacts, midsize) • Provide quick idea about the size. Don’t need actual dimension • Useful to have a shorthand way to say how efficient a computer algorithm is. • In CS, rough measure is called Big O notation • Alg. A is twice as fast as alg. B- not meaningful • Why? • Proportion can change radically as the number of items change • Need a measure that is related to the number of items BIG O IS THE SOLUTION!!!

  20. Big O Notation • Insertion in an unsorted array • Does not depend on how many items in an array, no matter how big is array N • Item placed in the next available position a[nElems] and nElems++; • Constant time , T=K, O(1) • Real situation: time depends on speed of the microprocessor, how efficiently the compiler generated the program code and other factors; • Constant K account for all such factors • Linear search • No. of comparisons that must be made to find that item • Average time: half of the total number of items T=K*N/2 • How to calculate K? • k= K/2; T=k*N • Proportional to the size of the array O(N)

  21. Running Times in Big O Notation Why Not Use Arrays for Everything?

  22. Sorting • Examples: • students by grade, customers by zip code, home sales by price, cities in order of increasing population, countries by GNP, stars by magnitude, and so on. • Sorting data may also be a preliminary step to searching it. Binary search, which can be applied only to sorted data, is much faster than a linear search. • Because sorting is so important and potentially so time-consuming, it has been the subject of extensive research in computer science, and some very sophisticated methods have been developed.

  23. How would you do Sorting? • Two Steps: • Compare two items. • Swap two items or copy one item. • Move one position right • Problem: • Imagine that kids-league baseball team is lined up on the field. The regulation nine players, plus an extra, have shown up for practice. You want to arrange the players in order of increasing height (with the shortest player on the left), for the team picture. How would you go about this sorting process?

  24. How would you do Sorting? Unsorted Sorted

  25. Notoriously slow but conceptually simplest Solution: You start at the left end of the line and compare the two kids in positions 0 and 1. If the one on the left (in 0) is taller, you swap them. If the one on the right is taller, you don't do anything. Then you move over one position and compare the kids in positions 1 and 2. Again, if the one on the left is taller, you swap them. Bubble Sort Beginning of the 1st Pass

  26. After 1st pass tallest kid is on the right. Biggest item bubble up to the top end of the array as the algorithm progresses Bubble Sort End of 1st Pass

  27. After this first pass through all the data, you've made N–1 comparisons and somewhere between 0 and N–1 swaps, depending on the initial arrangement of the players. The item at the end of the array is sorted and won't be moved again. Now you go back and start another pass from the left end of the line. Again you go toward the right, comparing and swapping when appropriate. However, this time you can stop one player short of the end of the line, at position N–2, because you know the last position, at N–1, already contains the tallest player. This rule could be stated as: When you reach the first sorted player, start over at the left end of the line. You continue this process until all the players are in order. Bubble Sort- Observation

  28. Examples: 10 items Comparisons: 9+8+7+-----+1= 45 General formula : (N-1)+(N-2)+(N-3)+--------+1=N*(N-1)/2~N2/2 ~O(N2) Practice Problems Define the following terms: data structure, algorithm, file structure, Big O notation Why array is not used in everything? What are the major steps in sorting? Compare the run time in terms of Big O notation for different algorithm Implement bubble sort using Java Efficiency of the Bubble Sort

More Related