320 likes | 632 Views
Chapter 7: Data Structures. Earlier in the semester we introduced data types Data types: built-in types for the language primitive types: integer, real, character, boolean, string (in some languages) more exotic types: arrays, records, pointers Here, we introduce data structures
E N D
Chapter 7: Data Structures • Earlier in the semester we introduced data types • Data types: built-in types for the language • primitive types: integer, real, character, boolean, string (in some languages) • more exotic types: arrays, records, pointers • Here, we introduce data structures • Data structures: more elaborate data types usually composed out of primitive types using arrays, records and pointers • lists, queues, stacks, trees, graphs, objects • leads to ADTs
While primitive types allow us to store one item, an array allows us to store many items of the same type homogeneous storage define type to be stored and array size size is denoted by indices some languages have fixed lower end (1 or 0) other languages allow the user to define the lower and upper ends of the indices to access an array element, we must specify an index as in A[i] or A[5] We use arrays to store lists Arrays offer quick access to any element, known as random access easy to work with, pass entire array as one parameter we can also use arrays to represent other structures such as queues, stacks and trees One drawback of the array is its fixed size most languages require that the upper limit on the array size be specified at compile time (or run time) too small of an upper limit and your array is too small too large of an upper limit and you waste memory Arrays
Language Implementation of Arrays • The language must have a mechanism for determining the storage location of an array item given the array index • Arrays are stored in consecutive memory locations no matter what the “shape” of the array is • This conversion mechanism is the mapping function • 1 dimensional array: location = offset + (i-1)*unit_size • i is the index, unit_size is the size in bytes of each element • 2-d array: location = offset + [(i - 1) * c + (j - 1)] * unit_size • i, j are the indices and c is the number of columns in the array • The equation for the 2-d array assumes row-major order, there is also the possibility for column-major order • although this appears to only be used in FORTRAN • see figures 7.1 and 7.2 p. 323
Lists • Lists represent ordered or unordered information that can be accessed randomly • We can implement this using an array • We can also implement this using individual elements connected together by pointers • A pointer is a variable which stores a memory location • An advantage of a list made with pointers is that it is dynamic, it can grow or shrink thus using the exact amount of memory required A record storing 2 items Datum Pointer to another datum
List Implementations • Contiguous list: using an array • Problems: • fixed size (not dynamic) • if the list is ordered (sorted) then • adding into the list requires shifting elements down • deleting from the list requires shifting elements up • see figure 7.4 p. 328 for a contiguous list of strings • Linked list: using pointers • adding and deleting are simplified but • unlike the array, there is no random access, we have to follow the chain of pointers when searching for an item whether the list is sorted or not
More on Pointers • Pointers are studied in detail in 2380 • here we will only briefly talk about them • Languages require instructions to • create a data structure that is accessed by pointer • destroy a structure accessed by a pointer • a way to access the block of memory through the pointer • Languages will also have a special value for a pointer called NIL when the pointer is currently not pointing at anything • A list also requires a head pointer (points to first item) head pointer
Array based adding and deleting: To add at position i, we shift the elements from i to n down 1 position each for (j= n; j>1; j--) a[j+1] = a[j]; a[i] = new_item; n = n+1; To delete the item at position i, we shift the elements from i+1 to n up 1 position each and subtract 1 from n for (j=i; j<=n; j++) a[j-1] = a[j]; n = n-1; Pointer-based adding and deleting: First search for correct position to do the adding or deleting) To Add: Previous item’s pointer must now point to the new item New item’s pointer must now point to what previous item’s pointer used to point to See figure 7.7 p. 330 To Delete Previous item’s pointer must point to what the item after the item to be deleted Delete item (return to heap) See figure 7.6 p. 330 Adding and Deleting in Lists
A list must support certain operations: sorting searching accessing the ith item adding deleting printing destroying Once a list is created, with these operations, we can use the list without worrying about its implementation, this is the concept behind the ADT Notice however that a list implemented as an array has the drawback of static size while the list implemented using pointers has the drawback of no random access We use computational complexity analysis to determine which implementation is better for a given situation Arrays offer O(1) access, O(log n) searching with binary search, O(n) adding and deleting Linked lists offer O(n) access, O(n) searching, and O(n) adding and deleting Conceptual Lists
Problem with pointers • There are several problems associated with pointers and programmers often have problems using pointers until they are used to them • dangling pointers • destroying an item but leaving the pointer around • lost objects • changing a pointer to point at something else, the initial object is lost • using pointer arithmetic • in C, it was common to use integers as pointers and use arithmetic to adjust pointers, but this is not a good idea! • running out of heap memory
Stacks and Queues Top, Head • In a list, you can access any of the elements • In a stack, you can only access the top element • In a queue, you can only access the top (head) and bottom (tail) elements • you insert at the tail and remove from the head Interior are inaccessible for queues and stacks Tail
Similar to a list but all accesses are made at one location called the top Add items only at the top Remove item only at the top The stack is a LIFO (last in first out) structure Always remove the most recently placed item Oldest item will be the last removed Like a stack of trays in a cafeteria Operations: create, destroy push -- add an item to the top pop -- remove an item from the top empty and full Stacks are mostly used for backtracking purposes you want to get back to the point you were at before trying the latest move Uses/Applications used in mazes or game playing to “undo” a move or decision OS uses a run-time stack to keep track of where you are at in your program Procedures and functions are pushed onto the stack when they are called – returning is easy, just pop the location of the procedure call off the run-time stack See figure 7.9 p. 333 Stacks
Stack Implementation 1 Top • Array-based • array of N items and an integer -- the stack pointer called the top • Top indicates the index of the top of stack • pushing: increment Top, insert new item there • popping: remove item at Top, decrement Top • full: if Top = = N • empty: if Top = = 0 • Notice that the top of the stack moves up and down, the bottom of the stack remains fixed at position 1 in the array • because the array is fixed size, our stack is limited in size and so, while this implementation is easy, it may not be a good implementation
Stack Implementation 2 • Pointer-based • Use record structures for each stack element, an info field and a next pointer • Need pointer, Top, to point to current TOP of the stack, initially NIL • pushing: create new item, pointed to by temp, set temp’s next pointer to top, set top to temp • popping: set temp to top, make top’s pointer point to temp’s next pointer, delete temp • empty: if top = = NIL • full: assume it is never full • The pointer-based implementation is not fixed size and only full if you run out of heap memory • Unlike the list implemented using pointers, the stack is not inefficient because we never try to randomly access into the stack Top
FIFO structure (first in, first out) Used to represent lists that are processed in order such as a list of jobs waiting at a printer Queues are used throughout the operating system Unlike the stack, the queue offers access at two ends: at the front end: we remove items called the head at the rear end: we add items called the tail Since we access the queue at two points, we need to means of accessing it, we call these the head (or front) pointer and the tail (or rear) pointer Operations: Enqueue: add item at the tail Dequeue: remove item at the head Create, Destroy, Empty, Full Queues
Queue Implementation 1 Tail Head • Array based: • Use an array of N items and two integers, tail and head, that indicate the index of the two ends of the queue • Enqueue: Add item at q[tail] and increment tail • Dequeue: Remove item at q[head] and increment head • Empty: if tail = = head; Full: if tail = = n • Note that the queue may seem full but there may be empty spaces if head<>1! • So, we use a circular queue instead: • if tail = = n and we want to add, we reset tail = 1 • this queue is full if tail = = head - 1 • see fig 7.15 p. 340
Queue Implementation 2 • Use a linked list accessed through two pointers, head and tail • The initial, empty queue has head and tail = NIL • Enqueue: create a new item pointed to by temp, set tail’s next pointer to point to temp and set tail to point at temp • note: if it’s the first item in the list, then we also set HEAD to point at it • Dequeue: set temp to head, set head to point to the item after head, delete temp and return it to the heap • Empty: if head and tail are NIL • Full: assume it is never full Tail Head
A tree is a structure in which items have more than one successor A parent is a node that has successors, the successors are the parent’s children The root node is the first node, which has no parents Leaf nodes are at the other end of the tree, nodes with no children Siblings (or twins) are nodes that share the same parent node A subtree is a structure within a tree that starts at a given node onwards to the leaf nodes, with the given node being a subroot The depth of the tree is the number of levels, or the number of edges from the root node to the furthest leaf node A general tree is a tree in which nodes can have any number of children A binary tree is a tree in which nodes have up to 2 children Trees
Why use binary trees? • While general trees can be used to represent such information as classification hierarchies, organizational charts (see fig 7.16 p. 342) and family trees… • We use binary trees as a means of ordered storage • Each node has a Left and Right child such that • The left child and all nodes in the left child’s subtree) are less than the given node • The right child and all nodes in the right child’s subtree are greater than the given node • Unlike a sorted list in an array, adding and deleting become easier
Representing a Tree • As with lists, we can represent trees using arrays and pointers • Array based: root node goes in array index 1, left child in 2, right child in 3, etc • Node i has a left child at index 2*i and a right child at index 2*i+1 • See figure 7.19 p. 345 • Unfortunately, if the tree is sparse, then a lot of the array locations will be empty (see fig 7.20 p. 346) • Pointer based: each node has three items, an information field, a left pointer and a right pointer (see figure 7.18 p. 344) info
Example Binary Tree 22 13 41 6 21 33 48 2 7 16 30 40 44 51 Where would you add 20? What about 23?
Each node in the tree has a left and right pointer and an info field (here storing the node’s name) To move from one node to another, have a TEMP pointer which is updated to point at TEMP’s LEFT or RIGHT child Pointer-based Binary Tree TEMP Root Left Right Child Child Left Right Left Right Child Child Child Child
Trees and Recursion • If we use the pointer based implementation, then how do we go from a node to its parent? • We do not usually have parent pointers in our tree • So there is no easy way to get back up the tree • If we use recursion to implement tree algorithms then we use the have a LIFO structure that allows us to easily find a pointer to a given node’s parent because the parent was the most recently visited node prior to the given node • We implement traversal, add, delete, print all recursively • some of these algorithms are given in figures 7.24 and 7.26 • To search the tree, we do not need recursion as we are only going down the tree, not back up • The binary search tree algorithm is given in figure 7.22 p. 348 • We will skip the details of the tree here, you will study them in detail in both 2380 and 3333
Customized Data Types • You saw in 1380 that C/C++ allows a programmer to define his/her own data types • This allows for customized structures including records which can be used to define linked list elements: • Ex: struct foo { char info[7]; ptr next; } Info Next
One form of user-defined type is the ADT The programmer defines the data structure and also the operations that will be used on the data structure The data type and data structure are encapsulated into one set of definitions In Pascal this would be a Unit, in C++ it is an object, in Ada it is a package (see fig 7.27 for an Ada package for a Stack) By encapsulating the structure and operations together, the details are not available for a programmer to misuse We will explore ADTs in detail in 2380 and 3333, but for now, lets consider why we might use them: we could use other people’s code (the idea of reusability or off-the-shelf components) we can let someone else use our code where we know that they will not be able to misuse it promotes the idea of modularity more than procedures by themselves, now we have process modularity and data modularity Abstract Data Types
Information Hiding • In order to make sure that other programmers do not know how the data structure is implemented, the details are encapsulated with the processes and hidden from view • Different languages achieve information hiding in different ways • In C++ and Java, it is through private sections in the class or struct definitions • In Ada, it is also through a private section • see figure 7.28 p. 357 • In Pascal, it is not possible -- one of the reasons that Pascal is not used much in “real world programming”
Your code adt a; Access a through procedure calls Interface procedure names and parameters adt definitions (structure and procedures) Example of Using ADTs The Interface provides the means by which program code can access the ADT – in C++ and Java, the Interface is public and the ADT code is private (see p. 359)
Pointers in Machine Language • Notice that in this chapter, we used pointers to reference the next item in the list • A pointer variable stores a memory location rather than a value • In our machine language (from chapter 2), we had two load operations • Load register with value from given memory location • Load register with given value • Unfortunately, we do not have a load that will load register with value stored in the memory location pointed at by the given memory location
This idea of loading a value that is stored at a memory location that we reference through another memory location is known as indirect addressing We now have three loads: Load Direct Load Immediate Load Indirect We will similarly need 2 saves, Save Direct and Save Indirect We can enhance our machine language with two additional instructions Op code D DRXY Load register R with the value stored at the memory location that is stored in XY Op code E ERXY Store the value in register R at the memory location that is stored at memory location XY Addressing modes are covered in detail in 2333 Addressing Modes