930 likes | 1.05k Views
Data Structures. Fundamental Data Storage. Data Structures. For sizeable programs, one problem that can quickly arise is that of data storage. What is the most efficient or effective way to organize and utilize information within a program? Quick answer – it depends on the task.
E N D
Data Structures Fundamental Data Storage
Data Structures • For sizeable programs, one problem that can quickly arise is that of data storage. • What is the most efficient or effective way to organize and utilize information within a program? • Quick answer – it depends on the task.
Data Structures • For some tasks, it is helpful (at minimum) and possibly necessary to have sorted data. • For other tasks, it is not necessary to note where any given piece of data is stored within a storage data structure.
Data Structures • Note: while we have seen these in passing and as examples earlier in the course, we will now examine these a little more closely.
Arrays • Possibly the most basic non-trivial data storage structure is that of the array. • We’ve already seen the notion of a “vector” that dynamically resizes. 6 9 2 5 7 8 0 1 3 4
Beyond Arrays • Note that the main structure being implemented by an array is effectively that of an ordered list. • Just like with an array, each element being stored has a specific location, which implies an ordering. 6 9 2 5 7 8 0 1 3 4
Beyond Arrays • In Java, there is an ArrayList class in the java.util.* package. • This class internally uses an array and resizes it when necessary as new items are added to the conceptual underlying list. • This resizing is handled internally and automatically by the class.
Beyond Arrays • In C++, there is a vector class as part of the std namespace. • Likewise, this class internally uses an array and resizes it when necessary as new items are added to the conceptual underlying list. • This resizing is also handled internally and automatically by the class.
Beyond Arrays • However, arrays are not the only way to model a list. • Another such model is that of the linked list. (See the graphic below.)
Linked Lists • The linked list stores each data element separately and individually, allocating space for new elements whenever as they are added into the list.
Linked Lists • Adding data to the end of a linked list is trivial, as it (usually) also is for an array.
Linked Lists • Adding data in the middle of the list, or at its beginning, is (relatively) very time-consuming for an array. • For a linked list, however, it is often a much simpler operation.
Adding Elements • Remember that for an array, elements are in fixed locations. • To insert an element into the middle of an array requires moving all elements at and after the point of insertion, e.g., insert 7 at index 3. 5 42 9 4 13 1 2 3 8 6 9 2 5 7 8 0 1 3 4
Adding Elements 5 42 9 4 13 1 2 3 8 6 9 2 5 7 8 0 1 3 4 9 5 13 42 2 4 1 3 8 6 9 2 5 7 8 0 1 3 4 9 5 13 42 2 4 1 7 3 8 6 9 2 5 7 8 0 1 3 4
Adding Elements • For a linked list, however, each element’s storage space is distinct and separate from the others. • New storage may be placed directly in the middle of the chain.
Linked Lists • Naturally, there is the question of what these “links of the chain” actually are, or more properly, how to represent them.
Linked Lists • In their most basic and simple form… template <typenameT> class Node<T> { public: T value; Node<T>* next; }
Linked Lists template <typenameT> class Node<T> { public: T value; Node<T>* next; } value next
Linked Lists Remember – objects are handled by reference, so the class Node<T> doesn’t actually contain another Node<T> – just a reference to the next one in line.
Linked Lists The end of the “linked list chain” is denoted by a null reference in the last node. The “ground” symbol at the end denotes this.
Lists • Note that we now have two different ways of storing data, each of which has its own pros and cons. • Arrays • Good for adding items to the end of lists and for random access to items within the list. • Bad for cases with many additions and removals at various places within the list.
Lists • Note that we now have two different ways of storing data, each of which has its own pros and cons. • Arrays • Good for adding items to the end of lists and for random access to items within the list. • Bad for cases with many additions and removals at various places within the list.
Lists • Note that we now have two different ways of storing data, each of which has its own pros and cons. • Linked Lists • Better for adding and removing items at random locations within the list. • Bad at randomly accessing items from the list. • Note that to use a random item within the list, we must traverse the chain to find it.
Lists • Note that both of these objects fulfill the same end goal – to represent a group of objects with some implied ordering upon them. • While they meet this goal differently, their primary purpose is identical.
Templates • Templates are integral to generic programming in C++ • Template is like a blueprint • Blueprint is used to instantiate function when it is actually used in code • “Actual” types are substituted in for the “formal” types of the template
Why Templates? What is the difference between the following two functions? intcompare(const string &v1, const string &v2) { if (v1 < v2) return -1; if (v2 < v1) return 1; return 0; } intcompare(constdouble &v1, constdouble &v2) { if (v1 < v2) return -1; if (v2 < v1) return 1; return 0; } Only the types!
Why Templates? What if we could write the function once for any type and have the compiler just use the right types? template <typenameT> intcompare(const T &v1, const T &v2) { if (v1 < v2) return -1; if (v2 < v1) return 1; return 0; } Requires type T to have < operator
Exercise 1 • Implement the generic compare function • Implement a main() that compares two doubles, two ints, two chars, and two strings using the compare fcn. • Compile and see that it is good!
What is Going On? • Compiler sees structure when template is defined, blueprint when generic function is coded (in header) • When call to function is seen, compiler substitutes types used in invocation into blueprint and generates required code • Can’t catch many errors until invocation is seen
Abstracting Beyond Lists • We have this notion of a “list” structure, which maps its stored objects to indices. • What if we don’t actually need to have a lookup position for our stored objects? • But wait! How could we possibly iterate over the objects in a for loop?
The Iterator • Many programming languages provide objects called iterators for enumerating objects contained within data structures • C++ and Java are no exceptions • C++’s versions are defined in the <iterator> header file • (see 3.4 – 3.5)
The Iterator • This iterator may be used to get each contained object in order, one at a time, in a controllable manner. • It’s especially designed to work well with for loops.
The Iterator • Example code: vector<int> numbers; // omitted code initializing numbers. iterator<int> iter; for(iter = numbers.begin(); iter != numbers.end(); iter++) { cout << *iter << ‘ ’; }
The Iterator • In C++, iterators are designed to look like and act something like pointers. • The * and -> operators are overloaded to give pointer-like semantics, allowing users of the iterator object to “dereference” the object currently “referenced” by the iterator.
The Iterator • In C++, iterators are designed to look like and act something like pointers. • Note the use of operator ++ to increment the iterator to the next item • This is another way we can interact with pointers; it’s useful for iterating across an array while using pointer semantics… but keep a copy of the original around!
The Iterator vector<int> numbers; // omitted code initializing numbers. iterator<int> iter; for(iter = numbers.begin(); iter != numbers.end(); iter++) { cout << *iter << ‘ ’; }
The Iterator • C++11 (the newest edition/standard) also provides an alternate version of the for-loop which is designed to work with iterable structures and iterators • Looks like “foreach” in other languages vector<Person> structure; for(Person &p:structure) { //Code. }
The Iterator • Both the std::vector and std::list classes of C++ implement iterators. • begin() returns an iterator to the list’s first element • end() is a special iterator “just after” the final element of the list, useful for checking when we’re done with iteration • Use != to check for termination
Exercise 2 • Include <iterator> header • Use iterator to walk through an array you define and print out its contents • Compile and run • See that it is good
Abstracting Beyond Lists • There are many, many other techniques for storing data than the model of a list. • Such other data structures have different techniques for accessing stored data. • You have seen one in your lab exercises
Other Data Structures • Let’s move on from this idea of a “list” structure. • In particular, note how lists map their stored objects to indices (or can map an index to the stored object) • What if we don’t actually need to have a lookup position for our stored objects? • In particular, does it really need to be an integer?
Other Data Structures • There are many, many other techniques for storing data than the model of a list. • Such other data structures have different techniques for accessing and handling stored data. • These “different techniques” are often designed with a focus on different usage patterns.
Other Data Structures • A first example: arrays index their contained objects by integers. • Should integers be the only thing by which we can index an item within a collection-oriented data structure? • Think up some examples with neighbors cake red A113 blue … apple bear 42
Maps • The interface built on this idea within Java is the Map. • TreeMap and HashMap are the two prominent implementations. • The value is the object being stored within the map. • The keyis the data element used as an index into the map for that value (i.e., how you “look up” the value) • Key is like key in a database, sometimes call “tag” in associative memory
Maps • The classes built on this idea within C++ are mapand unordered_map. • Sidenote – these are also not polymorphically related. • Map stores items in order of keys • Unordered map does not require keys to have order relation at all!
Maps • How would such a map work? • We could just use matching arrays for the keys and values. • However, this wouldn’t be the most efficient idea – better techniques are known.
Hash Maps • Hashmaps work by converting the key to a unique integer, where possible, through a hashing function. • C++: hash maps are represented by unordered_map. • The selection of such a function is not a simple operation. • As such, the constructor takes in a hashing function as an argument, mapping each key to a nearly-unique integer.
Hash Maps • This “hash code” is then mapped into an array for storage. • Problem: the “hash code” can easily be larger than the storage array’s size. • Solution: modular arithmetic. Divide by the array’s size and use the remainder.
Hash Maps New input: (“Football”, “Will”) hash(“Football”) -2070369658 -2070369658 mod 7 0