180 likes | 319 Views
Data Structures and Analysis (COMP 410). David Stotts Computer Science Department UNC Chapel Hill. Design Problem. Real Problem. Type ahead Like on google search, phone typing…
E N D
Data Structures and Analysis(COMP 410) David Stotts Computer Science Department UNC Chapel Hill
Real Problem Type ahead • Like on google search, phone typing… • you type a few chars and the program fills in a list of possible choices for you… based on the prefix you have typed • Keep typing more chars, the choices narrow and change Design a data structure that will let you do this Describe the time complexity of using it… • searching it as typing is done, • generating alternatives, etc.
Take some time Discuss an approach with your neighbor In 5-10 mins we will discuss ideas as a class
Basic idea Let’s not use node to store a whole word Use child link to represent a char typed Path is then the word <root> t n a a a e r o s e n tar as to an a w tea new
Basic idea… This tree encodes (stores) these words: tar, tan, tea, to, ton, toe, a, an, ant, as, net, nest, new, no <root> t n a a n e a o tan o r s e n no as tar to an a n w t s e t tea ton ant new t net toe nest
This has a name Trie Pronounced “try” or “tree”, both ways Or “trie tree” tree-tree, try-tree Comes from “ reTRIEval ” Used for prefix-based retrieval of strings formed over an alphabet
Representation How many children at each node? As many as there are chars you can type Let’s say 26 for this example node { string val= null; node[26] child = new [null,null,…,null]; booleanisWord = false; }
Representation node { string val= null; node[26] child = new [null,null,…,null]; booleanisWord = false; } val: isWord: false . . . child: 0 1 2 3 4 5 6 7 . . . 22 23 24 25
Representation val: val: “be” val: “a” val: “b” isWord: true isWord: false isWord: false isWord: true 0 1 2 3 4 5 6 7 . . . 22 23 24 25 0 1 2 3 4 5 6 7 . . . 22 23 24 25 0 1 2 3 4 5 6 7 . . . 22 23 24 25 0 1 2 3 4 5 6 7 . . . 22 23 24 25 . . . . . . . . . . . . child: child: child: child:
Representation val: “be” val: “b” val: “a” val: isWord: true isWord: false isWord: false isWord: true 0 1 2 3 4 5 6 7 . . . 22 23 24 25 0 1 2 3 4 5 6 7 . . . 22 23 24 25 0 1 2 3 4 5 6 7 . . . 22 23 24 25 0 1 2 3 4 5 6 7 . . . 22 23 24 25 <root> . . . . . . . . . . . . child: child: child: child: b a a e be
Analysis Big Oh time complexity is always expressed in terms of some problem size Here the problem size is not the number of words encoded in the tree, like we say for BST Rather we choose M, the length of a word being inserted or searched for
Analysis The worst case time needed to find a word of length M is… O(M) This is true if the tree contains 10 words or 10 million words Length of the longest path in the tree is length of the longest word stored in the tree
Analysis If a word of length M can be made from N different characters (like 26 in the alphabet) then the number of possible nodes in the data structure is M^N A trie to store words 20 character long in an alphabet of 52 chars (upper and lower) is 20^52
Analysis Note that if we store 26 character words and limit us to lower case we get 26^26 possible nodes… This is slightly worse than 26 ! 26 * 26 * 26 * … * 26 Is worse than 26 * 25 * 24 * … * 2 * 1
Analysis How bad is N!? Lets compare let N = 20 2^N is 2^20 is about a million N! is 20! is 2.432902e+18 2,432,902,000,000,000,000 2,432,902,000,000 * a million 2.4 trillion millions
So what? A trie made to hold 20 character words… Made from 20 lower case characters Worst case find operation is O(20) or O(N) Worst case space… O(N!) So -- its very fast to use -- Impossible (very impractical) to build in time and space
END Beyond this is just templates