420 likes | 517 Views
Data Structures: Trees and Grammars. Readings: Sections 6.1, 7.1-7.4 (more from Ch. 6 later). Goals for this Unit. Continue focus on data structures and algorithms Understand concepts of reference-based data structures (e.g. linked lists, binary trees) Some implementation for binary trees
E N D
Data Structures:Trees and Grammars Readings:Sections 6.1, 7.1-7.4 (more from Ch. 6 later)
Goals for this Unit • Continue focus on data structures and algorithms • Understand concepts of reference-based data structures (e.g. linked lists, binary trees) • Some implementation for binary trees • Understand usefulness of trees and hierarchies as useful data models • Recursion used to define data organization
Taxonomy of Data Structures • From the text: • Data type: collection of values and operations • Compare to Abstract Data Type! • Simple data types vs. composite data types • Book: Data structures are composite data types • Definition: a collection of elements that are some combination of primitive and other composite data types
Book’s Classification of Data Structures • Four groupings: • Linear Data Structures • Hierarchical • Graph • Sets and Tables • When defining these, note an element has: • one or more information fields • relationships with other elements
Note on Our Book and Our Course • Our book’s strategy • In Ch. 6, discuss principles of Lists • Give an interface, then implement from scratch • In Ch. 7, discuss principles of Trees • Later, in Ch. 9, see what Java gives us • Our course’s strategy • We did Ch. 9 first. Saw List interfaces and operations • Then, Ch. 8 on maps and sets • Now, trees with some implementation too
Trees Represent… • Concept of a tree verycommon and important. • Tree terminology: • Nodes have one parent • A node’s children • Leaf nodes: no children • Root node: top or start; no parent • Data structures that store trees • Execution or processing that can be expressed as a tree • E.g. method calls as a program runs • Searching a maze or puzzle
Trees are Important • Trees are important for cognition and computation • computer science • language processing (human or computer) • parse trees • knowledge representation (or modeling of the “real world”) • E.g. family trees; the Linnaean taxonomy (kingdom, phylum, …, species); etc.
C:\ MyMail CS120 CS216 school pers lab1 lab2 lab3 list.h list.cpp calc.cpp Another Tree Example: File System • What about file links (Unix) or shortcuts (Windows)?
Another Tree Example: XML and HTML documents How is this a tree? What are the leaves? <HTML> <HEAD>…</HEAD> <BODY> <H1>My Page</H1> <P> Blah <PRE>blah blah</PRE> End </P> </BODY> </HTML>
Tree Data Structures • Why this now? • Very useful in coding • TreeMap in Java Collections Framework • Example of recursive data structures • Methods are recursive algorithms
Tree Definitions and Terms • First, general trees vs. binary trees • Each node in a binary tree has at most two children • General tree definition: • Set of nodes T (possibly empty?) with a distinguished node, the root • All other nodes form a set of disjoint subtrees Ti • each a tree in its own right • each connected to the root with an edge • Note the recursive definition • Each node is the root of a subtree
r T2 T3 T1 Picture of Tree Definition • And all subtrees are recursively defined as: • a node with… • subtrees attached to it
Tree Terminology • A node’s parent • A node’s children • Binary tree: left child and right child • Sibling nodes • Descendants, ancestors • A node’s degree (how many children) • Leaf nodes or terminal nodes • Internal or non-terminal nodes
Recursive Data Structure Recursive Data Structure: a data structure that contains a pointer or reference to an instance of itself: public class TreeNode<T> { T nodeItem; TreeNode<T> left, right; TreeNode<T> parent; … } • Recursion is a natural way to express many algorithms. • For recursive data-structures, recursive algorithms are a natural choice
General Trees • Representing general trees is a bit harder • Each node has a list of child nodes • Turns out that: • Binary trees are simpler and still quite useful • From now on, let’s focus on binary-trees only
ADT Tree • Remember definition on an ADT? • Model of information: we just covered that • Operations? See pages 405-406 in textbook • Many are similar to ADT List or any data structure • The “CRUD” operations: create, replace, update, delete • Important about this list of operations • some are in terms of one specified node, e.g. hasParent() • others are “tree-wide”, e.g. size(), traversal
Classes for Binary Trees (pp. 416-431) • class LinkedBinaryTree (p. 425, bottom) • reference to root BinaryTreeNode • methods: tree-level operations • class BinaryTreeNode (p. 416) • data: an object (of some type) • left: references root of left-subtree (or null) • right: references root of right-subtree (or null) • parent: references this node’s parent node • Could this be null? When should it be? • methods: node-level operations
Two-class Strategy for Recursive Data Structures • Common design: use two classes for a Tree or List • “Top” class • has reference to “first” node • other things that apply to the whole data-structure object (e.g. the tree-object) • both methods and fields • Node class • Recursive definitions are here as references to other node objects • Also data (of course) • Methods defined in this class are recursive
Binary Tree and Node Class • LinkedBinaryTree class has: • reference to root node • reference to a current node, a cursor • non-recursive methods like:boolean find(tgt) // see if tgt is in the whole tree • Node class has: • data, references to left and right subtrees • recursive versions of methods like find:boolean find(tgt) // is tgt here or in my subtrees? • Note: BinaryTree.find() just calls Node.find() on the root node! • Other methods work this way too
Why Does This Matter Now? • This illustrates (again) important design ideas • The tree itself is what we’re interested in • There are tree-level operations on it (“ADT level” operations) • The implementation is a recursive data structure • There are recursive methods inside the lower-level classes that are closely related (same name!) to the ADT-level operation • Principles? abstraction (hiding details), delegation (helper classes, methods)
ADT Tree Operations: “Navigation” • Positioning: • toRoot(), toParent(), toLeftChild(), toRightChild(), find(Object o) • Checking: • hasParent(), hasLeftChild(), etc. • equals(Object tree2) • Book calls this a “deep compare” • Do two distinct objects have the same structure and contents?
ADT Tree Operations: Mutators • Mutators: • insertRight(Object o), insertLeft(Object o) • create a new node containing new data • make this new node be the child of the current node • Important: We use these to build trees! • prune() • delete the subtree rooted by the current node
Next: Implementation • Next (in the book) • How to implement Java classes for binary trees • Class for node, another class for BinTree • Interface for both, then two implementations (array and reference) • But for us: • We’ll skip some of this, particularly the array version • We’ll only look at reference-base implementation • After that: concept of a binary search tree
Understanding Implementations • Let’s review some of the methods on pp. 416-431 • (Done in class, looking at code in book.) • Some topics discussed: • Node class. Parent reference or not? • Are two trees equal? • Traversal strategies: Section 7.3.2 in book • visit() method and callback (also 7.3.2)
Binary Search Trees • We often need collections that store items • Maybe a long series of inserts or deletions • We want fast lookup, and often we want to access in sorted order • Lists: O(n) lookup • Could sort them for O(lg n) lookup • Cost to sort is O(n lg n) and we might need to re-sort often as we insert, remove items • Solution: search tree
Binary Search Trees • Associated with each node is a key value that can be compared. • Binary search tree property: • every node in the left subtree has key whose value is less than the value of the root’s key value, and • every node in the right subtree has key whose value is greater than the value of the root’s key value.
Example 5 4 8 1 7 11 3 BINARY SEARCH TREE
Counterexample 8 5 11 2 6 10 18 7 4 15 20 NOT A BINARY SEARCH TREE 21
Find and Insert in BST • Find: look for where it should be • If not there, that’s where you insert
Recursion and Tree Operations • Recursive code for tree operations is simple, natural, elegant • Example: pseudo-code for Node.find() boolean find(Comparable tgt) { Node next = null; if (this.data matches tgt) return true else if (tgt’s data < this.data) next = this.leftChild else // tgt’s data > this.data next = this.rightChild // next points to left or right subtree if (next == null ) return false // no subtree else return next.find(tgt) // search on }
Order in BSTs • How could we traverse a BST so that the nodes are visited in sorted order? • Does one of our traversal strategies work? • A very useful property about BSTs • Consider Java’s TreeSet and TreeMap • A search tree (not a BST, but be one of its better “cousins”) • In CS2150: AVL trees, Red-Black trees • Guarantee: search times are O(lg n)
Deleting from a BST • Removing a node requires • Moving its left and right subtrees • Not hard if one not there • But if both there? • Answer: not too tough, but wait for CS2150 to see! • In CS2110, we’ll not worry about this
Next: Grammars, Trees, Recursion • Languages are governed by a set of rules called a grammar • Is a statement legal ? • Generate or derive a new legal statement • Natural language grammars • Language processing by computers • But, grammars used a lot in computing • Grammar for a programming language • Grammar for valid inputs, messages, data, etc.
Backus-Naur Form • http://en.wikipedia.org/wiki/Backus-Naur_form • BNF is a widely-used notation for describing the grammar or formal syntax of programming languages or data • BNF specifics a grammar as a set of derivation rules of this form: <symbol> ::= <expression with symbols> • Look at website and example there (also on next slide) • How are trees involved here? Is it recursive?
BNF for Postal Address • <postal-address> ::= <name-part> <street-address> <zip-part> • <personal-part> ::= <first-name> | <initial> "." • <name-part> ::= <personal-part> <last-name> [<jr-part>] <EOL> | <personal-part> <name-part> • <street-address> ::= [<apt>] <house-num> <street-name> <EOL> • <zip-part> ::= <town-name> "," <state-code> <ZIP-code> <EOL> Example: Ann Marie G. Jones 123 Main St. Hooville, VA 22901 Where’s the recursion?
Grammars in Language • Rule-based grammars describe • how legal statements can be produced • how to tell if a statement is legal • Study textbook, pp. 389-391, to see rule-based grammar for simple Java-like arithmetic expressions • four rules for expressions, terms, factors, and letter • Study how a (possibly) legal statement is parsed to generate a parse tree
Computing Parse-Tree Example • Expression: a * b + c
Grammar Terms and Concepts • First, this is what’s called a context-free grammar • For CS2110, let’s not worry about what this means! (But in CS2102, you learn this.) • A CFG has • a set of variables (AKA non-terminals) • a set of terminal symbols • a set of productions • a starting symbol
Previous Parse Tree • Terminal symbols: • <operator> could be: + * • <letter> could be: a b c • Production: <factor> <letter> | <number>
Natural Language Parse Tree • Statement: The man bit the dog
How Can We Use Grammars? • Parsing • Is a given statement a valid statement in the language? (Is the statement recognized by the grammar?) • Note this is what the Java compiler does as a first step toward creating an executable form of your program. (Find errors, or build executable.) • Production • Generate a legal statement for this grammar • Demo: generate random statements! • See link on website next to slides
Demo’s Poem-grammar data file { <start> The <object> <verb> tonight } { <object> waves big yellow flowers slugs } { <verb> sigh <adverb> portend like <object> die <adverb> } { <adverb> warily grumpily } • Note: no recursive productions in this example!