180 likes | 279 Views
A grammar for arithmetic expressions involving the four basic operators and parenthesized expressions. Parenthesized expressions have the highest precedence * and / have the same level of precedence, just under parenthesized expressions; these operators are left-associative.
E N D
A grammar for arithmetic expressions involving the four basic operators and parenthesized expressions. Parenthesized expressions have the highest precedence * and / have the same level of precedence, just under parenthesized expressions; these operators are left-associative. + and – have the same level of precedence, just under the * and / operators; these operators are also left associative. These requirements determine the grammar.
The productions for this grammar are: <E> <E> + <T> | <E> - <T> | <T> <T> <T> * <F> | <T> / <F> | <F> <F> ( <E> ) | <float> <float> any float number This grammar is relatively easy to understand, and apply to expressions by hand. There is a slight problem: It sometimes takes some thought to select the correct production to apply. As we will see shortly, this grammar is difficult to implement; but there is a solution to this problem.
Example: (a + b) * c <E> <T> <T> * <F> ( <E> ) c <E> + <T> a b
Example: a + b * c <E> <E> + <T> <T> <T> * <F> <F> <F> c a b
From a programming point of view, this grammar for expressions has some serious problems. Consider the productions <E> > <E> + <T> | <E> - <T> | <T> First of all there is no way to choose which of these three productions to apply based on the first token in an expression. A person can look at the entire expression, and usually make a good choice on each step. A program, however, doesn’t have that ability. A trial and error approach can be programmed that allows for backtracking when a dead end is reached, but this is not very satisfactory, and is inefficient. And there is a more serious problem:
The implementation of the function to process the first non-terminal, the start symbol, would look something like the following: E ( ) { E ( ); . . . etc . . . } So . . . E( ) calls E( ), which immediately calls E( ), which immediately calls E( ), which . . . etc. There is no base case for this recursive function, and the unlimited function calls will over-run the run-time stack causing a stack overflow error.
Two grammars are equivalent if they recognize exactly the same language. An equivalent grammar to <E> <E> + <T> | <E> - <T> | <T> <T> <T> * <F> | <T> / <F> | <F> <F> ( <E> ) | <float> <float> any float number is <E> <T> <E1> <E1> + <T> <E1> | - <T> <E1> | <T> <F> <T1> <T1> * <F> <T1> | / <F> <T1> | <F> ( <E> ) | any float number
It’s easy to implement this grammar to recognize valid arithmetic expressions. An expression is read in as a string of characters. A function must be written to scan the string to build the next token. A token is a floating point number, an operator: +, -. *, /, and a left and right parenthesis: (, ) - these are the terminal symbols in this grammar.
The functions corresponding to the non-terminals are: <E> <T> <E1> E ( ) { T ( ) E1 ( ) } <T> <F> <T1> T ( ) { F ( ) T1 ( ) }
<E1> + <T> <E1> | - <T> <E1> | E1 ( ) { if token is + or - { get next token T ( ) E1 ( ) } }
<T1> * <F> <T1> | / <F> <T1> | T1 ( ) { if token is * or / { get next token F ( ) T1 ( ) } }
<F> ( <E> ) | any float number F ( ) { if token is a number get next token else if token is ‘(‘ { get next token E ( ) if token is ‘)’ get next token else error ( ) } else error ( ) }
If we simply want to recognize a valid arithmetic expression, each of these functions will have a return type bool. Also parameters for these functions, and the corresponding arguments in the function calls have not been listed. However, we don’t just want to recognize valid arithmetic expressions, we want to evaluate the expressions. To do this actions have to be inserted into the code that was just outlined.
Two stacks are needed while scanning the string (expression): An opStk holds the operators as they are encountered. When it is time for an operator to be applied, it is popped from the stack. When the process is complete, this stack should be empty. A valStk holds the operands as they are encountered. When it is time for an operator to be applied, two operands are popped from the stack to perform the operation. The result is pushed on the valStk. When the process is complete, the value of the expression should be the only entry on the stack.
A calling program will do the following input an expression as a string get the first token call E ( ) display the value of the expression or a suitable error message The previous outlines for the functions E ( ), and T ( ) remain unchanged since they only manage the functions that actually do the work. The remaining functions become:
E1 ( ) { if token is + or - { opStk.push (token) get next token T ( ) process the operation E1 ( ) } }
T1 ( ) { if token is * or / { opStk.push ( token) get next token F ( ) process the operation T1 ( ) } }
F ( ) { if token is a number valStk.push (token) get next token else if token is ‘(‘ { get next token E ( ) if token is ‘)’ get next token else error ( ) } else error ( ) }