480 likes | 516 Views
Scheme in Python. Overview. We’ll look at how to implement a simple scheme interpreter in Python This is based on the Scheme in Scheme interpreter we studied before We’ll look at pyscheme 1.6, which was implemented by Danny Yoo as an undergraduate at Berkeley
E N D
Overview • We’ll look at how to implement a simple scheme interpreter in Python • This is based on the Scheme in Scheme interpreter we studied before • We’ll look at pyscheme 1.6, which was implemented by Danny Yoo as an undergraduate at Berkeley • Since Python doesn’t optimize for tail recursion, he uses trampolining, which we’ll introduce
What we need to do • A representation for Scheme’s native data structures • Pairs (aka, cons cells), symbols, strings, numbers,Booleans • A reader that converts a stream of characters into a stream of s-expressions • We’ll introduce an intervening step reading characters and converting to tokens • Implement various built-ins • e.g., cons, car, +, …
What we won’t need to do We can rely on Python for a number of very useful things • Representing numbers and strings • Garbage collection • Low level I/O
atoms • Atoms include strings, number, and symbols • We’ll use Python’s native representation for string and numbers • Symbols in Scheme are interned – there is a unique object for each symbol read • This is how they differ from strings,which are not interned • Note: some Lisp implemen-tations intern small integers
Symbols # A global dictionary that contains all known symbols __INTERNED_SYMBOLS = {} class __Symbol(str): """A symbol is just a special kind of string""" def __eq__(self, other): return self is other def symbol(s): """"Returns symbol given string, creating new ones if needed””” global __interned_symbols if s not in __INTERNED_SYMBOLS: __INTERNED_SYMBOLS[s] = __Symbol(s) return __INTERNED_SYMBOLS[s] # Here are definitions of symbols that we should know scheme_false = symbol("#f") scheme_true = symbol("#t") __empty_symbol = Symbol("") def isSymbol(s): return type(s) == type(__empty_symbol)
GCing Unused Symbols • If the only reference to a symbol is from theglobal list of interned symbols, it can be garbage collected • We’ll use Python’s weakref’s for this • A weak reference is a reference that doesn’t protect an object from garbage collection • Objects referenced only by weakreferences are considered unreachable(or "weakly reachable") and may becollected at any time
using weakrefs import weakref from UserString import UserString as __UserStr … __INTERNED_SYMBOLS = weakref.WeakValueDictionary({}) … class __Symbol(__UserStr): … if s not in __INTERNED_SYMBOLS: # make a temp strong reference newSymbol = __Symbol(s) __INTERNED_SYMBOLS[s] = newSymbol return __INTERNED_SYMBOLS[s]
Representing pairs • The core of scheme only has one kind of data structure – lists– and it is made up out of pairs • What Python types should we use? • A user defined class, Pair • Lists • Tuples • Dictionary • Closures
Aside: pairs as closures • Functions are very powerful • We can use them to represent cons cells or pairs • We don’t want to do this in practice • But it shows the power of programming with functions
(define (mycons theCar theCdr) ;; mycons returns a closure that takes a 2-arg function and applies ;; it to the two remembered vlue's, i.e., the pair's car and cdr. (lambda (f) (f theCar theCdr))) (define (mycar cell) ;; mycar takes a pair closure and feeds it a 2-arg function that ;; just returns the first arg (cell (lambda (theCar theCdr) theCar))) (define (mycdr cell) ;; mycdr takes a pair closure and feeds it a 2-arg function that ;; just returns the first arg (cell (lambda (theCar theCdr) theCdr))) (define myempty ;; the empty list is just a function that always returns true. (lambda (f) #t)) (define (mynull? cell) ;; a pair is not the empty list (eq? cell myempty))
example > (define p1 (mycons 1 (mycons 2 myempty))) > p1 #<procedure:.../scheme/pairs.ss:1:31> > (mycar p1) 1 > (mycdr p1) #<procedure:.../scheme/pairs.ss:1:31> > (mycar (mycdr p1)) 2 > (mycdr (mycdr p1)) #<procedure:myempty>
Representing pairs • We’ll define a subclass of list to represent a pair Class Pair(list) : pass • The cons functions creates a new cons cell with a given car and cdr def cons(car, cdr):return Pair([car, cdr]) • Defining built-in functions for pairs will be easy def car(p): return p[0] def cdr(p): return p[1] def cadr(p): return car(cdr(p)) def set_car(p,x): p[0] = x
Lexical Analyzer • Consume a string of characters, identify tokens, throw away comments and whitespace, and return a list of remaining tokens • Each token will be a (<type>, <token>) tuple like (‘number’, ‘3.145’) or (‘comment’, ‘;; foo’) • Recognize tokens using regular expressions • We won’t worry about efficiency
Token regular expressions PATTERNS = [ ('whitespace', re.compile(r'(\s+)')), ('comment', re.compile(r'(;[^\n]*)')), ('(', re.compile(r'(\()')), (')', re.compile(r'(\))')), ('dot', re.compile(r'(\.\s)')), ('number', re.compile(r'([+\-]?(?:\d+\.\d+|\d+\.|\.\d+|\d+))')), ('symbol', re.compile(r'([a-zA-Z\+\=\?\!\@\#\$\%\^\&\*\-\/\.\>\<][\w\+\=\?\!\@\#\$\%\^\&\*\-\/\.\>\<]*)')), ('string', re.compile(r'"(([^\"]|\\")*)"')), ('\'', re.compile(r'(\')')), ('`', re.compile(r'(`)')), (',', re.compile(r'(,)')) ]
Lex Examples >>> from lex import * >>> tokenize("") [(None, None)] >>> tokenize(" 1 2. .3 1.3 -4") [('number', '1'), ('number', '2.'), ('number', '.3'), ('number', '1.3'), ('number', '-4'), (None, None)] >>> tokenize('foo 12.3foo +') [('symbol', 'foo'), ('number', '12.3'), ('symbol', 'foo'), ('symbol', '+'), (None, None)] >>> tokenize('(foo (bar ()))') [('(', '('), ('symbol', 'foo'), ('(', '('), ('symbol', 'bar'), ('(', '('), (')', ')'), (')', ')'), (')', ')'), (None, None)]
Raw string notation >>> s = ‘\nfoo\n’ >>> s '\nfoo\n' >>> print s foo >>> s = r'\nfoo\n' >>> s '\\nfoo\\n' >>> print s \nfoo\n
tokenize() def tokenize(s): toks = [] found = True while s and found: found = False for type, regex in PATTERNS: match_obj = regex.match(s) if match_obj: if type not in ('whitespace', 'comment'): toks.append((type, match_obj.group(1))) s = s[match_obj.span()[1] :] found = True break if not found: print "\nNo match'", s, ”’ – tokenize” toks.append(EOF_TOKEN) return tokens
tokenize() examples >>> from lex import * >>> tokenize('(a 1.0)') [('(', '('), ('symbol', 'a'), ('number', '1.0'), (')', ')'), (None, None)] >>> tokenize('(define (add1 x)(+ x 1))') [('(', '('), ('symbol', 'define'), ('(', '('), ('symbol', 'add1'), ('symbol', 'x'), (')', ')'), ('(', '('), ('symbol', '+'), ('symbol', 'x'), ('number', '1'), (')', ')'), (')', ')'), (None, None)]
parse • Consume a sequence of tokens and produce a sequence of s-expressions • Use a recursive descent parser • We’ll handle just a few special cases, namely quote and backquote and dotted pairs
Peeking and eating def peek(tokens): """Take a quick glance at the first token in our tokens list.""” if len(tokens) == 0: raise ParserError, "While peeking: ran out of tokens.” return tokens[0]
Peeking and eating def eat(tokens, desired_type): """If the type of the next token is desired_type, pop it from the list and return it, else return False””” if len(tokens) == 0: raise ParserError, 'No tokens left, seeking ' + desired_type return tokens.pop(0) if tokens[0][0] == desired_type else False
Peeking and eating def eat_safe(tokens, tokenType): """Digest the first token in our tokens list, making sure that we're biting on the right tokenType of thing.""” if len(tokens) == 0: raise ParserError, "While trying to eat %s: ran out of tokens." % tokenType ) if tokens[0][0] != tokenType: raise ParserError, "Seeking %s got %s" % (tokenType, tokens[0]) return tokens.pop(0)
parse def parseExpression(tokens): if eat(tokens, '\''): return cons(symbol('quote'), cons(parseExpression(tokens), NIL)) if eat(tokens, '`'): return cons(symbol('quasiquote'), cons(parseExpression(tokens), NIL)) elif eat(tokens, ','): return cons(symbol('unquote'), cons(parseExpression(tokens), NIL)) elif eat(tokens, '('): return parse_list_members(tokens) elif peek(tokens)[0] in ('number’,'symbol’,'string'): return parse_atom(tokens) else: raise ParserError, ”Parsing: no alternatives"
parse_list_members() def parse_list_members(tokens): if eat(tokens, 'dot'): final = parseExpression(tokens) eat_safe(tokens, ')') return final if peek(tokens)[0] in ('\'’,'`’,',’,'(’, 'number’,'symbol’,'string'): return cons(parseExpression(tokens), parse_list_members(tokens)) if eat(tokens, ')'): return NIL raise ParserError, "Can't finish list” + tokens
Recursive descent parsing • Remember one problem with recursive descent parsing is that the grammar has to be right recursive • Another potential problem is recursing too deeply and exceeding the limit on the stack • But maybe we can use tail recursion, which an interpreter or compiler can recognize and execute as iteration? • Not in Python
Python doesn’t optimize tail recursion def fact0(n): # iterative facorial result = 1 while n>1: result *= n n -= 1 return result def fact1(n): # simple recursive factorial return 1 if n==1 else n*fact2(n - 1) def fact2(n, result=1): # tail recursive factorial return result if n==1 else fact2(n-1, n*result)
Try this http://www.csee.umbc.edu/331/fall08/0101/code/python/pyscheme-1.7/src/fact.py
Default limit is 999 fact2(1000) and fact3(1000) both die >>> fact2(1000) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "fact.py", line 17, in fact2 return result if n==1 else fact2(n-1, n*result) File "fact.py", line 17, in fact2 … File "fact.py", line 17, in fact2 return result if n==1 else fact2(n-1, n*result) RuntimeError: maximum recursion depth exceeded
How to solve this? • You can set the maximum recursion depth higher >>> import sys >>> sys.getrecursionlimit() 1000 >>> sys.setrecursionlimit(10000) >>> fact2(1100) 53437084880926377034242155 ... 00000000L • But this is not a general solution • And Guido is on the record as not wanting to optimize tail recursion • http://www.artima.com/forums/flat.jsp?forum=106&thread=147358
Trampoline Style • A trampoline is a loop that iteratively invokes thunk-returning functions A thunk is just a a piece of code to perform a delayed computation (e.g., a closure) • A single trampoline can express all control transfers of a program • Converting a program to trampolined style is trampolining This is kind of continuation passing style of programming • Trampolined functions can do tail recursive function calls in stack-oriented languages
Trampolining is one answer • A way to program using CPS, Continuation Passing Style • CPS is a style of programming where control is passed explicitly as continuations • Trampolining is a simple way toeliminate recursion • We’ll use a simple kind of trampolining • Instead of making a recursive call, aprocedure can bounce back up to itscaller with a continuation, which canbe called to proceed with the computation
Pogo from pogo import pogo, land, bounce def fact3(n): # factorial in a trampolined style return pogo(fact_tramp(n)) def fact_tramp(n, result=1): return land(result) if n==1 else bounce(fact_tramp, n-1, n*result)
Variable length argument lists >>> def foo(*args): print "Number of arguments:", len(args) print "Arguments are: ", args >>> foo(1,2,3,'d',5) Number of arguments: 5 Arguments are: (1, 2, 3, 'd', 5) >>> def bar(arg1, *rest): print …
pogo.py def bounce(function, *args): """Returns new trampolined value that continues bouncing""" return ('bounce', function, args) def land(value): """Returns new trampolined value that lands off trampoline""" return ('land', value)
It works >>> sys.setrecursionlimit(10) >>> fact3(100) 93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000L >>> fact3(1000) 4023872600770937735...00000000000000L
pogo.py def pogo(bouncer): try: while True: if bouncer[0] == 'land’: return bouncer[1] elif bouncer[0] == 'bounce': bouncer = bouncer[1](*bouncer[2]) else: traceback.print_exc() raise TypeError, "not a bouncer” except TypeError: traceback.print_exc() raise TypeError, "not a bouncer”
See pyscheme1.6 • Pyscheme1.6 is written in trampoline style • Which was done by hand, as opposed to using an automatic trampoliner • And which I’ve been undoing by hand
def eval(exp, env): return pogo.pogo(teval(exp, env, pogo.land)) def teval(exp, env, cont): if expressions.isIf(exp): return evalIf(exp, env, cont) … def evalIf(exp, env, cont): def c(predicate_val): if isTrue(predicate_val): return teval(ifConsequent(exp), env, cont) else: return teval(ifAlternative(exp), env, cont) return teval(expressions.ifPredicate(exp), env, c)
eval def eval(exp, env): if exp.isSelfEvaluating(exp): return exp if exp.isVariable(exp): return env.lookupVariableValue(exp, env) if exp.isQuoted(exp): return evalQuoted(exp, env) if exp.isAssignment(exp): return evalAssignment(exp, env) if exp.isDefinition(exp): return evalDefinition(exp, env) if exp.isIf(exp): return evalIf(exp, env) if exp.isLambda(exp): return exp.makeProcedure(exp.lambdaParameters(exp), exp.lambdaBody(exp), env) if exp.isBegin(exp): return evalSequence(exp.beginActions(exp), env) if exp.isApplication(exp): return evalApplication(exp, env) raise SchemeError, "Unknown expr, eval " + str(exp)
apply def apply(procedure, arguments, env): if exp.isPrimitiveProcedure(procedure): return applyPrimProc(procedure, arguments, env) if exp.isCompoundProcedure(procedure): newEnv = env.extendEnvironment( exp.procedureParameters(procedure), arguments, exp.procedureEnvironment(procedure)) return evalSequence(exp.procedureBody(procedure), newEnv) raise SchemeError, "Unknown proc - apply " + str(procedure)
Environments • An environment will be a list of frames • Each frame will be a Python dictionary with the variable names as keys and their values as values
env THE_EMPTY_ENVIRONMENT = [] def enclosingEnvironment(env): return env[1:] def firstFrame(env): return env[0] def extendEnvironment(var_pairs, val_pairs, base): new_frame = {} vars = toPythonList(var_pairs) vals = toPythonList(val_pairs) if len(vars) != len vals: raise SchemeError, "Mismatched vals and vars" for (var, val) in zip(vars, vals): new_frame[var] = val return new_frame + base_env
Lookup a Variable Value def lookupVariableValue(var, env): while True: if env == THE_EMPTY_ENVIRONMENT: raise SchemeError,"Unbound var “+var frame = firstFrame(env) if frame.has_key(var): return frame[var] env = enclosingEnvironment(env)
Define/Set a Variable def defineVariable(var, val, env): firstFrame(env)[var] = val def setVariableValue(var, val, env): while True: if env == THE_EMPTY_ENVIRONMENT: raise SchemeError, "Unbound variable -- SET! " + var top = firstFrame(env) if top.has_key(var): top[var] = val return env = enclosingEnvironment(env)
Builtins • We’ll define a Python function to handle each of the primitive Scheme functions • Many List functions take any number of args: • (+ 1 2) => 3 • (+ 1 2 3 4 5) => 15 • (+ ) => 0 • We can takuse Python’s (new) syntax for functions that take any number or args, e.g.: • If the last parameter in a function’s parameter list is preceded by a *, it’s bound to a list of the remaining args • def add (*args): sum(args)
Builtins def allNumbers(numbers): for n in numbers: if type(n) not in (types.IntType, types.LongType, types.FloatType): return 0 return 1 def schemeAdd(*numbers): if not allNumbers(numbers): raise SchemeError, "prim + - non-numeric arg” return sum(numbers)
Setting up the initial environment def setupEnvironment(): PRIME_PROCEDURES = [ ["car", pair.car], ["cdr", pair.cdr], ["+", schemeAdd], ... ] init_env = env.extendEnvironment( pair.NIL, pair.NIL, env.THE_EMPTY_ENVIRONMENT) for name, proc in PRIME_PROCEDURES: p = cons(symbol("primitive"), cons(proc, NIL)) defineVariable(symbol(name), p, env) defineVariable(symbol("#t"),symbol("#t"), init_env) defineVariable(symbol("#f"), symbol("#f"), init_env) return initial_environment