Prolog for Linguists Symbolic Systems 139P/239P

Prolog for Linguists Symbolic Systems 139P/239P John Dowding Week 4, October 29, 2001 jdowding@stanford.edu

Office Hours • We have reserved 4 workstations in the Unix Cluster in Meyer library, fables 1-4 • Skipping 4:30-5:30 on Thursday this week • Friday 3:30-4:30, after NLP Reading Group this week • If not, contact me and we can make other arrangements

Course Schedule • Oct. 8 • Oct. 15 • Oct. 22 • Oct. 29 • Nov. 5 (double up) • Nov. 12 • Nov. 26 (double up) • Dec. 3 No class on Nov. 19

Homework • By now, we have covered most of Chapters 3 and 4 of Clocksin and Mellish. Read them and let me know if you have any questions. • Homework to be handed in by noon on the 29th.

subterm/2 % subterm(+SubTerm, +Term) subterm(Term1, Term2):- Term1 == Term2, !. subterm(SubTerm, Term):- compound(Term), functor(Term, _Functor, Arity), subterm_helper(Arity, SubTerm, Term).

subterm_helper/3 % subterm_helper(+Index, +SubTerm, +Term) subterm_helper(Index, SubTerm, Term):- Index > 0, arg(Index, Term, Arg), subterm(SubTerm, Arg), !. subterm_helper(Index, SubTerm, Term):- Index > 0, NextIndex is Index - 1, subterm_helper(NextIndex, SubTerm, Term).

occurs_in/2 % occurs_in(-Var, +Term) occurs_in(Var, Term):- var(Var), Var == Term, !. occurs_in(Var, Term):- compound(Term), functor(Term, _Functor, Arity), occurs_in_helper(Arity, Var, Term).

occurs_in_helper/3 %occurs_in_helper(+Index, -Var, +Term) occurs_in_helper(Index, Var, Term):- Index > 0, arg(Index, Term, Arg), occurs_in(Var, Arg). occurs_in_helper(Index, Var, Term):- Index > 0, NextIndex is Index - 1, occurs_in_helper(NextIndex, Var, Term).

(Or) occurs_in/2 % occurs_in(-Var, +Term):- occurs_in(Var, Term):- var(Var), subterm(Var, Term).

replace_all/4 % replace_all(+Element, +Term, +NewElement, -ResultTerm) replace_all(Element, Term, NewElement, NewElement):- Element == Term, !. replace_all(_Element, Term, _NewElement, Term):- atomic(Term). replace_all(_Element, Term, _NewElement, Term):- var(Term). replace_all(Element, Term, NewElement, ResultTerm):- compound(Term), functor(Term, Functor, Arity), functor(ResultTerm, Functor, Arity), replace_all_helper(Arity, Element, Term, NewElement, ResultTerm).

replace_all_helper/5 replace_all_helper(0, _Element, _Term, _NewElement, _ResultTerm):- !. replace_all_helper(Index, Element, Term, NewElement, ResultTerm):- arg(Index, Term, Arg), arg(Index, ResultTerm, ResultArg), replace_all(Element, Arg, NewElement, ResultArg), NextIndex is Index - 1, replace_all_helper(NextIndex, Element, Term, NewElement, ResultTerm).

flatten/2 % flatten(+List, -ListOfAtoms). flatten([], []) :- !. flatten(Atomic, [Atomic]):- atomic(Atomic). flatten([Head|Tail], ListOfAtoms):- flatten(Head, ListOfAtoms1), flatten(Tail, ListOfAtoms2), append(ListOfAtoms1, ListOfAtoms2, ListOfAtoms).

flatten_dl/2 % flatten_dl(+List, -ListOfAtoms) flatten_dl(List, ListOfAtoms):- flatten_dl_helper(List, (ListOfAtoms-[])). % flatten_dl_helper(+List, +DelayList) flatten_dl_helper([], (Empty-Empty)):- !. flatten_dl_helper(Atomic, ([Atomic|Back]-Back)):- atomic(Atomic). flatten_dl_helper([Head|Tail], (Front-Back)):- flatten_dl_helper(Head, (Front-NextBack)), flatten_dl_helper(Tail, (NextBack-Back)).

Could have written subterm/2 as: %subterm(+SubTerm, +Term) subterm(SubTerm, Term):- replace_all(SubTerm, Term, _AnyThing, NewTerm), \+ Term == NewTerm. • But this would be slower

Accumulators • Build up partial results to return at the end list_length([], 0). list_length([_Head|Tail], Result):- list_length(Tail, N), Result is N +1. list_length(List, Result) :- list_length_helper(List, 0, Result). list_length_helper([], Result, Result). list_length_helper([_Head|Tail], Partial, Result):- NextPartial is Partial + 1, list_length_helper(Tail, NextPartial, Result).

flatten/2 with an accumulator %flatten_acc(+ListOfLists, -ListOfAtoms) flatten_acc(List, ListOfAtoms):- flatten_acc_helper(List, [], ListOfAtoms). %flatten_acc_help(+ListOfLists, +PartialResult, -FinalResult) flatten_acc_helper([], PartialResult, PartialResult):- !. flatten_acc_helper(Atomic, Partial, [Atomic|Partial]):- atomic(Atomic), !. flatten_acc_helper([Head|Tail], PartialResult, FinalResult):- flatten_acc_helper(Tail, PartialResult, NextResult), flatten_acc_helper(Head, NextResult, FinalResult).

Difference Lists • Use two logical variables that point to different portions of the same list. • Compare stacks with queues:

Queues • Queue represented as a pair of lists (Front-Back) • Back is always a variable %empty_queue(?Queue) – true if the queue is empty empty_queue(Queue-Queue). %add_to_queue(+Element, +Queue, -NewQueue) add_to_queue(Element, (Front-[Element| Back]), (Front-Back)). %remove_from_queue(+Queue, -Element, -NewQueue) remove_from_queue(([Element|Front]-Back), Element, (Front-Back)).

Generate-and-Test • Popular (and sometimes efficient) way to write a program. Goal :- Generator, - generates candidate solutions Tester. - verifies correct answers

One more generate and test example • N-Queens Problem

Unification • Two terms unify iff there is a set of substitutions of variables with terms that makes the terms identical • True unification disallows cyclic terms: • X=f(X) ought to fail because there is no finite term that can substitute for X to make those terms identical. • This is called the occurs check. • Prolog unification does not enforce the occurs check, and may create cyclic terms • Occurs check is expensive • O(n) – n is the size of the smaller of the two terms • O(n+m) – n and m are the sizes of the two terms • In Prolog, it is quite typical to unify a variable with a larger term

%unify_woc(?Term1, ?Term2) unify_woc(Var1, Term2):- var(Var1), !, \+ occurs_in(Var1, Term2), Var1 = Term2. unify_woc(Term1, Var2):- var(Var2), !, \+ occurs_in(Var2, Term1), Var2 = Term1. unify_woc(Atomic1,Atomic2):- atomic(Atomic1), atomic(Atomic2), !, Atomic1 == Atomic2. unify_woc(Term1, Term2):- compound(Term1), compound(Term2), functor(Term1, Functor, Arity), functor(Term2, Functor, Arity), unify_woc_helper(Arity, Term1, Term2). unify_woc/2 (with occurs check)

unify_woc_helper/3 % unify_woc_helper(+Index, +Term1, +Term2) unify_woc_helper(0, _Term1, _Term2):- !. unify_woc_helper(Index, Term1, Term2):- arg(Index, Term1, Arg1), arg(Index, Term2, Arg2), unify_woc(Arg1, Arg2), NextIndex is Index - 1, unify_woc_helper(NextIndex, Term1, Term2).

More about cut! • Common to distinguish between red cuts and green cuts • Red cuts change the solutions of a predicate • Green cuts do not change the solutions, but effect the efficiency • Most of the cuts we have used so far are all red cuts %delete_all(+Element, +List, -NewList) delete_all(_Element, [], []). delete_all(Element, [Element|List], NewList) :- !, delete_all(Element, List, NewList). delete_all(Element, [Head|List], [Head|NewList]) :- delete_all(Element, List, NewList).

Green cuts • Green cuts can be used to avoid unproductive backtracking % identical(?Term1, ?Term2) identical(Var1, Var2):- var(Var1), var(Var2), !, Var1 == Var2. identical(Atomic1,Atomic2):- atomic(Atomic1), atomic(Atomic2), !, Atomic1 == Atomic2. identical(Term1, Term2):- compound(Term1), compound(Term2), functor(Term1, Functor, Arity), functor(Term2, Functor, Arity), identical_helper(Arity, Term1, Term2).

Technique: moving unifications after the cut % parent(+Person, -NumParents) parent(adam, 0):- !. parent(eve, 0):- !. parent(_EverybodyElse, 2). • The goal parent(eve, 2). Succeeds % parent(+Person, ?NumParents). parent(adam, NumParent):- !, NumParents = 0. parent(eve, NumParent):- !, NumParent = 0. parent(_EverybodyElse, 2).

Last Call Optimization • Generalization of Tail-Recursion Optimization • Turns recursions into iteration by reusing stackframe • When about to execute last Goal in a clause, • If there are no more choices points for the predicate, • And no choice points from earlier Goals in clause delete_all(_Element, [], []). delete_all(Element, [Element|List], NewList) :- !, delete_all(Element, List, NewList). delete_all(Element, [Head|List], [Head|NewList]) :- delete_all(Element, List, NewList).

Advice on cuts • Dangerous, easy to misuse • Rules of thumb: • Use sparingly • Use with as narrow scope as possible • Know which choice points you are removing • Green cuts may be unnecessary, sometimes the compiler can figure it out.

Input/Output of Terms • Input and Output in Prolog takes place on Streams • By default, input comes from the keyboard, and output goes to the screen. • Three special streams: • user_input • user_output • user_error • read(-Term) • write(+Term) • nl

Example: Input/Output • repeat/0 is a built-in predicate that will always resucceed % classifing terms classify_term :- repeat, write('What term should I classify? '), nl, read(Term), process_term(Term), Term == end_of_file.

I/O Example (cont) process_term(Atomic):- atomic(Atomic), !, write(Atomic), write(' is atomic.'), nl. process_term(Variable):- var(Variable), !, write(Variable), write(' is a variable.'), nl. process_term(Term):- compound(Term), write(Term), write(' is a compound term.‘), nl.

Streams • You can create streams with open/3 open(+FileName, +Mode, -Stream) • Mode is one of read, write, or append. • When finished reading or writing from a Stream, it should be closed with close(+Stream) • There are Stream-versions of other Input/Output predicates • read(+Stream, -Term) • write(+Stream, +Term) • nl(+Stream)

Characters and character I/O • Prolog represents characters in two ways: • Single character atoms ‘a’, ‘b’, ‘c’ • Character codes • Numbers that represent the character in some character encoding scheme (like ASCII) • By default, the character encoding scheme is ASCII, but others are possible for handling international character sets. • Input and Output predicates for characters follow a naming convention: • If the predicate deals with single character atoms, it’s name ends in _char. • If the predicate deals with character codes, it’s name ends in _code. • Characters are character codes is traditional “Edinburgh” Prolog, but single character atoms were introduced in the ISO Prolog Standard.

Special Syntax I • Prolog has a special syntax for typing character codes: • 0’a is a expression that means the character codc that represents the character a in the current character encoding scheme.

Special Syntax II • A sequence of characters enclosed in double quote marks is a shorthand for a list containing those character codes. • “abc” = [97, 98, 99] • It is possible to change this default behavior to one in which uses single character atoms instead of character codes, but we won’t do that here.

Built-in Predicates: • atom_chars(Atom, CharacterCodes) • Converts an Atom to it’s corresponding list of character codes, • Or, converts a list of CharacterCodes to an Atom. • put_code(Code) and put_code(Stream, Code) • Write the character represented by Code • get_code(Code) and get_code(Stream, Code) • Read a character, and return it’s corresponding Code • Checking the status of a Stream: • at_end_of_file(Stream) • at_end_of_line(Stream)

Tokenizer • A token is a sequence of characters that constitute a single unit • What counts as a token will vary • A token for a programming language may be different from a token for, say, English. • We will start to write a tokenizer for English, and build on it in further classes

Tokenizer for English • Most tokens are consecutive alphabetic characters, separated by white space • Except for some characters that always form a single token on their own: . ‘ ! ? -

Homework • Read section in SICTus Prolog manual on Input/Output • This material corresponds to Ch. 5 in Clocksin and Mellish, but the Prolog manual is more up to date and consistent with the ISO Prolog Standard • Improve the tokenizer by adding support for contractions • can’t., won’t haven’t, etc. • would’ve, should’ve • I’ll, she’ll, he’ll • He’s, She’s, (contracted is and contracted has, and possessive) • Don’t hand this in, but hold on to it, you’ll need it later.

Prolog for Linguists Symbolic Systems 139P/239P