Beyond Lists: Other Data Structures

Beyond Lists: Other Data Structures • Lisp would still be a pretty decent programming language if it only contained atoms and lists • But we can go far beyond the list with some of the other CL data structures: • Sequences which include • Arrays • Strings (really arrays of characters) • Vectors (1-D arrays) • Bit-vectors • Characters • Association Lists • Property Lists • Hash Tables • Structures (structs) • Objects/classes (we cover this separately also, later in the semester)

Characters • We will start with characters because we will need to understand characters to understand strings • Do not confuse characters with symbols of one character • Characters are denoted as #\character where character is the single character (such as #\a or #\A) • In the event that the character does not have a single character on the keyboard, we denote it by name • #\space #\tab #\control-a #\bel (for the bell) • Note that case is immaterial when specifying a character by name but matters when specifying a single letter • So #\SPACE = #\space = #\Space, but #\a != #\A • Characters are ordered alphabetically and numerically, but not necessarily ordered by ASCII values • A < Z; a < z; 0 < 9; but in ordinary ASCII, A < a and A < 0, but not necessarily in CL, this is a machine dependent situation • However on the PCs, the characters do follow the ASCII table, 0 < 9 < A < Z < a < z

Character Functions • char-code char – returns char’s code • ASCII value for PCs • Predicate functions: • alpha-char-p, upper-case-p, lower-case-p, • both-case-p • t if the char is in one case and there is a character in the other case, therefore this is true if it is a letter • digit-char-p, alphanumericp • Comparison functions: • char<, char>, char<=, char>=, char/=, char= • Note that these can accept multiple arguments and compares them in order such that (char< #\a #\b #\c) is t whereas (char< #\3 #\5 #\4) is nil • char-equal, char-not-equal, char-lessp, char-greaterp, char-not-greaterp, char-not-lessp – case insensitive versions of the previous group of functions • (char= #\a #\A) is nil whereas (char-equal #\a #\A) is t

Character Conversion Functions • char-upcase, char-downcase – returns the character in a changed case if the character is a letter, otherwise the character is unchanged • character – coerces the given argument into a character if possible, otherwise will signal an error • digit-char – coerce the given digit into a character digit • (character ’a)  #\a • (character ’ab)  error • (character 5)  error • (digit-char 5)  #\5 • (digit-char 12)  nil • (digit-char ’a)  error • char-name – returns the name of the character if you supply the character’s abbreviation • (char-name #\bel)  “bell” • name-char – returns the character of a given name • (name-char “newline”)  #\Newline

Hash Tables • Hash tables, sometimes also called dictionaries, are data structures that store values along with identification keys for easy access • supply the key, the structure returns the data • often, these use a hash function (covered in CSC 364) • In CL, the hash table data structure is available for this task • there are three variants of hash tables based on whether keys are matched using eq, eql or equal • The two basic operations for a hash table are to insert a new item and to access an item given its key – both are accomplished using gethash • (gethash key hashtable)  returns the item referenced by key or nil • (setf (gethash ’george a) ’smith)  places the datum ’smith in the table indexed by the key ’george • (gethash ’george a)  smith • You can delete an entry from a hash table using remhash • (remhash key hashtable)

Hash Table Sizes • Hash Tables are often implemented as very large arrays • for instance, a hash table that stores 10 items might be stored in an array of 100 elements • hash tables can therefore be very wasteful of memory space • in CL to get around this problem, hash tables can grow automatically and so you can specify an original size and a growth size when you create the hash table • (make-hash-table :size x :rehash-size y :rehash-threshold z) – the hash table will start off with size x, and when needed, will increase in size by increments of y, and z denotes at what point the hash table should change in size • (make-hash-table :size 100 :rehash-size 90 :rehash-threshold :80) – when the table has 80 elements added to it, it will become 90 greater • these values default to implementation-specific values if you do not specify them yourself

Successful and Failed Accesses • The gethash function actually returns two values • the value matching the key • and whether the access was successful or not (t or nil) • if the key is not in the hash table, the first response is usually nil and the second one is nil, however you can alter the first response by providing a default return value • (gethash key hashtable defaultvalue) as in • (gethash ’x a ’error)  returns the symbol error if ’x is not found in a • If you use setf to place a new entry into the hash table, and an entry already exists for that key, the entry is replaced by the new one and no error is indicated • therefore, you must be careful when inserting elements into a hash table • you could test to see if an entry exists first • (if (gethash key hashtable) ’error (setf (gethash key hashtable) value))

Other Hash Table Functions • maphash – a mapping function that maps every entry of the hash table onto the given function • each entry in the hash table is actually a pair, the key and the datum, so your function that you are applying must accept two arguments • Example: (defun print-hash-items (x y) (print (list x y))) • (maphash #’print-hash-items a)  prints all entries in the hash table a • clrhash – clears all entries in the hash table and returns the table, now empty • hash-table-count – returns the number of entries currently in the hash table (0 if empty) • because hash tables are opaque with respect to how they work, an alternative approach is to implement an association list, which we look at in a little while

Sequences • Sequences are generic types in Common Lisp that have several subtypes • Lists • Arrays • Vectors (1-D arrays) • Strings (vectors of characters) • Bit-arrays (arrays of bits) • Association and Property Lists (not all sequence functions operate as you might expect on these types of lists) • While each of these has specific functions, there are also functions that can be applied to any sequence • We start by looking at sequence functions keeping in mind that these are applicable to any of the above types of sequences • For many of these, the more specific function for that particular type of sequence will be more efficient • for instance, using the sequence function to access the ith element is less efficient than using the specific access function such as nth for lists and aref for arrays • you are free to use whichever you prefer

Sequence Functions • elt – return the requested element of the sequence • (elt seq i) returns the ith value in seq where the first element is at i = 0 (same as nth for lists) • length – return the size of the sequence • for lists, this is the number of top-level elements, for strings it is the number of characters but for arrays, it is the number of array cells, not the number of elements currently in the array • position – return the location in the sequence of the first occurrence of the given item • (position item seq) – like elt, the first item is at position 0, if the item does not appear, returns nil • remove – remove all occurrences of the given item from the seq • (remove item seq) • delete is the destructive version of remove • substitute – replace all occurrences of a given item with a new item in the sequence • (substitute new old seq) • nsubstitute is a destructive version of substitute

More Sequence Functions • count – number of occurrences of an item in the seq • (count item seq) • reverse, nreverse – same as with lists • find – finds and returns an item in the sequence • (find item seq) – this may not be useful, you already know the value of item, there are other versions of find that we will find more useful • remove-duplicates, delete-duplicates – remove any duplicated items in the given sequence (delete is the destructive version) • (remove-duplicates ’ (1 2 3 4 2 3 5 2 3 6))  (1 4 5 2 3 6) • subseq – return the subsequence starting at the index given and ending at the end of the sequence or at the location before an (optional) ending index • (subseq ’(1 2 3 4 5) 2)  (3 4 5) • (subseq ’(1 2 3 4 5) 2 4)  (3 4)

And A Few More • copy-seq • return a copy of a sequence (not just a pointer) • copies are true when tested with equalp but not eq since the copy does not occupy the same memory • make-sequence – create a sequence of a given type and size, optionally with a given initial value • (make-sequence ’vector 10 :initial-element 0) • concatenate – combines two or more sequences, must be supplied a return-type even if that type is equal to the sequences’ type • (concatenate ’vector “abc” ’(3 4 5))  the vector of a b c 3 4 5 • search – find the first occurrence of subsequence in the sequence • (search sub seq) such as (search “abc” “abacabcdabac”)  4 • mismatch – the opposite of search, returns the index of the first mismatch, nil if the two sequences match exactly

Functions That Apply Functions • We will hold off on covering these in detail until later in the semester when we look at how to apply functions • Each of these takes a function and one or more sequences, and applies the function to the sequence(s) returning either a new sequence or a single value • map – a mapping function that returns a new sequence of a specified type after the given function has been applied to each element of the sequence • (map ’vector #’- ’(1 2 3 4))  vector of -1 -2 -3 -4 • (map ’list #’char-upcase “hello”)  (#\H #\E #\L #\L #\O) • merge – merge two into a new sequence of a specified type • (merge ’vector’(1 3 5) ’(2 6) #’< )  vector of 1 2 3 5 6 • notice that in map and merge, the new type can differ from the original type • reduce – similar to map but reduces the result to a single item • (reduce #’+ ’(1 2 3 4))  10 • sort, some, every, notany, notevery – to be covered later

Using Keywords • Most sequence functions permit optional parameters • :start – provide a starting index • remember, sequences start at index 0 • :end – provide an ending index • this is the index of the element after the last one you want involved, that is, :end is an excluded element • :from-end – if set to t, then the sequence function works from the end of the sequence to the front • :start2 and :end2 – where to start and end in a second sequence if the function calls for two sequences (such as in search) • :test – provide a function to test, used in some of the functions, we won’t bother exploring it yet • :key – another function that can be applied to each sequence element • consider a list of lists as in ’((a b) (c d) (a c) (d e) (a d)) and we want to count how many of the second items are equal to d • (count ’d seq :key #’cadr)  2 (two of the cadrs are ’d)

Arrays • Arrays are a specific type of sequence which have attributes that go beyond the sequence such as being • able to store multiple types of elements • multi-dimensional • flexible in size if desired • this attribute might lead to less efficient array operations • To create an array, use make-array • it accepts an integer specifying the size of the array • or a list of integers that represents the size of each array dimension • optionally you can include • :initial-element followed by a value to initialize all array elements • :element-type followed by a type to restrict the array to elements of the given type – this makes the array more efficient to use • (setf a (make-array 10 :initial-element 0)) • (setf b (make-array (10 20 50))) ;; 3-d array • (setf c (make-array 100 :element-type ’ratio)

Arrays As Lists • Arrays are indicated in CL by #(items) as in • #(1 2 3 4 5) • so if you do (setf a (make-array 4)) then a is #(nil nil nil nil) • We can directly manipulate arrays as if they were lists that start with a #, so for instance, we can create and initialize an array: • (setf a #(1 2 3 4 5)) • using :initial-element only permits you to initialize to the same value • There are several ways to create multidimensional arrays: • (setf a (make-array ’(4 5)) – a is a 2-D array • (setf a (make-array 4 :initial-element (make-array 5 :initial-element))) • (setf a #2A((nil nil nil nil nil) (nil nil nil nil nil) (nil nil nil nil nil) (nil nil nil nil nil))) • (setf a #(#(nil nil nil nil nil) #(nil nil nil nil nil) #(nil nil nil nil nil) #(nil nil nil nil nil)))) • in the latter three cases, the array is an array of arrays • we can use this last approach to create jagged arrays

Accessing Array Elements • To access an element of the array, use aref • or elt • recall that arrays start at element 0 • (aref x 0)  0th element or x[0] • (aref y 0 1 6)  element y[0][1][6] • We can assign an array to equal a section of another array using :displaced-to • (setf a (make-array ’(4 3))) • (setf b (make-array 8 :displaced-to a :displaced-index-offset 2)) • (aref a 0 2) = (aref b 0), (aref 2 1) = (aref b 5) • Aref is setf-able – that is, you assign values into an array using (setf (aref array subscript(s)) value) • (setf (aref a 2 2) 10) – for the 2D array above

Array Functions • array-rank returns the number of dimensions for an array • array-dimension, when given an array and a dimension number, returns the size of that dimension • (setf a #3a(((…) (…) (…)) ((…) (…) (…)))) • (array-rank a)  2 (2-dimensional) • (array-dimension a 1)  3 (dimension 1 has 3 elements) • array-dimensions returns a list of all of the dimensions number of elements • (array-dimensions a)  (x 3) – depending on how many elements make up … • array-total-size – the total number of elements in the array (which is calculated as a product of the dimensions from array-dimensions) • Note that these work only if the array is rectangular, not jagged • array-in-bounds-p – given an array and subscripts, returns t if the subscripts are within legal bounds, nil otherwise • good for error checking prior to an aref operation

Adjustable Arrays • One unique aspect of Common Lisp arrays is that you can make them adjustable • If you declare an array to be of some size XxY and later find that you need to expand it to be (X+m)x(Y+n), you can do that if the array is adjustable • notice that this is not like in Java where you create a new array, copy the old into the new, and then reassign your array pointer variable • you can similarly do array resizing in CL like you do in Java, but this can be inefficient – takes time to copy the old array over • Instead, if you specify your array as :adjustable t when you use make-array, then the array is adjustable • This means that you can arbitrarily change the size of the array without having to do array copying • Adjustable arrays might be less efficiently accessed than ordinary arrays especially if you are adjusting them often

How To Adjust An Array • Use the function adjust-array • (adjust-array array new-dimensions) • new-dimensions will be a single integer for a one-dimensional array, or a list of integers for multi-dimensional arrays • new dimensions must have the same rank as the array had originally (you can change size, but not dimensions) • the changed size can be smaller or larger than the original array • if smaller, obviously some elements are chopped out of the array • if larger, the original elements will remain where they were in the array (although not necessarily where they were in memory) • Adjust-array returns an array of the new size, if we want our array effected, we have to do (setf array (adjust-array array …))

Example • (setf a #2a((1 2 3) (4 5 6) (7 8 9) (10 11 12))) • (array-rank a)  2 • (array-dimensions a)  (4 3) • (setf a (adjust-array a ’(5 3)))  a is now reset to be #2a((1 2 3) (4 5 6) (7 8 9) (10 11 12) (nil nil nil)) • (setf a (adjust-array a ’(5 4)))  a is now reset to be #2a((1 2 3 nil) (4 5 6 nil) (7 8 9 nil) (10 11 12 nil) (nil nil nil nil)) • (setf a (adjust-array a ’(4 4)))  a is now reset to be #2a((1 2 3 nil) (4 5 6 nil) (7 8 9 nil) (10 11 12 nil)) • (setf a (adjust-array a 4))  error, a is unaffected • (setf a (adjust-array a ’(4 4 2))  error, a is unaffected

Strings • Strings are vectors of characters • You could create a string using (make-array size :element-type ’character) where size is the size of the string • Alternatively, you could create a string using “…” as you do in Java as in (setf a “hello”) • And a third alternative is to specify the individual characters in an array such as #(#\a #\b #\c) • note that #(“abc”) is not the same, this would be an array of strings where the array currently only has 1 string • A fourth alternative is to use (make-string size) which has an optional :initial-element • The second and fourth mechanisms will provide the most efficient string and the second mechanism is probably the easiest

String Comparison Functions • string= – compares two strings to see if their corresponding characters are equal (i.e., equal instead of eq) and will always be false if the two strings are of different lengths • string-equal – same except that it ignores case (equalp instead of equal) • string<, string>, string<=, string>=, string/= – again, case-sensitive • string-lessp, string-greaterp, string-not-greaterp, string-not-lessp – case-insensitive • all of these permit :start, :end, :start2, :end2 optional keyword parameters • (equal< s1 s2 :start 5 :end 9 :start2 2 :end2 6)

String Manipulation Functions • string-trim – given a sequence of characters and a string, it returns the string with all of the characters in the sequence trimmed off of the beginning and the ending of the string • (string-trim '(#\0 #\1 #\2 #\3 #\4 #\5 #\6 #\7 #\8 #\9) “853 Pine Street Apt. 34”)  “ Pine Street Apt. ” • (string-trim '(#\0 #\1 #\2 #\3 #\4 #\5 #\6 #\7 #\8 #\9) “853 Pine Street Apt. 34A”)  “ Pine Street Apt. 34A” • there are also functions to trim only the left or right side (string-trim-left, string-trim-right) • string-upcase, string-downcase, string-capitalize – returns the string with all letters in uppercase, lowercase, or starting letters in uppercase, all other characters are unaffected • nstring-upcase, nstring-downcase, nstring-capitalize are destructive versions • All of these functions permit :start and :end parameters • string – returns a string unaffected if the parameter is a string, otherwise converts the argument to a string (if possible) • (string #\a)  “a”, (string 392)  error

Association Lists • Recall that a cons cell whose cdr does not point to a list but instead to an item creates a dotted pair • The association list (a-list or assoc-list) uses this to store a key/value pair in the form (key . value) • the car of any cons cell is the key, the cdr is the value • A list of dotted pairs is an association list • We could create the a-list by hand, or use some of the association list built-in functions • The association list gives us the ability to see what’s going on inside of the dictionary-style data structure, but if the association list becomes lengthy, searching it leads to poorer performance than using the hash table • So, for small sets of data, or for sets of data where you have no idea how big your hash table should be, you can use the association list • Or, for data where you want to do more than what is offered by the hash table, you can use the association list • Otherwise, it is best to use the hash table for both efficiency and simplicity

Adding to an A-List • To add a new dotted pair to an A-list, use acons (similar to cons) • (acons key datum alist) • this is equivalent to (cons (cons x y) a) • note for this to be a true a-list, neither x nor y should be a list • To create pairs, you can use pairlis • (pairlis lis1 lis2)  this creates a group of dotted pairs with corresponding elements from both lists • (pairlis ’(a b) ’(1 2))  ((b . 2) (a . 1)) • the two lists must be of the same length • you can optionally supply an a-list as a third argument, then the new pairs are consed to the original list • (setf alist (pairlis (newkeys) (newdata) alist))  adds to alist the new pairs • Note: a-lists should mimic the usage of hash tables, but there is nothing to prevent you from adding a duplicate key to an a-list!

Accessing Into an A-List • The access command is assoc followed by the key of the item desired, and the alist • (assoc ’fred students)  returns the dotted pair whose car is ’fred (note: this returns the entire cons cell, not just the cdr) • The function is basically doing this: (defun assoc (a lis) (if (equal a (caar lis)) (car lis) (assoc a (cdr lis)))) • If you know the item and want the key, use rassoc • (rassoc ’smith students)  the dotted pair whose cdr is ’smith • This function is the same as the above function except that we have (equal a (cdar lis)) – notice, this is not (cadr!) • You can change entries using rplaca and rplacd with assoc or rassoc • (rplaca (assoc key lis) newkey) – rplaca to replace the key • (rplacd (assoc key lis) newdatum) – rplacd to replace the datum • (rplaca (rassoc datum lis) newkey) – rplaca to replace the key • (replacd (rassoc datum lis) newdatum) – rplacd to replace the datum

Property Lists • A hold-over from early lisp is the property list • The property list is a group of properties (or attributes) attached to a symbol (instead of a variable), and is used to describe that symbolic entity • the p-list, like an a-list, stores pairs of items • unlike the a-list, no key can be duplicated in the p-list • and there is no variable that points to the list, only a symbol • P-list operations are usually destructive • most often, a-list operations are non-destructive and require that you use setf to alter the a-list • The p-list does not use dotted pairs, but instead pairs of values in a list so that the first item is the key and the second is the property of the key • therefore, p-lists will always have an even number of elements • Get is used to access the p-list (get ’plist key) and returns key’s associated property (notice that we use the symbol, not a variable, to access it) • Example: the property list Zappa contains (name Frank job musician status dead) • (get ’Zappa ’job) returns musician • (get ’Zappa ’salary) returns nil • (get ’Zappa ’salary ’unknown) returns unknown

P-list Functions • Aside from get, we use setf to add to a p-list • (setf (get ’zappa ’name) ’Frank) • (setf (get ’zappa ’job) ’musician) • (setf (get ’zappa ’status) ’dead) • notice unlike an A-list, we cannot just create the list as (setf zappa ’(name Frank job musician status dead)) because this would treat zappa as a variable pointing to a list, not a p-list • remprop – removes a property pair from the p-list • (remprop ’zappa ’status) – remember, this is destructive, so you don’t have to reassign zappa to be the list that this returns • symbol-plist – returns the plist for us to use in other functions rather than the symbol • it also returns the list in a somewhat readable format • (symbol-plist ’zappa) would return • (PKG::SYMBOL-NAME-STRING "ZAPPA" STATUS DEAD JOB MUSICIAN NAME FRANK

Using Plists • Since you are attaching a group of properties to a symbol, you cannot directly access the list • In the previous example, the symbol was Zappa • now Zappa is not a variable, so you can’t get access to the p-list like you would with arrays, a-lists or others • To get ahold of the p-list, use symbol-plist as we saw on the previous slide • Operations getf and remf are like get and remprop but operate on the p-lists themselves • (getf plist ’key) = (get ’name ’key) where (symbol-plist ’name)  plist • Finally, get-properties can return all of the properties from a list of properties requested • (get-properties (symbol-plist ’zappa) ’(name job status))  (name Frank job musician status dead)

Structures • Structures allow you to create non-homogenous data structures • In C, these are called structs • In Java, there is no equivalent, although objects are like this • Unlike C and Java however, structures in CL have a lot of short-cut options and accessing functions which are automatically generated for you • To define a structure: • (defstruct name slots) • Slots are just names that you want to call each item/member of the structure • (defstruct person name sex age occupation) • You can supply your defstruct a variety of keyword arguments for any/every slot such as • :type – limit the slot to containing a specific type of datum • :read-only – restrict a slot to be a constant set to the initialization value • and/or provide default values (wrap the name and value in ( )’s)

Generated Functions • Once you have defined your structure, you can now generate instances of the structure and access the slots • make-name generates an instance of a structure called name • If you named it person, then (make-person) creates the instance • You can specify initial values in your function call by using :slot value • (make-person :name ’Frank :age 53) just returns the struct • (setf a (make-person :name ’Frank :age 53)) stores the struct in a variable • Slots without default values, or without initialized values, will be nil • the make-name function is automatically generated when you do defstruct • Another function is of the form structname-slotname, which accesses that particular slot (also generated when you do defstruct) • (person-name a)  returns the value of a’s name slot • Slots are setf-able • (setf (person-age a) 50) • Structures also have type-checking predicate functions generated • (person-p a)  t, alternatively you can do (typep a ’person)

Other Structure Comments • As with arrays, you can instantiate a structure on your own using the form #s(type values) as in: • (setf b #s(person :name ’jim :age 44 :sex ’m)) • fields not listed in such a statement default to their default values or nil • You can also have other functions generated for you in your defstruct statement • define a constructor function • define a copier function (to make a copy of a structure) • define a prediction function • the predicate function is automatically created for you, but this permits you to change the name of the function • To do any of these, you wrap the structure name and these commands in a layer of ( )s, we will see details on the next slides • Finally, you can specify :include struct-type which builds upon struct-type, providing you a form of inheritance • Again, the syntax differs, see the next slide

Example • Let’s flesh out our person example • (defstruct person name (sex ’m) age (occupation ’unknown)) • (setf p1 (make-person :name ’Bob :age 20)) • (setf p2 (make-person :name ’Sue :sex ’f : age 33 :occupation ’professor)) • (print (list “enter occupation for” (person-name p1))) • (setf (person-occupation p1) (read)) • (defstruct (doctor (:include person)) medical-school (done-interning t)) • notice here that the name of this structure and its “parent” are placed inside of parens, and then we list the additional slots • (setf p3 (make-doctor :name ’Fred :age 49 :occupation ’doctor :medical-school ’OSU)) • we can similarly define a :print-function to specify how the structure should be printed out, and :conc-name if we want to alter the slot names to not have to include the structure’s name • for instance change (person-name p1) to be (p-n p1) • details for both of these are given in http://www.psg.com/~dlamkins/sl/chapter06.html

More on Structure “Inheritance” • Notice in the previous example we had to specify the occupation in p3 to give it an initial value • we would normally prefer to include (occupation ’doctor) in the defstruct for doctor so that all doctors have a default occupation of doctor, but unfortunately that would override part of person’s definition • for structures, the form of inheritance that we see is not controllable, unlike in OOP, so we can’t override what we inherit • we similarly cannot use any form of multiple inheritance for structures • we can get around both of these problems by using classes, something we will study later in the semester • If a variable points to a structure that inherits from another, which structure do you use to specify the slots? • Either • So p1’s slots are accessed by (person-slot p1) but p3’s slots can be accessed either by (person-slot p3) or (doctor-slot p3) • with the exception of those slots only defined by the doctor structure, they can only be accessed as doctor-slot • (person-p p1), (person-p p2) and (doctor-p p2) return t and (doctor-p p1) is nil as we might expect

Using the Constructor • The :constructor slot allows you to specify the name of a default constructor along with the parameters that the constructor expects • You can only use this option if you have placed the structure’s name inside of parens • (defstruct (foobar (:constructor construct-foobar (…))) …) rather than (defstruct foobar (:constructor…)) • The parameters listed should be the same names as the slots • You can use &key or &optional as desired • So, we change our person as follows: • (defstruct (person (:constructor construct-person (name &optional age sex occupation))) name age (sex ’m) (occupation ’unknown)) • Now we can do (make-person) or (make-person :name…) or we can do (construct-person ’fred), etc. • This prevents us from having to initialize slots by using the clunky :slot-name value format

Beyond Lists: Other Data Structures