350 likes | 435 Views
Chapter 10. Data Collections. Python Lists. Classes allow us to group heterogeneous kinds of data—integers, strings, Booleans, floats—with methods, and we can decide exactly what bits of data we want to group together.
E N D
Chapter 10 Data Collections
Python Lists • Classes allow us to group heterogeneous kinds of data—integers, strings, Booleans, floats—with methods, and we can decide exactly what bits of data we want to group together. • But sometimes we have numerous bits of the related data—a list of test scores, names, GPAs, etc. We need to COLLECT all these associated bits of data into one object.
Python Lists • Python lists are simply ordered sequences of items. We’ve already seen some of these, when we split a string for example:>>>string.split(“This is an historic day”)[‘This’, ‘is’, ‘an’, ‘historic’, ‘day’] • While mathematicians use subscripts to denote values in a sequence, programmers use index numbers. We’ve already seen this too. X = x1, x2, x3… xn
Math Python • If we were to sum all the test scores, in mathematics we would write: • n-1∑ scoreii = 0which just means start at the score0 value, add each subsequent value until you reach n-1, whatever number n is.
Math Python • In Python, that formula would be executed as:sum = 0for i in range(n): sum = sum + score[i]
Arrays • In Python, we call a sequence of items a list; in other programming languages like C++ or Java, these are called arrays. • But arrays have two features Python lists do not share. These are:>Arrays hold only one data type. Arrays are thus homogenous. >Arrays are of a fixed memory size. That is, the programmer must specify how many elements are in the array when it is created.
Python lists • By contrast Python lists are:>Dynamic. Lists can shrink or expand as needed. The number of elements in the list do not have to be specified when the list is created. >Hetereogenous. Python lists can hold a variety of data types—integers, floats, Booleans, characters, etc. • Python lists are mutable sequences of arbitrary objects.
List operations • Since strings are lists of characters, the same operations we used for strings work for lists in general.
Membership check • A membership check just means that some value you seek is or is not included in a particular list. For example, you might have a list of test scores, and you wanted to know if 80 was one of those scores. So:>>>scores = 66, 77, 88, 75, 80, 91, 89>>>80 in scoresTrue • This would apply to a string as well…. • >>>myS = “Barack Obama”>>> ‘bama’ in mySTrue
Lists and strings • Lists are mutable: they can expand or shrink dynamically as needed • Strings are not mutable. • Ergo, you can reassign a value to a list—any data type, too.>>>scores = 66, 77, 88, 75, 80, 91, 89>>>scores[3] = 99>>>scores[66, 77, 88, 99, 80, 91, 89]
Building a list • You can use repetition to build a list, but only if you want the same value in every slot:>>>scores = [78] * 15This will fill 15 slots with the value 78. • You can also use the append method to fill a list on the fly.
Workshop • Start with a numeric list: 92, 2, 59, 5, 77, 9, 31, 2, 55, 0, 4 • Perform the following functions on that list: • Append the number 16 • Sort the list from highest to lowest (hint: use the parameter lambda x, y: y-x • Reverse the list you just sorted • Find the index value of the first occurrence of 2 • Insert the value 55 into index slot 6 • Count the occurrences of 55 • Remove the first occurrence of 9 in the list • Delete the eighth element and determine the deleted value • Delete elements 3-6
Outline of stats.py • 1. getNumbers() • 2. mean(nums) • 3. stdDev(nums, xbar) • 4. median(nums) • 5. main()
from math import sqrt def getNumbers(): nums = [] # start with an empty list # sentinel loop to get numbers xStr = raw_input("Enter a number (<Enter> to quit) >> ") while xStr != "": x = eval(xStr) nums.append(x) # add this value to the list xStr = raw_input("Enter a number (<Enter> to quit) >> ") return nums
def mean(nums): sum = 0.0 for num in nums: sum = sum + num return sum / len(nums)
def stdDev(nums, xbar=None): if xbar == None: xbar = mean(nums) sumDevSq = 0.0 for num in nums: dev = num - xbar sumDevSq = sumDevSq + dev * dev return sqrt(sumDevSq/(len(nums)-1))
def median(nums): nums.sort() size = len(nums) midPos = size / 2 if size % 2 == 0: median = (nums[midPos] + nums[midPos-1]) / 2.0 else: median = nums[midPos] return median
def main(): print 'This program computes mean, median and standard deviation.' data = getNumbers() xbar = mean(data) std = stdDev(data, xbar) med = median(data) print '\nThe mean is', xbar print 'The standard deviation is', std print 'The median is', med if __name__ == '__main__': main() This line exists so that the file can be used as either a reusable module or as a standalone program
Lists of objects • Lists can be used to store collection of records, such as student GPA data or customer records • Such lists can be easily sorted. A student GPA list might be sorted by highest GPA; a customer record list might be stored by largest balance due, frequency of orders, etc.
Sorting for GPA • Pseudocode algorithm: • Get the name of input file from user • Read student records into a list • Sort the list by GPA • Get the name of output file from user • Write the sorted records to the file
Outline of gpasort.py • 1. import Student class from gpa.py • 2. import makeStudent function from gpa.py • 3. readStudents(filename) • 4. writeStudents(students, filename) • 5. cmpGPA(s1, s2) • 6. main()
From gpa import Student, makeStudent class Student: def __init__(self, name, hours, qpoints): self.name = name self.hours = float(hours) self.qpoints = float(qpoints) def getName(self): return self.name def getHours(self): return self.hours def getQPoints(self): return self.qpoints def gpa(self): return self.qpoints/self.hours Imports the Student class with data fields and methods defined
From gpa import Student, makeStudent def makeStudent(infoStr): name, hours, qpoints = string.split(infoStr,"\t") return Student(name, hours, qpoints) Takes a line from the input file and splits the tab-separated data into instance variables
This function creates an initially empty list, and then appends a student record into this empty list, creating a list of records def readStudents(filename): infile = open(filename, 'r') students = [] for line in infile: students.append(makeStudent(line)) infile.close() return students
This function takes a list of student records and writes each record, separated by a tab, into an new output file; the user creates the filename def writeStudents(students, filename): outfile = open(filename, 'w') for s in students: outfile.write("%s\t%f\t%f\n" % (s.getName(), s.getHours(), s.getQPoints())) outfile.close()
This function compares the GPA values of two student records and returns a -1 if the first > second, 0 if first == second, 1 if second > first. Notice that the values compared are the return values of a small function associated with the Student class. This is then used to sort the whole list of student records def cmpGPA(s1, s2): return cmp(s1.gpa(), s2.gpa())
What is the cmp() function? • cmp() is a built-in function (no libraries need to be imported) • cmp() returns a -1 if the first value is greater than the second, a 0 if they are equal, and a 1 if the second value is greater than the first • In the gpasort.py code, cmpGPA() is referenced as a parameter for the sort() method:data.sort(cmpGPA) Notice that there are no parentheses. The function is passed, not invoked; invocation is used by sort() when needed
def main(): print "This program sorts student grade information by GPA" filename = raw_input("Enter the name of the data file: ") data = readStudents(filename) data.sort(cmpGPA) filename = raw_input("Enter a name for the output file: ") writeStudents(data, filename) print "The data has been written to", filename if __name__ == '__main__': main()
Dictionaries • In large record collections, we may need different ways of looking up information. If you have a video store account, employees might look up your record by zip code, telephone number, or last name. • These values (zip, phone, name) when associated with a bigger record are called key-value pair. They’re a handy way of having keys to records.
Mapping • Mapping refers to associating (through syntax) a key with a particular record or another value. • If I wanted to map student names with test scores, I could write: >>>scores = {“Stan”: “79”, “Sara”: “92”, “Beaufort”: “88”}>>>scores[“Sara”]‘92’ • The first value is the key, the second the value associated with that key. So >>>scores[‘92’] won’t work
Dictionary operations • Has_key(<key>) returns true or false depending on whether the key is in the dictionary or not:>>>scores.has_key(“Stan”)True>>>scores.has_key(“stan”)False
<dictionary>.keys() returns a list of the keys>>>scores.keys()[‘Stan’, ‘Sara’, ‘Beaufort’] • <dictionary>.values() returns a list of values>>>scores.values()[‘79’, ’92’, ‘88’] • <dictionary>.items returns a list of the key-value pairs>>>scores.items[(‘Stan’, ‘79’), (‘Sara’, ‘92’), (‘Beaufort’, ‘88’)]
<dictionary>.get(<key>, <default>) returns the value associated with a key if it exists; otherwise, it returns default. >>>scores.get(“Sara”, “sucker”)‘92’>>>scores.get(“Barack”, “sucker”)‘sucker’ • >>>del scores[“Beaufort”] deletes that entire pair • >>>scores.clear() will delete all entries in the dictionary but not erase the dictionary