540 likes | 674 Views
CSC1015F – Chapter 5, Strings and Input. Michelle Kuttel mkuttel@cs.uct.ac.za. The String Data Type. Used for operating on textual information Think of a string as a sequence of characters To create string literals, enclose them in single, double, or triple quotes as follows:
E N D
CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za
The String Data Type Used for operating on textual information • Think of a string as a sequence of characters To create string literals, enclose them in single, double, or triple quotes as follows: • a = "Hello World" • b = 'Python is groovy' • c = """Computer says 'Noooo'"""
Comments and docstrings • It is common practice for the first statement of function to be a documentation string describing its usage. For example: def hello: “””Hello World function””” print(“Hello”) print(“I love CSC1015F”) This is called a “docstring” and can be printed thus: print(hello.__doc__)
Comments and docstrings • Try printing the doc string for functions you have been using, e.g.: print(input.__doc__) print(eval.__doc__)
Checkpoint Str1: Strings and loops. What does the following function do? def oneAtATime(word): for c in word: print("give us a '",c,"' ... ",c,"!", sep='') print("What do you have? -",word)
Checkpoint Str1a: Indexing examples does this function do? def str1a(word): for i in word: if i in "aeiou": continue print(i,end='')
Some BUILT IN String functions/methods s.capitalize() Capitalizes the first character. s.count(sub) Count the number of occurences of sub in s s.isalnum() Checks whether all characters are alphanumeric. s.isalpha() Checks whether all characters are alphabetic. s.isdigit() Checks whether all characters are digits. s.islower() Checks whether all characters are low- ercase. s.isspace() Checks whether all characters are whitespace.
Some BUILT IN String functions/methods s.istitle() Checks whether the string is a title- cased string (first letter of each word capitalized). s.isupper() Checks whether all characters are uppercase. s.join(t) Joins the strings in sequence t with s as a separator. s.lower() Converts to lowercase. s.lstrip([chrs]) Removes leading whitespace or characters supplied in chrs. s.upper() Converts a string to uppercase.
Some BUILT IN String functions/methods s.replace(oldsub,newsub) Replace all occurrences of oldsub in s with newsub s.find(sub) Find the first occurrence of sub in s
BUILT IN String functions/methods Try printing the doc string for str functions: print(str.isdigit.__doc__)
The String Data Type As string is a sequence of characters, we can access individual characters • called indexing • form: <string>[<expr>] • The last character in a string of n characters has index n-1
String functions: len • len tells you how many characters there are in a string: len(“Jabberwocky”) len(“Twas brillig and the slithy toves did gyre and gimble in the wabe”)
Checkpoint Str2: Indexing examples What does this function do? def str2(word): for i in range(0,len(word),2): print(word[i],end='')
More Indexing examples - indexing from the end What is the output of these lines? greet =“Hello Bob” greet[-1] greet[-2] greet[-3]
Checkpoint Str3 What is the output of these lines? def str3(word): for i in range(len(word)-1,-1,-1): print(word[i],end='')
Chopping strings into pieces: slicing The previous examples can be done much more simply: slicing indexes a range – returns a substring, starting at the first position and running up to, but not including, the last position.
Examples - slicing What is the output of these lines? greet =“Hello Bob” greet[0:3] greet[5:9] greet[:5] greet[5:] greet[:]
Checkpoint Str4: Strings and loops. What does the following function do? def sTree(word): for i in range(len(word)): print(word[0:i+1])
Checkpoint Str5: Strings and loops. What does the following code output? def sTree2(word): step=len(word)//3 for i in range(step,step*3+1,step): for j in range(i): print(word[0:j+1]) print("**\n**\n") sTree2(“strawberries”)
More info on slicing • The slicing operator may be given an optional stride, s[i:j:stride], that causes the slice to skip elements. • Then, i is the starting index; j is the ending index; and the produced subsequence is the elements s[i], s[i+stride], s[i+2*stride], and so forth until index j is reached (which is not included). • The stride may also be negative. • If the starting index is omitted, it is set to the beginning of the sequence if stride is positive or the end of the sequence if stride is negative. • If the ending index j is omitted, it is set to the end of the sequence if stride is positive or the beginning of the sequence if stride is negative.
More on slicing • Here are some examples with strides: a = "Jabberwocky” b = a[::2] # b = 'Jbewcy' c = a[::-2] # c = 'ycwebJ' d = a[0:5:2] # d = 'Jbe' e = a[5:0:-2] # e = 'rba' f = a[:5:1] # f = 'Jabbe' g = a[:5:-1] # g = 'ykcow' h = a[5::1] # h = 'rwocky' i = a[5::-1] # i = 'rebbaJ' j = a[5:0:-1] # 'rebba'
Checkpoint Str6: strides What is the output of these lines? greet =“Hello Bob” greet[8:5:-1]
Checkpoint Str7: Slicing with strides How would you do this function in one line with no loops? def str2(word): for i in range(0,len(word),2): print(word[i],end='')
Checkpoint Str8: • What does this code display? #checkpointStr8.py def crunch(s): m=len(s)//2 print(s[0],s[m],s[-1],sep='+') crunch("omelette") crunch("bug")
Example: filters • Pirate, Elmer Fudd, Swedish Cheff • produce parodies of English speech • How would you write one in Python?
Example: Genetic Algorithms (GA’s) • GA’s attempt to mimic the process of natural evolution in a population of individuals • use the principles of selection and evolution to produce several solutions to a given problem. • biologically-derived techniques such as inheritance, mutation, natural selection, and recombination • a computer simulation in which a population of abstract representations (called chromosomes) of candidate solutions (called individuals) to an optimization problem evolves toward better solutions. • over time, those genetic changes which enhance the viability of an organism tend to predominate
Bioinformatics Example: Crossover (recombination) Evolution works at the chromosome level through the reproductive process • portions of the genetic information of each parent are combined to generate the chromosomes of the offspring • this is called crossover
Crossover Methods Single-Point Crossover • randomly-located cut is made at the pth bit of each parent and crossover occurs • produces 2 different offspring
Gene splicing example (for genetic algorithms) • We can now do a cross-over! • Crossover3.py
Example: palindrome program palindrome |ˈpalɪndrəʊm|noun a word, phrase, or sequence that reads the same backward as forward, e.g., madam or nurses run In Python, write a program to check whether a word is a palindrome. • You don’t need to use loops…
String representation and message encoding • On the computer hardware, strings are also represented as zeros and ones. • Computers represent characters as numeric codes, a unique code for each digit. • an entire string is stored by translating each character to its equivalent code and then storing the whole thing as as a sequence of binary numbers in computer memory • There used to be a number of different codes for storing characters • which caused serious headaches!
ASCII (American Standard Code for Information Interchange) • An important character encoding standard • are used to represent numbers found on a typical (American) computer keyboard as well as some special control codes used for sending and recieveing information • A-Z uses values in range 65-90 • a-z uses values in range 97-122 • in use for a long time: developed for teletypes • American-centric • Extended ASCII codes have been developed
Unicode • A character set that includes all the ASCII characters plus many more exotic characters • http://www.unicode.org • Python supports Unicode standard • ord • returns numeric code of a character • chr • returns character corresponding to a code Unicodes for Cuneiform
Characters in memory • Smallest addressable piece of memory is usually 8 bits, or a byte • how many characters can be represented by a byte?
Characters in memory • Smallest addressable piece of memory is usually 8 bits, or a byte • how many characters can be represented by a byte? • 256 different values (28) • is this enough?
Characters in memory • Smallest addressable piece of memory is usually 8 bits, or a byte • 256 different values is enough for ASCII (only a 7 bit code) • but not enough for UNICODE, with 100 000+ possible characters • UNICODE uses different schemes for packing UNICODE characters into sequences of bytes • UTF-8 most common • uses a single byte for ASCII • up to 4 bytes for more exotic characters
Comparing strings • conditions may compare numbers or strings • when strings are compared, the order is lexographic • strings are put into order based on their Unicode values • e.g “Bbbb” < “bbbb” “B” <”a”
The min function… min(iterable[, key=func]) -> value min(a, b, c, ...[, key=func]) -> value With a single iterable argument, return its smallest item. With two or more arguments, return the smallest argument.
Checkpoint: What do these statements evaluate as? min(“hello”) min(“983456”) min(“Peanut”)
Example 2: DNA Reverse Complement Algorithm • A DNA molecule consists of two strands of nucleotides. Each nucleotide is one of the four molecules adenine, guanine, thymine, or cytosine. • Adenine always pairs with guanine and thymine always pairs with cytosine. • A pair of matched nucleotides is called a base pair • Task: write a Python program to calculate the reverse complement of any DNA strand
Scrabble letter scores • Different languages should have different scores for the letters • how do you work this out? • what is the algorithm?
Related Example: Calculating character (base) frequency • DNA has the alphabet ACGT • BaseFrequency.py
Why would you want to do this? • You can calculate the melting temperature of DNA from the base pair percentage in a DNA • References: • Breslauer et al. Proc. Natl. Acad. Sci. USA 83, 3746-3750 • Baldino et al. Methods in Enzymol. 168, 761-777).
Input/Output as string manipulation • eval evaluates a string as a Python expression. • Very general and can be used to turn strings into nearly any other Python data type • The “Swiss army knife” of string conversion • eval("3+4") • Can also use Python numeric type conversion functions: • int(“4”) • float(“4”) • But string must be a numeric literal of the appropriate form, or will get an error • Can also convert numbers to strings with str function
String formatting with format The built-in s.format() method is used to perform string formatting. • The {} are slots show where the values will go. • You can “name” the values, or access them by their position (counting from zero). >>> a = "Your name is {0} and your age is {age}" >>> a.format("Mike", age=40) • 'Your name is Mike and your age is 40'
Example 4: Better output for Calculating character (base) frequency • BaseFrequency2.py
More on format You can add an optional format specifier to each placeholder using a colon (:) to specify column widths, decimal places, and alignment. general format is: [[fill[align]][sign][0][width] [.precision][type] where each part enclosed in [] is optional. • The width specifier specifies the minimum field width to use • the align specifier is one of '<', '>’, or '^' for left, right, and centered alignment within the field. • An optional fill character fill is used to pad the space
More on format For example: name = "Elwood" r = "{0:<10}".format(name) # r = 'Elwood ' r = "{0:>10}".format(name) # r = ' Elwood' r = "{0:^10}".format(name) # r = ' Elwood ' r = "{0:=^10}".format(name) # r = '==Elwood==‘