1 / 54

CSC1015F – Chapter 5, Strings and Input

CSC1015F – Chapter 5, Strings and Input. Michelle Kuttel mkuttel@cs.uct.ac.za. The String Data Type. Used for operating on textual information Think of a string as a sequence of characters To create string literals, enclose them in single, double, or triple quotes as follows:

Download Presentation

CSC1015F – Chapter 5, Strings and Input

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za

  2. The String Data Type Used for operating on textual information • Think of a string as a sequence of characters To create string literals, enclose them in single, double, or triple quotes as follows: • a = "Hello World" • b = 'Python is groovy' • c = """Computer says 'Noooo'"""

  3. Comments and docstrings • It is common practice for the first statement of function to be a documentation string describing its usage. For example: def hello: “””Hello World function””” print(“Hello”) print(“I love CSC1015F”) This is called a “docstring” and can be printed thus: print(hello.__doc__)

  4. Comments and docstrings • Try printing the doc string for functions you have been using, e.g.: print(input.__doc__) print(eval.__doc__)

  5. Checkpoint Str1: Strings and loops. What does the following function do? def oneAtATime(word): for c in word: print("give us a '",c,"' ... ",c,"!", sep='') print("What do you have? -",word)

  6. Checkpoint Str1a: Indexing examples does this function do? def str1a(word): for i in word: if i in "aeiou": continue print(i,end='')

  7. Some BUILT IN String functions/methods s.capitalize() Capitalizes the first character. s.count(sub) Count the number of occurences of sub in s s.isalnum() Checks whether all characters are alphanumeric. s.isalpha() Checks whether all characters are alphabetic. s.isdigit() Checks whether all characters are digits. s.islower() Checks whether all characters are low- ercase. s.isspace() Checks whether all characters are whitespace.

  8. Some BUILT IN String functions/methods s.istitle() Checks whether the string is a title- cased string (first letter of each word capitalized). s.isupper() Checks whether all characters are uppercase. s.join(t) Joins the strings in sequence t with s as a separator. s.lower() Converts to lowercase. s.lstrip([chrs]) Removes leading whitespace or characters supplied in chrs. s.upper() Converts a string to uppercase.

  9. Some BUILT IN String functions/methods s.replace(oldsub,newsub) Replace all occurrences of oldsub in s with newsub s.find(sub) Find the first occurrence of sub in s

  10. BUILT IN String functions/methods Try printing the doc string for str functions: print(str.isdigit.__doc__)

  11. The String Data Type As string is a sequence of characters, we can access individual characters • called indexing • form: <string>[<expr>] • The last character in a string of n characters has index n-1

  12. String functions: len • len tells you how many characters there are in a string: len(“Jabberwocky”) len(“Twas brillig and the slithy toves did gyre and gimble in the wabe”)

  13. Checkpoint Str2: Indexing examples What does this function do? def str2(word): for i in range(0,len(word),2): print(word[i],end='')

  14. More Indexing examples - indexing from the end What is the output of these lines? greet =“Hello Bob” greet[-1] greet[-2] greet[-3]

  15. Checkpoint Str3 What is the output of these lines? def str3(word): for i in range(len(word)-1,-1,-1): print(word[i],end='')

  16. Chopping strings into pieces: slicing The previous examples can be done much more simply: slicing indexes a range – returns a substring, starting at the first position and running up to, but not including, the last position.

  17. Examples - slicing What is the output of these lines? greet =“Hello Bob” greet[0:3] greet[5:9] greet[:5] greet[5:] greet[:]

  18. Checkpoint Str4: Strings and loops. What does the following function do? def sTree(word): for i in range(len(word)): print(word[0:i+1])

  19. Checkpoint Str5: Strings and loops. What does the following code output? def sTree2(word): step=len(word)//3 for i in range(step,step*3+1,step): for j in range(i): print(word[0:j+1]) print("**\n**\n") sTree2(“strawberries”)

  20. More info on slicing • The slicing operator may be given an optional stride, s[i:j:stride], that causes the slice to skip elements. • Then, i is the starting index; j is the ending index; and the produced subsequence is the elements s[i], s[i+stride], s[i+2*stride], and so forth until index j is reached (which is not included). • The stride may also be negative. • If the starting index is omitted, it is set to the beginning of the sequence if stride is positive or the end of the sequence if stride is negative. • If the ending index j is omitted, it is set to the end of the sequence if stride is positive or the beginning of the sequence if stride is negative.

  21. More on slicing • Here are some examples with strides: a = "Jabberwocky” b = a[::2] # b = 'Jbewcy' c = a[::-2] # c = 'ycwebJ' d = a[0:5:2] # d = 'Jbe' e = a[5:0:-2] # e = 'rba' f = a[:5:1] # f = 'Jabbe' g = a[:5:-1] # g = 'ykcow' h = a[5::1] # h = 'rwocky' i = a[5::-1] # i = 'rebbaJ' j = a[5:0:-1] # 'rebba'

  22. Checkpoint Str6: strides What is the output of these lines? greet =“Hello Bob” greet[8:5:-1]

  23. Checkpoint Str7: Slicing with strides How would you do this function in one line with no loops? def str2(word): for i in range(0,len(word),2): print(word[i],end='')

  24. Checkpoint Str8: • What does this code display? #checkpointStr8.py def crunch(s): m=len(s)//2 print(s[0],s[m],s[-1],sep='+') crunch("omelette") crunch("bug")

  25. Example: filters • Pirate, Elmer Fudd, Swedish Cheff • produce parodies of English speech • How would you write one in Python?

  26. Example: Genetic Algorithms (GA’s) • GA’s attempt to mimic the process of natural evolution in a population of individuals • use the principles of selection and evolution to produce several solutions to a given problem. • biologically-derived techniques such as inheritance, mutation, natural selection, and recombination • a computer simulation in which a population of abstract representations (called chromosomes) of candidate solutions (called individuals) to an optimization problem evolves toward better solutions. • over time, those genetic changes which enhance the viability of an organism tend to predominate

  27. Bioinformatics Example: Crossover (recombination) Evolution works at the chromosome level through the reproductive process • portions of the genetic information of each parent are combined to generate the chromosomes of the offspring • this is called crossover

  28. Crossover Methods Single-Point Crossover • randomly-located cut is made at the pth bit of each parent and crossover occurs • produces 2 different offspring

  29. Gene splicing example (for genetic algorithms) • We can now do a cross-over! • Crossover3.py

  30. Example: palindrome program palindrome |ˈpalɪndrəʊm|noun a word, phrase, or sequence that reads the same backward as forward, e.g., madam or nurses run In Python, write a program to check whether a word is a palindrome. • You don’t need to use loops…

  31. String representation and message encoding • On the computer hardware, strings are also represented as zeros and ones. • Computers represent characters as numeric codes, a unique code for each digit. • an entire string is stored by translating each character to its equivalent code and then storing the whole thing as as a sequence of binary numbers in computer memory • There used to be a number of different codes for storing characters • which caused serious headaches!

  32. ASCII (American Standard Code for Information Interchange) • An important character encoding standard • are used to represent numbers found on a typical (American) computer keyboard as well as some special control codes used for sending and recieveing information • A-Z uses values in range 65-90 • a-z uses values in range 97-122 • in use for a long time: developed for teletypes • American-centric • Extended ASCII codes have been developed

  33. Unicode • A character set that includes all the ASCII characters plus many more exotic characters • http://www.unicode.org • Python supports Unicode standard • ord • returns numeric code of a character • chr • returns character corresponding to a code Unicodes for Cuneiform

  34. Characters in memory • Smallest addressable piece of memory is usually 8 bits, or a byte • how many characters can be represented by a byte?

  35. Characters in memory • Smallest addressable piece of memory is usually 8 bits, or a byte • how many characters can be represented by a byte? • 256 different values (28) • is this enough?

  36. Characters in memory • Smallest addressable piece of memory is usually 8 bits, or a byte • 256 different values is enough for ASCII (only a 7 bit code) • but not enough for UNICODE, with 100 000+ possible characters • UNICODE uses different schemes for packing UNICODE characters into sequences of bytes • UTF-8 most common • uses a single byte for ASCII • up to 4 bytes for more exotic characters

  37. Comparing strings • conditions may compare numbers or strings • when strings are compared, the order is lexographic • strings are put into order based on their Unicode values • e.g “Bbbb” < “bbbb” “B” <”a”

  38. The min function… min(iterable[, key=func]) -> value min(a, b, c, ...[, key=func]) -> value With a single iterable argument, return its smallest item. With two or more arguments, return the smallest argument.

  39. Checkpoint: What do these statements evaluate as? min(“hello”) min(“983456”) min(“Peanut”)

  40. Example 2: DNA Reverse Complement Algorithm • A DNA molecule consists of two strands of nucleotides. Each nucleotide is one of the four molecules adenine, guanine, thymine, or cytosine. • Adenine always pairs with guanine and thymine always pairs with cytosine. • A pair of matched nucleotides is called a base pair • Task: write a Python program to calculate the reverse complement of any DNA strand

  41. Scrabble letter scores • Different languages should have different scores for the letters • how do you work this out? • what is the algorithm?

  42. Related Example: Calculating character (base) frequency • DNA has the alphabet ACGT • BaseFrequency.py

  43. Why would you want to do this? • You can calculate the melting temperature of DNA from the base pair percentage in a DNA • References: • Breslauer et al. Proc. Natl. Acad. Sci. USA 83, 3746-3750 • Baldino et al. Methods in Enzymol. 168, 761-777).

  44. Input/Output as string manipulation • eval evaluates a string as a Python expression. • Very general and can be used to turn strings into nearly any other Python data type • The “Swiss army knife” of string conversion • eval("3+4") • Can also use Python numeric type conversion functions: • int(“4”) • float(“4”) • But string must be a numeric literal of the appropriate form, or will get an error • Can also convert numbers to strings with str function

  45. String formatting with format The built-in s.format() method is used to perform string formatting. • The {} are slots show where the values will go. • You can “name” the values, or access them by their position (counting from zero). >>> a = "Your name is {0} and your age is {age}" >>> a.format("Mike", age=40) • 'Your name is Mike and your age is 40'

  46. Example 4: Better output for Calculating character (base) frequency • BaseFrequency2.py

  47. More on format You can add an optional format specifier to each placeholder using a colon (:) to specify column widths, decimal places, and alignment. general format is: [[fill[align]][sign][0][width] [.precision][type] where each part enclosed in [] is optional. • The width specifier specifies the minimum field width to use • the align specifier is one of '<', '>’, or '^' for left, right, and centered alignment within the field. • An optional fill character fill is used to pad the space

  48. More on format For example: name = "Elwood" r = "{0:<10}".format(name) # r = 'Elwood ' r = "{0:>10}".format(name) # r = ' Elwood' r = "{0:^10}".format(name) # r = ' Elwood ' r = "{0:=^10}".format(name) # r = '==Elwood==‘

  49. format: type specifier indicates the type of data.

More Related