190 likes | 393 Views
Chapter 5 Strings. CSC1310 Fall 2009. Strings. String is an ordered collection of characters that stores and represents text-based information. Strings in Python are immutable (e.g., cannot be changed in place) sequences (e.g., they have a left-to-right order ).
E N D
Chapter 5 Strings CSC1310 Fall 2009
Strings • String is an ordered collection of characters that stores and represents text-based information. • Strings in Python are immutable (e.g., cannot be changed in place) sequences(e.g., they have a left-to-right order).
Single and Double Quotes • Single and double quotes are interchangeable. >>> ‘Python’, “Python” • Empty literal: ‘’ or “”. • It allows you to embed a quote character of the other type inside a string: >>> “knight’s”,‘knight”s’ • Python automatically concatenates adjacent strings >>>”Title”‘ of’“ thebook”
Escape Sequences • Escape sequence is a special byte code embedded into string, that can not be easily typed on a keyboard. • \ with one (or more) character(s) is replaced by a single character in the resulting string. • \n - Newline • \t - Horizontal Tab >>> s=‘a\nb\tc’ # 5 characters! >>> s ‘a\nb\tc’ >>> print s a b c
Escape Sequences • \\ - Backslash • \’ - Single quote • \” - Double quote • \a - Bell • \b - Backspace • \r - Carriage return • \xhh - Hex digits value hh • \0 - Null (binary zero bytes) >>> print ‘a\0m\0c’ # 5 characters! a m c
Raw Strings >>>print ‘C:\temp\new.txt’ >>>print ‘C:\\temp\\new.txt’ • Raw string suppress escape • Format: r“text” or r‘text’(R“text” or R‘text’) >>>print r‘C:\temp\new.txt’ • Raw strings may be used for directory paths, text pattern matching.
Triple Quoted Strings or Block Strings • Block string is a convenient literal format for coding multiline text data (error msgs, HTML or XML code). • Format: “”” text””” or ‘’’text’’’
Unicode Strings • Unicode (“wide” character) strings are used to support non-latin characters that require more than one byte in memory. • Format: u“text” or u‘text’(U“text” or U‘text’) • Expression with Unicode and normal strings has Unicode string as a result. >>>’fall’+u’08’ u’fall08’ >>>str(u’fall08’),unicode(‘fall08’) ‘fall08’,u’fall08’
Basic operations: len(), +, *,in • len(str) function returns the length of a string str. >>>len(‘abc’) • str1+str2 (concatenation) creates a new string by joining operands str1 and str2. >>>‘abc’ + ‘def’,len(‘abc’ + ‘def’) • str*i (repeat) adds a string str to itself i times. >>> print ‘-’ * 80 • str 1 in str2 (membership) returns true if str1 is a substring of str2; otherwise, returns false. >>>day=‘Monday 8th Sept 2008’ >>>’sep’ in day >>>’th Sep’ in day
Indexing • Each character in the string can be accessed by its position (offset) – index. >>>S = ‘STRINGINPYTHON’ >>>S[14],S[-15] • Negative offset can be viewed as counting backward from the end(offset –x is xth character from the end). >>>S[0],S[10],S[13],S[-5],S[-14] (‘S’,’T’,’N’,’Y’,’S’)
Slicing • Slicing allows us to extract an entire section (substring) in a single step. • Str1[offset1:offset2] returns a substring of str1 starting from offset1 (including) and ending at offset2 (excluding). >>>S[1:3] #extract item at offsets1 and 2 >>>S[1:] #all items past the first >>>S[:3] # extract items at offsets 0,1,2 >>>S[:-1] #fetch all but the last item >>>S[-1:] # extract last item >>>S[:] # a copy of the string
In Python 2.3 • Third index – stride(step) >>>S=‘0123456789’ >>>S[1:10:2] ‘13579’ • To reverse string use step =-1 >>>”hello”[::-1]
String Conversion • You cannot add a number and a string together. • int(str1) converts string str1 into integer. • float(str1) converts string str1 into floating-point number. • Older techniques: functions string.atoi(str1) and string.atof(str1). >>>int(“42”)+1,float(“42”)+1 • str(i) converts numeric i to string(`i`) >>>”fall0”+str(8)
Changing Strings • You cannot change a string in-place by assigning value to an index(S[0] = ‘X’) • To modify: create new string with concatenation and slicing. >>>s=‘spam’ >>>s=s+” again” # s+=” again!” >>>s >>>s=s[:3]+” is here”+s[-1:] >>>s • Alternatively, format a string.
Formatting Strings • “format string” % “object to insert” >>>s=“Sales tax” >>>s=”%s is %d percent!” % (s,8) • %s string • %d decimal integer • %i integer (%u - unsigned integer) • %o octal integer • %x hex integer (%X – uppercase hex integer) • %e floating-point exponent (%E - uppercase) • %f floating-point decimal (%F – uppercase)
String Methods (p.91 table 5-4) • Str1.replace(str2,str3) replaces each substring str2 in Str1 to str3. >>>‘string in python’.replace( ‘in’, ‘XXXX’) • Str1.find(str2) returns the offset where substring str2 appears in Str1, or -1. >>>where =‘string in python’.find( ‘in’) >>>’string in python’[:where] >>>‘in1in2in3in4in5’.replace( ‘in’, ‘XX’,3)
String Methods • Str.upper(), str.lower(), str.swapcase() • Str1.count(substr,start,end) • Str1.endswith(suffix,start,end) Str1.startswith(prefix,start,end) • Str1.index(substr,start,end) • Str1.isalnum(),str1.isalpha(), str1.isdigit(), str1.islower(),str1.isspace(),str1.issupper()
String Module • Maketrans()/translate() >>>import string >>>convert=string.maketrans(“ _-”,”_-+”) >>>input=“It is a two_part – one_part” >>>input.translate(convert) ‘It_is_a_two-part_+_one-part’
String Module • Constants • digits ‘0123456789’ • octdigits ‘01234567’ • hexdigits ‘0123456789abcdefABCDEF’ • lowercase ‘abcdefghijklmnopqrstuvwxyz’ • uppercase ‘ABCDEFGHIJKLMNOPRQSTUVWXYZ’ • letters lowercase+uppercase • whitespace ‘\t\n\r\v’ >>>import string >>>x=raw_input() >>>x in string.digits