Decisions in Python

Decisions in Python Comparing Strings – ASCII History

Comparing strings • Comparing numbers is easy, how do you compare strings? • Remember that everything inside the computer is a number – yes, everything! • So every character on the keyboard (and more) are represented as a number inside • Does that mean you have to memorize a lot of numbers to compare strings? No!

Comparing strings • If the operator is ==, then the strings have to be exactly identical – same spacing, same case, everything has to be the same to give a True (!= gives a False then) • If the operator is one of the other relational operators, how do you tell if one string is less than another? greater than? By using their ASCII codes!

ASCII • American Standard Code for Information Interchange • Early computer manufacturers each had their own code for characters • Users didn’t care as long as it worked, until they wanted to share / trade / sell information with someone who had a different brand of computer

ASCII • Some small companies made a living translating data from one company’s code to another • The computing community decided they needed a standard code for characters for all computers • Having one helps the Internet send messages between dissimilar computers • Other codes were considered but ASCII was the winner in the mid 1960’s • ASCII has 256 codes, each character takes up 1 byte

ASCII • ASCII is designed so that the alphabetic characters are in numeric order also. So ‘A’ < ‘B’ < ‘C’ < ‘D’ < … < ‘Z’ • Other characters: the lower case alphabetic characters are also in numeric order ‘a’ < ‘b’ < ‘c’ < ‘d’ < … < ‘z’ • The lower case letters come after the upper case letters in ASCII – an arbitrary decision

ASCII • What about digits? ‘0’, ‘1’, ‘2’, ‘3’, …? • They have codes also. ASCII is designed so those digits are also in numeric order ‘0’ < ‘1’ < ‘2’ < ‘3’ < … < ‘9’ • The codes for the digits are less than the upper case letters – an arbitrary decision • Just one more! ‘ ‘ = 1 space, is the lowest printable character, lower than all the other characters discussed

ASCII • Summary of the order you should know • First comes the space character • Then the digits ‘0’ ‘1’ ‘2’ ‘3’ ‘4’ ‘5’ ‘6’ ‘7’ ‘8’ ‘9’ • Then the upper case ‘A’, ‘B’, ‘C’, ‘D’, … ‘Z’ • Then the lower case ‘a’, ‘b’, ‘c’, ‘d’, …, ‘z’ • There are other characters in the code – control codes, punctuation, etc. you do NOT have to know any of those.

Unicode • For a while ASCII’s 256 codes was sufficient. It contained the English (Roman) alphabet, digits and some pnc. • What about other countries / languages? • Eventually it was decided that a code that could represent more languages’ symbols had to be created. • Unicode came about in the 90’s – first 256 characters are the ASCII code • Each character takes up 2 bytes (or more!) so Unicode has at least 65 thousand character codes

What does Unicode mean to me? • It takes up more room than a plain ASCII text file, which means more room on your HD, more time uploading and downloading • As a programmer, you may run into “wide characters” or “fat characters” which will require different libraries / methods to handle • If you get documents in foreign languages, you will be better able to get them translated correctly (Unicode does not do translation!)

Decisions in Python