680 likes | 796 Views
Programming for Linguists. An Introduction to Python. Contact. Claudia Peersman claudia.peersman@ua.ac.be Lange Winkelstraat 40, room L202 (2 nd floor). Literature.
E N D
Programming for Linguists An Introduction to Python
Contact • Claudia Peersman • claudia.peersman@ua.ac.be • Lange Winkelstraat 40, room L202 (2nd floor)
Literature • “Think Python. How to Think Like a Computer Scientist?” by Allen B. Downey freely available at:http://greenteapress.com/thinkpython/thinkpython.html • “Natural Language Processing with Python. Analyzing Text with the Natural Language Toolkit” by Steven Bird, Ewan Klein, and Edward Loperfreely available at:http://www.nltk.org/book
The Python programming language Part 1 • Formal vs. natural languages • The way of the program • Programming for linguists? • What is a program? • Debugging • Your first program
Formal vs. natural languages • Natural Languages: spoken languages, e.g. English, Dutch, French… • not designed by people • evolved naturally • Formal Languages: designed by people for specific applications, e.g.: • in mathematics: notation which denotes relationships among numbers and symbols • in chemistry: represent the chemical structure of molecules
Many features in common: • tokens, structure, syntax and semantics • A lot of differences:
Some Examples • 5 + 5 = 10 • H2O • 5 + 5 = 1$0 ??? • Zz ??? • Illegal tokens $ and Zz • 5 +: 5 = 10 ??? • Legal tokens, but illegal structure +:
The way of the program • Programming = the art of problem solving: • formulate problems • think creatively about possible solutions • express a solution clearly and accurately • trial and error
Low-level vs. high-level languages • Low-level languages = “machine languages”: only language a computer can execute • High-level languages like Python, Perl, Java, C++need to be processed to a low-level language to be executed by: • compilers • interpreters
An interpreter: • processes the program a little at a time • alternates between reading lines and performing computations • A compiler: • translates the high-level language completely first • once a program is compiled, it can be executed repeatedly without further translation
Programming for linguists? • aim: handle large linguistic corpora • automatic frequency counts • distribution of linguistic features across different categories, corpora • look up context • existing tools are limited, cost money
About Python… • high-level language • open source • executed by an interpreter in two ways: • interactive mode • script mode
interactive mode: • open the interpreter • >>> prompt = ready to begin • type a command • interpreter prints the results >>> 1 + 1 2
script mode: • open a new window in the interpreter • type a number of commands • save the program as a python script: e.g. test.py • the program is executed whenever you tell the interpreter to run it • the results are printed when the script is run
Which mode to use? • interactive mode: • good for testing small parts of the program before you go on • does not save the program! • script mode: • put together all small parts of code in a sequence of instructions for the computer to execute • save your program • use it again in the future
What is a program? • a sequence of instructions that specifies how to perform a computation • for linguists: the computation can also be e.g. looking up the context of words in a text, calculating average word lengths, sentence lengths, …
Some basic instructions • input: data you type, a text you load • output: display data on the screen, send data to a file • math: perform basic mathematical operations like +, -, X, :
conditional execution: check for certain conditions and execute the appropriate instructions • repetition: perform some action repeatedly, usually with some variation Programming = breaking a large, complex task into smaller and smaller subtasks until the subtasks are simple enough to be performed with one of these basic instructions
Bugs = programming errors • Debugging = process of tracking down programming errors • Three kinds of bugs: • syntax errors • runtime errors • semantic errors
Syntax errors • refer to the structure of the program and the rules about that structure • if there is even a single syntax error in your code: • Python will display an error message • the execution of your program will quit immediately
An example • parentheses: (1 + 2) : correct syntax 2) : syntax error Syntax errors are very common in the beginning. The more you practice and gain experience, the fewer mistakes you will make and the faster you will find them.
Runtime errors • also called exceptions • do not appear until after the program has started to run • Python will display an error message • For example: you give the instruction to open a file, but you have typed in the wrong file name or wrong directory
Semantic errors • The program will run perfectly, but it will not produce the results you wanted: the meaning of the program (semantics) is wrong • Tricky errors, because: • Python will not display an error message !! • you need to work backward looking at the output of the program and try to figure out what it is doing exactly
An example • Python function read( ) vs. readline( ) vs. readlines( )
Debugging is equally important to programming itself: • not only learn how to write a program • learn to write a program that works • learn to write a program that does what you want it to do • Always try out small pieces of code before you go on with writing your program • Try out your code on short pieces of text, so that you can verify your results manually
Your first program • open IDLE (desktop) • The first program is usually called “Hello, world!” • In Python: >>> print “Hello, world!” or >>> print ‘Hello, world!’ Mind the quotation marks!
This is the print statement • The quotation marks mark the beginning and the end of the text to be displayed • The quotation marks do not appear in the result
Why we teach Python: e.g. in Java: public class Hello { public static void main( String[] args ) { System.out.println( "Hello, World!" ); }}
Make some mistakes • What happens if you: • leave out one of the quotation marks • replace “ by ‘ or vice versa in one case • spell “print” wrong • double the quotation marks • double the quotation marks, but change the order
By making mistakes on purpose you will: • learn which details are important in writing program code • learn to debug more efficiently, because you get to know what the error messages mean
Try it yourselves • We will make time to try out new things as we proceed • Programming is a new way of thinking for linguists • If there is a problem or you have a question, do not hesitate to mention it immediately
Values and Types • values = basic elements of a program • e.g. print “Hello, world!” • each value has a type: • integer • string • float
Integer: all non-decimal numbers • e.g. 105 • String: a string of letters • e.g. “Hello, World!” • Float: numbers with a decimal point • e.g. 10.5 • The interpreter can tell you the type of a value: >>> type(105) <type ‘int’>
Try to find out what the type is of the following values: • “Hello!” • 3.1415 • Dag Jan • “123” • “123.456”
Try this:>>> print 123,456 • Float types always have a dot, never a comma • To which kind of error could this lead? • runtime error • syntax error • semantic error
Variables • A name that refers to a value • An assignment statement creates new variables and assigns values to them • You can choose the name yourself • e.g.>>> text = “Everything except ‘Hello, world!’”>>> age = 26>>> pi = 3.1415
The variables now carry the values we assigned to them:>>> print text>>> print age>>> print pi • The interpreter can again tell you the type:>>> type(text)
Variable names: • can be arbitrarily long • can contain both letters and numbers • have to begin with a letter • can contain uppercase letters • are case sensitive ! • If you use an illegal character in your name, you will get a syntax error message: • e.g. my name, live@
You cannot choose a name that is a keyword in Python: and del from as elif global assert else if break except import class exec in continue finally is def for lambda not while or with pass yield print raise return try • Tip: try to choose names which describe what the variable is used for
Statements • Units of code that the Python interpreter can execute • So far we have seen the print statement and the assignment statement • A program usually contains a series of statements that are executed in an order predetermined by the programmer
e.g.>>> age1 = 20>>> age2 = 40>>> print age240>>> average_age = (age1 + age2)/2>>> print average_age30
You always have to assign a value to a variable before you can work with it • Variables have to be spelled in the same way throughout the program • If you assign a new value to an existing variable, the old value is deleted e.g. >>>age = 20 >>>age = age + 20 >>>print age
Operators and Operands • Operators = special symbols that represent computations • e.g. +, -, *, /, ** • Operands = the values the operator is applied to • e.g. 2 + 2 • Try 2/3
When both operands are integers, the result is again an integer • If you want a floating-point result, you have to make one of the operands a floating-point number:>>> 2/3.00.66666666666666663 • youcanalsogive a command at the beginning of your script: from __future__ import division
Expressions • A combination of values, variables, and operators • Try:>>>x = 5>>>x + 1 • Now make a script of it (File New window) and run it (Run Run module)
In a script an expression all by itself does not print a result !!! • How can you modify the script so that it does produce a result ?
Order of Operations • The order of evaluation depends on the rules of precedence • For mathematical operators, Python follows mathematical conventions: • Parentheses • Exponentiation • Multiplication and division • Addition and subtraction
String Operations • In general: no mathematical operations on stringse.g. “hello”/ “hi” TypeError: unsupported operand type(s) for /: 'str' and 'str’ • Except: the + and * operators
Try: • “hello” + “hi” • “hello”*2 • String + string = concatenation • string * int = repetition