450 likes | 603 Views
LING 408/508: Computational Techniques for Linguists. Lecture 1 8/20/2012. Course web page. Go to: http://www.u.arizona.edu/~echan3/508.html Not using D2L in this course. Outline. Fill out survey Course introduction Syllabus Some advice Schedule office hours Python. Survey.
E N D
LING 408/508: Computational Techniques for Linguists Lecture 1 8/20/2012
Course web page • Go to: • http://www.u.arizona.edu/~echan3/508.html • Not using D2L in this course
Outline • Fill out survey • Course introduction • Syllabus • Some advice • Schedule office hours • Python
Survey • Please fill out the survey and hand it in by the next class. • You don’t need to list every single course you’ve ever taken.
Outline • Fill out survey • Course introduction • Syllabus • Some advice • Schedule office hours • Python
“Computational Techniques for Linguists” • Learn computer programming • Use the Python language • (Relatively) easy to learn • Write programs quickly • Good for working with text • Commonly used in computational linguistics
Linguistic applications • Use corpora • Corpus: large electronic database of language N.B.: plural of corpus is “corpora”, not “corpuses” • Examples: • Brown Corpus, 1 million words of mixed English texts • CELEX, dictionary of English words and their pronunciations • CHILDES: transcriptions of child / caretaker speech • Penn-Helsinki Parsed Corpus of Middle English • Investigate frequencies of words and constructions
Computational Linguistics / Natural Language Processing • Given a text: • Morphological analysis • Part of speech tagging • Parsing • Semantic analysis • etc. • Taught in LING 538 Computational Linguistics and LING 539 Statistical Natural Language Processing • Need to know how to program in order to built these kinds of systems
Computer programming = algorithmic thinking • Algorithm: precise, step-by-step series of instructions to accomplish a task • Joke: Why did the freshman die in the shower? • Because he followed the instructions on the shampoo bottle: lather, rinse, repeat • The shampoo bottle’s instructions were not a proper algorithm • Should say: lather, rinse, repeat until hair is clean
Who is this class for? • Primarily intended for HLT students • Master’s program in Human Language Technology, Department of Linguistics • (graduate) Master’s, and Accelerated Master’s • No assumption of previous experience in programming • Need to get up to speed very quickly in programming skills • Be competitive with students who have taken multiple undergraduate courses in programming
In the news • http://www.nytimes.com/2011/08/20/technology/finding-fake-reviews-online.html • Problem: fake product reviews for web sites • Researchers created a data set of 400 positive but fake reviews, and 400 genuine reviews • Human judges couldn’t tell them apart • Team developed computer program that got 90% correct • Companies offered jobs to these students = $$$
Prerequisites • Some knowledge of elementary linguistics • Concepts like: constituent, grammar, morpheme • Prior experience in programming is not assumed • But previous coursework in technical topics such as mathematics, logic, etc. probably means that programming will come (relatively) easy to you
Outline • Fill out survey • Course introduction • Syllabus • Some advice • Schedule office hours • Python
Outline • Fill out survey • Course introduction • Syllabus • Some advice • Schedule office hours • Python
This class is graduate-level • 400/500-level: higher expectations • Although this an introductory course, we will progress through the material quickly • 100/200-level intro programming courses are offered in C SC and ISTA departments • Assignments are time-consuming • Characteristic of all beginning programming courses • Strongly consider withdrawing if you fall behind early on • Not possible to catch up • Not possible to catch up • Not possible to catch up
But grading for undergrads is easier • Short assignments: 50% • These will be straightforward • Do all of these and you’ll be well on your way to a good course grade • Weekly assignments: 50% • These will be substantially harder • Less work: fewer problems for undergrads
How to get maximum points on assignments • Answer all questions on assignments. • I will be looking for answers to anything that is being asked or requested of you
Learning how to program • Learning to programming requires working on programming problems • Cannot learn how to program merely by: • Reading a book • Listening to lectures • Reading code • Learning programming involves: • Repeated attempts to correct one’s mistakes • Repeatedly refining code until you have a clear solution • Initial solutions are often obscure and too long • In every program there’s a beautiful solution struggling to break through
Cooking analogy • You want to bake a wedding cake. • You have an idea of what you want it to look like. • You have an idea of what ingredients and cooking tools you will need. • You don’t have a cookbook. • Need to develop a recipe = procedure = algorithm. • Algorithm: precise, step-by-step series of instructions to accomplish a task • Initial attempts at producing recipes may need to be refined.
Ask for help • It is common for novice programmers to get stuck • You don’t know why your program doesn’t work • You spend hours trying different things, but you don’t know why it still doesn’t work • Save time: get help • Work with your classmates • Go to office hours • Have discussion with instructor and other students • Send e-mail to the instructor
Outline • Fill out survey • Course introduction • Syllabus • Some advice • Schedule office hours • Python
Schedule office hours • I will create a doodle poll and e-mail the link to you • No guarantees that I can select times that fits everyone’s schedule
Outline • Fill out survey • Course introduction • Syllabus • Some advice • Schedule office hours • Python
Python language • Developed by Guido von Rossum • Released in early 1990s
Why Python • Datatypes such as strings, lists, and hash tables are built in to the language as primitives • Not verbose, not too concise • Easy to read • Gentle learning curve • Widely used in Natural Language Processing community • Examples of NLP in Python vs. other languages http://nltk.googlecode.com/svn/trunk/doc/howto/nlp-python.html
“Hello World” in Python and Java • Python: print('hello world') • Java: public class HelloWorld { public static void main(String[] args) { System.out.println("Hello World!"); } }
Installation • Download from www.python.org • Current versions are 2.7 and 3.2 • I will cover Python 3.2 • Similarities/differences • For novice programmers, the two versions are largely similar, but certain language constructs are different (and incompatible) • Example: • Python 2 print “Hello World!” • Python 3 print(“Hello World!”) • Once you become proficient in Python, it is not hard to switch to a different version • Much existing code is written in Python 2
Set Python environment variable • Create a directory mypythoncode for your Python code • e.g. C:\Users\Arizona\Desktop\508\mypythoncode\ • Set environment variable so Python knows where to find your code • Windows Vista: • Right-click on My Computer • Choose "Advanced system properties" • Add a new User variable called PYTHONPATH • Set the value of the variable to mypythoncode
Set Python environment variable • OS X terminal, Unix/Linux, etc.: • csh • edit .cshrc • setenv PYTHONPATH /home/me/mypythoncode • bash • edit .bashrcor .bash_profile • export PYTHONPATH=/home/me/mypythoncode
Running Python from the command line • Mac Terminal, Unix/Linux, etc. • Which command you type in to run python 3 depends on the system that you are using
IDLE, a Python Graphical User Interface (GUI) • Consists of Python shell (for running commands) and text editor • Available for Mac and Windows within the standard Python installation
Other installations and editors • There are other ways to run Python. • Obtain a different implementation of Python and/or IDE (integrated development environment) • PyDev + Eclipse IDE • ActiveState Python • etc. • Doesn’t matter what you use • Demonstrations in earliest lectures will be in IDLE only
Using Python interactively • Whether run from command line, or through IDLE >>> a = 3 >>> a 3 >>> print('hello') hello >>>
Test setting of Python environment variable >>> import sys >>> sys.path ['C:\\Python25\\Lib\\idlelib', 'C:\\Users\\Arizona\\Desktop\\508\\mypythoncode', 'C:\\Windows\\system32\\python25.zip', 'C:\\Python25\\DLLs', 'C:\\Python25\\lib', 'C:\\Python25\\lib\\plat-win', 'C:\\Python25\\lib\\lib-tk', 'C:\\Python25', 'C:\\Python25\\lib\\site-packages']
Python script files • Create a file called myfile.py # myfile.py # these are comments a = 3 a print('hello') • If using IDLE in Windows, be sure to add a .py extension to the file • Do notsave as a text file, otherwise it becomes myfile.py.txt
Run script file • Command line: [mint][~]> python myfile.py • Within IDLE: • Go to window for myfile.py • Press F5, or click on Run Run Module in menu bar • Will execute in Python shell • Output: >>> =========== RESTART================= >>> hello Note that the value of a is not shown, unlike when the code was typed directly in the shell
Order of execution in script files • Code executes from top to bottom • Incorrect: at print statement, b hasn’t been defined print(b) b = 7 • Correct: define the variable first b = 7 print(b)
Using IDLE (in Windows) • F5 execute Python script file • control-C cancel execution • control-D quit IDLE • alt-p previous in command history • alt-n next in command history • alt-3 comment a block of code • alt-4 uncomment a block of code
Comments • Comments are ignored by the Python interpreter but are useful for describing the purpose of a section of code a = 1 # everything after hash mark is a comment b = 2 c = 3 # statement below does not execute because # it is within a comment # d = 4
Comment a block of code • Select a block of code • Press alt-3, now it is commented • Select and press alt-4 to uncomment
Beware of Windows… • When you repeatedly execute code and cancel execution (with control-C), sometimes the processes continue anyway, and after a while IDLE won’t let you run your code • Solution: • press ctrl-alt-delete • start Task Manager • select lowest pythonw.exe processes • click on End Process • sometimes you might have to restart IDLE
Until next class… • Download and install Python • Send me e-mail if you have any problems • Buy an intro book (if desired) • Recommended reading for this week: • Zelle chapters 1, 2, 3, 6 • (But my lecture content will not be based on Zelle)