380 likes | 397 Views
Explore the world of Python programming with this tutorial. Learn data types, operators, NLP basics, and more. Set up an IDE, use variables, strings, lists, dictionaries, and sets in Python. Discover the power of Python for programming tasks.
E N D
Natural Language Processing – The technology making computer listen and talk as human beings do. Tutorial I: Natural Language Processing With Python Xu Ruifeng Harbin Institute of Technology, Shenzhen Graduate School Natural Language Processing
Contents Welcome to Python world How to run Python programs Data Types and Operators Basic Syntax NLP with Python
Welcom to Python world About Python • It’s named after Monty Python • Python is an elegant and robust rogramming language that delivers both the power and general applicability of traditional compiled languages with the ease of use of simpler scripting and interpreted languages. • It is commonly defined as an object-oriented scripting language —a definition that blends support for OOP with an overall orientation toward scripting roles.
Welcom to Python world • Python is often compared to Perl, Ruby.Python Is Engineering, Not Art.Some of its key distinguishing features include: Easy to read:very clear, readable syntax Easy to learn and use:simple structure, and a clearly defined syntax Portable:runs everywhere,Windows,Linux,Mac... Extensible:extensions and modules easily written in C, C++ (or Java for Jython, or .NET languages for IronPython) Describe a vision of company or strategic contents.
How to run Python programs • Downloading and Installing Python The most obvious place to get all Python-related software is at the main Web site at http://python.org.
How to run Python programs • Three main ways • The simplest way is by starting the interpreter interactively, entering one line of Python at a time for execution • Another way to start Python is by running a script written in Python • IDEs;(Eclipse+PyDev) http://www.cnblogs.com/sevenyuan/archive/2009/12/10/1620939.html
Data Types and Operators • Operators + - * / % ** • comparison operators < <= > >= == != <> • conjunction operators and or not
Data Types and Operators • Program Output, the print Statement, and "Hello World!" • Program Input and the raw_input() num=raw_input('Now enter a number:') print 'Doubling your number:%d'%(int(num)*2) • Comments(注释) • # one line • ''' one part
Data Types and Operators • Variables and Assignment Python is dynamically typed, meaning that no pre-declaration of a variable or its type is necessary. Thetype (and value) are initialized on assignment. Assignmentsare performed using the equal sign counter=0 miles=1000.0 name='Bob' kilometers=1.609*miles print'%f miles is the same as %f km'%(miles,kilometers) Multiple Assignment: x = y = z = 1 a, b, c = 1, 2, 'a string'
Numbers set List Dictionaries Strings Tuples Data Types and Operators
Data Types and Operators • Numbers Python long integers should not be confused with C long.If you are familiar with Java, a Python long is similar to numbers of the BigInteger class type.
Data Types and Operators • Strings • Subsets of strings can be taken using the index ( [ ] ) and slice ( [ : ] ) operators, which work with indexes starting at 0 in the beginning of the string and working their way from -1 at the end. • The plus ( + ) sign is the string concatenation operator, and theasterisk ( * ) is the repetition operator. #字符串 pystr='Python' iscool='is cool' print pystr[0] print iscool[1:2] print pystr+' '+iscool print pystr*2 Some example( StringExamples.py)
Data Types and Operators • Lists and Tuples They are similar to arrays, except that lists and tuples can store different types of objects. • differences #列表[]和元组() aList=[1,2,3,4] print aList print aList[0] print aList[2:] print aList[:3] aTuple=('robots',77,94,'try') print aTuple print aTuple[:3] print aTuple[2:] L=[...] and T=(...) Lists' elements and size can be changed but tuples' can not be changed Subsets can be taken with the slice operator ( [] and [ : ] )
Data Types and Operators • Dictionaries(Dic.py) • Dictionaries (or "dicts" for short) are Python's mapping type and work like associative arrays or hashes • D={key:value} • Key:usually numbers or strings.Value: any Python object. #create dict aDict={'host':'earth'} #add to dict aDict['port']=80 print aDict print aDict.keys() print aDict['host'] #输出键值对需要用到循环 for key in aDict: print key,aDict[key]
Data Types and Operators • Set: A set is an unordered collection with no duplicate elements (set.py)
Basic Syntax • if if expression1: if_suite elif expression2: elif_suite else: else_suite • while Loop #while 循环 counter=0 while counter<3: print 'loop # %d'%(counter) counter+=1
Basic Syntax • for Loop and the range() Built-in Function #for循环和range()内建函数 foo='abc' for c in foo: print c for i in range(len(foo)): print foo[i],'(%d)'%i The range() function has been often seen with len() for indexing into a string. Here, we can display both elements and their corresponding index value • range(start, end, step =1) range(2, 19, 3)
Basic Syntax • List Comprehensions Use a for loop to put together an entire list on a single line #列表解析:使用for循环将所有的值放在列表中 squared=[x**2 for x in range(4)] for i in squared: print i sqdEvens=[x**2 for x in range(8) if x%2] for i in sqdEvens: print i
Basic Syntax • Functions def addMe2Me(x): 'apply + operation to argument' return(x+x) print addMe2Me(4.25) print addMe2Me('Python') print addMe2Me([-1,'abc'])
Basic Syntax • Files and the open() and file() Built-in Functions • File Built-in Methods • read() / readline() / readlines() • write() / writelines() Note: (1)The readlines() method does not return a string like the other two input methods. Instead, it reads all (remaining) lines and returns them as a list of strings. (2)Line separators are preserved
Basic Syntax R/W Open for read / Open for write OPen() Open for append A + for read-write access If you are a C programmer, these are the same file open modes used for the C library function fopen()
Basic Syntax • Traverse import os #递归遍历E:\Kugou目录下所有文件 def show(arg, dirname, filenames): print 'dirname:' + dirname for f in filenames: if os.path.isfile(dirname+'\\'+f): print '-----' + f os.path.walk('E:\Kugou', show, None)
Basic Syntax • Class class FooClass(object): version=0.1 def __init__(self,nm='John Doe'): self.name=nm print 'Create a instance for',nm def showname(self): print 'Your name is ',self.name print 'My name is ',self.__class__.__name__ def showver(self): print self.version def addMe2Me(self,x): return x+x fool=FooClass() fool.showname() fool.showver() print fool.addMe2Me(5) print fool.addMe2Me('xyz')
NLP with Python • Why choose Python? Python is a simple yet powerful programming language with excellent functionality forprocessing linguistic data. For a example(FIND-ing.py),a five-line Python program that processes testing.txt and prints all the words ending in ing
NLP with Python • Regular Expressions(re.py) Regular expressions (REs) provide such an infrastructure for advanced text pattern matching, extraction, and/or search-and-replace functionality.they enable matching of multiple stringsan RE pattern
NLP with Python • Word Frequence(WordFrequency.py) #实现单个文本的词频统计 wordlist=open('data.txt').read().split() wordfreq=[wordlist.count(p) for p in wordlist] dictionary=dict(zip(wordlist,wordfreq)) aux=[(dictionary[key],key) for key in dictionary] aux.sort() aux.reverse() for a in aux: print a
NLP with Python • Web Downloads(WebDownloadHTML.py) import urllib url = "http://www.baidu.com" path = ".\web.html" urllib.urlretrieve(url,path)
NLP with Python • Web Downloads(WebDownload.py) # -* - coding: UTF-8 -* - #下载网页图片到本地文件夹 import os,urllib2,urllib #设置下载后存放的本地路径"E:\img\1.jpg" path=r'E:\img' file_name=r'1.jpg' dest_dir=os.path.join(path,file_name) #设置链接的路径 #url="http://pic3.nipic.com/20090518/2662644_083611033_2.jpg" url="http://ww4.sinaimg.cn/large/99e79587tw1dx42v7j6bqj.jpg" #定义下载函数downLoadPicFromURL(本地文件夹,网页URL) def downLoadPicFromURL(dest_dir,URL): try: urllib.urlretrieve(url , dest_dir) except: print '\tError retrieving the URL:', dest_dir #运行 downLoadPicFromURL(dest_dir,url)
NLP with Python • Word Segmentation def segment(text, segs): words = [] last = 0 for i in range(len(segs)): if segs[i] == '1': words.append(text[last:i+1]) last = i+1 words.append(text[last:]) return words text = "doyouseethekittyseethedoggydoyoulikethekittylikethedoggy" seg1 = "0000000000000001000000000010000000000000000100000000000" seg2 = "0100100100100001001001000010100100010010000100010010000" print segment(text, seg1) print segment(text, seg2)
NLP with Python • Regular Expressions for Tokenizing and Tagging Text
NLP with Python • Regular Expressions for Tokenizing Text Tokenizing.py mmseg
NLP with Python • Reading Tagged Corpora
NLP with Python • Named Entity Recognition
Advanced Topics • Network Programming • Internet Client Programming • Multithreaded Programming • GUI Programming • Web Programming • Database Programming • Extending Python
Thank You ! www.themegallery.com