370 likes | 524 Views
IX Language and Computer. Contents. 10.0 Introduction 10.1 Computer-assisted language learning 10.2 Machine translation 10.3 Corpus linguistics 10.4 Computer mediated communication. 10.0 Introduction: Computational linguistics.
E N D
Contents 10.0 Introduction 10.1 Computer-assisted language learning 10.2 Machine translation 10.3 Corpus linguistics 10.4 Computer mediated communication
10.0 Introduction: Computational linguistics • A branch of applied linguistics, dealing with computer processing of human language (Johnson & Johnson 1999) • 1. The analysis of language data so as to establish the order in which learners acquire various grammatical rules or the frequency of occurrence of some particular item • 2. Electronic production of artificial speech and the automatic recognition of human speech. • 3.Research on automatic translation between natural languages • 4. Text processing and communication between people and computer.
10.1.1 CAI / CAL vs. CALL • CAI—computer-assisted instruction(计算机辅助教学): the use of computer in a teaching program. • 1.A teaching program which is presented by a computer in a sequence. Students---responses—computer—correct or not. • 2.The use of computer to monitor student’s progress, offer directions to students.
CAL—computer-assisted learning (计算机辅助学习): emphasizing the use of computer in both teaching and learning I order to help learners to achieve educational objectives through their own reasoning and practice, a ref;lection of newly advocated autonomous learning. • 1. Leading students through a learning task step by step, checking comprehension and further practice and materials. • 2. Interaction through the exploration of a subject or problem
CALL—computer-assisted language learning (计算机辅助语言学习) • It refers to the use of computer in the teaching or learning of a second or foreign language. • 1. Activities which parallel learning through other media but which use the facilities of computer. • Activities which are extensions or adaptions of print-based or classroom based activities. • Activities which are unique to CALL.
10.1.2 Phases of CALL development • 1. Large mainframe machines in institution, conventional traditional grammatical explanation, audio-lingualism; with a terminal • 2. Small computers, taps or floppy disks, portable, eclectic, pragmatic and student-oriented • 3. Cognitive problem solving techniques and interactions among students in a group: computer as a trigger • 4. Word-processing enables students to compose and carry out their own writing, spoken and moving video available
10.1.3 Technology • Customizing, template, and authoring program ---Teachers use the program to design their own lessons which fit their own purposes. • Computer networks ---Local area network: More interaction between teachers and students • Compact disk technology • Digitized sound • USB (universal serial bus)
10.2 Machine translation • The use of machine to translate texts from one natural lg to another. • Unassisted MT, which takes pieces of text and translate them into output for immediate use with no human involvement. • Assisted MT, where a human translator clean up after, and sometimes before, translation in order to get better quality results. • Philosophical, religious concern; • Political concern • Economical concern
10.2.1 History of Development • 1. The independent work by MT researchers --early 1950sLimitation: hardware, memory, low access to storage, programming lg, assistance from linguistics. --Crude dictionary-based approach, statistical methods. Low quality, thus human involvement --Both pre-editing and post-editing were required
2.Towards good quality output • Improved hardware, first programming lg, development in syntactic analysis • Around 1960, good quality is achievable. • Assumption: the goal of MT must be the development of fully automatic systems producing high quality translations and the use of human assistance was regarded as interim arrangement, and post-editing would be less and less. • Emphasis of research was on the search for theories and methods for the achievement of “perfect” translation. • Bar-Hillel: critical of Fully Automatic High Quality Translation, proposed “man-machine symbiosis”
The development of translation tools • Since the 1970s, development continued in three main strands: • 1. Computer-based tools for translators --1960s, real-time interactive computer environment;1970s, word processing;1980s, microcomputer with networking and large storage capacity --dictionaries and terminological databanks, multilingual word processing, management of glossaries and terminology resources, input and output communication • 2. Operational MT systems involving human assistance in various ways • 3.“pure” theoretical research towards the improvement of MT methods
10. 2.2 Research methods • 1.Linguistic approach --A test-bed for any kinds of linguistic theories which attempt to account for language or grammatical rules • 2. The transfer approach • 3.The interlingual approach • An interlingua between any languages. • 4.The knowledge-based approach --Linguistic knowledge independent of context—semantic features --Linguistic knowledge that relates to context, pragmatic knowledge. --Common sense / real world knowledge (non-linguistic knowledge)
10.2.3 MT quality: • still poor
10.2.4 MT and the Internet: • --an accelerating growth of real-time on-line translation on the Internet itself. --Internet with further profound impact on MT: stand-alone PC replaced by Network computers. --Fewer “pure” MT systems but much more computer-based tools and applications where automatic translation is just one component.
10.2.5 Speech translation: • small-domain natural lg. application.
10.2.6 MT and human translation • They can and will co-exist in relative harmony. • MT:large scale/rapid translation, repetitive document,cost less, quality of out put is less important • Human translator: non-repetitive linguistically sophisticated texts, one-off texts in specific highly-specialized technical subjects, one-to-one interchange of information, spoken language translation
10.3.1 Definition • Corpus (corpora) : a collection of linguistic data, compiled as written texts or as a transcription of recorded speech. The main purpose of a corpus is to verify a hypothesis about lg---for example, to determine how the usage of a particular sound, word, or syntactic construction varies. • Corpus linguistics deals with the principles and practices of using corpora in lg study. A computer corpus is a large body of machine-readable texts. • --Crystal, David. 1992:85. AN Encyclopedic Dictionary of Language and Languages
Another definition • CORPUS (corpora) (1) a collection of texts, esp. if complete and self-command; the corpus of Anglo-Saxon verses. (2) plural also corpuses. In linguistics and lexicography, a body of texts, utterances or other specimens considered more or less representative of a language, and usu. Stored as an electronic database. • Corpus linguistics studies data in any such corpus.
10.3.2 Criticism and revival of corpus linguistics • Chomsky: empiricism vs. rationalism --invalidated corpus as a source of evidence in linguistic enquiry. --the description of rules in a language. --emphasis on competence rather than performance --practicability --ungrammatical sentences vs. new sentences
Revival of corpus linguistics • Quirk (191) Survey of English Usage (SEU) • Jan Svartvik (1975) London-Lund corpus (SEU and the Brown corpus) • Jan Svartvik: computerized the SEU
10.3.3 Concordance (共现检索) • Definition: The way of sorting data, for example, alphabetically of words occurring in the immediate context of the word. --Search for a particular word and retrieve all the examples of it. --This is the tool more often implemented in corpus linguistics to examine corpora. • Usage: comparing different usage of the same word. --Analyzing word frequencies --Finding and analyzing phrases and idioms --Creating indexes and word lists
10.3.4 Text encoding and annotation • Annotated corpora refer to those corpora which have been enhanced with various types of linguistics information. --The implicit linguistic information has been made explicit through the process of concrete annotation. --Claire_NP1 collects _VVZ shoes_NN2.
Leech (1993): seven maxims in annotation of corpora • 1. Possible to remove • 2. Possible to extract the annotation b itself from the text • 3. Guidelines for the end-user • 4. How/who carried out the annotation • 5. Not infallible but potentially useful tool • 6. Based n agreed and theory-neutral principles • 7. No a priori standard
10.3.5 Roles of corpus data • Speech research --A wide selection of variables: gender, age, class, etc. generalization --Variation within a spoken lg. --A sample of naturalistic speech --Large scale of quantitative study • Lexcial studies --Dictionaries --Definitions --Word combinations, co-occurring words
Semantics --Objective approach of study of semantics: semantic distinction is context-related, and make it possible to examine the context --Fuzziness and absoluteness: gradable • Sociolinguistics: natural quantitative data • Psycholinguistics:
10.4 Computer mediated communication (计算机介入的信息交流) • With a focus on lg and lg use in computer networked environment and by its use of methods of discourse analysis to address that focus. • It takes a variety of forms whose linguistic properties vary depending on the kind of messaging system used and the social and cultural context embedding particular instances of use. • Mails and news
PowerPoint: an application which enables one to create slide shows on his or her computer screen. It is a presentation authoring software creating graphical presentations with or without audio. • PowerPoint as a tool can be used to write outlines or create presentation visuals on the slides. • PowerPoint as a text has been broadly understood as the product created visually, graphically, acoustically, or audio-visually. • PowerPoint as a genre refers to a recurring tpe of activities just like a letter, a note, etc.
Blog: mid-1990s • A weblog, or blog for short, is defined by Dan Gilmore as “ an online journal comprised of links and postings in reverse chronological order, meaning the most recent posting appears at the top of the page”. • Features of blogs. 1. Post-centric 2. Arranged in chronological order 3. Serial and cumulative, opened-ended 4. Brief and independent narratives, some fictional, some frame of the narratives 5. Great variety in quality, content and ambition 6. Free-access 7. Style is personal and informal 8. Genuine human passion
Chatroom --A chat room is an online forum where people can chat online. • Emoticons (表情符号) or smileys(笑眯眯) --Less punctuation and acronyms U, 4 (for), r (are), brb (be right back) --Short sentences, informal expressions : ) :-) : ( : < : - > :c
Summary • CAI-CAL- CALL • MT • Corpus linguistcs • Concordance • CMC