250 likes | 263 Views
Stefania Spina's project integrates a Dictionary of Italian Collocations into a Virtual Learning Environment (VLE) to support students' lexical competence. The project addresses the complexity of Multi-Word Units (MWUs) and the motivation to improve learners' fluency in second language acquisition (SLA). The dictionary, based on a reference corpus, offers a learner-oriented tool with statistical methodologies for learning common Italian collocations. The extraction and compilation of collocations are detailed, emphasizing the integration of the database within the VLE for language learning. The project includes automatic recognition of collocations, linguistic information display, and collocation competence assessment within the learning environment.
E N D
The DictionaryofItalianCollocations: Design and Integration in an Online LearningEnvironment Stefania Spina UniversityforForeigners Perugia, Italia
The Dictionary of Italian Collocations • Part of APRIL project (“Personalised web environmentforlanguagelearning”) • NLP resourcesas a supportfor the lexicalcompetenceofstudentsofItalianwithin a VirtualLearningEnvironment(VLE). LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations
Presentationoutline • background and motivation • reference corpus • methodology • dictionary compilation • integrationwithin VLE LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations
Background • Complexityof MWU: • differentsyntactic and semanticprofiles • prototypicalfeatures: • semantic (non-)compositionality • (non-)substitutabilityofcomponentsbysemanticallysimilarwords • (non-)insertionofexternalitems • continuum ratherthan definite categories LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations
Motivation: collocations in SLA • improvelearnersfluency • examplesfromItalianleanercorpora • preoccupata per l’esame vado a prendere una doccia (Vietnam) • Fare la doccia “take a shower” • ho dimenticato la macchina di fotografia (China) • Macchina fotografica “camera” • non-nativespeakers and L2 vocabulary: first single words, then more extendedchunks • trend tooveruse the creative combinationofisolatedwords • Sinclair’s open choiceprinciple LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations
DICI • collocationsrequirespecificpedagogicalattention • DictionaryofItalianCollocations(DICI) • itiscorpus-based; • itis a learner-orientedtool: listof the most common Italiancollocations, classified on a frequencybasis; • itisalsobased on statisticalmethodologies (dispersion in the differenttextualgenresrepresented in the corpus). LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations
Reference corpus • Perugia corpus: POS-tagged, lemmatized LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations
Extractionbased on POS sequences • Analysisofexistinglistofcollocations: • 150 different POS sequences • 10 mostproductive (75%) LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations
Experimentalmethodology: 4steps • extractionof candidate collocationsfrom corpus; • filteringof the candidate collocations: frequency; • filteringof the candidate collocations: dispersion; • filteringof the candidate collocations: manual • 6POS sequences • 12-million-word sample • 4 corpus sections LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations
Collocationsextraction + frequency • IMS Corpus Workbench • removingall the candidateswithfrequency = 1 • 41643 collocations LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations
Dispersion • Examples: • Aggrottare la fronte “tofrown” (fiction) • Vincere le elezioni “towin the elections” (press) • Dare una definizione “togive a definition” (academic prose) LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations
Dispersion • Juilland’sDvalue (Juilland - Chang-Rodriguez, 1964) LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations
Dispertion + frequency • Dvalue: combinedwithfrequency = usage • U = FD • Usage value ≥ 2: 2047 candidate collocations • Manualselection. Finalresult: • listof1553 word combinations = dictionaryentries LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations
Collocationslist LREC 2010 - Stefania Spina - The DictionaryofItalianCollocations
Compilation of the Dictionary • Lexical database enrichedwithtwokindsof data: • visibleto the learner (client output) • definition, examples, part-of-speech, syntacticcontextofoccurrenceofcollocations • tobeprocessedbyotherapplications (server) • internalsyntacticconfigurationforautomaticrecognition LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations
DB integration in the VLE • VirtualLearningEnvironment: • web applicationspecificallydevotedtolanguagelearning • LELE (Linguistically-EnhancedLearningEnvironment) • providelanguagelearnerswithadditional NLP resources, in ordertoimprovetheirlinguisticcompetence • receptive and productivelearningactivitiesconcerning the recognition and the activeuseofcollocations LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations
LELE Features • toautomaticallyrecognize and highlightmulti-wordunits in writtenItaliantexts; • to show additionallinguistic information about the selectedcollocations; • to generate collocationtestsforcollocationalcompetenceassessmentofsecondlanguagelearners. • … LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations
LELE scheme VLE DB + tagger browser server client LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations
Conclusions • Nextsteps: • samemethodologyto the whole corpus, forall the 10 selected POS sequences • test of LELE system withstudents: startingjanuary 2011 • Furtherresearch • refinestatisticalmeasures • assigncollocationstodifferentlevelsofcompetence • othertools (productivetasks) LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations
Stefania Spina E-learning and Language Technologies UniversityforForeigners Perugia, Italy stefania.spina@unistrapg.it http://april.unistrapg.it LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations
References • Juilland, A & Chang-Rodriguez, E. (1964). FrequencyDictionaryofSpanishWords. The Hague: Mouton & Co • Meunier, F. & Granger S. (2008). Phraseology in foreignlanguagelearning and teaching. Amsterdam: John Benjamins • Nesselhauf, N. (2005). Collocations in a learner corpus. Amsterdam: John Benjamins • PazosBretaña, M. & PamiesBertrán, A. (2008). Combinedstatistical and grammaticalcriteria. In S. Granger & F. Meunier (Eds), Phraseology. An interdisciplinaryperspective. Amsterdam: John Benjamins, pp. 391-406. LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations
LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations
Backgroud: prototypicalfeatures • semantic (non)-compositionality Tagliare la corda “runaway” aprire la porta “open the door” • (non)-substitutability {fare|porre|rivolgere|formulare} una domanda “ask a question” Camera oscura “dark room” * Stanza oscura • (non)-insertionofexternalitems fare una lunga, calda, riposante doccia “take a long, hot, restfulshower” Sistema *molto operativo “operating system” LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations