1 / 34

Contextualising concordances for corpusCALL

Contextualising concordances for corpusCALL. Hans Paulussen & Piet Desmet K.U.Leuven / KULAK ALT Research Center on CALL. Overview. Corpora for CALL: samples Types of sample rendition XML: new opportunities. Corpora for CALL /1. Corpora for learning activities

efia
Download Presentation

Contextualising concordances for corpusCALL

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Contextualising concordances for corpusCALL Hans Paulussen & Piet Desmet K.U.Leuven / KULAKALT Research Center on CALL EUROCALL September 2006 — Universidad de Granada

  2. Overview • Corpora for CALL: samples • Types of sample rendition • XML: new opportunities EUROCALL September 2006 — Universidad de Granada

  3. Corpora for CALL /1 • Corpora for learning activities • before: preparation of exercises • during: corpus material as part of the learning activity • after: corpus material for feedback EUROCALL September 2006 — Universidad de Granada

  4. Corpora for CALL /2 • Corpora as reference material • learner dictionaries • learner grammars EUROCALL September 2006 — Universidad de Granada

  5. Corpora during learning activities • corpus is part of learning activity • Mariana (Vordingburg Gymnasium, Denmark) • http://www.vordingbg-gym.dk/km/ict4lt/ • corpus supports learning activity • NEDERLEX (FUNDP Namur) • http://obelix.droit.fundp.ac.be/droit1/index.php EUROCALL September 2006 — Universidad de Granada

  6. EUROCALL September 2006 — Universidad de Granada

  7. REBECA • Ressources Electroniques Bilingues Extraites de Corpus Alignés (bilingual electronic ressources extracted from aligned corpora) • Parallel corpus:5,000,000 Dutch  5,000,000 French • automatic corpus selection • sentence alignment EUROCALL September 2006 — Universidad de Granada

  8. EUROCALL September 2006 — Universidad de Granada

  9. REBECAalignedbilingualcorpus filteredKWICfiles course texts lexicon Resource links EUROCALL September 2006 — Universidad de Granada

  10. http://corpora.informatik.uni-leipzig.de/ EUROCALL September 2006 — Universidad de Granada

  11. Drinking glasses /1 • At the winery, Evxinograd's director, Ivan Penkov, is pouring out glasses of his 20-year-old brandy. (source: Wall Street Journal 1991) • Drinking glasses are plastic and are weighted on the bottom so that they are tough to knock over. (source: Wall Street Journal 1991) • Noting that Winston Churchill regularly drank several glasses of whiskey, brandy, champagne, and at least one high-ball during the working day, the Economist observed on March 4, "he could never have been trusted to run the Pentagon." (source: Wall Street Journal 1989) • The two drain their glasses in one gulp. (source: Wall Street Journal 1991) EUROCALL September 2006 — Universidad de Granada

  12. Drinking glasses /2 • Yet, consumerism lived, even if it didn't fill too many champagne glasses. (source: Wall Street Journal 1988) • After Investcorp took over, Tiffany played down its $10 wine glasses to concentrate on the high-priced diamonds and gold jewelry that had made it famous. (source: Wall Street Journal 1990) • At a black-tie benefit hosted by ARA Services Inc. a few weeks ago, Chairman Joseph Neubauer and members of his management team exuded confidence as they moved from one dinner table to the next, shaking hands, patting backs and clinking glasses. (source: Wall Street Journal 1988) EUROCALL September 2006 — Universidad de Granada

  13. Spectacles • Mr. Brown wears tinted aviator glasses, combat boots, and a Soldier of Fortune cap on his closely shaved head. (source: Wall Street Journal 1991) • She must wear prism glasses to correct double vision caused by the accident. (source: Wall Street Journal 1991) • In meetings, he often can be seen chewing on the end of his reading glasses; sometimes, he speaks so softly that he can't be heard. (source: Wall Street Journal 1990) • His horn-rimmed glasses and rakish beret were irresistibly photogenic. (source: Wall Street Journal 1987) EUROCALL September 2006 — Universidad de Granada

  14. Problematic glasses • And its replaceable filters are good for only about 100 glasses. (source: Wall Street Journal 1991) • I still have to find his glasses and keys for him. (source: Wall Street Journal 1988) • The police confiscated her watch and glasses. (source: Wall Street Journal 1989) • He plans to hand out 100 glasses when he performs in Washington, D.C., in December at the Kennedy Center's Mozart Festival. (source: Wall Street Journal 1991) • He often hands out glasses to his audience and has them play chords. (source: Wall Street Journal 1991) • The glasses were my idea. (source: Wall Street Journal 1988) EUROCALL September 2006 — Universidad de Granada

  15. Rendering authentic text samples • Linked samples • Extracted samples • Embedded samples EUROCALL September 2006 — Universidad de Granada

  16. Linked samples • The sample is linked to the original document (e.g. pdf document) • Original context & layout • Full context • Problem: sample skimming EUROCALL September 2006 — Universidad de Granada

  17. Extracted samples • The example is extracted from the original document (e.g. KWIC concordance) • Sample shown in immediate context • Layout: not authentic • Context: limited EUROCALL September 2006 — Universidad de Granada

  18. Embedded samples • The example is embedded in the original document • Sample shown in full context • Layout: authentic • Problem: recreating and indexing the document EUROCALL September 2006 — Universidad de Granada

  19. XML -> XHTML • XML: extensible markup language • Stylesheets: • CSS: cascading style sheet • XSLT: XML style sheet transformations • XHTML EUROCALL September 2006 — Universidad de Granada

  20. Web reinvents standardisation • SGML: standard generalized markup language (1968; ISO in 1986) • HTML: hypertext markup language (1993) • XML: extensible markup language (1998) • XHTML: extensible HTML EUROCALL September 2006 — Universidad de Granada

  21. EUROCALL September 2006 — Universidad de Granada

  22. EUROCALL September 2006 — Universidad de Granada

  23. EUROCALL September 2006 — Universidad de Granada

  24. EUROCALL September 2006 — Universidad de Granada

  25. <?xml version ="1.0" encoding="ISO-8859-1"?> <!DOCTYPE poème SYSTEM "poemfr.dtd"> <poème> <préambule> <titre>Chanson d'automne</titre> <recueil>Poèmes saturniens</recueil> <date>1866</date> <auteur>Paul Veraine</auteur> </préambule> <corps> <stance> <ligne>Les sanglots longs</ligne> <ligne>Des violons</ligne> <ligne><r/>De l'automne</ligne> <ligne>Blessent mon coeur</ligne> <ligne>D'une langueur</ligne> <ligne><r/>Monotone.</ligne> </stance> EUROCALL September 2006 — Universidad de Granada

  26. poem.dtd <?xml version="1.0" encoding="ISO-8859-1"?> <!-- poemfr.dtd : DTD pour poésie M. Goossens --> <!ELEMENT poème (préambule, corps)> <!ELEMENT préambule (titre, recueil?, date?, auteur)> <!ELEMENT titre (#PCDATA)> <!ELEMENT recueil (#PCDATA)> <!ELEMENT date (#PCDATA)> <!ELEMENT auteur (#PCDATA)> <!ELEMENT corps (stance|ligne)+> <!ELEMENT stance (ligne)+> <!ELEMENT ligne (#PCDATA|r)*> <!ELEMENT r EMPTY> EUROCALL September 2006 — Universidad de Granada

  27. xpath $ xpath -e '//*/stance[contains(., "langueur")]' Verlaine1.xml Found 1 nodes in Verlaine1.xml: -- NODE -- <stance> <ligne>Les sanglots longs</ligne> <ligne>Des violons</ligne> <ligne><r />De l'automne</ligne> <ligne>Blessent mon coeur</ligne> <ligne>D'une langueur</ligne> <ligne><r />Monotone.</ligne> </stance> EUROCALL September 2006 — Universidad de Granada

  28. EUROCALL September 2006 — Universidad de Granada

  29. EUROCALL September 2006 — Universidad de Granada

  30. EUROCALL September 2006 — Universidad de Granada

  31. EUROCALL September 2006 — Universidad de Granada

  32. EUROCALL September 2006 — Universidad de Granada

  33. EUROCALL September 2006 — Universidad de Granada

  34. Conclusion • Recreating an authentic document containing indexed samples is feasible • At what cost? • Full control of production cycle • Text and images? • Optimisation of on-the-fly rendition EUROCALL September 2006 — Universidad de Granada

More Related