1 / 20

Ishida & Matsubara Laboratory – Ari Hautasaari

Computer-Mediated Multilingual Communication Case: Pangaea. Ishida & Matsubara Laboratory – Ari Hautasaari. Target system introduction Quantitative analysis of the statistics Research problems Solution discussion Future research. @ COCON Karasuma 27.3.2009. Target System Introduction.

lanza
Download Presentation

Ishida & Matsubara Laboratory – Ari Hautasaari

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer-Mediated Multilingual CommunicationCase: Pangaea Ishida & Matsubara Laboratory – Ari Hautasaari Target system introduction Quantitative analysis of the statistics Research problems Solution discussion Future research @ COCON Karasuma 27.3.2009

  2. Target System Introduction

  3. Target System IntroductionPangaea Multilingual BBS • The target system for the case-study is Pangaea-organizations multilingual bulletin board system (BBS) for intra-organization communication and CSCW. • Working in 5 countries – Japan, Korea, Austria, Kenya and most recently Malaysia. • Pangaea organization has approximately 240 volunteers around the world, 130 in Japan and 110 in other countries. • Pangaea has introduced their multilingual and multicultural services to over 3000 children all over the world. “Pangaea is the non-profit organization headquartered in Tokyo, Japan. The people who participate in a project of pangaea by the various shapes from each country are called “pangaean”. This site is the place with which pangaean in all over the world can communicate seamlessly.”

  4. Target System IntroductionPangaea Multilingual BBS Multilingual BBS architecture. Basic BBS design where all the messages are saved under BBS Topics and BBS Messages The BBS is accessible to only users with an assigned user ID and password • Users choose the language of the interface and default language of messages as they login the system • Four languages are available: Japanese, Korean, English and German. • In the future more languages may be added. • The language of the interface is not connected to the language used in messages, thus German speaking users can use the German interface and post messages in English.

  5. Target System IntroductionPangaea Multilingual BBS Flowchart for posting messages Topic starters post their messages in their mother tongue. The plain message is translated into 3 target languages (in reality to four languages as the “translation” of the source text stays the same). The users are able to read the messages and post answers in their native or preferred language depending on what language they choose to use in the BBS. The users have a possibility to correct the machine translated messages by hand through the BBS. Language Posts Japanese 408 • Japanese the main language as a source language. • German is supported but not frequently used. Korean 32 English 25 German 0 Total 465 Messages by source language

  6. Quantitative Analysis of the Statistics

  7. Quantitative Analysis of the Statistics The quantitative data on the Pangaea-BBS was collected from the late 2008 version of the service. The data was extracted from the Pangaea SQL-server contents. Because of privacy issues and decentralized data storages for some data (log-in information, personal information ect) some statistics were not available for this study. Since some essential variables are not stored in the database, some of the data had to be extracted and examined by hand (translation corrections). • Topics include system messages and user instructions as well as intra-organizational communication in form of reports. • Topics are divided by activity sites and activities. • The amount of actual users was not available. Topic starters will thus represent the amount of Active Users. Categories 16 Topics 339 Messages 465 Posts to topic 126 Answer rate 37% Topic starters 61 Basic statistics

  8. Quantitative Analysis of the Statistics Topics started per user • The amount of topics is displayed as a negatively skewed distribution. • In the horizontal axis the amount of users increases to the right. • In the vertical axis the amount of topics started increases upwards. The amount of topics greatly exceeds the amount of reply messages in the system. Topics in this case represent the individual reports, messages and announcements. Users Topics distribution

  9. Quantitative Analysis of the Statistics Topics started per user 68 • Average amount of topics started per user (mean) : 5.6 • Median of topics started per user: 3 • Mode of topics started per user: 1 • As the topics started per user is negatively skewed, we use the median as the average topics per user. • Most common users post 1 topic (mode). 37 Percentage of posts by Top-5 posters: 48% Percentage of Top-5 posters in the user base: 8% 22 19 17 1 Users 1 3 10 18 Topics distribution

  10. Quantitative Analysis of the Statistics Users Topics started 18 1 10 2 11 3 Users 3 4 5 5 3 6 3 7 1 8 0 9 1 10 1 11 1 17 1 19 + 1 22 Topics started 1 37 1 68 Topics per user Topics per user

  11. Target System IntroductionPangaea Multilingual BBS • Total MT corrections rate by users (Japanese): 14% • Total correctors: 6 • Percentage of user base: ~10% • Total topics started by correctors: 28 • Percentage of total topics: 8% • Average (mean) of topics started by correctors: 4,67 Translation corrections were collected by hand by comparing the update tag and creation tag in the database and comparing a new MT to the existing one. User ID Entries corrected Topics started 13 3 7 User ID Topics started 7 68 27 1 5 56 1 5 6 37 8 22 62 1 5 79 1 2 21 19 80 1 4 9 17 Total 163 Total 8 28 Top-5 topic posters Translation corrections by users

  12. Target System IntroductionPangaea Multilingual BBS Poster demographic is highly skewed with almost half of the posts by only 5 users. Answering rate to topics quite low  No interaction between people through the BBS. Most of the users only post one topic  No incentive to use the BBS as a main communication medium for an average user. Translation correction rate low in Japanese. Translation corrections were not found in other supported languages. Translation corrections done by 6 people, 3 by one person. None of the correctors were among the Top-5 posters. Only messages translated from English or Korean to Japanese were modified by users. Even though the amount of people within the organization is distributed 50 / 50 between Japanese and rest of the world, a clear majority of messages are in Japanese. Posting rate for a Japanese user 3.14 Posting rate for other users 0.52 • The answer rate for topics for a control forum with approximately the same amount of users is 5.7 whereas the answer rate in the Pangaea is 0.37. • The Top-5 posters in a control forum account for 19.11% of all posts whereas in Pangaea Top-5 posters account for 48%.

  13. Research Problems

  14. Research Problems Social Dynamics Language The Pangaea BBS tries to tackle the problem of language in the formation of social capital (communication, common ground, social networks). There are few setbacks in the system regarding the MT. No method to verify that the translated message was understood  Users assume that the translation is understood  No reason for translation correction by default. • Context sharing in multilingual environment – Interaction patterns and awareness of relation. • How does MT affect how users perceive the context? • Grounding incrementally – Providing evidence that the message is understood. • How to indicate that the message is understood? • How do users see the communication medium? Are the users communicating with a MT or a human? What kind of effect does it have as users are aware that the text is machine translated?

  15. Research Problems Social Dynamics Language MT quality mediocre at best. No incentive for users to correct machine translated sentences  bad translations are accepted with high frequency. • No entries with German as the source language  English is most likely used even in German speaking environment. • How are the users affected by the MT quality in terms of lexical entrainment – Does the environment affect the language used? • Do people adjust the language they use according to their expectation of the systems performance?

  16. Research Problems しかし、ゆっくりと、彼らはじめに活動をし始めた、そして、終わりまでには。。。 しかし、アクティビティがはじまると、彼らは徐々にアクティビティをまじめに取り組みはじめて。。。 Changing a term: Changing the wording: 彼らが1ヵ月活動に来なくて […] していたからであるだろう。 常に一生懸命に。。。 Machine translated sentences correction example.Correcting bad translation: 常に熱心に。。。 彼らが先月が休みのため  […] していたためである。 • There is a great need for translation correction, but since the MT quality is not up to par the work load is big. • No incentive to perform cumbersome translation correction • If the meaning is understood, why bother to correct the translation? • If the MT quality is better in English, why use German?

  17. Solution Discussion

  18. Solution Discussion • Development of the user interface. • Encourage users to communicate through the system. • User friendly UI for translation correction (correction in increments, collaborative translation view). • Develop an incentive through the UI for translation correction (display the user name of the corrector). • Add a method to display that the received message was understood. • Make the machine translation as invisible as possible for the users. Development of the database. Add new variables/tables to the database: Indicate if the translated text is modified by the user and by which user. Store the messages of all users in relation to the topic (not just the last poster). Store the original translated messages and modified messages separately.

  19. Future Research

  20. Future Research Controlled experiments in collaboration of the target organization User questionnaire and interview on user preferences and experiences. Translation correction in a controlled environment. Participant observation of the interaction between the system and the users. Develop a set of variables for the log data. Develop a log-data viewer for the system. Develop ideas for the system and UI improvement. More user friendly Incentives to do tasks with the system Incentive to use the system for actual communication Categorize translation corrections when more data is available. • With more data on the users and the system analyze the effects of machine translation on CSCW social capital and building and intra-organizational social networks.

More Related