210 likes | 367 Views
Accessing an Information System by Chatting Bayan Abu Shawar and Eric Atwell bshawar@comp.leeds.ac.uk, eric@comp.leeds.ac.uk School of Computing University of Leeds. Presentation Outline Introduction. Chatbot and corpus definitions. ALICE chatbot system. What has been done so far.
E N D
Accessing an Information System by Chatting Bayan Abu Shawar and Eric Atwell bshawar@comp.leeds.ac.uk, eric@comp.leeds.ac.uk School of Computing University of Leeds
Presentation Outline • Introduction. • Chatbot and corpus definitions. • ALICE chatbot system. • What has been done so far. • System architecture of the Qur’an chatbot. • Results and Evaluation.
Introduction • Methods of Accessing an information system: • Information Retrieval (IR): which retrieve a relevant subset of documents from a large set. • Information Extraction (IE): which is the process of extracting specific pieces of data from documents to fill a list of slots in predefined templates. • We presented another way to access an information system using a chatbot tool.
Definitions • A Chatbot is a computer program, which is designed to simulate human conversation. • The user chats with the bot using textual or spoken natural language. • The chatbot must have access to knowledge (e.g., set of input/output rules), to accept input and match it against the rules to generate replies in the conversation. • We developed a machine learning approach to automatically generate chatting rules from machine readable text (corpora) and convert it to the ALICE chatbot format.
ALICE System • ALICE: the Artificial Linguistic Internet Computer Entity; a software robot that you can chat with using natural language. • ALICE is composed of two parts: • Chatbot Engine • The language model • ALICE language model is stored in AIML files. • AIML:The Artificial Intelligence Mark up Language.
The AIML Format <aiml version=”1.0” > < topic name=” the topic” > <category> <pattern>PATTERN</pattern> <template>Template</template> </category> .. </topic> </aiml>
Implementing a Java Program • The primary goal of chatbots is to mimic real human conversations. We developed a Java program to read from ‘real’ human dialogues and generate conversational rules for the ALICE chatbot. • The program reads a dialogue corpus • Converts the dialogue transcript to AIML format. • The output AIML is used to retrain ALICE.
The Aim of the Automatic Process • Saving time and effort in encoding the knowledge manually. • Generating different versions of the chatbots that are not restricted to specific language and/or domain. • Creating versions that simulates ‘real’ human conversation. • Machine Learning Approach • Using most significant word approach: based on the fact that usually people respond according to the most significant word. • A frequency list has been obtained form each corpora then used to generate the least frequent word.
The Dialogue Corpora Used so Far • Minnesota: French dialogue corpus. • Spoken Afrikaans: Afrikaans dialogue corpus. • British National Corpus (BNC): Spoken transcripts.
The Holy book of Islam (Qur’an) • The Qur’an is written in the classical Arabic form. • Qur’an consists of 114 soora (chapters), which are grouped into 30 parts. • Each soora consists of sequential verses (sections).
The Original English Text Format of Qur’an Sample: THE DAYBREAK, DAWN, CHAPTER NO. 113 With the Name of Allah, the Merciful Benefactor, The Merciful Redeemer 113.1 Say: I seek refuge with the Lord of the Dawn 113.2 From the mischief of created things; 113.3 From the mischief of Darkness as it overspreads; 113.4 From the mischief of those who practise secret arts; 113.5 And from the mischief of the envious one as he practises envy.
Using the Qur’an as a Trainable Corpus • We selected the Qur’an to illustrate: • Whether or not we could access an information source via chatting? • How to convert a written text to the AIML format? • How to adapt ALICE to learn from a text which is not a dialogue transcripts? • How to adapt ALICE interpreter to recognise the Arabic characters?
The Qur’an chatbot • In this chatbot we used the parallel corpora English/Arabic text. • Input: a statement, question or a verse in English. • Output: Verse(s) extracted from Qur’an in both English and Arabic • Problems raised: • How to divide a non conversational text into utterance-like chunks? • How to enable ALICE interpreter to recognise Arabic characters?
Dialogue corpora Text pre-processing Phase Preparation Filtration Reiteration Reading files RestructuringPhase AIML FILE Investigation Patterns Equalities String Vector Investigation Templates Equalities System Architecture
Results Two versions of ALICE have been generated: Qur’an0-13:http://www.pandorabots.com/pandora/talk?botid=832e8a088e348e83[link] Qur’an 14-114: http://www.pandorabots.com/pandora/talk?botid=b60ec43cbe3481fe [link]
Sample of Chatting Dialogue Human: Is it right that our protector is Allah? Qur'an14-114: {Muhammad (Muhammad)} 11 That is because Allah is the Protector of those who believe, but those who reject Allah have no protector.ذَلِكَ بِأَنَّ اللَّهَ مَوْلَى الَّذِينَ آمَنُوا وَأَنَّ الْكَافِرِينَ لَا مَوْلَى لَهُمْ (سورة محمد)11{The Morning Star (At-Tariq)} 4 There is no soul but has a protector over it.إِن كُلُّ نَفْسٍ لَّمَّا عَلَيْهَا حَافِظٌ (سورة الطارق)4 Human: Allah, the eternal, Absolute Qur'an14-114: {Sincerity (Al-Iklas)} 3 He begetteth not, nor is He begotten;لَمْ يَلِدْ وَلَمْ يُولَدْ (سورة الإخلاص)3 Human: He begetteth not, nor is He begotten Qur'an14-114: {Sincerity (Al-Iklas)} 4 And there is none like unto Him.وَلَمْ يَكُن لَّهُ كُفُوًا أَحَدٌ (سورة الإخلاص)4
Evaluation • Problems of evaluation: • It is not easy to evaluate this kind of general information access. • The source is not in terms of specific questions. • Can not count numbers of hits in order to compute precision and recall scores.
Evaluation (cont.) • Evaluate the System in Terms of: • Achieving the proposed objectives. • Finding possible usefulness for the system. • User satisfaction.
Evaluation (Cont.) • We achieved our goals which are focusing in using a text which is not conversational in its nature and using the Arabic language. • The feedback from users were as follows: • Some users found the tool unsatisfactory since it does not provide answers to the questions. • Others found it interesting to: • Know more about Qur’an. • Find out from which soora a certain verse came from.
Conclusions • We presented a novel way of accessing information from an online source by having an informal chat. • The system may use as a search tool for verses that hold same words but have different connotations. • It may be good to know the soora name of a certain verse. • Students could use it as a new method to recite the Qur’an.