140 likes | 264 Views
6th International Conference on Education and New Learning Technologies Barcelona , 7th - 9th of July 2014. The use of an intelligent forum crawler for data retrieval from e-learning portals.
E N D
6th International Conference on Education and New Learning Technologies Barcelona, 7th - 9th of July 2014 The use of an intelligent forum crawler for data retrieval from e-learning portals Miloš PavkovićandJelicaProtić, University of BelgradeSchool of Electrical Engineering, Belgrade, Serbia
Introduction • A large number of forums with different topics • Forums are often used by students during their studies • Large number of relevant information scattered around different forums inside one university domain • Forums are based on different technologies
Issues • The same topic can appear across different forums inside one university domain • School official forums VS. departments independent forums • Same documents can be uploaded as post attachments to a couple of different web forums • Similar courses at different schools
Solution – Specialized crawler • Specialized forum crawler • Aggregation of crawled data from multiple forums of a single university domain • Storing data into database • Forum modules that use this database for helping students
Forum structure • Always defined by presented implicit paths • Example of a) forum b) thread c) attachments inside post.
Crawler algorithm • FCbRE – Forum Crawler based on Regular Expressions • Automated system • Identifying DOM structure and basic forum elements with regular expressions. • Identifying forum implicit paths using regexExample: >>index\.php\?showforum\==\digit+!>+>\P=!<+ • Extraction of post content and storing into the database
Web Forum Threads Forums Posts Attach - site id - site name - site link + site id - forum id - forum name - forum link + forum id - thread id - thread name - thread link + thread id - post id - post info + post id - attach id - attach name - attach link T – Simil. A – Simil. F – Simil. F/T – Simil. + thread id (1) + thread id (2) + attach id (1) + attach id (2) + forum id (1) + forum id (2) + forum id + thread id Crawler database • Essential in FCbRE model • Forum threads and posts are separately stored • Similarity tables that contain unique pairs of identifiers of forums, threads and attachments
Finding similarities • Determining similarities of forums, threads or document names • It is not enough to just compare the words • grammatical errors • Singular/plural form • different form but the same semantic meaning • Using existing search engines to distinguish semantics • FCbRE uses low-level semantic difference
Module plugins • Two module plugins • FCbRE-S (FCbRE Search plugin ) • FCbRE-DP (FCbRE Duplicate Prevention plugin) • Both used for experimental purposes • Written for vBulletin technology • Can be adopted for any other forum technology
FCbRE-S (FCbRE Search plugin ) • Designed for standard forums searches • Forwards the requested query to FCbRE database for similarity comparison • All similarities are shown as addition to standard search results
FCbRE-DP (Duplicate Prevention plugin) • Implemented in the section where the users can create a topic or forum • Monitors the field for the name of new thread or forum • Notifies the user that the similarity exist
Results • 9 web forums from the University of Belgrade, manually gathered • This group is a mixture from different sources • Percentage of similar forums is smallest, while for the document is highest • True percentage of "useful" duplicates should be taken with caution
Conclusion • The proposed solution performs information aggregation of related forums • It has potential in reducing duplication of forums, topics and posts • The use of plugins would result in higher forum content quality
Thank you! Feel free to contact us and ask any question that you may find interesting milos_pavkovic@yahoo.com jeca@etf.bg.ac.rs