by Santi Caballé , Claudi Paniagua, Fatos Xhafa, and Thanasis Daradoumis

Second International Workshop on Grid Computing and its Application to Data Analysis GADA'05Agia Napa, Cyprus – November, 1-2 2005 A Grid-aware Implementation for Providing Effective Feedback to On-line Learning Groups bySanti Caballé, Claudi Paniagua, Fatos Xhafa, and Thanasis Daradoumis Open University of Catalonia Barcelona - Spain

Index • Computer-Supported Collaborative Learningis a paradigm for research in educational technology that focuses on the use of Information and Communications Technology (ICT) as a mediation tool within collaborative methods of learning. B. Wasson (1998) • In CSCL environments, the analysis of the information related to the collaborative group activity is crucial for understanding collaboration and group processes. P. Dillenbourg (1999) • Introduction:the process of embedding information and knowledge into CSCL applications. • Approach: need for structuring and processing of large amounts of group activity information. • Problem: lack of computational resources. • Solution: a Grid-aware approach based on the Master-Worker paradigm. • An application: a Grid-based prototype to process group activity log files. • Processing results: empirical analysis. • Conclusions and future work.

Introduction (I): The process of embedding information and knowledge into CSCL applicationsThe whole picture Four stages in event management: • Classification, processing, analysis and presentation.

Introduction (II): The process of embedding information and knowledge into CSCL applicationsStage I: Classification • Collection of information. • Extraction of actions. • Identification of events. • Categorization according to • Task performance • Group functioning • Scaffolding • Store as system log files. Classification in synchronous environments is very similar.

Introduction (III): The process of embedding information and knowledge into CSCL applications Stage II: Processing • Obtain event information from large log files. • Process log files according to desired criteria. e.g. • time • workspace • Store processing results in a suitable database. Processing of events needs great computational power.

Introduction (IV): The process of embedding information and knowledge into CSCL applications Stage III: Analysis • Need for extracting complex knowledge from the database. • Define consulting criteria. • Send criteria and data to external statistics package. • Obtain useful statistical results from the analysis. External analysis offers the best existing statistical package.

Introduction (V): The process of embedding information and knowledge into CSCL applications Stage IV: Presentation • Predefine an XML coding to represent ad hoc statistical measurements. • Structure statistical results into XML output. • Convert XML into desired presentation format. • Present results to users. Users receive constant knowledge in terms of appropriate feedback to influence their motivation and emotional state.

Approach (I) Motivation • Support for real on-line environments with a large number of students and tutors that are geographically distributed. • High degree of user-user and user-system interaction generates lots of event information. • Constant provision of complex knowledge to group participants. • Need to supply efficient and useful feedback for improving the motivation, emotional state, and problem-solving abilities of groups in on-line collaborative learning.

Approach (II)Context at Open University of Catalonia • Group activity at Open University of Catalonia involves hundreds of students and dozens of tutors in several on-line courses. • The complexity of the learning practices entails intensive collaboration activity. • BSCW is used as a groupware system to capture group activity interaction in log files. • BSCW does not provide log file processing nor statistical analysis capabilities. • BSCW generates a huge daily single log file and does not classify nor structure data in any way.

Statement of the problem Lack of computational resources • Need for processing of a huge amount of event information gathered in single log files. • Essential to constantly dispose the processing results of group activity in real-time. • Event information in log files should be partitioned in multiple log files according to particular needs. • Event information must be constantly processed in an efficient manner during the processing stage. • Lack of sufficient computational resources is the main obstacle to the constant processing of multiple data log files in real time.

Solution (I)Redefining the processing stage • Obtain event information from large log files. • Structure the information according to particular needs. • Create log files of different degrees of granularity. • Process all log files at the same time. • Store results in the database. Need for the processing of all log files to be parallelized.

Solution (II)A Grid-based solution • Grid technology provides broad access to massive information and computational resources. • In this context, Grid computing paradigm • overcomes the lack of computational resources to process a large amount of event information. • allows processing of the log files taking advantage of the parallelism inherent in the distributed nature of Grid. • provides load balance in the processing of log files of different granularity. • Master-Worker paradigm using Planetlab platform, a Grid-based approach for processing log files.

Solution (III) Master-Worker paradigm • Distinguishes two types of processors: • master: performs the control and coordination tasks. • workers: perform most of the computational work. • Advantages: • flexibility: workers can be implemented in different ways. • scalability: workers can be easily added. • separation of concerns: master does coordination and workers do specific tasks. • Target: parallel applications with weak synchronization and reasonably large grain size.

Solution (IV) Architecture The architecture of an application for processing log files.

Solution (V)Implementation (I) • The workers receive and do the following task (MWTask) : address of the location of the log file; name of the log file; size of the log file; address of the location where the processing routine is found. url of the database where the processed information will be stored; • The master processor (MWDriver) is programmed as follows: while (true) do check for new log files generated from the Collaborative Learning Application Server; update the list of the <log file description> for the new incoming log files; for each newlog file generate a task; submit the newly generated;

Solution (VI)Implementation (II) • The worker processor (MWWorker) is programmed as follows: receive the task; receive the specified log file from the specified location in the task description; run the processing routine on the log file; send the master the task’s report (execution time,…) on completion; send the database the processing results; • Efficiency issues: • weak synchronization between master and worker ensures the application runs without loss of performance. • log files with different granularity allow an efficient load balance among workers and minimizes data transmission. • number of workers can be adapted dynamically when a new resource appears.

A Grid prototype (I)An application for processing log files • EventExtractor : an ad hoc application for extracting event information from BSCW • converts event information into well-formatted data. • stores the extraction results in a database. • needs a lot of time to process sequentially. • MW model: appropriate in this context given that • log files of different granularity are processed. • workers are not synchronized between them. • communication load between master and workers are low. • Planetlab platform: using a real Grid environment • by installing the Globus Toolkit 3 Grid service container, • and deploying the prototype on Planetlab.

A Grid prototype (II)Master-Worker algorithm (I): overview • A minimal Grid implementation made up of: • the worker as a Grid service that does the main work by the next steps: • wraps the EventExtractor routine, • publishes an interface that the master calls in order to dispatch a task, • passes a string representation of the events to be processed, and • returns a data structure containing performance information. After completion the task, the worker is put back into a queue of idle workers • the master first obtains the event log file to be processed, the available workers, the task size to be dispatched to workers and the number of workers to use that put in an idle queue. Then enters the next loop: • reads a specific number of events from a event log file, • calls an idle worker and sends it the events to be processed, The master exits the loop when all events in the current log file have been read and all tasks to be dispatched have been finalized.

A Grid prototype (III)Master-Worker algorithm (II): the Master • The Master implements the EventExtractorMaster interface with a single operation to call the worker’s processEvents operation • returns performance statistics about the execution. • The EventExtractorMasterImp class aggregates an instance of EventExtractorMasterDispatcherto dispatch all tasks to available workers.

A Grid prototype (IV)Master-Worker algorithm (III): the Task private void _dispatchEventsToWorker(String events, long nEvents,double workerDBInsertTime, EventExtractorMasterStatsBean masterStats) throws Exception { EventExtractorWorker worker = null; worker = m_queue.getNextWorker(); this.beforeDispatch(worker); EventExtractorWorkerStatsBean workerStats = worker.processEvents(events.toString(), workerDBInsertTime); this.afterDispatch(worker); this.decrementPendingDispatchs(); } This operation synchronously sends a sequence of events (single task) to an available worker.

A Grid prototype (V)Master-Worker algorithm (IV): the Task • Two strategies to dispatch tasks to workers • by blocking up to the queue of idle workers is empty. • by implementing the queue of idle workers with the round-robin scheme.

A Grid prototype (VI)Master-Worker algorithm (V): the Worker • The worker grid service implements the EventExtractorWorker interface which has only a single operation: processEvents(String events, double dbInsertTimeInMs); • The implementation parses the events passed in order to extract the required information • processEventsreturns a data structure with performance information about the task executed (elapsed time, number of events and bytes processed).

A Grid prototype (VII) Test battery • An ad hoc test battery was designed made up of: • exhaustive collection of log files • from the spring term of a course with 140 students arranged in 5-member groups and 2 tutors. • a selected sample of a few log files • as a representative stratum of file size and event complexity. • All test battery was processed by the EventExtractor on single-processor nodes of Planetlab • involving usual configurations. • with different work load. • repeating the execution several times.

Experimental results (I) Sequential approach Comparison scale for 8 representative log files Results of over 100 log filesprocessed • Sequential processing shows that the processing time is linear on the log file size processed.

Experimental results (II)Parallel approach (I) • The parallel processing results were obtained by • running tests for different task sizes and number of workers • observing efficiency and speed-up for each set of workers Observed speed-up and efficiency for 5-event task and different number of workers

Experimental results (III) Parallel approach (II) • Reasonable speed up is achieved in every test • however, parallel efficiency tends to decrease with the number of workers. Observed speed-up with increasing number of workers

Experimental results (IV)Analysis of the results • Apart from very small task sizes, the speed up observed showed the feasibility of the parallelization. • small task sizes were affected by the transmission time. • The more workers used in our tests the further to the maximum was the speed up achieved • trade off between number of workers and task size. • Results were a little biased due to the homogeneous behaviour observed in Planetlab • they should be adjusted to the dynamic workload of a real Grid. • Results are dependent on the low complesity of the BSCW’s lof files • event complexity is the key to take advantage of the Grid.

Conclusions and future work • Efficient embedding of information and knowledge into group activity is a crucial factor for the success of the online collaborative learning activity. • Strong need for computational resources to process large amounts of group activity log data. • Grid-aware application based on the Master-Worker paradigm for processing log files of group activity in an efficient yet simple manner. • According to the results, the benefits of Grid enhances depending on the volume and complexity of event log files to be processed. • We plan to improve our prototype in terms of communication master-workers, fault-tolerance and dynamic discovery of idle workers.

Thank you ! Questions?

by Santi Caballé , Claudi Paniagua, Fatos Xhafa, and Thanasis Daradoumis

by Santi Caballé , Claudi Paniagua, Fatos Xhafa, and Thanasis Daradoumis

Presentation Transcript

Developing a Research Trajectory

Experimental Psychology