120 likes | 470 Views
WP6: Software Platform and Tools. Lead: UDE Partners: UMA, CICE, FriontiersIn Month 1 - Month 30. Overview. Bundles a ll activities related to the provision of a software platform hosting tools and services for data mining, crawling and social network analysis
E N D
WP6: Software Platformand Tools Lead: UDE Partners: UMA, CICE, FriontiersIn Month 1 - Month 30
Overview • Bundles all activities related to the provision of a software platform hosting tools and services for data mining, crawling and social network analysis • Relies on existing tools, either free and open software or tools owned by the partners • First part: definition of crawling, data mining, storage strategy • Second part: Data transformation for SNA, definition of network based role model and evaluation of these models
Specific objectives • Selection and evaluation of mining strategies • Specification of crawling approach and integration of crawlers • Specification and configuration of a software platform • Preparation / transformation of data for SNA • Specification and modelling of roles and constellations (SNA) • Data analyses and evaluation • Model revision and software adaptation
T6.1 Crawler and mining strategy • Specify requirements for crawling and data mining based on the focused data sources and social models • flexible with respect crawling strategies to be adaptable also to the needs other work packages (esp. the case studies) • integrated and controlled by a framework which handles the storage of retrieved web objects and the notification of newly found relevant data and changes in the data sources. Responsible: UMA (2PM) Contributors: UDE (1PM), CICE (1PM)
T6.2 Semantic evaluation and filtering • Categorize and filter data retrieved from the various data sources • relies on techniques adopted from the field of knowledge discovery in databases (KDD) • encompass the pre-processing of given data in terms of statistical sampling, cleaning and transformation of the data into adequate representations for the subsequent algorithms Responsible: UDE (2PM) Contributors: UMA (1PM)
T6.3 Framework for storage, notification and triggering • Retrigger crawler due changes in data corpus over time • Re-triggering based on a "when appropriate" strategy • recognition of specific events such as new conference announcements or availability of proceedings. • Notify its users about new and relevant findings Responsible: UDE (2PM) Contributors: UMA (2PM), CICE (2PM)
T6.4 Data transformation and structural modeling for SNA • Define a common data format for sharing within consortium based on the identification of relevant communities and their "traces" (communication, co-publications etc.), and based on the general conceptual model (WP 2) • Define and specify typical roles and constellations (e.g. broker) based on SNA techniques (e.g. blockmodeling) • Continuously verification of social indicators Responsible: UDE (2PM) Contributors: UMA (2PM)
T6.5 Software platform • Configure an integrated software platform for crawling/data mining and SNA based on the initial specifications • input relates to the transformation from relevant data sources (specified in T6.4) • output is concerned with visualisation and reporting • Revised and adapt platform according to emerging issues and needs (esp. considering the case studies) Uses freely available (open) software and software owned by the partners (mainly UDE) Responsible: UDE (7PM) Contributors: UMA (4PM), CICE (1PM)
T6.6 Data analysis and evaluation • Test platform with standard cases based on specifications of WP 4 (Measurements and Social Indicators) • early phase: test functioning of the platform and its components (from T6.5) and adequacy of the semantic filters (T6.2) and structural definitions (T6.4). • later stage: evaluate actual performance and community developments in association with the case studies and with WP 4. Responsible: UDE (3PM) Contributors: Frontiersln (2PM), UMA (1PM), CICE (1PM)
Deliverables and Milesones Deliverables • 6.1 Mining strategy and requirements specification for the software platform (RP:UDE,RV:UMA, C: all in /M5) • 6.2 First version of structural definitions (RP: UDE, RV: UMA, C: all in / M10) • 6.3 Configuration, test of the platform and first evaluation report (RP:UDE,RV:CICE,C: all in /M22) • 6.4 Final report and system (RP:UDE,RV:CICE,C: all in /M30) Milestones • MS2, SISOB System first prototype, month 15 • MS3, SISOB Final System, month 30
Tools • Open Source Crawler • DMD –Data-Multiplexer-Demultiplexer • WOS2Pajek, Pajek, and UCINET • CFinder
Challenges • Data model adequate to different data sources • Data model supporting multilevel analysis according to multivocality in project • Merging different types of data • Cleaning data • e. g. researchers having different email • e. g. researchers writing their names in different ways • How to get data from Web 2.0 Platforms like Mendeley