150 likes | 390 Views
Researcher Portal. ——Group 26 WANG Jingwei JIANG Yu. Introduction. Data Sources. Data Collection. Keyword Search Scrapy , PHP Keywords Algorithm Databases Data mining Computer Computing Internet Network Recognition Software System. Schema Mapping.
E N D
ResearcherPortal ——Group 26 WANG Jingwei JIANG Yu
Data Collection • Keyword Search • Scrapy, PHP • Keywords • Algorithm • Databases • Data mining • Computer • Computing • Internet • Network • Recognition • Software • System
Schema Mapping • Delete unnecessary attributes • Repetition: eg. journal & alternate journal • Useless: eg. rec-number, db-id, Reference count, PDF link, Patent Citation Count… • Redefine data • String → Int: Year, Time cited, Volume, Pages • Replace inconsistent name • Different name of attributes in different datasets have the same meaning • Start page, End page -> Pages • Journal / Book title -> Publication • … • Split data: Author, Paper
Result IEEE SCI DBLP • Author • Count: 4436 • Paper • Count: 8103 • Author: • Count: 5141 • Paper: • Count: 5002 • Author • Count:8489 • Attribute:Author,Title • Paper: • Count:11010 • Attribute: Id, Score, Title, Author, Venue, Volume, Number, Pages, Year, Type, URL, Publisher, URL7
Evaluation • Explanation • M1: Method 1 - Take the most resent value • M2: Method 2 - Take the most often occurring value • Reduction Ratio =
Measure • Explanation • M1: Method 1 - Assume one data source is accurate • M2: Method 2 - Take the most resent value • Reduction Ratio =