200 likes | 341 Views
Mr. JOTL: A User Friendly Matching Software. Stéphane Lhuillery, Julio Raffo & Fernando Lladós . Outline. Background Objectives & Rationale Results User Friendly Software Concept Alpha test Further steps. Background.
E N D
2nd "NameGame" APE-INV workshop Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós
2nd "NameGame" APE-INV workshop Outline • Background • Objectives & Rationale • Results • User Friendly Software • Concept • Alpha test • Further steps
2nd "NameGame" APE-INV workshop Background • Automatic patent retrieval is becoming compulsory due to the size of data sets. • Growing literature looking at this NameGame: • On firms’ names: Derwent, 2002; Mageman et al., 2006; Hall, 2006; Thoma et al. 2007. • On inventors’ names: Trajtenberg et al., 2006; Hoisl, 2006; Lissoni et al., 2006; Mariani et al., 2007; Raffo & Lhuillery, 2009; etc. • Our ESF Project outcomes: • New matching best practices • APE-INV database
2nd "NameGame" APE-INV workshop Objectives of the NameGame Maximizing True positives ? Minimize False negative (=higher recall) Minimize False positive (=higher precision)
2nd "NameGame" APE-INV workshop Rationale behind: A three step game
2nd "NameGame" APE-INV workshop Examples on matching (EPFL)
2nd "NameGame" APE-INV workshop Examples on filtering (EPFL)
2nd "NameGame" APE-INV workshop What we learned so far? • General • Matching algorithms are not perfect, but improve considerably the results. • Cleaning step • Data origin changes substantially the data preparation process • Matching step • There is a hierarchy pattern across algorithms, although specific to each particular case • Filtering step • Supplementary data availability enhances or constraints the disambiguation process
2nd "NameGame" APE-INV workshop Why to create a user friendly software? ISI Thomson Survey PATSTAT / APE-INV Database PATVAL SCOPUS EU FWProgram
2nd "NameGame" APE-INV workshop Concept behind Mr. JOTL • Intuitive for beginner users • Flexible on inputs and its preparation • Fair variety of standard matching processes • Adaptable on the disambiguation filters • But soundly customizable for advanced users • Conceived and coded to be expanded in the future by multiple developers
2nd "NameGame" APE-INV workshop From concept to real • (ok for the moment just an alpha!)
Disambiguation SSM
2nd "NameGame" APE-INV workshop Let’s test it!
2nd "NameGame" APE-INV workshop Technical notes • OS supported (so far): • Windows XP, Vista, Seven (Server & x64) • Coded in C sharp • Pros: • Free Development Environment • Low cost of entry • Large Developer community • Cons: • Proprietary language and libraries • Less performing memory management • Libraries needed: Scintella: open source lexer, syntax highlighter • Customizable code: • C sharp & VBA • Suggested environment for future development: • Visual Studio (Express version is free to use) • Mono in Linux
2nd "NameGame" APE-INV workshop Further developments • Full coding existing algorithms. • Testing performance against large dataset (>Million records). • Pre-setting standard routines (as XML). • Drafting documentation (+Video). • Proof-testing with first time users (at EPFL).
2nd "NameGame" APE-INV workshop Openness and its governance • How to share it? • GitHub? • Forums • How to develop a dynamic sharing community?
2nd "NameGame" APE-INV workshop Thank you!