1 / 21

The EP-INV- Patstat db and preliminary results

The EP-INV- Patstat db and preliminary results. Andrea Maurino DISCo - Dip. di Informatica, Sistematica e Comunicazione Universit à di Milano Bicocca viale Sarca 336/14, 20124, Milano (Italy ). Index. APE-INV project EP-INV- PatStat Feedback Web application Preliminary results

pahana
Download Presentation

The EP-INV- Patstat db and preliminary results

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The EP-INV-Patstat db and preliminary results Andrea Maurino DISCo - Dip. di Informatica, Sistematica e ComunicazioneUniversità di Milano Bicoccaviale Sarca 336/14, 20124, Milano (Italy)

  2. Index • APE-INV project • EP-INV-PatStat • Feedback Web application • Preliminary results • Ongoing works ••• ITIS Lab •••http://www.itis.disco.unimib.it

  3. A preliminary truth • The world is dirty! • and • Real world data are dirty! • A mandatory and prelimnary task before to realize any analysis or statistic is • Clean your data ••• ITIS Lab •••http://www.itis.disco.unimib.it

  4. Disambiguation of academic inventors: ESF-APE-INV www.academicpatenting.eu Project chair: Francesco Lissoni (uniBocconi) Technical Manager: Andrea Maurino (uniMiB) Project steps: • Reclassification of all patents by inventor (INV) • Matching between inventors and academic scientists (APE) • Results expected: To produce a freely-available database of “Academic Patenting in Europe” ••• ITIS Lab •••http://www.itis.disco.unimib.it

  5. EP-INV-PatStat PATSTAT_PUBL_NR • PATSTAT_APPL_ID INVENTORS_INFO DISAMBIGUATION ••• ITIS Lab •••http://www.itis.disco.unimib.it

  6. Which is the part of PatStat interested by disambiguation? Users should not consider these tables, SUBSTITUTIVE TABLES with disambiguated inventors and inventors information are provided by APE-INV project Source: PatStat documentation

  7. INVENTORS_INFO • INVENTORS_INFO table • CODINV2 • NAME-SURNAME • COUNTRY / GCOUNTRY • STATE • REGION / GREGION • COUNTY / GCOUNTY • CITY / GCITY • STREET / GSTREET • ZIP / GZIP • LONGITUDE • LATITUDE • GACCURACY • Fields preceded by letter G are the result of Google-based standardization algorithm, all the other fields are cleaned PatStat addresses (eg. CITY and GCITY) • We report Google information only when GACCURACY is larger than or equal to 6 (i.e. Address is available at the level of Street).

  8. From APE-INV to PatStat, PATSTAT_PUBL_NR and PATSTAT_APPL_ID In order to connect DISAMBIGUATION and INVENTORS_INFO tables with PatStat dataset we include in the repository other two tables: • PATSTAT_PUBL_NR • allows to link each inventor (as identified by the CODINV2 code in the APE-INV dataset) to her granted patents (PUBLN_NR). • PATSTAT_APPL_ID • Allows to identify the APPLN_ID corresponding to each PUBLN_NR (NB In the specific case of EP patents there is a one-to-one correspondence between APPLN_ID and PUBLN_NR). • The table reports also the information of the PatStat edition the APPLN_ID refers to.

  9. DISAMBIGUATION.txt • DISAMBIGUATION table • CODINV2: is a stable key generated within the APE-INV project. It identifies uniquely any distinctive combination of inventor and address • CODINV: is a code associated to each CODINV2 after applying the disambiguation procedure. If two or more distinct CODINV2s are found to be the same person, they are assigned the same CODINV

  10. Feedback web application ••• ITIS Lab •••http://www.itis.disco.unimib.it

  11. Why sharing data • Instead of looking for one golden algorithm, APE-INV proposes data dissemination and users’ feedback recording • 2 kinds of users: • Take the data and run (dissemination only): they use the data in their studies a-critically. No benefit for the project, risky for them (data are disambiguated according to the state-of-the-art of dissemination techniques, but we can always do better..). • Critical users (dissemination+feedback): they use the data, usually sub-samples of the whole dataset, and have the possibility to increase the data quality: • Hand checked data and survey work on smaller samples • Algorithms fitting better sub-sample specificities (es. Country, firm, technological field) • Data sources external to PatStat helping the disambiguation effort ••• ITIS Lab •••http://www.itis.disco.unimib.it

  12. How does data dissemination work? • Access http://www.ape-inv.disco.unimib.it/ with id and password • Choose the country(s) of inventors you need (eg. My research is on Italian inventors) • Get the EP-INV dataset and the CONTROVERSY.txt Query results in txt format.

  13. Some results ••• ITIS Lab •••http://www.itis.disco.unimib.it

  14. Number of academic patents, 1996-2006

  15. Ownership distribution of academic patents lower bound estimates

  16. Ownership distribution of academic patents, upper bound estimates

  17. Ongoing works ••• ITIS Lab •••http://www.itis.disco.unimib.it

  18. Temporal Record linkage • “Pantarei” (Heraclitus) everything flows, everything is constantly changing. • Database may keep trace of these never ending changes • Examples • People change names • Xin Dong Xin Luna Dong • People change works • Havely moves from Univ. of Wa. to Google • Nations change • YUGOSLAVIA  Serbia-Montenegro Serbia Kosovo • Based on the paper P. Li, X.L.Dong, A.Maurino, D.Scrivistava, linking temporal data, VLDB 2011 ••• ITIS Lab •••http://www.itis.disco.unimib.it

  19. An example

  20. Experimental Evaluation • Effectiveness test: • Data set: patent data, 1871 records, 359 entities, in 1978-2003 • Comparison: three existing algorithms, w./o. decayed similarity

  21. Thanks! 疑问 ••• ITIS Lab •••http://www.itis.disco.unimib.it

More Related