140 likes | 269 Views
The NBER Patent Data Project: Past data uses and future plans. Prof. Bronwyn H. Hall University of California at Berkeley, University of Maastricht, NBER, and IFS London. Outline. Currently available NBER patent data from the USPTO Uses of these data
E N D
The NBER Patent Data Project: Past data uses and future plans Prof. Bronwyn H. Hall University of California at Berkeley, University of Maastricht, NBER, and IFS London
Outline • Currently available NBER patent data from the USPTO • Uses of these data • The new PDP (Patent Data project) at NBER • What do we add • Where do we stand now • discussion EPIP Bocconi Workshop
NBER Patent Citations Data File • ~3 million U.S. patents granted between January 1963 and December 1999 (now updated to 2002) • Patent number, application and grant dates • Name of first inventor; name and type of assignee • Country and state of first inventor • Main US patent class; number of claims; main IPC class from 76 • Number of citations, forward and backward; generality and originality measures based on citations • All citations made to these patents between 1975 and 1999 (over 16 million) • Match of patenting organizations to Compustat (the data set of all firms traded in the U.S. stock market) • Available at • www.nber.org/patents • emlab.berkeley.edu/users/bhhall/bhdata.html (2002 update) EPIP Bocconi Workshop
Use of NBER patent data • >100 significant research projects (at least one quarter outside the US) • ~100 published papers • ~50 doctoral dissertations in accounting(1), agric econ (1), econ (22), finance (3), history (1), info tech (1), law(2), management (15), public policy (1), unknown(3) EPIP Bocconi Workshop
Research areas • Positive • Individual inventors – migration and co-invention, spillovers • Organizations, networks, and innovation • Geography of innovation • Knowledge spillovers, local and international • Citations as a value indicator • Normative • The patent explosion and its implications for firms, the patent office, and social welfare • Patent policy – legal and administrative; the examination process • Patent and patent litigation strategy • University and laboratory patenting EPIP Bocconi Workshop
The PDP project at NBER • A new project to update and extend the publicly available USPTO data • Principal investigators: • Iain Cockburn (BU), Bronwyn Hall (UCB), Walter (Woody) Powell (Stanford), Manuel Trajtenberg (Tel Aviv) • Senior investigators: • Ajay Agarwal (Toronto), James Bessen (BU), Stuart Graham (GA Tech), Megan Macgarvie (BU) EPIP Bocconi Workshop
Database design principles • Accessibility • Provide public tools (xml based) to allow others to extract data • Modularity (as in the OECD/EPO effort) • Linking out - an open source-like environment so that others can link their data to the patent data • Provide attribution and citation so contributors are recognized • Annotation • By users – e.g., error correction, identification of SW or gene patents, etc EPIP Bocconi Workshop
Tasks and objectives • Update existing data to 2007 • Clean and standardize • Compute normalization coefficients to correct for truncation, differences across fields in citation practice • Additional data (see next slide) • Link outs to • Patstat • Litigation data • Assignee name data • Geo-coded data EPIP Bocconi Workshop
Additional data • Detailed tech class info – full set of USPC and IPC codes • Priority information • Foreign application data • Continuation and divisional relationships • Multiple assignee information • Inventors • names for tracking migration, co-invention, etc. • detailed location info for all inventors • Source of citations (applicants vs. examiners) • Attorney and patent agent names • Reexamination requests and outcomes EPIP Bocconi Workshop
Currently: cleaning raw data • ASSG - assignees (Cockburn, Agarwal) • CLAS - classification (MacGarvie, Hall) • FREF - foreign references (Cockburn) • GOVT - government interest (Graham) • INVT - inventors (Cockburn, Agarwal) • LREP - legal representatives (not used for now) • OREF - other references (not used for now) • PATN - basic patent info (Bessen) • PRIR - priority information (Graham) • PCTA - PCT information (Cockburn) • REIS - reissue information (Graham) • RLAP - related application info (Graham) • UREF - US references (Cockburn, Trajtenberg) EPIP Bocconi Workshop
Geographical data (inventor and assignee address) • data from USPTO (to be cleaned) • City (8000 Munchen?) • State (CA vs CA) • Country • data from non-USPTO sources • Regions (SMSA, Canadian provinces, European NUTS/LAU regions) • Latitude-longitude coordinates • Need to normalize geographical names to match source e.g., Vienna vs. Wien) EPIP Bocconi Workshop
Assignee names • K U Leuven project for Eurostat OECD/WIPO/EPO consortium • Derwent world patent index • 117 pages of 4 char codes for std names • A large number of rules (Russian inst, Japanese cos, etc.) • Rules for nonstandard co codes • IFS project – more later • One question: what to do about the extended character set? • München vs Munich vs Muenchen vs Munchen • Derwent uses ue EPIP Bocconi Workshop