380 likes | 498 Views
SEWP Research Conference October 19, 2005. Creating a Longitudinal Research Worker-Establishment Matched Dataset from Patent Data: Description and Application to Understanding International Knowledge Flows.
E N D
SEWP Research Conference October 19, 2005 Creating a Longitudinal Research Worker-Establishment Matched Dataset from Patent Data:Description and Application to Understanding International Knowledge Flows Jinyoung Kim (SUNY-Buffalo) Sangjoon John Lee (Alfred University)Gerald Marschke (SUNY-Albany)
Issues • Construction of a longitudinal research worker-establishment matched panel data • Knowledge flow across national borders
Idea • Policy implications on immigration, labor market, and education arena • productivity of scientific researchers • transmittal mechanism of knowledge • Technology spillover appears to be geographically limited • Firms access externally-located technology partly through hiring of and collaboration with researchers from the outside.
We examined: • Trends in U.S. firms’ access to the researchers overseas and those with foreign research experience in the late 1980s through the 1990s • Role of research personnel as a pathway for the diffusion of ideas from foreign countries to U.S. innovators • The firm-level determinants of accessing innovations developed overseas.
Main findings: • In recent years, an increase in the extent that U.S. innovators access researchers residing in foreign country • The fraction U.S. residents with foreign research experience in US firms appears to be falling. • U.S. pharmaceutical and semiconductor firms are increasingly going to foreign countries to employ such researchers • Retaining researchers with overseas research experience seems to facilitate access to innovations developed overseas. • In the semiconductor industry, smaller firms and older firms are more likely to make use of the output of non-U.S. R&D. • In the pharmaceutical industry, younger firms are more likely to make use of the output of non-U.S. R&D.
Outline • Literature Review • Data Construction Process • Empirical findings • Conclusions
Literatures Various mechanisms for technology and knowledge transfer across institutional boundaries. • Informal Contact • Agrawal, Cockburn, and McHale (2003), Von Hippel (1988) • Spillovers • Henderson, Jaffe, and Trajtenberg (REStat 1998), Jaffe (AER 1989), Zucker, Darby, and Brewer (AER 1998), Audretsch and Feldman (AER 1996), Mowery, Ziedonis (NBER 2001).
Transmission of Tacit knowledge Feldman (1994) • Collaboration and Hiring Cohen, Nelson, and Walsh (Mgt Science 2002), Almeida and Kugot (Mgt Science 1999), Zucker, Darby, and Armstrong (NBER 2001), Adams, Black, Clemmons, and Stephan (NBER 2004)
Data • Patent Bibliographic data (Patents BIB) • U.S. utility patents issued between January 1975 and February 2002. • Patent ID number, patent application and granting, patent assignee, and geographic information (country, state, city, address) on all inventors involved. • The number of patents during this period is 2,493,610 and the number inventor records is 5,105,754
2. ProQuest Digital Dissertations Abstracts • Author, title of dissertation, degree conferring institution, date of degree, academic field, and type of degree • From over 1,000 North American graduate schools and European universities. • For those who earned degrees in all natural science and engineering fields between 1945 and 2003 • 1,068,551 degree holders.
3. The Compact D/SEC • 12,000 publicly traded firms • at least $5 million in assets and at least 500 shareholders • Information obtained from Annual Reports, 10-K and 20-F filings, and Proxy Statements for those companies. • pharmaceutical and semiconductor firms in the Compact D/SEC data by their primary SIC. • selected only the years 1989 through 1997 due to patent grant lag
4. Standard & Poor’s Annual Guide to Stocks – Directory of Obsolete Securities • histories of firm ownership changes due to mergers and acquisitions, bankruptcy, dissolution, and name changes, updated through December 2002. 5. NBER Patent-Citations • collected by Hall, Jaffe and Trajtenberg (2001) • all citations made and received by patents granted between 1975 and 1999. (16,522,438 citation records) 6. Thomas Register • Firm founding year
3 Steps in Data Construction Citation S&P • Identifying the same inventor among ‘same/similar’ names (Patent BIB) • Identifying the Ownership Structure of Subsidiaries (Compact D/SEC, S&P) • Combining Patent-Inventor Data with Firm Data and Patent Citation Data Proquest Patent BIB Compact D/SEC Thomas +
Step 1: Identifying the Same Inventor • Inventor name variants Adam Smith vs. Adam Smith? Adam E. Smith vs. Adam Smith? Adam Smyth vs. Adam Smith? : :
The size of data (1975-2002) 2,493,610 million patents 5,105,754 million inventor names • Name of the inventor (last, first, middle, surname modifier) • Street address, zip • City, state, country Over 16 million patent citations (A. Jaffe)
How to identify? • Pair each name with other names and compare N(N-1)/2 number of unique pairs. = (5,105,754 x 5,105,753) / 2 ≈ 13 trillion pairs • Trajtenberg (2004)
How to Identify? a. The pair is a ‘Match’ if • Last names (SOUNDEX coded) and First Names in the pair are the same and • at least one of below categories are the same • Full Address: same street address+ city + country • Self Citation: same name is found in the patent that is citing • Shared Partner (s): two names from the pair share the same partner c.f. Strong Criteria (Trajtenberg 2004)
SOUNDEX Coding Method • Code on the way a last name sounds rather than the way it is spelled. • Expand the list of similar last names to overcome the potential for inconsistent foreign name translations into English. PETTIT (P330000), Chang (C520000), Chiang (C520000) • Giving letters numerical values from 1 to 6 1 for B, F, P, V; 2 for C, G, J, K, Q, S, X, Z; 3 for D, T; 4 for L; 5 for M, N; 6 for R; 0 for punctuation, H, W, Y
b. The pair is a ‘Match’ if • Full Last (not a Soundex coded) and First Names in the pair are the same and • at least one of below categories are the same • Zip Code • Full Middle Name c.f. Medium Criteria (Trajtenberg 2004) c. The pair is a ‘Mismatch’ if middle name initials are different.
Impose Transitivity A matched to B B matched to C, Amatched toC
An Example • Match: 1:2 , 1:5, 1:6, 2:3, 2:4, 2:5, 2:6, 5:6: 3:6 • -ID 5 is identified to be the same inventor through Transitivity
126 mismatches found after imposing transitivity • 3 categories of Mismatches i) from data error ‘Laszlo Andra Szporny’ vs. ‘Laszlo Eszter Szporny’ ii) Inventor with 2 Middle names iii) same Last and First names appear in the same patent
Matching Results • 2.3 million unique inventors (45%) out of 5.1 million names c.f. Trajtenberg (2004) • 1.6 million distinctive inventors (37%) out of 4.3 million names. (Our patent database is larger because it includes additional years, 2000-2002.) • a matching criterion of the same Assignee -> can yield a bias in mobility among inventors. • assigns scores for each matching criteria • Instead we apply the criterion that two inventors are not treated as a match if their middle name initials differ. • SOUNDEX coding system sometimes so loosely specifies names that apparently different last names are considered a match.
Add Dissertation Abstract Information to Inventor data • Match degree holders in the Dissertation Abstract data with the Inventor data. • contains a full name in a string for each individual author • Convert the last, first, middle names in the inventor data to a string of aggregated names • 64,507 (3 percent) Ph.D. or equivalent degree holders out of 2.3 million uniquely identified inventors
Step 2: Ownership Structure of Subsidiaries • Necessary when combine firm-level information with patent data file • Patent Assignee: either a parent firm or its subsidiaries. • Firm identifier does not exist. • Frequent changes in firm ownership and corporate names - During 1989 and 1997, 152 firms were merged, 15 firms were acquired, 145 firms changed their firm names • Firm ownership structure of subsidiaries, M&A, and name change history • Relate each assignee to a firm • Enables to identify each inventor’s firm for which he/she is innovating
Select two industry firms in the Compact D/SEC • Primary SIC 2834 (pharmaceutical preparation) or Primary SIC 3674 (semiconductor and related devices) 2. Use S&P data • whether the change of an inventor’s firm is due to firm-level M&A and/or corporation name changes. 3. List of subsidiary in the Compact D/SEC throughout the period 1989-1997 • not always complete – • if once a subsidiary of the firm, it is a subsidiary throughout 1989-1997 4. Combined firms’ founding year
Step 3: Combining Inventor data with firm data and Patent Citation data • Combine inventor file with firm-level data • Patent-inventor-firm matched data • Link to Hall, Jaffe, and Trajtenberg citation data (2001) • 16,522,438 citations for all granted patents applied from 1975 through 1999.
Descriptive Statistics 1975 - 2002 • 2,493,610 patents • 2.05 inventors per patent • 2,299,579 unique inventors
Descriptive Statistics * 3 percent(64,507) of Ph.D. or equivalent degree holders
Number of Patents Granted by Year of Application * Grant lag - 97 % of patents are granted within the first 4 years of the applications date (Hall, Griliches, and Hausman 1986)
International Knowledge Flow • Trends in U.S. firms’ access to the researchers with overseas research experience • Role of research personnel as a pathway for the diffusion of ideas from foreign to U.S. • The firm-level determinants of accessing innovations developed overseas.
Inventors with Foreign Experience in US Domestic Patents † Resided in foreign countries in the previous 10 years
Determinants of Citation to Foreign-Assigned Patents Dependent variable = logit transform of CITE_FRGN Note: Rows show the estimated coefficient and the t statistic for each regressor. The result for a constant term is suppressed. The t statistic is based on the Huber-White sandwich estimator of variance.
Conclusion • An increase in the extent that U.S. innovators access researchers with foreign R&D experience in recent years • An increase in U.S. firms’ employment of foreign-residing researchers; • The fraction of research-active U.S. residents with foreign research experience appears to be falling • Possibly to capture the geographically dispersed knowledge spillovers. • Having researchers with research experience abroad seems to facilitate access to foreign produced knowledge. • In the semiconductor industry smaller firms and older firms are more likely to make use of the output of non-U.S. R&D. • In the pharmaceutical industry, younger firms are more likely to make use of the output of non-U.S. R&D.
Future Extension • The consequences of the mobility of R&D personnel on firm R&D. • The impact of the arrival of a researcher with a particular set of R&D experiences on the character and quantity R&D done by a firm • The importance of inter-firm mobility for technological diffusion. • How firms organize the R&D enterprise, the extent of collaboration among scientists geographically dispersed, and the extent of interaction among scientists with different backgrounds.