Andrei Mogoutov, Alberto Cambrosio, Peter Keating & Philippe Mustar

6th Biennial International Triple Helix Conference on University-Industry Government-Links Singapore, May 16-18 , 2007 Biomedical innovation at the laboratory, clinical and commercial interface.Mapping research grants, publications and patents in the field of microarrays • Andrei Mogoutov, Alberto Cambrosio, Peter Keating & Philippe Mustar

Main goals of this paper: • To analyze biomedical innovation by triangulating three sources of information: publications, patents and research projects (see Verganti et al.) • In particular: to develop a methodology for linking publication, patent and project databases by using emergent (rather than pre-established) categories • Methods: • Heterogeneous network analysis (ReseauLu X2) • Text-mining (SPSS LexiQuest Mine)

Case study: Microarrays • A DNA microarray (a.k.a as biochip, DNA chip, gene array, etc.) is a collection of microscopic DNA spots, commonly representing single genes, arrayed on a solid surface by covalent attachment to chemically suitable matrices • Compared to previous molecular genetic approaches, a microarray experiment involves the simultaneous analysis of many hundreds or thousands of genes, as opposed to single ones • Microarrays have become a key technology of the (post)genomic era • Annual compounded growth rate of the microarray market between 1999-2004: 63%

Databases • Publications: • PubMed: robust keyword system; biomedical • Web of Science: addresses and citations; general S&T • [PubMed/WoS intersection] • Research Projects: • CRISP: NIH-financed projects; biomedical • [NSF] • Patents: • Derwent Innovation Index • [USPTO]/ [EUPTO]

1. Characterizing the field of microarrays

Publications (PubMed)

Publications (Web of Science)

Publications: most cited authors

Mapping: ReseauLu X2

Co-authorship network (most cited authors)

Collaborative institutional network

regulatory agency Institutional network (4 nearest nodes) biotech company hospital university

Journal inter-citation network (5 nearest nodes) cancer cluster

Patents (Derwent)

CRISP Projects

2. Database bridges • 2a. Via authors and pre-established (institutional) categories

CRISP projects by Institute

Link via authors Categories by Institutes CRISP projects vs. Publications

Link via authors Categories by Institutes CRISP projects vs. Citations

Link via authors Categories by Institutes CRISP projects vs. Patents

2. Database bridges • 2b. Via content (emergent categories)

Text mining: SPSS LexiQuest Mine and Text Mining Builder Dictionary interface Concept extraction

Methodology for generating emergent categories • The chosen database is text-mined (NLP software) to extract the relevant concepts (composite terms and uniterms): • in the present case, WoS was chosen over CRISP because it includes biomedical and non-biomedical domains • The most relevant (specific) concepts are selected by using a ChiSq filter • After building a co-occurrence map (nearest nodes), clusters corresponding to sub-domains are identified by a modified fuzzy K-means clustering algorithm • The list of concepts defining each sub-domain is used to analyze the other databases

Emergent sub-domains

Publications (WoS) by sub-domains

CRISP projects by sub-domains

Patents by sub-domains

% of sub-domains in projects, patents and publications

SNPs

Bioinformatics

Acknowledgments • Research for this paper was supported by grants from: • CIHR • FQRSC • SSHRC

Andrei Mogoutov, Alberto Cambrosio, Peter Keating & Philippe Mustar