160 likes | 312 Views
Detection of different types of bibliometric performance at the individual level in the Life Sciences: methodological outline. Rodrigo Costas & Ed Noyons CWTS – Leiden University, the Netherlands. Outline. Introduction Main objective Methodological development Some results
E N D
Detection of different types of bibliometric performance at the individual level in the Life Sciences: methodological outline Rodrigo Costas & Ed Noyons CWTS – Leiden University, the Netherlands
Outline • Introduction • Main objective • Methodological development • Some results • Conclusions and further research
Introduction • Individual scholars: nuclear in science but difficult to measure (evaluate) • Warnings of misuse of bibliometrics at individual level • Glänzel& Wouters (2013) The dos and don’ts in individual level bibliometrics • Problems of bibliometrics at individual level: • Difficult data collection • Importance of multidimensionality and contextualization • Lack of reliability of indicators • Main considerations: • Don’t use only single indicators (multidimensionalize!) • Don’t use them alone (contextualize! peer review!) • Don’t consider only raw scores (cluster! allow ties! No ranks!)
What can we do with bibliometrics at the individual level? • To describe bibliometrically the activity of individual scholars • Who (how many) is active in a field or in a topic? • How people collaborate or organize in groups? Who could be interesting partners for collaboration in a topic? • Mobility? • To inform types of bibliometric performance • What type of performance do individuals exhibit bibliometrically? • Top producers, selective researchers, hubs, etc.
Main objective • Bibliometrically… • To identify active scholars all over the world active in the Life Sciences • To model different types of scientific performance based on bibliometric indicators • … and they must be Dutch or Belgian
Delineation of the LS core (worldwide) • Consideration of paper-based CWTS classification (Waltman & van Eck, 2013) meso-fields • Input from experts (Crucell): • 373 ‘meso fields’ selected by the experts as the ‘core’ of LS • 8,139,922 publications (41% of the whole database!) • Period of time for the LS core: 1993-2012
CWTS author disambiguation algorithm (Caron & van Eck, 2013) • Applied to the whole database (1980-2012) • Main characteristics • Based on : • Co-authorship, references, addresses, journals, etc. • Rules • Other refinements • Conservative approach • Preliminary results: 95% precision and 90% recall • Total ‘unique’ authors identified: 34,697,674
Selection of LS researchers (worldwide) • 10,008,311 unique disambiguated authors! • 66% of them have only 1 publication • 14% have 5 or more publications (1,388,080 authors) • Collection of their ‘full oeuvres’ (rest of publications outside the LS ‘core’) – period 1980-2011 • Final selection of researchers with: • >50% of their output in LS core and focusing period 1993-2011 Final set of researchers: 1,309,458 This will be our “context”!
Identification of Dutch/Belgian authors ‘Certain linkages’ of authors with NL/BE • E-mail (.nl, .be); Only 1 country (NL or BE); Corresponding author; WoS direct link Author/Address, 1st Author – 1st Address Strong linkages (>10%) • Calculation of the MCAD and MPRAD
Modeling performance: basic approach I • Defining types of performance: • 3 ‘performance dimensions’ (multidimensional approach) : • P: total number of publications • PP top 10%: proportion of pubs. in the top 10% • MNJS: mean normalized journal score • Calculated for all the LS authors worldwide (1,309,458): percentiles 25 and 50(classificatory approach) • Time • Full period (1980-2011) • Cohort of ‘scientific age’ - 2000-2011
Modeling performance (suggestions) P PPtop10% MNJS Highest ‘High impact’ (‘High potential’) ‘Top producers’ ‘Top toppers’ Lowest
Results • Presence of types of performance worldwide: • 1) ‘Top toppers full period’ • 58073 (4%) • 2) ‘Top producers full period’ (they are all include in 1) • 327375 (25%) • 3) ‘High impact full period’ • 91111 (7%) • 4) ‘Top toppers cohort’ • 24963 (4%) • 5) ‘Top producers cohort’ • 153593 (25%) • 6) ‘High potential cohort’ - 25213 (4%)
Conclusions • Advantages of this approach: • Robust field delineation • Broad scale of the analysis at the individual level (international analysis) • Individual level analysis: • Multidimensional approach • Contextual analysis at the international level • Lower importance of raw scores and classificatory approach • Expansion of the analytical possibilities of bibliometric performance: bottom up approaches
Challenges • Data quality (author name disambiguation, linkages authors-addresses, etc.) • Only bibliometric performance as covered in the Web of Science! • Only scientific production is considered; other activities (teaching, managing, etc.) are not considered • Conceptual problems and further developments: • Thresholds (percentiles) bootstraping? • Age of scholars not known, personal situation, etc. analysis by cohorts? Gender? • Limitations of citations Altmetrics? Acknowledgements?