Central R&D assessment indicators: Scientometric and Webometric Methods

Central R&D assessment indicators: Scientometric and Webometric Methods Peter Ingwersen Royal School of LIS - 2010 Denmark – pi@iva.dk http://www.iva.dk/pi Oslo University College, Norway

Agenda • Scientific Communication: • Classic & present models • Scientometrics: • Publication Analyses • Publication Point evaluation (’Norwegian Model’) • Citation Analyses • Crown Indicators (research profile weighting) • Hirsch Index (h-Index) • Webometrics • Web Impact Factors; Issue tracking - mining • Concluding remarks 2011

Peers TechnicalResearch report • Un-published • non-peer review • informal com. Archive Library index Conf. Papers (Peer reviewed) Journal Articles (Peer reviewed) Research idea & activities Domain databases Citation database Time Scientific communication 1 – Classic Model (prior to Web / open access) 2011

Scientific communication 1 – Present Model (incl. Web / open access) Peers • TechnicalResearch reports • Working • papers • Un-published • public • Non Peer review Inst. Repositories Open access journals Full text Domain database Journal Articles (Peer reviewed) Conf. Papers (Peer reviewed) Research idea & activities - Web of Science - Scopus Google (Scholar) Academic Web Search Engines Time 2011

Confidence in information source? Authoritative source Student output Open Access - Journals (peer reviewed) - Inst. Repositories (Duplicates/versions) Restricted Access - Journal articles (peer reviewed) Working papers Research reports Blogs … Conference Papers Posters, Abstracts (peer reviewed) Collaboratory round tables Restricted Access - Research Monographs (peer reviewed) Teaching material Searchable on Open Web Partly searchable on Open Web Qualified knowledge source (Domain dependent ) Scientific communication 2 – What ’is’ scientific information? 2011

Examples of Publication analysis • Ranking most productive • Countries in a field • Journals in a field • Institutions or universities; departments or goups • (Exponential Bradford-like distributions) • Counting scientific publications in • Academic fields / disciplines • Countries, regions, universities, departments Counting number of articles OVER TIME • Time series 2011

Typical time series 1981-2005 2011

Productivity Growth 2011

Publication Growth – all fields 1981-20061981-85 = index 1: China=14,114 p.; EU=736,616 p; USA=887,039 p. India = 65,250 (98,598) publ. 2011

Publication success ‘points’ • As done in Norway: • Articles from the journals considered among the 20 % best journals in a field: 3 points • Articles from other (peer reviewed) journals: 1 point • Conference papers (peer reviewed): .7 points • Monographs (int.publisher): 8 points • Monographs (other publishers): 5 points • Fractional counts; points used for funding distribution • Covers all research areas, incl. humanities, for all document types 2011

One cannot use the publication points for DIRECTcomparison • Between universities or countries • Or applied to individual researchers • Recent detailed article on the system: • Schneider, J.W. (2009) An outline of the bibliometric indicator used for performance-based funding of research institutions in Norway. European Political Science, 8(3), p. 364-378. India 2010

However: Publication Point Indicators established! • Elleby, A., & Ingwersen, P. Publication point indicators: A comparative case study of two publication point systems and citation impact in an interdisciplinary context. Journal of Informetrics, 4(2010): 512-523. • doi:10.1016/j.joi.2010.06.001 India 2010

Publication Point Indicators 2 • Comparing the vectors of ideal cumulated PP (= expected success gain) with the actually obtained PP, for the same publication (types), providing a ratio that can be normalized: nPPI:The normalized Publication Point Index • Comparisons between institutionas can be done at specific ranges of publication vector values through their nPPI. India 2010

Cumulated Publ. Point Indicatorthe DIIS example (n=70) Tutorial 2011

Citation Analyses • Diachronic (forward in time) … or Synchronous (back in time – like ISI-JIF) • Observing: how older research is received by current research (ISI+Scopus: always peer reviewed sources) • Citation indicators: • Time series (like for publications) • Citation Impact (Crown Indicators) • Citedness 2011

Absolute Citation Impact 2011

‘Crown indicators’ • Normalized impact-indicators for one unit (center/university/country) in relation to research field globally: • JCI : Journal Crown Indicator • FCI : Field Crown Indicator – both provide an index number 2011

Journal Crown Indicator • The ratio between: - the real numberof citations received for all journal articles in a unit from a year, and - the diachronic citation impact of the same journals used by the unit, covering the same period (= the expected impact). ONE WOULD NEVER APPLY THE ISI-JIF!! Since it only signifies the AVERAGE (international) impact of an article made in a synchronous way 2011 2010 2009 2007 2008 2011

Journal Impact Factor - ISI • Synchroneous method: • For 2010: Analysis done in Febr-April, 2011 for … 1) all citations given in 2010 to journal X for articles+notes+letters in journal X, 2) Published in previous two years: 2008-2009 2011 2010 2009 2007 2008 2011

Field Crown Indicator - FCI • Normalisationmust be weighted in relation to the observed unit’s publication profile: • Like a ’shadow’ unit (country) • An example of this weighting for India: 2011

Research profile as weight for impact calculation (as ‘shadow country’) 2011

Research profile (China) as weight for impact calculation (as ‘shadow country’) 2011

A small European country with very different profile 2011

Example of research profile with FCI-index score 2011

Summary: Different indicators – one given period • Σc/Σp / ΣC/ΣP – Globally normalized impact: • For single fields it is OK to use! • If averaged over all subject areas: quick’n dirty!: all areas have the same weight! – thus: • Σc / Σ(C/Parea x parea ) = FCI: StandardField Crown Indicator (FCI) for ’profile’ of subject areas for a local unit (country/university) – via applying it as global profile, like a kind of ’shadow unit’. Made as ratio of sums of citations over publications (weights) (If done as sum of rations divided by fields: all fields equal) 2011

Ageing of journals or articles • Cited half-life - diachronic: • Acumulate citations forward in time by year: • 1990 91 92 93 94 95 96 97 98 99 00 01 02 - yrs • 2 12 20 25 30 17 12 10 0 3 10 0 - Citations Acum: 2 14 34 59†89106 118 128 128 131 132 • 1/2 life= 132/2 = 66 = ca. 4,2years Tutorial 2011

Ageing of journals or articles – 2 Tutorial 2011

Hirsch Index (2005) • A composite index of publications and citations for a unit (person, group, dept. …) in a novel way: • H is the number of articles given a number of citations larger or equal to h. • A person’s h-index of 13 implies that he/she among all his/her publications has 13, that at least each has obtained 13 citations. • The index is dependent on research field and age of researcher. Can be normalized in many ways. Tutorial 2011

Criticism of Citation Analyses • Formal influences not cited • Biased citing • Informal influences not cited • Self-citing – may indeed improve external cits.! Different types of citations – • Variations in citation rate related to type of publication, nationality, time period, and size and type of speciality – normalization? • Technical limitations of citation indexes and domain databases • Multiple authorship – fractional counting/article level Tutorial 2011

Reply: van Raan (1998) • Different biases equalizes each other • If researchers simply do most of their referencing ”in a reasonable way” and if a sufficient number of citations are counted, realiable patterns can be observed. • It is very unlikely that all researchers demonstrate the same biases (e.g. all researchers consciously cite research, which does not pertain to their field) Tutorial 2011

Google Scholar • Does not apply PageRank for ranking but citations • Contains conference papers and journal articles (??) • Workable for Computer Science and Engineering (and Inf. Sc.) • Requires a lot of clean-up! • Apply http://www.harzing.com/pop.htmfor (Publish or Perish) for better analysis on top of GS • Google Scholar may provide the h-index for persons 2011

L. Björneborn & P. Ingwersen 2003 infor-/biblio-/sciento-/cyber-/webo-/metrics informetrics bibliometrics scientometrics cybermetrics webometrics 2011

L. Björneborn & P. Ingwersen 2003 Link terminologybasic concepts A B E G C D F • B has an outlink to C; outlinking : ~ reference • B has an inlink from A; inlinked : ~ citation • B has a selflink; selflinking : ~ self-citation • A has no inlinks; non-linked: ~ non-cited • E and F are reciprocally linked • A is transitively linked with H via B – DH is reachable from A by a directed link path • A has a transversal link to G : short cut • C and D are co-linked from B, i.e. have co-inlinks orshared inlinks: co-citation • B and E are co-linking to D, i.e. have co-out-links orshared outlinks: bibliog.coupling H co-links 2011

2011

www.internetworldstats.com 2011

Search engine analyses • See e.g. Judith Bar-Ilan’s excellent longitudinal analyses • Mike Thelwall et al. in several case studies • Scientific material on the Web: • Lawrence & Giles (1999):approx. 6 % of Web sites contains scientific or educational contents • Increasingly:the Web is a web of uncertainty • Allen et al. (1999) – biology topics from 500 Web sites assessed for quality: • 46 % of sites were ”informative” – but: • 10-35 % inaccurate; 20-35 % misleading • 48 % unreferenced 2011

The Web-Impact Factor Ingwersen, 1998 • Intuitively (naively?) believed as similar to the Journal ImpactFactor • Demonstraterecognition by other web sites - or simply impact – notnecessarilyquality • Central issue: are web sites similar to journals and web pages similar to articles? • Arein-linkssimilar to citations – orsimplyroadsigns? • What is really calculated? • DEFINE WHAT YOU ARE CALCULATING: site or page IF 2011

The only valid webometric tool: Site Explorer Yahoo Search … • If one enters (old valid) commands like: • Link:URL or Domain: topdomain (edu, dk) or Site:URL you are transferred to: http://siteexplorer.search.yahoo.com/new/ • Or find it via this URL • The same facilities are available in click-mode, as one starts with a given URL: • Finding ‘all’ web pages in a site • Finding ‘all’ inlinks to that site/those pages • Also without selflinks! – this implies … Ingwersen

… to calculate Web Impact Factors • But one should be prudent in interpretations. • Note that external inlinks is the best indicator of recognition • Take care of how many sub-domains (and pages) that are included in the click analysis. • Results can be downloaded Ingwersen

Possible types of Web-IF: • E-journal Web-IF • Calculated by in-links • Calculated as traditional JIF (citations) • Scientific web site – IF (by link analyses) • National – regional (some URL-problems) • Institutions – single sites • Other entities, e.g. domains • Best nominator: no. of staff – or simply use external inlinks 2011

Web-links like citations? • Kleinberg (1998) between citation weights and Google’s PageRank: Hubs~ review article: have many outlinks (refs) to: Authority pages~ influential (highly cited) documents: have many inlinks fromHubs! Typical: Web index pages =homepage with self-inlinks = Table of contents 2011

Reasons for outlinking … • Out-linksmainly for functionalpurposes • Navigation– interestspaces… • Pointing to authority in certaindomains? (Latour:rhetoricreasons for references-links) • Normativereasonsfor linking? (Merton) • Dowehavenegativelinks? • Wedohavenon-linking(commercialsites) 2011

Some additional reasons for providing links • In part analogous to providing references (recognition) • And, among others, • emphasising the own position and relationship (professional, collaboration, self-presentation etc.) • sharing knowledge, experience, associations … • acknowledging support, sponsorship, assistance • providing information for various purposes (commercial, scientific, education, entertainment) • drawing attention to questions of individual or common interest and to information provided by others (the navigational purpose) 2011

Other differences between references, citations & links • The time issue: • Agingof sources are different on the Web: • Birth, Maturity & Obsolescence happens faster • Decline & Death of sources occur too– but • Mariages – Divorse – Re-mariage – Death & Resurrection … & alike liberalphenomena are found on the Web! (Wolfgang Glänzel) 2011

Issue tracking – Web mining • Adequatesamplingrequiresknowledge of the structure and properties of the population - the Web space to besampled • Issuetrackingof knownproperties / issuesmay help • Web mining the unknown is moredifficult, due to • the dynamic, distributed & diversenature • the variety of actors and minimum of standards • the lack of qualitycontrol of contents • Web archeology – study of the past Web 2011

Nielsen Blog Pulse • Observes blogs worldwide by providing: • Trend search – development over time of terms/concepts – user selection! • Featured trends – predefined categories • Coversation tracker – blog conversations • BlogPulse profiles – blog profiles • Look into: http://www.blogpulse.com/tools.html 2011

Home > ToolsTrend Search 2011

Concluding remarks: Future • With open access we can foresee a nightmare as concerns tracking qualified and authoritative scientific publications, aside from the citation indexes because of • Lack of Bibliographic control (what is original – vs. parallel and spin-off versions & crab?) over many institutional repositories – and mixed on the web with all other document types incl. Blogs (web 2.0) – Google Scholar(?) … Google Books (?) 2011

Concluding remarks • One may be somewhat cautious on Web-IF applications without careful sampling via robotsdue to its incomprehensiveness and what it actually signifies • One might also try to investigate more the behavioural aspects of providing and receiving linksto understand what the impact might mean and how/whylinksare made • Understand the Web space structurebetter • Design workable robots, downloading & local analyses 2011

References • Allen, E.S., Burke, J.M., Welch, M.E., Rieseberg, L.H. (1999). How reliable is science information on the Web? Science, 402, 722. • Björneborn, L., Ingwersen, P. (2004). Towards a basic framework for webometrics. Journal of American Society for Information Science and Technology, 55(14): 1216-1227. • Brin, S., Page, L. (1998). The anatomy of a large scale hypertextual web search engine. Computer Networks and ISDN Systems, 30(1-7), 107-117. • Elleby, A., Ingwersen, P. Publication Point Indicators: A Comparative Case Study of two Publication Point Systems and Citation Impact in an Interdisciplinary Context. Journal of Informetrics, 2010, 4, p. 512-523. • Hirsch, J.E. (2005): An index to quantify an individual’s scientific research output. PNAS, 102: 16569-16572. 2011

Central R&D assessment indicators: Scientometric and Webometric Methods