1 / 64

Bibliometric [scientometric, webometric, informetric …] searching

Bibliometric [scientometric, webometric, informetric …] searching. Data used for assessing impact of scholarly output . tefkos@rutgers.edu ; http://comminfo.rutgers.edu/~tefko/. Central idea.

rene
Download Presentation

Bibliometric [scientometric, webometric, informetric …] searching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bibliometric [scientometric, webometric, informetric …]searching Data used for assessing impact of scholarly output tefkos@rutgers.edu; http://comminfo.rutgers.edu/~tefko/ Tefko Saracevic

  2. Central idea • Use of quantitative methods – statistics – to study & characterize recorded communication - ‘literature’ - of all kinds • In order to: • describe research output with various indicators & distributions • use in evaluating scholarly scientific performance • New tools increased & changed significantly role of searching & searchers Tefko Saracevic

  3. ToC Goals, definitions Reasons, applications – why? Data sources for bibliometric analyses Methods & measures – how? A sample of examples Implications for searching. Caveats Tefko Saracevic

  4. Bibliometrics, scientometrics, webometrics … 1. Goals, definitions Tefko Saracevic

  5. Metric studies • Applied in many fields: Sociometrics, Econometrics, Biometrics … • deal with statistical properties, relations, & principles of a variety of entities in their domain • Metric studies in information science follow these by concentrating on statistical properties & the discovery of associated relations & principles of information objects, structures, & processes Tefko Saracevic

  6. Goals of metric studies • To characterize statistically entities under study • more ambitiously to discover regularities & relations in their distributions & dynamics in order to observe predictive regularities & formulate laws • describe numerically, predict, apply • Same in information science • portray statistically entities under study: • literature, documents, … all kinds of inf. objects & processes as related to science, institutions, the Web … • but also people – authors • more recently: also scholarly productivity Tefko Saracevic

  7. Definitions • biblio derived from “biblion” Greek word for book • metrics derived from “metrikos” Greek word for measurement • Bibliometrics • “...the application of mathematical and statistical methods to books and other media of communication .” Alan Pritchard (1969) • “… the quantitative treatment of the properties of recorded discourse and behavior pertaining to it.” Robert Fairthorne (1969) Tefko Saracevic

  8. Definitions … morebut with differing contexts • Scientometricsbibliometric & other metric studies specifically concentrating on science • Informetricsstudy of the quantitative aspects of information in any form - broadest • Webometrics quantitative analysis of web-related phenomena • Cybermetricsquantitative aspects of information resources on the whole Internet • E-metricsmeasures of electronic resources, particularly in libraries For simplicity, we will use here bibliometrics to cover all Tefko Saracevic

  9. Why? What? What for? 2. Base, reasons, use Tefko Saracevic

  10. Based on what entities have & could be COUNTED • In documents (as entities): • authors • their institutions, countries • sources – e.g. journals • references – who & what is cited • age of references • & anything else that is countable • In Web entities • identifying relationships between Web objects • link structures • out-links • in-links • self-links • nodes, central nodes • in a way analogous to citations And derivation of structuresbased on any of these Tefko Saracevic

  11. A lot is based on citations • Citation analysis: • analysis of data derived from references cited in footnotes or bibliographies of scholarly publications Used to be just counts • Now it also leads to examination & mapping of intellectual impact of scholars, projects, institutions, journals, disciplines, and nations Becoming increasingly popular & widely used –with important implications for searching Tefko Saracevic

  12. Reasons for bibliometric studies • Understanding of patterns • discovery of regularities, behavior • “order out of documentary chaos” [Bradford, 1948] • Analysis of structures & dynamics • discovery of connections, relations, networks • search for regularities - possible predictions • Discovery of impacts, effects • relation between entities & amounts of their various uses • providing support for making of decisions, policies Tefko Saracevic

  13. Major branches of bibliometrics Relational Evaluative Newer – impacts, effects where bibliometrics became a big deal in many arenas Data from what was observed but looking for measures of impact, prominence, ranking, bang Discovers who’s up & how much up Used for decisions, policies • Older - patterns, structures, relations, mappings • where bibliometrics started • Data on what was observed • e.g. no. of articles/citations by/to an author; no. of journals with articles relevant to a topic; no. of articles/citations in/to a journal … • Used for description, mapping of relations & prediction Tefko Saracevic

  14. Seeking …Thelwall (2008) Relational Evaluative Evaluative bibliometrics seeks to assess the impact of scholarly work, usually to compare the relative scientific contributions of two or more individuals or groups • Relational bibliometrics seeks to illuminate relationships within research, such as the cognitive structure of research fields, the emergence of new research fronts, or national and international co-authorship patterns Tefko Saracevic

  15. Major approaches Empirical Theoretical Building of generalized models, theories often mathematical, abstract becoming highly specialized We will NOT pursue this here but you should be aware that there are a lot of theoretical efforts • Collection & study of data • establishment of measures • statistical & graphic analyses • We will pursue some of these here • concentrate on empirical Tefko Saracevic

  16. Users Relational Evaluative – new audience Library managers Analysts University administrators (deans, provosts) Directors of institutional research National governments & ministries Grant & funding agencies  • Mostly scholars • Mostly research oriented • But also librarians for decisions • e.g. on collections, purchase, weeding Tefko Saracevic

  17. Used in a variety of functions & areas • In collection development identifying the most-useful materials: by analyzing circulation records; journal / e-journal usage statistics; etc. • In information retrieval identifying top-ranked documents, authors: those most highly-cited; most highly co-cited; most popular; etc. • In the sociology of knowledge identifying structural and temporal relationships between documents, authors, research areas, universities etc. • In policy making justifying, managing or prioritizing support for course of action in a number of areas – e.g. science policy, institutional policy Tefko Saracevic

  18. Use of evaluative bibliometrics • Academic, research & government institutions for: • promotion and tenure, hiring, salary raising • decisions for support of departments, disciplines • grants decision; research policy making • visualization of scholarly networks, identifying key contributions & contributors • monitoring scholarly developments • determining journal citation impact • Resource allocation: • identifying authors most worthy of support; • research areas most worthy of funding • journals most worthy of support or purchase; etc. Tefko Saracevic

  19. Major bibliometric factors for evaluation of academic performance For individuals For institutions Total no. of publications Total no. of citations Various ratios - per faculty, project … • Number of publications in peer reviewed journals • The impact factor of those journals • The h-index Tefko Saracevic

  20. Impact indicators and studies • Several governments mandate citation analysis to • asses quality of research and institutions • inform decisions on support • determine support for journal • rank institutions, programs, departments, projects • Many institutions practice it regulalry Tefko Saracevic

  21. Where does stuff for analysis come from? 3. Data sources for bibliometric analyses Tefko Saracevic

  22. Main sources for bibliometric analyses • Bibliographies, indexes • once popular, not any more • once done manually - limited • Documents in databases • computerization enabled wide collection of data & development of new methods • Science statistics • And then there are citations • as they become automated use of bibliometrics exploded • Web & Internet • mining connections & other networked aspects • but also applying some older methods to new data Tefko Saracevic

  23. Institute for Scientific Information(ISI, now Thomson Reuters) • ISI launched in 1962 by Eugene Garfield • started by publishing Science Citation Index (SCI) & later Social Science Citation Index (SSCI) and Arts & Humanities Citation Index (A&HCI) [all still in Dialog] • these morphed into Web of Science (WoS) • All only cover an ISI selected set of journals • thus all citation results & studies are based on that set of journals, not the universe of journals and books, but the citations themselves are to whatever is cited • true of any database – Scopus, Google Scholar etc. Tefko Saracevic

  24. Impact of ISI citation databases • Major source for bibliometric analysis • Revolutionized use of citations • e.g. easy citation counts, tracing, establishment of connections … became possible • Provided data for new types of analysis • e.g. mapping of fields, identifying research fronts • Laid base for evaluative bibliometrics • Instigated new types of searching • above & beyond subject searching Tefko Saracevic

  25. Expansion of citation data sources • Starting in early 2000s citation data are being offered by a number of databases other than Web of Science, most notably • Scopus • Google Scholar • and a host of others • This expanded dramatically availability of data & types of analyses • a number of innovations were introduced • use of such data also expanded • Challenge to WoS databases Tefko Saracevic

  26. Connections • Data from relational bibliometrics is used for sorting, ranking, mapping … in evaluative bibliometrics • Raw data obtained from relational analyses is then “milked” in many ways • often combined with other data • e.g. ranked citation counts and financial data, enrollment data … Tefko Saracevic

  27. 4. Methods & measures – how? Tefko Saracevic

  28. Overview • A few older bibliometric laws & methods: • Lotka’s law • deals with distribution of authors in a field • Bradford’s law • deals with distribution of articles relevant to a subject across journals where they appear • From citations: • citation age (or obsolescence) • co-citation • clustering & co-citation maps • bibliographic coupling • journal impact factor • self citation (auto-citation) • & many more. Tefko Saracevic

  29. Lotka’s law (1926) – papers & authorsAlfred Lotka (1880-1949, American mathematician, chemist and statistician) Formal English A large proportion of the total literature in a field is authored by a small proportion of the total number of authors, falling down regularly, where the majority of authors produce but one paper e.g. for 100 authors, who on average each wrote one article each over a specific period, we have also 25 authors with 2 articles (100/22=25), 11 with 3 articles (100/32 ≈ 11), 6 with 4 articles (100/42 ≈ 6) etc. Number of authors who had published n papers in a given field is roughly 1/n2 the number of authors who had published one paper only Tefko Saracevic

  30. Bradford’s law (1934) – papers & journalsSamuel C. Bradford (1878-1948, British mathematician and librarian) Formal English Basically states that most articles in a subject are produced by few journals (called nucleus) and the rest are made up of many separate sources that increase in numbers in a regular, exponential way Like Lotka’s law this is a law that generally follows laws of diminishing returns If scientific journals are arranged in order of decreasing productivity of articles on a given subject, they may be divided into a nucleus of periodicals more particularly devoted to the subject and several groups or zones containing the same number of articles as the nucleus, when the numbers of periodicals in the nucleus and succeeding zones will be as a : n : n2 : n3 n is called Bradford multiplier Tefko Saracevic

  31. Bradford’s law: How he did it? • He grouped periodicals with articles relevant to a subject (from a bibliography) into 3 zones in order of decreasing yield • from journals with largest no. of articles to those with smallest; at the end are journals with one article each on the subject • Each zone had the SAME number of articles but different no. of journals • The number of journals in each zone increases exponentially • e.g. if there are 5 journals in the first zone that produced 12 relevant articles; there may be 10 journals in the second zone for next 12 articles & 20 for next 12 – Bradford multiplier (n) found here is 10/5=2 Tefko Saracevic

  32. Cited half-life Formal English How far back in time one must go to account for one half of the citations a journal receives in a given year e.g. if in 2008 the journal XYZ has a cited half life of 7.0 it means that articles published in XYZ between 2002 to 2008 (inclusive) account for 50% of all citations to articles from that journal (anyplace) in 2008 • Definition: the number of years that the number of citations take to decline to 50% of its current total value Tefko Saracevic

  33. Citing half-life Formal English A measure of how current (or how old) are the references cited in a journal e.g. if in 2008 for journal XYZ citing half life was 9.0 it means that 50% of articles cited (references) in XYZ were published between years 2000 and 2008 (inclusive) • Definition: the median age of all cited articles in the journal during a given year Tefko Saracevic

  34. Co-citation a popular similarity measure between two entities Formal English As of 2.: How often are two authors cited together If author A and B are both cited by C, they may be said to be related to one another, even though they don’t directly reference each other if A and B are both cited by many other articles, they have a stronger relationship. The more items they are cited by, the stronger their relationship is The frequency with which two items of earlier literature are cited together by the later literature • frequency with which two documents are cited together, or • frequency with which two authors are cited together irrespective of what document Tefko Saracevic

  35. Use of co-citation • Co-citation is often used as a measure of similarity • if authors or documents are co-cited they are likely to be similar in some way • This means that if collections of documents are arranged according to their co-citation counts then this should produce a pattern reflecting cognitive scientific relationships • Author co-citation analysis (ACA) is a technique in that it measures the similarity of pairs of authors through the frequency with which their work is co-cited • These are then arranged in maps showing a structure of an field, domain, area of research … Tefko Saracevic

  36. Map of Author Co-citation Analysis of information science Zhao & Strotmann (2008) Tefko Saracevic

  37. Bibliographic coupling Formal English Occurs when two works reference a common third work in their bibliographies e.g. If in one article Saracevic cites Kantor, P. &in another article Belkin cites Kantor. P., but neither Saracevic or Belkin cite each other in those articles then Saracevic & Belkin are bibliographically coupled because they cite Kantor • Links two items that reference the same items, so that if A and B both reference C, they may be said to be related, even though they don't directly reference each other. The more items they both reference in common, the stronger their relationship is • It is backward chaining, while co-citation is forward chaining Tefko Saracevic

  38. Journal Impact Factorin Journal Citation Reports(JCR) Formal English Measures how often articles in a specific journal have been cited a Journal Impact Factor for journal XYZ of 2.5 means that, on average, the articles published in XYZ one or two year ago have been cited two and a half times How to use Journal Citation Reports The average number of times articles from the journal published in the past two years have been cited in the JCR year. The number of citations published in the year X to articles in the journal published in years X − 1 and X − 2, divided by the number of articles published in the journal in the years X − 1 and X − 2. Tefko Saracevic

  39. h-index - Hirsch (2005) Formal English Number of papers a scientist has published that received the same number of citations I published (as listed in Scopus): 74 articles 31 of which were considered for h-index (their criteria) of these 15 were cited at least 15 times others were cited less my h-index is 15 • For a scientist, is the largest number h such that s/he has at least h publications cited at least h times & the other publications have less citations each • it is more than a straight citation count because it takes into account BOTH: number of publications one had AND number of citations one received Tefko Saracevic

  40. h-index differences • There are differences in typical h values in different fields, determined in part by • the average number of references in a paper in the field • the average number of papers produced by each scientist in the field • the size (number of scientists) of the field • Thus, comparison of h-indexes of scientists in different fields may not be valid • Keep it to the same field! • e.g. h indices in biological sciences tend to be higher than in physics Tefko Saracevic

  41. Citation frequency: citations are skewedResearch front • A few articles are cited a lot, others less, a lot very little or not al all • 80-20 distribution: 20% of articles may account for 80% of the citations • from 1900-2005, about one half of one percent of cited papers were cited over 200 times. Out of about 38 million source items about half were not cited at all.(Garfield, 2005) • This led to identifying of a “research front” • cluster of highly cited papers in a domain • showing also links among the highly cited papers in form of maps • indicating what papers are frequently cited together i.e. co-citated • For searchers: identifying current & evolving research fronts in a domain Tefko Saracevic

  42. Aggregate article & citation statistics • Derived from citation databases • combined statistics for a variety of entities • “Milked” in great many, even ingenious ways • e.g. a major component in ranking of universities (shown later) • The number of citations to all articles in a • journal (base for Journal Impact Factor) • or all articles or citations received by • author • research group • institution • country Tefko Saracevic

  43. 5. A sample of examples Tefko Saracevic

  44. Scopus citation tracking for an author Tefko Saracevic

  45. Scopus journal analyzer -three journals selected for comparisoncould be further analyzed by tabs or listed in a table Tefko Saracevic

  46. Web of Science citation report for an author Tefko Saracevic

  47. Web of Science Journal Citation Report for three journals Tefko Saracevic

  48. Histogram for JASIST using Garfield's HistCite LCS= Local Citation Score; count of how much cited in JASIST GCS=Global Citation Score; count of how much cited in all journals in WoS LCR=Local Cited References; how many references from JASIST NCR=Number of Cited References; how many references in the paper Tefko Saracevic

  49. WoS: Essential Science Indicators Tefko Saracevic

  50. WoS: Incites Tefko Saracevic

More Related