1 / 18

Content Mining a short introduction to practices and policies

Content Mining a short introduction to practices and policies. Summary of a study for the Publishing Research Consortium into Journal Article Mining, By Eefke Smit, Maurits van der Graaf (2011) Full study available on PRC website. Let ’ s start with a potential user (1). Use-case-1:

Download Presentation

Content Mining a short introduction to practices and policies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Content Mininga short introduction to practices and policies Summary of a study for the Publishing Research Consortium into Journal Article Mining, By Eefke Smit, Maurits van der Graaf (2011) Full study available on PRC website

  2. Let’s start with a potential user (1) Use-case-1: keeping up-to-date Prof. Joost Verhaagen PhD, Netherlands Institute for Neuroscience, Amsterdam • Since 1982: • 90,000 journal articles on neuroregeneration (e.g. spinal cord injury) • New articles: • on average 22 journal articles per day on neuroregeneration

  3. Let’s start with a potential user (2) Use-case-2: Information needed as result of laboratory experiments Prof. Joost Verhaagen PhD, Netherlands Institute for Neuroscience, Amsterdam • Which molecules do play a role in this process? • Typical outcome of an experiment: hundreds of molecules show enhanced activity • Next step: how to filter out the relevant molecules? • ‘You would like to have for each of these molecules a meta-analysis about what is already known about these molecules in other processes’

  4. The essence of TDM is: So much information to analyze: Can a machine do this for him ?

  5. What TDM looks like: Textmining tool for semantic search byPubTator, seehttp://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/PubTator/tutorial/index.html

  6. Typical text mining consists of • Processing large corpora of text in an automated way • To identify entities, instances, actions, relationships and patterns and do further analysis (e.g. on assertions or sentiments) • Examples: genes, proteins, gene-disease patterns, compound properties, chemical structures, side effects of drugs • Text mining output typically consists of a new metadata layer for the information: • Article clusters and categorisations, indexes • Topical maps, to show the occurrence of topics and their inter-relationships • Databases with facts, patterns, relationships, statements, assertions, properties found in the articles, • Visualisations like graphs, mappings, plot-graphs and topical maps

  7. Study commisioned in 2011 bythe Publishing Research Consortium • Authors: • Eefke Smit, Intn’l Assoc of STM Publishers • Maurits van der Graaf, Pleiade Management & Consultancy • Two parts: • Qualitative study: • 29 interviews with experts in academia, research, libraries, vendors and publishers • Quantitative study • Survey among publishers (members Crossref & STM) • 190 responses • Full report on PRC website www.publishingresearch.net • Article in the 1st issue of 2012 of Learned Publishing

  8. Optimists and Pessimists on TDM Optimistsabout TDM: • Vast digital corpus available and growing • More and more application areas (business, legal, social, etc) • Tools improvingfast • Manualworkreduced • Public domain or domain precision • Processing power less of a problem, analytical tools better, visualisationadds to analysis Skepticsabout TDM: • Has alwaysover-promised • Only in specializedfields • Tools stillcomplicated • Manualcurationnecessary • High investments • Domain dependent • No commondictionary • Overambition in the promise of knowledgediscovery

  9. Publishers are optimistic:Opinions/ expectations for Content Mining in the next 3 years

  10. Publishers are optimistic, continued:Opinions/ expectations for Content Mining on scholarly content in the next 3 years

  11. …but publishers do not yet get many mining requests from 3rd parties:

  12. Publishers are liberal in allowing mining:How case-by-case requests are treated

  13. …and plan more mining themselves:for retrieval and navigation

  14. Cross-sector solutions to facilitate Content Mining better Suggestions made by experts during the interviews: • Standardization of Content Formats • One Content Mining platform • Commonly agreed access and permission terms • One window for mining permissions • Collaboration with national libraries (ad 3: most interviewed experts do NOT see Open Access as a related issue; access terms also relate to datafile delivery or mining on the platform itself)

  15. Survey results for the 5 suggestions for cross-sector solutions

  16. Standardisation best prefered,of content formats and of API’s Top 3 for all Respondents: • Standardisation of Formats • One Mining Platform • Agreed Permission Terms Top 3 for Experts only: • Standardisation of Formats • Agreed Permission Terms • One Mining Platform Experts believe less in one platform and support standardisation even stronger, not just for content, also for APIs:

  17. Progress since 2011 on collective solutions facilitating TDM: • Commonly agreed permission terms: • STM standard clause for TDM on subscribed content for non commercial TDM • STM standard clause for pharma-TDM via Pharma-Documentention-Ring (PDR) • One window for mining permissions across publishers: • PLS/ CLA prototype, sepcially serving small and medium sized publishers • Obtain minable content in one place and provide standardized APIs and content formats: • CCC content mining platform • Crossref project-Prospect: single point web access for multiple publishers

  18. Questions ? Eefke Smit Director Standards and Technology International Association of STM Publishers smit@stm-assoc.org

More Related