1 / 7

Current work on CitEc

This project focuses on improving data harvesting and parsing methods for citation pattern analysis. It involves extracting full-text documents, converting them to ASCII format, and parsing reference sections to study citation patterns effectively. The goal is to enhance user services and logging/registration based on data insights.

ushera
Download Presentation

Current work on CitEc

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Current work on CitEc José Manuel Barrueco Cruz http://www.uv.es/~barrueco Thomas Krichel http://openlib.org/home/krichel

  2. Data • Papers from RePEc dataset • 31139 Working Papers • 15145 Journal Articles • all of them available online, not all are free • More than 90% of them are in PDF or PostScript formats

  3. Harvesting • Perl script that: • Reads the RePEc data • Downloads the documents full text • Converts them to ASCII (using pstotext) • Tries to find a Reference section

  4. Test on 1000 documents • 13% are not found in the URL specified • 3% are not it PDF or PS • 15% give errors in the pstotext conversion • 9% are converted but a reference section can not be found • 60% were successfully converted

  5. Parsing problems of CiteSeer • Publication date. When a reference contains more than one year it is discarded • Source of publication, i.e. working papers series or journals titles is not parsed be CiteSeer. We will need to add code with a list of all journals and working paper series.

  6. To do • Study of citation patterns • Use of data in user services • Use of data in logging and registration services

  7. Thank you for your attention. Contact José Manuel Barrueco Cruz for more information

More Related