1 / 20

SELL - May 2009

cormac
Download Presentation

SELL - May 2009

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. SELL - May 2009 Collecting and Analyzing Usage Statistics at INIST-CNRS 1 Good morning Today, Im going to present the project that INIST-CNRS is currently conducting on the usage statistics of our portals. But, first I would like to introduce myself. If after this presentation, you need more details, you can join Magali Colin, the usage statistics project managerGood morning Today, Im going to present the project that INIST-CNRS is currently conducting on the usage statistics of our portals. But, first I would like to introduce myself. If after this presentation, you need more details, you can join Magali Colin, the usage statistics project manager

    2. SELL - May 2009 Collecting and Analyzing Usage Statistics at INIST-CNRS 2 CNRS: 11600 researchers 1200 research units Then, I would like to say a few words about CNRS, our parent organization. CNRS stands for Centre National de la Recherche Scientifique or National Center for Scientific Research. It is a French government-funded research organization, under the administrative authority of France's Ministry in charge of Higher Education and Research. CNRS is the largest basic research organization in Europe and carries out research in all fields of knowledge through its six research departments CNRS has over one thousand two hundred (1 200) research and service units located throughout France and ninety per cent of these units are joint laboratories with universities. There are eleven thousand six hundred (11 600) researchers and fourteen thousand four hundred (14 400) engineers, technicians and administrative staff). Then, I would like to say a few words about CNRS, our parent organization. CNRS stands for Centre National de la Recherche Scientifique or National Center for Scientific Research. It is a French government-funded research organization, under the administrative authority of France's Ministry in charge of Higher Education and Research. CNRS is the largest basic research organization in Europe and carries out research in all fields of knowledge through its six research departments CNRS has over one thousand two hundred (1 200) research and service units located throughout France and ninety per cent of these units are joint laboratories with universities. There are eleven thousand six hundred (11 600) researchers and fourteen thousand four hundred (14 400) engineers, technicians and administrative staff).

    3. SELL - May 2009 Collecting and Analyzing Usage Statistics at INIST-CNRS 3 To collect and analyze the results and findings of worldwide research in Science, Technology, Medicine, Humanities and Social Sciences. To widely disseminate these results and findings through various activities using traditional and new technological means available today. INIST-CNRS, the Institute for Scientifc and Technical Information of CNRS is a laboratory created in 1989. It is located in Eastern France in a suburb of the city of Nancy and its buildings were designed by the world-famous architect Jean Nouvel. The Institutes primary missions are to collect and analyze the production of worldwide scientific research in all fields of knowledge and to disseminate its results and findings to the widest audience. INIST-CNRS, the Institute for Scientifc and Technical Information of CNRS is a laboratory created in 1989. It is located in Eastern France in a suburb of the city of Nancy and its buildings were designed by the world-famous architect Jean Nouvel. The Institutes primary missions are to collect and analyze the production of worldwide scientific research in all fields of knowledge and to disseminate its results and findings to the widest audience.

    4. SELL - May 2009 Collecting and Analyzing Usage Statistics at INIST-CNRS 4 Negotiation on behalf of CNRS for the acquisition of electronic resources Creation and design of interdisciplinary and subject-oriented portals. Document delivery services Leading supplier in France (350 000 copies delivered in 2007) Production of multilingual, multidisciplinary bibliographic databases FRANCIS, PASCAL, ISD Research partnerships within the ICT community Partnership in Open Access initiatives ICT = Information and Communication Technology This slide list the activities of INIST-CNRS. Portal creation and acquisition negotiations are the activities that concern todays presentation. ICT = Information and Communication Technology This slide list the activities of INIST-CNRS. Portal creation and acquisition negotiations are the activities that concern todays presentation.

    5. SELL - May 2009 Collecting and Analyzing Usage Statistics at INIST-CNRS 5 Life Sciences BiblioVIE http://bibliovie.inist.fr Humanities and Social Sciences BiblioSHS http://biblioshs.inist.fr Chemical Sciences TitaneSciences http://titanesciences.inist.fr Engineering Sciences BiblioST2I http://bibliost2i.inist.fr Earth Sciences BiblioPl@nets http://biblioplanets.inist.fr And also, managed by INIST for the French National Institute for Health and Medical Research, INSERM BiblioInserm - http://biblioinserm.inist.f As early as 2000, INIST took advantage of the increasing availability of electronic resources to offer its users subject-oriented portals accessible on the Web. These portals give access to information useful to researchers. Some information might be freely accessible such as news of interest to specific communities while some resources, such journals and databases, might be only accessible to authorized users. As early as 2000, INIST took advantage of the increasing availability of electronic resources to offer its users subject-oriented portals accessible on the Web. These portals give access to information useful to researchers. Some information might be freely accessible such as news of interest to specific communities while some resources, such journals and databases, might be only accessible to authorized users.

    6. SELL - May 2009 Collecting and Analyzing Usage Statistics at INIST-CNRS 6 INIST-CNRS is in charge of the technical implementation and management of the CNRS portals. It provides publishers with a single IP address by portal and manages password and authorized users logins of each CNRS unit. INIST-CNRS also is in charge, on behalf of CNRS units, of negotiating with publishers and producers access to their electronic resources contained in the various portals. This slide gives you some figures and activities involved in portal creation and management. INIST-CNRS is in charge of the technical implementation and management of the CNRS portals. It provides publishers with a single IP address by portal and manages password and authorized users logins of each CNRS unit. INIST-CNRS also is in charge, on behalf of CNRS units, of negotiating with publishers and producers access to their electronic resources contained in the various portals. This slide gives you some figures and activities involved in portal creation and management.

    7. SELL - May 2009 Collecting and Analyzing Usage Statistics at INIST-CNRS 7 I would like to show you a sample screen of BiblioVie, the portal dedicated to the CNRS Life Sciences community. It is a good exemple of the other portals. On the top left, you can find information on the various ressources in BiblioVie. In the middle, you have the news section. Right below, is where you can access the ressources. For exemple, you can look for journals or databases. This Life Sciences portal provides restricted access to over three thousand six hundred (3600) journals. The workgroup on usage statistics focuses primarily on the usage of restricted access journals. I would like to show you a sample screen of BiblioVie, the portal dedicated to the CNRS Life Sciences community. It is a good exemple of the other portals. On the top left, you can find information on the various ressources in BiblioVie. In the middle, you have the news section. Right below, is where you can access the ressources. For exemple, you can look for journals or databases. This Life Sciences portal provides restricted access to over three thousand six hundred (3600) journals. The workgroup on usage statistics focuses primarily on the usage of restricted access journals.

    8. SELL - May 2009 Collecting and Analyzing Usage Statistics at INIST-CNRS 8 Know how electronic resources are used and quantify their usage. Optimize acquisitions, match user expectations and negotiate with publishers. Streamline expenditures and provide budget requests with hard data. Why the evaluation of the usage of electronic resources is so important? Usage statistics are of strategic importance for library acquisitions. For cost-effective acquisitions, detailed usage statistics are needed to find a better match between user expectations and library budgets. Publishers are also aware of the importance of the usage of their electronic resources. They produce their own statistics and overall they are willing to share them. So, usage evaluation of resources is needed to: Gain a better understanding of resource usage and quantify this usage by portal and scientific communities to provide data on which to base budget allotments among the various scientific departments during national negotiations at CNRS level Optimize acquisitions and back up editorial decisions. Gain a better understanding of user expectations. Provide hard data to back up discussions during negotiations. Check whether publishers price increases are legitimate. Supply hard data to streamline expenditures and back up our own internal budget requests to our parent organization. Why the evaluation of the usage of electronic resources is so important? Usage statistics are of strategic importance for library acquisitions. For cost-effective acquisitions, detailed usage statistics are needed to find a better match between user expectations and library budgets. Publishers are also aware of the importance of the usage of their electronic resources. They produce their own statistics and overall they are willing to share them. So, usage evaluation of resources is needed to: Gain a better understanding of resource usage and quantify this usage by portal and scientific communities to provide data on which to base budget allotments among the various scientific departments during national negotiations at CNRS level Optimize acquisitions and back up editorial decisions. Gain a better understanding of user expectations. Provide hard data to back up discussions during negotiations. Check whether publishers price increases are legitimate. Supply hard data to streamline expenditures and back up our own internal budget requests to our parent organization.

    9. SELL - May 2009 Collecting and Analyzing Usage Statistics at INIST-CNRS 9 Monthly download of publishers usage data (from the publishers site via identification) Drawbacks of this type of statistics: incomplete not homogenous not always available no links to users (1 IP/portal) However: Not all publishers publish usage statistics therefore the data is incomplete. Not all publishers are COUNTER compliant so there is a lack of homogeneity and also a lack of documentation on the content and the nature of the statistics they supply. Publishers do not always publish their statistics at the same time and some of them might publish them quite late. The EZ-proxy does not allow publishers to identify connections beyond the IP addresses and the portals. This lack of data between resource usage (articles accessed) and resource users (the laboratories accessing these articles) is one of the reason that prompted us to develop internal statistics. However: Not all publishers publish usage statistics therefore the data is incomplete. Not all publishers are COUNTER compliant so there is a lack of homogeneity and also a lack of documentation on the content and the nature of the statistics they supply. Publishers do not always publish their statistics at the same time and some of them might publish them quite late. The EZ-proxy does not allow publishers to identify connections beyond the IP addresses and the portals. This lack of data between resource usage (articles accessed) and resource users (the laboratories accessing these articles) is one of the reason that prompted us to develop internal statistics.

    10. SELL - May 2009 Collecting and Analyzing Usage Statistics at INIST-CNRS 10 January 2006: Creation of a work group on statistics at INIST-CNRS associating: INIST-CNRS users, INIST-CNRS computer specialists. Eight persons are members of the workgroup. Three or four persons dedicate three quarters of their time to the project especially during intensive testing phases. The objectives of the group are: To compile comprehensive information that might be useful to meet the needs we have just mentionned in an autonomous, rapid, homogenous and comprehensive manner. And To formalize the obtained data in statistical usage reports and other tables and charts to be used as tools for decision making. Eight persons are members of the workgroup. Three or four persons dedicate three quarters of their time to the project especially during intensive testing phases. The objectives of the group are: To compile comprehensive information that might be useful to meet the needs we have just mentionned in an autonomous, rapid, homogenous and comprehensive manner. And To formalize the obtained data in statistical usage reports and other tables and charts to be used as tools for decision making.

    11. SELL - May 2009 Collecting and Analyzing Usage Statistics at INIST-CNRS 11 Homogenous and detailed data On every publisher Quickly available With links to user laboratories We decided to create a system to collect statistics based on connection log counts. Since the institute was technical operator for the Life Sciences portal the data we needed was easily available. By producing our own statistics, we are able to obtain data that is more homogeneous and more exhaustive and we can establish a link between resource usage and resource users. We decided to create a system to collect statistics based on connection log counts. Since the institute was technical operator for the Life Sciences portal the data we needed was easily available. By producing our own statistics, we are able to obtain data that is more homogeneous and more exhaustive and we can establish a link between resource usage and resource users.

    12. SELL - May 2009 Collecting and Analyzing Usage Statistics at INIST-CNRS 12 On this slide, you can follow the process we use to retrieve our statistics. Users must identify themselves on their dedicated portal. The proxy which identify them, records every transaction in log files and transmits the page requests (= a transaction) to the publisher sites. The log files are sorted by portals and by date and saved for later analysis. Data will be extracted from these files and fed into the statistics database. So, with user identification, we can know at anytime who is at the origin of the transaction. On this slide, you can follow the process we use to retrieve our statistics. Users must identify themselves on their dedicated portal. The proxy which identify them, records every transaction in log files and transmits the page requests (= a transaction) to the publisher sites. The log files are sorted by portals and by date and saved for later analysis. Data will be extracted from these files and fed into the statistics database. So, with user identification, we can know at anytime who is at the origin of the transaction.

    13. SELL - May 2009 Collecting and Analyzing Usage Statistics at INIST-CNRS 13 User identification Dated transactions Special processing Controlled results Fault detection Thanks to the specificity of internal statistics, we can process a maximum of data extracted from the transaction records. We can identify the users through their logins and passwords. We can pinpoint when a transaction occurred since each log is dated. We can perform special processing for example to follow the COUNTER recommendations. For example, simultaneous access in the same time window are considered as duplicates and are deleted. The time window is 10 seconds for text documents and 30 seconds for PDF documents. We are able to check the results. Manual tests conducted on different platforms enabled us to find out whether the actual user queries correspond to the figures obtained form the database. This means that our teams queried the resources and noted everything they did. Then they compared their notes with the results derived from the database using login and query date filters. Finally, we can manage faults such as detection of transaction peaks for some logins, highlighting of abnormal situation and access cut offs. Thanks to the specificity of internal statistics, we can process a maximum of data extracted from the transaction records. We can identify the users through their logins and passwords. We can pinpoint when a transaction occurred since each log is dated. We can perform special processing for example to follow the COUNTER recommendations. For example, simultaneous access in the same time window are considered as duplicates and are deleted. The time window is 10 seconds for text documents and 30 seconds for PDF documents. We are able to check the results. Manual tests conducted on different platforms enabled us to find out whether the actual user queries correspond to the figures obtained form the database. This means that our teams queried the resources and noted everything they did. Then they compared their notes with the results derived from the database using login and query date filters. Finally, we can manage faults such as detection of transaction peaks for some logins, highlighting of abnormal situation and access cut offs.

    14. SELL - May 2009 Collecting and Analyzing Usage Statistics at INIST-CNRS 14 Lets move on to the technical aspects of our prototype. Once the log files have been established, they are saved in a data repository. To do so, we implemented a Data Center-type MySQL database to centralize all the data from all other portals As you can see on the slide, everything revolves around a single repository. The objective is to measure resource usage and to relate them back to data from different origins. In the database, we can centralize The bibliographic data extracted from the Millenium integrated library system of Innovative Interfaces The pricing data on journals, also extracted from Millenium The impact factor of the journals obtained from the Journal Citation Reports (GCR) The descriptive data on the laboratories such as location, number of researchers, research field. The transactions saved on a monthly basis Some tables come from the extractions, some come from various sources and some had to be created from scratch to introduce information existing in a somewhat random and scattered manner. The query and journal data are used to optimize acquisitions and back up editorial decisions. The journal and pricing data are used to provide hard data to back up our budget requests. The pricing and laboratory data are used to allocate budgets among the various CNRS scientific departments. The CNRS laboratory and access data are used to find out user expectations in order to better answer user needs. Lets move on to the technical aspects of our prototype. Once the log files have been established, they are saved in a data repository. To do so, we implemented a Data Center-type MySQL database to centralize all the data from all other portals As you can see on the slide, everything revolves around a single repository. The objective is to measure resource usage and to relate them back to data from different origins. In the database, we can centralize The bibliographic data extracted from the Millenium integrated library system of Innovative Interfaces The pricing data on journals, also extracted from Millenium The impact factor of the journals obtained from the Journal Citation Reports (GCR) The descriptive data on the laboratories such as location, number of researchers, research field. The transactions saved on a monthly basis Some tables come from the extractions, some come from various sources and some had to be created from scratch to introduce information existing in a somewhat random and scattered manner. The query and journal data are used to optimize acquisitions and back up editorial decisions. The journal and pricing data are used to provide hard data to back up our budget requests. The pricing and laboratory data are used to allocate budgets among the various CNRS scientific departments. The CNRS laboratory and access data are used to find out user expectations in order to better answer user needs.

    15. SELL - May 2009 Collecting and Analyzing Usage Statistics at INIST-CNRS 15 We decided to work on a prototype based on the Life Sciences portal: it contains the largest number of journals, publishers and platforms (such as sciencedirect, IOP, Highwire). Over three thousand and six hundred (3 600) titles that charge for their subscriptions, published by some 40 publishers are currently available on this portal. I must mention that presently we are only processing the level one of the Statistical Reports of COUNTER, level which indicates the successful accesses to full text articles by month and by journal. Only resources that charge were studied, and we plan to extend this study to data bases and available tools and services. We decided to work on a prototype based on the Life Sciences portal: it contains the largest number of journals, publishers and platforms (such as sciencedirect, IOP, Highwire). Over three thousand and six hundred (3 600) titles that charge for their subscriptions, published by some 40 publishers are currently available on this portal. I must mention that presently we are only processing the level one of the Statistical Reports of COUNTER, level which indicates the successful accesses to full text articles by month and by journal. Only resources that charge were studied, and we plan to extend this study to data bases and available tools and services.

    16. SELL - May 2009 Collecting and Analyzing Usage Statistics at INIST-CNRS 16 This slide shows some samples of the graphs and figures obtained from the usage statistics database. On the left, we have a map of France showing by region the distribution of the laboratories that account for eighty per cent of the queries on journals with a restricted access. This distribution is based on the Life Sciences portal data for the first semester 2007. As you can see, the Ile de France region or greater Paris area, accounts for thirty per cent of all queries, while the regions around the Rhone valley and the Mediterranean account each for 14%. These results are not surprising since those regions have the highest concentration of CNRS research units. The graph on the right shows the distribution of queries for the 20 titles most often accessed and the other titles. The top twenty titles represent less that one percent of all titles available on the portal, but also 34% of all access. This slide shows some samples of the graphs and figures obtained from the usage statistics database. On the left, we have a map of France showing by region the distribution of the laboratories that account for eighty per cent of the queries on journals with a restricted access. This distribution is based on the Life Sciences portal data for the first semester 2007. As you can see, the Ile de France region or greater Paris area, accounts for thirty per cent of all queries, while the regions around the Rhone valley and the Mediterranean account each for 14%. These results are not surprising since those regions have the highest concentration of CNRS research units. The graph on the right shows the distribution of queries for the 20 titles most often accessed and the other titles. The top twenty titles represent less that one percent of all titles available on the portal, but also 34% of all access.

    17. SELL - May 2009 Collecting and Analyzing Usage Statistics at INIST-CNRS 17 This slide shows some of the indicators we can produce. We can also produce data by portal such as The number of accesses to fulltext documents in PDF or HTML format for a given period The number of journals The number of publishers The number of platforms The number of publishers and journals that are seldom accessed The number of authorized CNRS laboratories The number of researchers affiliated with these laboratories. We cross and compare some indicators such as number of accesses and average cost per access. We can also clusterize indicators by laboratory and by scientific departments, by journal and publisher, or by scientific field in order to plot different axes This slide shows some of the indicators we can produce. We can also produce data by portal such as The number of accesses to fulltext documents in PDF or HTML format for a given period The number of journals The number of publishers The number of platforms The number of publishers and journals that are seldom accessed The number of authorized CNRS laboratories The number of researchers affiliated with these laboratories. We cross and compare some indicators such as number of accesses and average cost per access. We can also clusterize indicators by laboratory and by scientific departments, by journal and publisher, or by scientific field in order to plot different axes

    18. SELL - May 2009 Collecting and Analyzing Usage Statistics at INIST-CNRS 18 Possibility of different accesses to resources Identification not possible for some platforms Possibility of different interpretations Identification at laboratory level only Free of charge queries are not counted Access to publishers statistics In some CNRS laboratories, users can have other logins to access resources via other means than the INIST-CNRS portals. This might be the case in CNRS laboratories working with universities or industries. We have no control over this situation. It is not possible to analyze usage data on some platforms. Not all platforms analyze data in the same way. For example, in the case of a PDF document, the whole document might be counted as one article or each page of the document might be counted as one article. We still have to integrate publisher statistics in the database. In some CNRS laboratories, users can have other logins to access resources via other means than the INIST-CNRS portals. This might be the case in CNRS laboratories working with universities or industries. We have no control over this situation. It is not possible to analyze usage data on some platforms. Not all platforms analyze data in the same way. For example, in the case of a PDF document, the whole document might be counted as one article or each page of the document might be counted as one article. We still have to integrate publisher statistics in the database.

    19. SELL - May 2009 Collecting and Analyzing Usage Statistics at INIST-CNRS 19 To conclude, today we can study the usage of resources that charge for access and are available on the Life Sciences and Humanities and Social Sciences portals since January 2007. We have 3 immediate objectives: To keep contributing to the portail_stat database by adding data from other portals. To update the access reports of the journals that charge by integrating evolutions over time, starting with the Life Sciences, Humanites and Social Sciences, and Engineering Sciences portals and then moving on to the other INIST-CNRS portals. To enhance each report with qualitative data by comparing the number of accesses to a journal and the impact factor of that journal. To conclude, today we can study the usage of resources that charge for access and are available on the Life Sciences and Humanities and Social Sciences portals since January 2007. We have 3 immediate objectives: To keep contributing to the portail_stat database by adding data from other portals. To update the access reports of the journals that charge by integrating evolutions over time, starting with the Life Sciences, Humanites and Social Sciences, and Engineering Sciences portals and then moving on to the other INIST-CNRS portals. To enhance each report with qualitative data by comparing the number of accesses to a journal and the impact factor of that journal.

    20. SELL - May 2009 Collecting and Analyzing Usage Statistics at INIST-CNRS 20 Now, if you have any questions, I will try to answer them. Now, if you have any questions, I will try to answer them.

More Related