280 likes | 347 Views
T2D + data identification, curation & duration. Maxine Tedesco ACCOLEDS: December 2-4, 2009. Table to Data (T2D) Project.
E N D
T2D + data identification, curation& duration Maxine Tedesco ACCOLEDS: December 2-4, 2009
Table to Data (T2D) Project Approved March/08 at the COPPUL director’s meeting as a collaborative project seeking to implement a system of linking articles & data in open access journals published at COPPUL institutions.
T2D activities to date • May/08: Brainstorming at IASSIST conference • July/08: Drupal Wiki established & “Outline of Activities” disseminated to project members • Fall/08: Maxine undertook a Literature Search (building on work done by Jim Jacobs, Feb/08) • December/08: Maxine reported at ACCOLEDS and renewed effort to involve project members • Spring/09: Maxine investigated related project topics in connection with Study Leave research Additionally, Chuck liaised/advocated for the project throughout the timeline & consultation with OA publishers was undertaken by some project members.
T2D project stages • Investigating • Literature Searches re: background, tools, etc. • Recruiting • Open access publishers amenable to a pilot project • Researchers willing to deposit data • Marking • Develop a set of descriptive tags for table content • Identify which parts of a data file “should” be linked and/or archived • Tooling (i.e., tools for markup, searching & display) • Evaluating/Reporting (i.e., HOW the project results contribute to research, teaching & learning)
So … What Is In It For Us? This seemed like a reasonable question to investigate further in the research in terms of “background information”.
Taking into account researchers’ disciplinary differences, tables/figures are increasingly: • used as a more effective summary of the article’s content than subject headings or other descriptors • used as a quick means of identifying types of data, methodologies &/or results • used to assess article relevance before reading the entire article • less effective if completely extracted from the surrounding explanatory text and/or complementary tables/figures
DISAGGREGATION Disaggregation of article components such as tables/figures facilitates searching at a greater level of granularity in order to: • Improve search precision (# of relevant items) & recall (# of tables/figures not otherwise retrieved in a traditional search) • Facilitate the REAGGREGATIONof a journal article’s components into new forms/formats
REAGGREGATION? Researchers wish to easily incorporate tabular information: • into new documents (to support original research) • into multimedia documents (to support presentations - classrooms or conferences) • into other contexts (utilize data in pre-existing tables rather than generate new time-consuming and/or expensive datasets) • into a comparison of similar information (to check one’s own work against other work)
So … What Can Make It Easier To Retrieve Relevant Tables/Figures? The research was decidedly sparse in this area or not quite as “on-topic” as one would have hoped.
Overview of Literature Review The research mostly dealt with such topics as: • Making T&F (tables/figures) more accessible to the visually impaired. • Improved graphical presentation of T&F. • Poor quality of T&F replication in electronic versions of documents. • Improved dissemination of statistical information. • Full-text does not necessarily mean the inclusion of T&F.
Format-Specific Databases • TableBase (Gage; 1997+) • table title, table text, and descriptor fields are searchable • text that accompanies the table is not searchable or retrievable from the product • tables are directly downloadable to Excel • Statistical Universe (Lexis-Nexis PowerTables; 2000+) • users search by “criteria” • links to full-text documents in the CIS/LEXIS-NEXIS digital archive & on WWW sites • download a PDF file or an Excel spreadsheet
SEARCH RESULTS from TableBase
TYPICAL RECORD in TableBase
Databases with “Deep Indexing” features • Illustrata (ProQuest/CSA; 2006+) • assigns 7-8 index terms per image (these are searchable but not the table text itself) • thumbnail images for quick preview • links to full-text and other components within the product • Selected ProQuest Databases (Oct. 1, 2009+) • deep indexing of images added along with traditional abstracting & indexing of text (at no additional cost)
Products That Index TableCONTENT • TableSeer (search engine; 2006+) • automatically identifies tables in digital documents and extracts the contents in the cells of the tables • contents are stored in a queryable table in a database which extracts table metadata and uses a novel ranking function to search for tables relevant to user queries • BioText Search Engine (freely available web-based application; 2007+) • searches over 300 open access journals • ability to search for words within a table
TableSeer is part of ChemxSeer http://chemxseer.ist.psu.edu/
BioText Search in Articles For: “hypercholesterolemia” & “Education”
So … What Does This All Mean for the T2D Project? Not exactly sure but perhaps, in seeing this trend in the Abstract & Indexing industry, we might investigate developing a “SocioText” type of product to index open access journals such as the Canadian Journal of Sociology = ??
So … What Else Needs To Be “Put On The Table”? What if the table information is insufficient and I want to look at entire dataset? Where is the entire dataset? Who owns the entire dataset? When will it become available for me to use? How can I get my hands on it?
Identific/cur/dur-ATION! • Personal Websites • Institutional Repositories • Subject-specific Repositories such as: • Dryad - http://datadryad.org/repo • ExLab - http://exlab.bus.ucf.edu AND THEN PERHAPS, there’s still: • Desk Drawers (aka: LOST)
So . . . What Do We Do Now? Hopefully I’ve been able to provide some context and/or “food for thought” and, well . . . stay tuned for updates!