1 / 29

Outline

Improving the ETD Landscape ETD 2014: 17 th Int’l Symposium on ETDs Leicester, England Edward A. Fox Executive Director, NDLTD, www.ndltd.org fox@vt.edu http:// fox.cs.vt.edu /talks/ 2014 Virginia Tech, Blacksburg, VA 24061 USA. Outline. Acknowledgments Why, what, who, how

zurina
Download Presentation

Outline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving the ETD LandscapeETD 2014: 17th Int’l Symposium on ETDsLeicester, EnglandEdward A. FoxExecutive Director, NDLTD, www.ndltd.orgfox@vt.edu http://fox.cs.vt.edu/talks/2014 Virginia Tech, Blacksburg, VA 24061 USA

  2. Outline • Acknowledgments • Why, what, who, how • Improving, quality • Related technical contributions • DLs and DL curriculum

  3. Acknowledgments • Family, mentors, teachers, students • Dissertations: Sung Hee Park, VenkatSrinivasan, Seungwon Yang • NSF: IIS-0535057, 0916733, 1319578 • All those working with ETDs • NDLTD, including its Members, Board, Committees, and Working Groups

  4. Why, What, Who? • Why? • enhance graduate education • expand global research collaboration • What? • help students communicate more effectively • get ETDs for all TDs: next goal 5 million • help make ETDs open, accessible, preserved • Who? • levels: students, faculty, staff, (grad) administrators • professions: CS, IT, LIS, librarians, archivists

  5. How? • Authoring systems, tools, methods • Data and auxiliary information management aids • Metadata creation software and techniques • Submission, approval, refinement workflows • Local access and information management • Sharing, disseminating, discovering • OAI, data providers, harvesting • Regional/national, global institutions • Services: access, preservation, adding value • Add back files

  6. Improving – 1 of 2 • Context: Quality frameworks, references on quality • Guidelines and documentation for all of this • Works • XML + PDF + raw/original representations • Multimedia, software, simulations, websites, dynamic content • Data, auxiliary information, references/bibliographies • Reproducibility • Metadata • Completeness: subject classification, faculty by role • Authority info

  7. Improving – 2 of 2 • Local services • Training, assistance • IR, archives, archival consortia • Global services • Browse, faceted search, full-text search • Recommend, CLIR, CBIR, summaries, topics • Linked data, hyperlinks, citation linking • Alerts, notifications, RSS feeds, filtering

  8. Information Life Cycle (adapted) Creation Active Authoring Modifying Classifying Tagging Recommending Indexing Social Context Using Citing Retention / Mining Downloading Storing Retrieving Semi- Active Discovering Utilization Filtering Distributing Networking Inactive Searching Borgman et al. 1996 http://is.gseis.ucla.edu/research/dig_libraries/

  9. Quality and the Information Life Cycle

  10. Quality Dimensions

  11. Digital Library Service Taxonomy

  12. Improve related movements • Make related efforts work for graduate researchers, ETDs, and university ETD activities: • Open access, institutional repositories • Sharing references and citations: Zotero, … • Sharing data, datasets, workflows; reproducible science: reproducibleresearch.net, … • Building author profiles: ORCID, ISNI, … • Digital libraries and DL education (DL2014)

  13. Related technical contributions • Broadly: new/better systems, user/usage studies, added services, improved practices • Automatically assign topics or categories to ETDs or to portions (e.g., chapters) to aid browsing and (faceted) searching • Build a union reference collection: by aiding authors (e.g., Hiberlink) and/or by automatic ETD text mining • Enhanced information retrieval: cross language IR, content based IR (image/video/music) …

  14. Topic determination • Given a document, extract or generate generalized description of its topics • Statistical approaches, e.g., LDA • Knowledge based approaches, e.g., Xpantrac • Take a webpage or document • Use portions of it to build queries to a knowledge source (Web, Wikipedia, and ETD collection) • Combine, analyze, and summarize the results • Seungwon Yang, "Automatic Identification of Topic Tags from Texts Based on Expansion-Extraction Approach", Jan. 2014, Ph.D. dissertation, http://hdl.handle.net/10919/25111

  15. ETD Classification: VenkatSrinivasan • Enhance metadata by adding subject categories • Hierarchical classification of ETDs (and chapters thereof) using Library of Congress categories • Training data • OCLC’s WorldCat: records from 1M books have good labels but little metadata; labels on ETDs not usable • Results coming from queries each designed to describe a category • Need to balance negative and positive examples throughout the LoC taxonomy

  16. ETD Classification: Algorithm Pipeline ETDs categorized into a node of the category tree (after classification) Category Tree ETD Collection Category label for each node used as query ETD metadata used for categorization Categorized ETDs Google Naïve Bayes Classifiers Level-wise categorization Top 50 webpages (for each node in the tree) Browsing Training Web Interface Document Sets Training Sets Cleanup (stemming, stopword removal, etc.)

  17. Reference Extraction and Databasing • How can we implement metadata schema for bibliographic information? • What machine learning methods are effective to extract reference sectionsincluding footnotes and chapter references? Sung Hee Park, "Discipline-Independent Text Information Extraction from Heterogeneous Styled References Using Knowledge from the Web", June 2013, VT CS Ph.D. dissertation

  18. Dataflow of Reference Section Extraction Training data Feature Extraction Learning Pdf2 txt Feature Extraction Reference Section Extraction Tagged data ETD in PDF

  19. ETD References: System Architecture ETD Repository Extracting Reference Sections Searching, Browsing, Manipulating Metadata with References Users Web App (e.g., ETD-db) https://github.com/VTUL/etddb2 Union ETD References ?

  20. Discovery, Search Engines, Info. Retrieval(to be extended for images, etc.) Query Q Search Ranking D Results Documents Best matches (Q with D) selected Quality of many systems is low, with recall and precision at only around .5, as opposed to 1 at 1.

  21. Search Module Detail(features can be about text, images, …) Similarity Function Feature vector Q Query Q S = Sim(Q,D1) Feature vectors D1 Document D1 • In CBIR (Content Based Image Retrieval), • search is based on visual content of images • Color • Shape • Texture …

  22. DL Definitions: Informal 5S DLs are complex systems that • help satisfy info needs of users (societies) • provide info services (scenarios) • organize info in usable ways (structures) • present info in usable ways (spaces) • communicate info with users (streams) • Use this as: checklist, design guidelines, basis for formal description, specification for software implementation; e.g., Spaces help re GIS, VR

  23. Digital Library Books • Edward A. Fox and Jonathan P. Leidig, eds. Digital Library Applications: CBIR, Education, Social Networks, eScience/Simulation, and GIS.Morgan & Claypool Publishers, 2014, 175 p., http://dx.doi.org/10.2200/S00565ED1V01Y201401ICR032 • Edward A. Fox and Ricardo da Silva Torres, eds. Digital Library Technologies: Complex Objects, Annotation, Ontologies, Classification, Extraction, and Security. Morgan & Claypool, 2014, 205 p., http://dx.doi.org/10.2200/S00566ED1V01Y201401ICR033 • RaoShen, Marcos Andre Goncalves, and Edward A. Fox. Key Issues Regarding Digital Libraries: Evaluation and Integration. Morgan & Claypool, 2013, 110 p., http://dx.doi.org/10.2200/S00474ED1V01Y201301ICR026 • Edward A. Fox, Marcos Andre Goncalves, and RaoShen. Theoretical Foundations for Digital Libraries: The 5S (Societies, Scenarios, Spaces, Structures, Streams) Approach. Morgan & Claypool, 2012, 180 p., http://dx.doi.org/10.2200/S00434ED1V01Y201207ICR022, supplementary website https://sites.google.com/a/morganclaypool.com/dlibrary/

  24. DL Curriculum Project • NSF awards to VT and UNC-CH: CS and LIS • Project server: http://curric.dlib.vt.edu/ • Wikiversity: http://en.wikiversity.org/wiki/Curriculum_on_Digital_Libraries • Table 1: Core DL Curriculum • Table 2: Information Retrieval Packages • Table 3: LucidWorks Big Data Software • Table 4: Multimedia Software

  25. DL Curriculum Module Template 1. Module name 2. Scope 3. Learning objectives 4. 5S characteristics of the module (streams, structures, spaces, scenarios, society) 5. Level of effort required (in-class and out-of-class time required for students) 6. Relationships with other modules (flow between modules) 7. Prerequisite knowledge/skills required (what the students need to know prior to beginning the module; completion optional; complete only if prerequisite knowledge/skills are not included in other modules) 8. Introductory remedial instruction (the body of knowledge to be taught for the prerequisite knowledge/skills required; completion optional) 9. Body of knowledge (theory + practice; an outline that could be used as the basis for class lectures) 10. Resources (required readings for students; additional suggested readings for instructor and students) 11. Exercises / Learning activities 12. Evaluation of learning objective achievement (graded exercises or assignments) 13. Glossary 14. Additional useful links 15. Contributors (authors of module, reviewers of module)

  26. DL Curriculum Framework

  27. DL Curriculum Modules - examples • Module 1-b: History of digital libraries and library automation • Module 2-c: File Formats, Transformation, and Migration • Module 3-b: Digitization • Module 4-b: Metadata • Module 5-a: Architecture overviews • …

  28. Summary Scene

  29. Conclusion: Improving together • Who will help? • What can we do? • What knowledge and education is needed? • What connections, integrations, collaborations can help with ETDs? • Please comment and share! – Ed Fox (fox@vt.edu, http://fox.cs.vt.edu/talks/2014)

More Related