1 / 22

Research ExperT

Research ExperT. Paul Varcholik Joshua Thompson EEL 6883 – Software Engineering II Spring 2009. Background. Academic Research Literature Reviews Conferences Journals Material collected from the Internet Google Scholar How do researchers organize the papers they find? Hard copies

maik
Download Presentation

Research ExperT

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Research ExperT Paul Varcholik Joshua Thompson EEL 6883 – Software Engineering II Spring 2009

  2. Background • Academic Research • Literature Reviews • Conferences • Journals • Material collected from the Internet • Google Scholar • How do researchers organize the papers they find? • Hard copies • On-Disk Directory Structures

  3. Background (cont.) • Needs • Storage and quick retrieval of research papers • Collaboration with colleagues • User-provided reviews • Annotated references • Existing Tools • 2collab.com • Mendeley • Zotero • Papers (Mac-only) • Wikipedia comparison

  4. High-Level Architecture • 5 Assemblies • 1 Common • 1 Data Layer • 1 Unit Test • 2 UI • 1 Web • 1 Windows Forms (WinForms)

  5. First Iteration • Requirements gathering, initial design, and implementation • Web-based system • Foundation set, key features available • Large scope required feature pull-back • UI lacking polish

  6. Second Iteration • Windows Forms (WinForms) UI • Same base code – database and data layer with some extensions • Attempts at auto-extraction of meta-data

  7. Iteration Metrics Comparison • 180 files • ~4,500 ELOC • 57 classes and enumerations • 15 database tables • 88 stored procedures • 87 unit tests • Files • ~9,650 ELOC • 92 classes and enumerations • 16 database tables • 100 stored procedures • 96 unit tests First Iteration Second Iteration

  8. UI Comparison Web Windows

  9. Unit Testing

  10. Discussion (cont.) • Low complexity

  11. Discussion (cont.) • High maintainability You can think of the score as a percentage grade, numbers closer to 100 are better. * The formula for average complexity is logarithmic (the numbers don’t add up like sums)

  12. PDF Parsing • Metadata • Issue Heading • Title • Authors • Abstract • Keywords

  13. PDF Parsing (cont.) • Using PDFBox libraries for PDF reading and manipulation • Three methods for parsing PDFs • Automatic • XML based • User-driven image based

  14. PDF Parsing (cont.) • Automatic parsing • Uses heuristics to determine metadata • Font sizes • Relative positioning • Specific tokens • Pros • No user input required • Can provide reasonable guesses • Cons • Makes assumptions • Does not always work 100% • Difficulties with text grabbing

  15. PDF Parsing (cont.)

  16. PDF Parsing (cont.) • XML Parsing • Paper formats are specified • Order of metadata • Relative font sizes • Token delimiters • Pros • More effective than automatic parsing • No direct user input required • Cons • Requires manual input for each publication source

  17. PDF Parsing (cont.) • User-Driven Image Based Parsing • Display Page 1 • User draws rectangles around metadata • Uses automatic parsing as an initial guess • User can review/modify the results • Pros • Uses automatic and user-driven methods • Cons • Requires user input

  18. PDF Parsing

  19. Demonstration

  20. Discussion • Interesting uses of .NET Reflection • Object Registry • Difficulties of PDF Parsing • Approaches to resolving these difficulties • Publication source templates • User input • Cut-and-paste

  21. Future Work • Integrated meta-data parsing • Group-User-Repository access roles • Author ranking • Advanced searching • Annotated references • Additional document types (e.g. MS Word) • More UI polish • Server selection • Review attachment improvements • Administration features

  22. Questions? Research Expert Paul Varcholik Joshua Thompson EEL 6883 – Software Engineering II Spring 2009

More Related