1 / 81

VIVO Implementation Workshop

VIVO Implementation Workshop. VIVO 2 nd Annual Conference Washington, D.C. Leslie McIntosh Valrie Davis Nicholas Rejack Elly Cramer. Agenda. 8:30 – 9:00 Introductions & Implementation Strategy 9:00 – 9:30 Implementation, system, and personnel requirements

ranger
Download Presentation

VIVO Implementation Workshop

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. VIVO Implementation Workshop VIVO 2nd Annual Conference Washington, D.C. • Leslie McIntosh • Valrie Davis • Nicholas Rejack • Elly Cramer

  2. Agenda 8:30 – 9:00 Introductions & Implementation Strategy 9:00 – 9:30 Implementation, system, and personnel requirements 9:30 – 10:30 Data: identification, issues and managing data 10:30 – 10:45Break 10:45 – 11:30 Ontology: data mapping details 11:30 – 12:00 VIVO Customizations

  3. Introductions • Who are we? • What has been our role in VIVO?

  4. Introductions • Who are you? • Where are you from? • What is your interest in VIVO? • Where are you in the VIVO implementation process?

  5. Implementation Strategy

  6. Preparing for Implementation • Who will have information in VIVO? • Is opt-in/opt-out a concern? • What sources will be utilized? • What data will be stored? • How is emergency data removal accomplished?

  7. Preparing for Implementation • What group(s) should manage VIVO? • What group(s) provides support to users? • How often should data be refreshed? • Who will receive greater editing privileges? • Will we utilize a single sign-on?

  8. First Activities • Identify and contact local sources (HR, Sponsored Programs, Registrar) • Evaluate the Ontology • How do your local data sources map to VIVO? • What local extensions do you need? • Model your organization • Do you have an ingestible organization file, or do you need to manually curate?

  9. Data Sanitization Your data is only as good as your source • Analyze data (frequency, visibility, cause) • Prioritize sanitization work • Sanitize: • Script out data issues en-masse (merge dups, etc.) • Curate by hand (misspellings, merge dups.)

  10. Get Data Out

  11. Ongoing Maintenance • Upgrade to new versions • Evaluate system architecture, as triples grows • Adhere to a consistent refresh schedule • Identify potential data sources or controlled vocabularies • Adjust ontology and data ingest, as needed • Sanitize data • Assist with repurposing efforts • Managing expectations

  12. Managing Expectations

  13. What is your plan? • Divide and conquer • By data sources? • By division/department?

  14. VIVO Implementation, System, and Personnel Requirements

  15. The Team

  16. Team Composition Roles Titles Project Manager/Team Lead System Administrator Data Scientist /Curator Training/Outreach Liaison • Facilitate coordination, communication, and completion of activities • Install, update, and monitor VIVO software • Design, develop, implement, and manage data

  17. Team Composition People Person Data Guru Computer Whiz

  18. Team Dynamics People Person Data Guru Computer Whiz

  19. The Location

  20. What entity should operate VIVO? Where at your institution is there… • Access to institutional data? • Authority over institutional data? • IT capabilities (e.g. personnel, servers, IT security)? • Time?

  21. Potential Places for VIVO • Informatics Division/Department • Library • Information Technology • Provost Office • College Dean Office • Research Office

  22. Resources

  23. Resource Type • PeopleWho will work on the project?What is the optimal team size? • ComputingDo you have a system for software install, maintenance, and security? • DataWhere are the data sources? • TimeWhat time and effort will be devoted to the project?

  24. Time • Estimate the time you think it will take for each task • Double it • Phases/Releases …and manage expectations.

  25. Questions & Activities • What does this look like at your institution/agency? • What resources do you have? • What is the structure of your institution? • Where will you house VIVO?

  26. Data: identification, issues, and managing data Implementation Workshop VIVO 2nd Annual Conference Washington, D.C. Valrie Davis, UF; Chris Westling, Cornell; Eliza Chan, Weill Cornell; Ryan Cobine, IU

  27. To be covered • Data Stewardship and VIVO • Data Lifecycle • Data Sources & Data Providers • Data Quality and frequent issues • Your VIVO toolbox

  28. Data Stewardship • Design, develop, implement, and manage data • Ensure data quality, integrity, breadth, timeliness and privacy • Manage and monitor/audit the data quality • Part of a team • Perform data quality checks • Support the goal of data re-use

  29. This isn’t just ANY data!

  30. Your data lifecycle • Identify the available data • Prioritize (cost, benefit, difficulty) • Approach • Map to the ontology, create extensions if needed • Write your XSLT and perform your data “run” • Examine (SPARQL, explore, etc.) • Adjust • Identify blockers and post-run sanitization requirements

  31. Approaching Data Providers

  32. Approaching data providers • Who should be approached? • By whom? • At what point? • What concerns will they have? • What do you need to know? • What are their delivery options? • Include negotiation time in your project plan

  33. What we might hear • “This data may be public but we don’t think it should be exposed” • “I’ll have a talk with the Privacy Officer” • “Will the unique identifier be made public?” • “When will you refresh the data?” • “Are you providing an opt-in/opt-out option?” • “Why should we share our data?”

  34. Preparation • Right Team | Right People • Understand the value of VIVO • Consult with your privacy or security officer • Acquaint yourself with their data • Acquaint yourself with the ontology • Consider “emergency data removal”

  35. Post Meeting • Communicate: • Refresh cycle changes • Source sanitization needs • New features (shibboleth) • New visualizations

  36. Data Quality

  37. Questions to Improve Data Quality • Does your data need pre-sanitization? • Post-ingest sanitization? • What order should you load data? (i.e., journal titles before publications?) • What data will be ingested? Manual? • How will you classify your data? • Source as source (not editable in VIVO) • VIVO as source (editable) • Courtesy ingest (editable)

  38. Frequent Data Quality Issues • Not up-to-date • Redundant data (i.e., publications) • Misspellings or variations of spelling (NIH, Nat I. Health, Nat Inst. Health) • Accumulation of old data • Incomplete data

  39. Data Anecdotes • Users listed cell phone numbers as work numbers • Trucks listed as organizations • Employee start dates listed as date of departmental re-organization • Employees not associated with a dept code that is NOT their department • “CWID” and “Lump Sum” as names of people

  40. Questions

  41. What are your greatest data concerns or challenges at this point?

  42. Who monitors the accuracy of the data?Who will provide good “customer support” when needed?

  43. Your VIVO Toolbox Harvester VIVOSpreadsheet Tool sparql

  44. Harvester • Schedule a harvest to run automatically • Requires some manual intervention (reindexing) • Each run is set-up slightly different

  45. “a power tool for working with messy data” • Manipulate datasets

  46. sparql • Write queries to upload, extract, count, identify • Pre- and post- run queries • Help identify data issues you can’t spot through exploration (why is that query saying we have double the people who exist?) • Similar to SQL queries

  47. Spreadsheet Tool: new feature • A tool for uploading data into VIVO • Works with People, Grants, and Publications • Easy enough for anyone to use

  48. Credits • http://dragonartz.wordpress.com/2009/04/10/light-rays-with-sparkles-background-vector/ • http://www.rotruckinsurance.com/category/truck-insurance/ • http://www.robinupton.com/software/ • http://wonwill.en.made-in-china.com/offer/PokQsIuxaTrZ/Sell-Combine-Harvester-Grain-Harvester-2080A-.html

  49. Ontology: Data Mapping Details Implementation Workshop VIVO 2nd Annual Conference Washington, D.C. Nicholas Rejack, UF

More Related