250 likes | 355 Views
Reinventing Science Librarianship. Education for New Roles Catherine Blake cablake@email.unc.edu http://www.ils.unc.edu/~cablake University of North Carolina @ Chapel Hill. Source: The DCC Curation Lifecycle Model. Creation. Jupiter has moons Galileo, Sidereus Nuncius, 1610
E N D
Reinventing Science Librarianship Education for New Roles Catherine Blake cablake@email.unc.edu http://www.ils.unc.edu/~cablake University of North Carolina @ Chapel Hill
Creation • Jupiter has moons • Galileo, Sidereus Nuncius, 1610 • Relative sizes of the Earth, Sun and Moon • Aristarchus's 3rd century BC • this image - 10th century AD Source: Wikipedia
Creation • Little Dipper microarray processors • Biology/pharmacology • The first beam in the Large Hadron Collider at CERN1 was successfully steered around the full 27 kilometers of the world’s most powerful particle accelerator Source: http://www.scigene.com/products/little_dipper.html http://mediaarchive.cern.ch/MediaArchive/Photo/Public/2008/0809002/0809002_01/0809002_01-A5-at-72-dpi.jpg
Acquisition & Collection • Data acquired directly from scientists • Heterogeneous formats • multi-media • annotations on a spreadsheet • Varying quality • experimental settings • Student vs verified data
Collectively identifying resources Group think Social bookmarking Participatory cataloging Eg UNC photographs Identification & Cataloging
Storage & Preservation • Storage • 92% on magnetic media • 5 exabytes of print, film, magnetic, and optical storage media produced about in 2002 • Preservation • Heterogeneous • Changing hardware • Changing software Image Source: http://www.cray.com/products/index.html http://www2.sims.berkeley.edu/research/projects/how-much-info-2003/
Barriers to access removed • Environment • New source of information providers (Scientists, Granting agencies) • NIH Mandated access • Consequences • No single point of access • Different levels of access required • HIPPA compliance • Maintaining cultural norms
Use and Reuse • Data and Text Mining • Use data collected for a different purpose • Eg a side-effect of one drug becomes the purpse of another • Information Synthesis • Combine speculative information • Literature Based Discovery • Uncover transitive connections from text
Data Oriented Roles • Data Consultant • Share best practice regarding how to organize & share data • Data Distributor • Scientists control the data, distributor makes the data available to others • Data Manager • Manager organizes and keep the data
New Roles • Data Service Provider • Data conversion and pre-processing • Data and Text Analyst • Scientist provides the data, analyst applies visualization, data and text mining tools. • Embedded Roles (Data Scientist) • Information Work flow
Data Oriented Roles • Information organization • Conceptual Modeling • Create and understand • ER diagrams • UML diagrams • Concept maps
Information Object interpreted using 1+ interpreted Data Representation using 1+ Object Information Physical Digital Object Object 1+ Bit Sequence Reference Model For an Open Archival Information System Source:nost.gsfc.nasa.gov/isoas/presentations/oais_tutorial_200005.ppt
Data Oriented Roles • Conceptual relational models • Good database design • Normalization • Methods to enforce • data quality • referential integrity • Ongoing maintenance
New Roles • Text Mining: A case study • All text is not created equal • Things that in the way • Page breaks • Figures • Tables • Special characters • Implications to preservation
Machine readable form ></TABLE ><P >Scientists engage in the discovery process more than any other user population, yet their day-to-day activities are often elusive. … The development of accurate models often requires that a scientist resolve conflicting evidence.</P ><P >One activity that consumes much of a scientists' time is <I >synthesis</I >, <IMG SRC="/giflibrary/12/ldquo.gif" BORDER="0">the dialectic combination of thesis and antithesis into a higher stage of truth<IMG SRC="/giflibrary/12/rdquo.gif" BORDER="0"> (<I >Merriam-Webster's Collegiate Dictionary</I >, [<A HREF="#BIB24" >2004</A >]). This dictionary definition reflects the alternative viewpoints that often occur when multiple empirical studies explore the same phenomena. The synthesis activity results in an overall finding - a higher stage of truth - which scientists achieve by …
First phase pre-processing ></TABLE> <P>Scientists engage in the discovery process more than any other user population, yet their day-to-day activities are often elusive. … The development of accurate models often requires that a scientist resolve conflicting evidence.</P> <P>One activity that consumes much of a scientists' time is <I>synthesis</I>, <IMG SRC="/giflibrary/12/ldquo.gif” BORDER="0">the dialectic combination of thesis and antithesis into a higher stage of truth<IMG SRC="/giflibrary/12/rdquo.gif“ BORDER="0"> (<I>Merriam-Webster's Collegiate Dictionary</I>, [<A HREF="#BIB24">2004</A>]). This dictionary definition reflects the alternative viewpoints that often occur when multiple empirical studies explore the same phenomena. The synthesis activity results in an overall finding - a higher stage of truth - which scientists achieve by … OLD: <IMG SRC="/giflibrary/12/ldquo.gif” BORDER="0"> NEW: ” OLD: <IMG SRC="/giflibrary/12/ldquo.gif” BORDER="0"> NEW: “ OLD: (Merriam-Webster's Collegiate Dictionary [<A HREF="#BIB24">2004</A>]) NEW: _BIB_24
Second phase pre-processing • Add Identifiers • break paragraphs into sentences • Add document, section, paragraph, sentence IDs • Replacements • symbols , references • Output: Identifiers|One activity that consumes much of a scientists' time is synthesis “the dialectic combination of thesis and antithesis into a higher stage of truth” _BIB_24. Identifiers|This dictionary definition reflects the alternative viewpoints that often occur when multiple empirical studies explore the same phenomena.
Clustering Categorization Association Rules IBM Intelligent Miner for text (Clustering) SAS Text Miner (Association Rules) Text Analytics
Visualization NCI-funded research 1995-2001
Embedded Roles • Workflow • Deep understanding • Data formats • Access norms • Reward structures • Custom pre-processing
Closing Remarks • Not everyone will have every skill • Existing skills that will remain critical • Strong ties to faculty • Strong negotiating skills • Knowledge of standards and resources • The roles exist, its not clear where they will live within an institution The ability to think like someone within a discipline