240 likes | 258 Views
Computational Resources for Teaching Bioinformatics. Jodi Schwarz , Department of Biology, Vassar College Marc Smith , Department of Computer Science, Vassar College Cristian Opazo , Academic Computing Services, CIS, Vassar College. Teaching Big Science at Small Colleges:
E N D
Computational Resources for Teaching Bioinformatics Jodi Schwarz, Department of Biology, Vassar College Marc Smith, Department of Computer Science, Vassar College Cristian Opazo, Academic Computing Services, CIS, Vassar College Teaching Big Science at Small Colleges: a Genomics Collaboration Workshop 2007
Biology as an information science • Biology increasingly high-throughput and interdisciplinary • Why? The concurrent developments of • technologies for high-throughput studies • computational approaches/power • concepts of systems and informatics • Result: The structure of biological research is changing • Collaboration is essential • Computation is essential • How do we biology faculty train ourselves and our students? • work with people trained in different fields • develop a quantitative and high-throughput perspective • remain focused on the biology
Genomics/Bioinformatics in Vassar Biology Biol 106: Introductory Biology: Students given two mutant strains of C. elegans and are told that each contains a mutation in a different gene (unc54 or unc119) • Microscopy to characterize phenotype in mutant and wt • Fluorescence microscopy to localize gene expression in wt • NCBI Mapviewer to find gene sequence and literature • BLAST to identify potential homologs • SequenceExtractor tool to predict length of wt PCR product • PCR amplification of the genes from mutant and wild type • Interpretation of data: • identify which gene contains mutation in each strain • discuss how mutation might confer the observed phenotype GFP and gel images: Kate Susman
Development of upper division courses • HHMI-supported bioinformatics faculty position • Hired a biologist who • uses bioinformatic tools • has worked with computational biologists • is not a programmer • My goals for students • Learn, explore, and be excited by biology • Get beyond the point and click mentality of bioinformatic tools • Understanding computer science approaches what is an algorithm? what are scoring matrices? • Assessing the quality of the output • Finding and evaluating bioinformatic tools • Inquiry based learning: conduct original research
Example 1: 300-level Molecular Biology Goal: learn molecular biology using inquiry-based genomics approach In-class time: two hours per week (26 hours total) System: Aiptasia pallida - a developing model system for studying coral symbiosis • Animal host and microbial symbiont • genomic resources: hot-off-the press ESTs, assembled into about 1000 contigs Specific Goals: Learn and explore biology • Become familiar with the significance of the symbiosis • Study the biology of symbiosis from a molecular and genomics perspective Cultivate molecular biology skills • Bench skills • Interpretation and assessment of sequencing reads –trim vector etc • Identification of functional regions of mRNAs from EST sequence Cultivate bioinformatic skills • genomic level: large-scale annotation of the EST dataset • annotation of genes: symbiosis or metazoan evolution concurrent project at UC-Merced JGI Genomics course
1. Microscopy • 2. QC of the cDNA library • plate, pick clones, plasmid isolation, restriction digest • determine average insert size • determine the redundancy of the library • Large scale bioinformatic analysis • blastall against swiss-prot • fraction with blast hits • most highly expressed genes • KEGG annotations • larger order biological processes Molecular Biology research project 4. Characterize a target gene • which gene to choose and why? • which analyses to perform? • Pfam to identify conserved domains • conservation: clustalw alignments • evolutionary: phylogenetic trees • structure prediction • how to interpret the results? • how to prepare figures for manuscript?
Problems with lab component: • not enough time for both wet lab and bioinformatics components • students shortchanged on the depth of knowledge • too many diverse platforms • JGI computers • via remote server (web access) • PC-only and Mac-only applications • no central location for storage of databases and results Successes: • students learned molecular biology from multiple perspectives • students learned how to apply pre-existing bioinformatic tools to study biological questions • students could identify an interesting question and pursue it • some focused on symbiosis • some focused on evolution • some focused on structure • students grappled with research • set the stage for the next group of students to do functional studies BRING into 200-level for more lab time to do functional work
Example 2: Bioinformatics No wet lab component Greater level of student-driven questions 1. Explore genome-level biological systems and questions • structural genomics (genome sequencing, annotation, architecture) • evolutionary genomics (how do genomes evolve?) • environmental genomics (metagenomics – what lives out there?) • biomedical genomics (use of microarrays and SNP analysis in studying disease) 2. Learn approaches and tools in more detail through series of workshops and small assignments 3. Given the questions/approaches, design and conduct an original research project • individual meetings to design projects
Bioinformatics research project Stages Student challenges • Develop a question know the biology • evolution of drug resistance in malaria • Early animal evolution • uncover “novel” homologs of a particular gene in diverse organisms • Identify a source of sequence data what sequences are appropriate? • ApiDB protein sequences • assembled EST reads from a genome center • nr, Wormbase, Flybase, and other organism-specific • Tools for analysis how to manipulate sequence data? how to find and understand tools? what does this output mean? what other analyses should I do? • Write/present manuscript what do my results suggest? how do I present the results?
Assessment and Direction What were the limitations? • Logistical: No single location/computer system for: • diversity of tools and operating systems • storage of datasets/results • flexibility: point and click to command line • Expertise: my CS limitations as a biologist Where do we need to go next? • Computational platform for simple to sophisticated • Instructional collaboration between biologists and computer scientists Spring 2008: BIOL/CMPU 353: Bioinformatics
Bioinformatics CourseCo-taught by Bio/CS Depts. • Bio majors register under BIOL prefix • CS majors register under CMPU prefix • Different pre-reqs for Bio/CS students • Students work in pairs: • one from each major--must work together • learn to speak each other’s language • First half of course builds fundamentals • Second half of course students engage in a bioinformatics research project
First half of course • Introduce CS topics to Biology students • Introduce Bio topics to CS students • Computational labs (teams of two) • CS students become more familiar with biology problem domains (e.g. sequence alignment, …) • Bio students become familiar with information modeling, algorithm design • CS students explain problem-solving process;Bio students explain biological processes
Biology Majors • Computational fundamentals • Data abstraction • Control structures • Algorithm design and problem-solving • Goals • Participate in algorithm design • Read/understand code • Use/compose existing computational tools • not to turn biologists into programmers
Computer Science Majors • Biological fundamentals • evolution • molecular biology: structures and processes • informatics: e.g., sequence alignment • Goals • understand statement of biological problems (e.g., “predict the open reading frame of this sequence”) • translate biological structures into data structures • work with biologists to design algorithms • not to turn computer scientists into biologists
Both Bio/CS majors • Experimental computer science • Research subject to the scientific method • Designing, implementing, and conducting computational experiments (nondeterminism) • Devising heuristics for computationally infeasible problems • Bioinformatics is interdisciplinary research • Biologists are not computer scientists • Computer scientists are not biologists • Both need each other
Second half of course • work in teams of two • select a research problem • literature search on related work • questions posed / open • techniques applied to finding answers • devise computational experiments • obtain datasets, implement algorithms, … • employ scientific method • present results
bioinf.cs.vassar.edu • Resource for different levels of students • via browser (BioTeam’s iNquiry tool suite) • via secure remote login (ssh / command line tools) • Turnkey Linux-based computing cluster by Rocketcalc, LLC • Delivered May, set up over the summer • Specs: 1 chassis, 4 nodes, 16 CPUs, 1GB switch, 16GB RAM, 3TB disk capacity • Cost (HW+SW): ~$35K • Support infrastructure is essential!
Supporting a cross-constituencyacademic endeavor • Needs range from the strictly technical (acquisition, deployment, testing and maintenance of computing hardware and software) to the more academic (curricular development, training of faculty and students, assessment) Cross-discipline, inter-departmental endeavors require a higher, wider level of technical support and organization • Consequence: a dedicated, knowledgeable support team of full-time professionals has to be considered from the early stages of any bioinformatics project in higher education
PROJECT MANAGEMENT FUNDING & SPONSORSHIP RESEARCH AND TEACHING FACULTY SYSTEMS ADMINISTRATION INTER-INSTITUTIONAL COLLABORATION FACULTY AND STUDENT TRAINING An ideal model
Current support structure at Vassar • The ACS-sciences consultant provides computing expertise, conducting workshops and training sessions on the specific software tools. Additionally, acts as liaison between the various departments (academic and administrative) involved on the overall effort Existence of an Academic Computing unit (ACS) within the college’s main IT division (Computing and information Services, CIS), whose main goal is to provide support and expertise on faculty projects (curricular and research) that include an important technology component • Back-end hardware support is provided (on this preliminary phase) by a systems administrator at the Computer Science department. In the future, a higher institutional involvement is expected (CIS level)
A main goal: establishing collaborations “(…) A critical consideration in the design and management of these centers, from their architecture to their participating faculty and staff, is identification of means to foster collaboration and information exchange” “Contemporary life sciences is increasingly an interdisciplinary effort, as evidenced by the emergence of academic research and educational centers in which faculty teams from across the natural and physical sciences are brought together to create synergistic investigative and scholarly groups (…)” "Bioinformatics: New Technology Models for Research, Education, and Service"Gary Allen, Executive Director, University of Missouri Bioinformatics Consortium The EDUCAUSE Center for Applied Research (ECAR) Research Bulletin, Vol 2004, Issue 8, April 13, 2004 http://connect.educause.edu/library/abstract/BioinformaticsNewTec/40090
http://www.nbirn.net/ An example:Biomedical Informatics Research Network (BIRN) • The BIRN is a geographically distributed virtual community of shared resources (hardware, software applications and databases) within which biomedical scientists and clinical researchers make discoveries by enhancing communication and collaboration across research disciplines An initiative sponsored by the National Institutes of Health and the National Center for Research Resources that fosters large-scale biomedical science collaborations • The BIRN uses emerging cyberinfrastructure (high-speed networks, distributed high-performance computing, and data integration capabilities) to support a consortial effort among 12 universities and 16 research groups engaged in investigation of human neurological disease and associated animal models
BIRN: a wide collaboration Data derived from individual subgroups are being used to drive the definition, construction and daily use of a federated data system, collected and stored across geographically separated sites but presented as a unified data archive that can be securely accessed across institutional boundaries
Final Considerations • What life sciences initiatives currently exist on campus? • To what degree does the institution consider academic endeavors in genomics / bioinformatics a strategic priority? • Which elements of the existing IT infrastructure on campus are well positioned to support our initiatives? Which elements must be improved?