240 likes | 362 Views
Center for Integrated Fungal Research. Fungal Genomics Laboratory. Industrial applications. glutamic acid citric acid amylases proteases lipases. Bioterrorism. Biologically interesting and genetically tractable. Insight into eukaryotic gene regulation and development.
E N D
Center for Integrated Fungal Research Fungal Genomics Laboratory
Industrial applications • glutamic acid • citric acid • amylases • proteases • lipases
Biologically interesting and genetically tractable Insight into eukaryotic gene regulation and development
Framework of rice blast genome Deep (25X) large insert (130 kb) single enzyme (HindIII) BAC library from rice infecting strain 70-15 – 9,216 clones B. BAC fingerprints used to create contigs RFLP 2 RFLP 1 RFLP 3 BAC 1 C. BAC contigs anchored to genetic map BAC 2 BAC 3 A. BAC-end sequence provides “Sequence Tag Connectors” BAC 4 BAC 5 BAC 6 1f 3f 4f 5f 6f 1r 2r 3r 4r 5r 6r 2f STC: ~500 bp sequence every 3-4 kb across genome
USDA-IFAFSproject Oct 2000 “Gene discovery in the rice blast fungus: ESTs and sequence of chromosome 7” 1. Generate ~5 X draft sequence of chromosome 7 (4.2 Mb). 2. Generate 35,000 ESTs and create a set of ~5,000 ESTs representing unique genes. 3. Provide basic sequence analysis and integration of data into physical map of chromosome 7.
NSF-IFAFS projectOct 2001 whole genome sequence host-pathogen function analysis • Generate ~7 x draft sequence of M.grisea • Generate 50,000 knockouts • Analyze host-pathogen interaction • Provide basic sequence analysis
Consequences of Scaling • Moore’s law has allowed labs to keep ahead of data • Sequence data is now outpacing processing capability • Bioinformatics processing will be a real problem
Computational platforms • Modern biology requires robust computational platforms • Computer technology implementation is expensive (from a biologists viewpoint) • Computer technology development is even more expensive (you want how much?!) • This detracts from research for small labs
On the brink • Significant investment in off the shelf components and cross training people • Moderate sized genomes • 20 to 50 Mega Bases • Takes 2 weeks for initial analyses • Homology searches take days
Local blast (www.fungalgenomics.ncsu.edu)
Federated database Select a chromosome Link to genetic information (blue) Link to marker data and other data at http://ascus.cit.cornell.edu/blastdb/
Rice blastN. crassa synteny 2 kb 3 kb 10 kb N. crassa Contig 1.515 185kb 0.5 kb M. grisea - BAC 6J18 111kb 15 kb 20 kb 1 kb N. crassa Contig 1.13 1 kb N. crassa Contig 1.513 17 kb N. crassa Contig 1.841 97 out of 179 unique ESTs from chromosome 7 gave significant (E<10-5) tBlastX match to N. crassa genome shotgun assembly
CIFR BioInformatics Foundation BioInformatics Public Http Exposure Rube Sequence Pipe line Sequence Data Biological results GRL High Throughput WebBlaster AlkaEST Data mining mask Phred consed Phrap Http Blast Report Blast Report db Artemis Curation Relational Data Model Advanced BioInformatics Data Loading Genome BioPerl Interface browser Curation Work area OO Genomic Analysis extract Submissions Extraction Genbank load Higher Order BioInformatics homology Research BioInformatics PBS/LSF Grid Access Repeat analysis Gene prediction Developed at CIFR EST analysis synteny Cluster analysis Ongoing work at CIFR Pathway analysis NC BioInformatics Super computing Grid In-silico mutation Open source and others Cellular models
And over the . . . edge • Our whole genome arrives Spring 2002 • Everyone wants immediate results • Host (Rice) genome size far greater than the pathogen • Comparative genomics likely to require N way analyses • And then there’s proteomics ….
Research Biology • NCSU GRL • Romulus • Remus Excellent foundation work ~6 years to sequence M.grisea
Industrial Scale Biology High Throughput Sequence Centers (Whitehead) ~4 days to sequence M.grisea
Research Bioinformatics • CIFR FGL • Mycelial mat Excellent foundation work est. 4 years to analyze M.grisea
Industrial Scale Bioinformatics North Carolina BioGrid Hopefully 4 hours to analyze M.grisea
Islands of Capability • There are not enough resources for every lab to re-implement technologies • Individual centers specialize according to their research focus • Grid ties together disparate systems • Share knowledge and capabilities • Standards based for interoperability
Future directions5 years* • Organized distributed research - “Virtual Centers” • Bioinformatics • Tool development • Gene prediction algorithms for filamentous fungi • Gene Indexing • “Distributed Annotation Systems (DAS)” • Develop better search features “Queries” • Integrate sequenced and annotated BAC clones • Integrate ESTs and expression profiles etc • Functional Genomics • Comparative studies - saprophyte vs pathogen etc • Coordinate IRBGC and PGI etc • Complete nucleotide sequence, full length ESTs • Knock out/silence all genes • Transcriptional profiling in various backgrounds (path mutants) • Construct protein-protein linkage maps (signaling pathways) * The biologists view
Future Directions5 years* • Collaborative knowledge sharing • New data mining approaches • New ways of visualizing the information • In-silico experimentation • Gene knock outs • Regulatory modification • Pathway models • Cellular models * The bioinformaticians view
Finding solutions to practical problems • Seeking answers requires asking questions • Takes 1-2 weeks per question • BioGrid may give near real-time response • BioGrid will bridge the islands of capability • Focus resources back on our work • Consequently, we are going to further accelerate the rate of discovery