450 likes | 849 Views
Why genomics?. Genomics represents a complete change in the way we are able to think about the life sciences. Genomics enables rapid and efficient discovery of important genes related to commodity quality and improvement.
E N D
Why genomics? • Genomics represents a complete change in the way we are able to think about the life sciences. • Genomics enables rapid and efficient discovery of important genes related to commodity quality and improvement. • Genomics approaches provide the ability to look at complex traits and pathways.
What is genomics? Genomic approaches include: • Structural • DNA sequence (genomic, cDNA) • Molecular mapping (AFLP, microsatellite, RFLP, etc.) • Genotyping and fingerprinting • Functional • Gene expression analysis (RNA, proteins) • Gene function analysis (knockouts, mutations, biochemical assays) • Gene interactions • Bioinformatics • Compilation and analysis of collected data
What is genomics? • Genomic science is the industrialization of molecular biology to address complex biological questions. • It is the integration of biology, engineering, and statistics to solve the sequence of a complex genome and then mine the sequence data to obtain biological insights. • Although DNA sequence is central to genomics, it is simply the starting point for large-scale genome analysis.
The Power of Genomics Organism Genome Size # Genes Year FX174 5,400 bp 10 1977Tobacco mosaic virus 6,300 bpp 4 1982 Smallpox virus 185,000 bp 200 1993Escherichia coli 4,600,000 bp 4,390 1997Saccharomyces cerevisiae 12,100,000 bp 6,000 1996Caenorhabditis elegans 100,000,000 bp 20,000 1998Homo sapiens 3,000,000,000 bp 23,000 2001 Arabidopsis thaliana 125,000,000 bp 25,000 2000Oryza sativa (Rice)466,000,000 bp 32-50,000 2002 Nicotiana tabacum 4,500,000,000 bp 36,263 Triticum aestivum 16,000,000,000 bp 40-80,000
World Meat Production Actual and projected world meat production, (World Agricultural Outlook, The Food and Agricultural Policy Research Institute, Iowa State University) Developing World Wants More Protein
1 Acre in Iowa $300 MM Fermentation Facility Therapeutic Proteins or
What improvements should be targeted in crop improvement research?
Improvements Needed • Even with modern breeding technologies, and agrochemicals, significant yield loss still occurs. • Average yields are merely 21.6 % of record yields. • What causes the 78.4% loss in yield? J.S.Boyer “Plant Productivity and Environment,” Science, Vol. 218, October 29, 1982 pp. 443-448.
69.1% of record yields are lost to environmental stress, while a mere 9.3% are lost to diseases, insects, and weeds combined. Improvements Needed J.S.Boyer “Plant Productivity and Environment,” Science, Vol. 218, October 29, 1982 pp. 443-448.
Drought, cold and excess water account for largest insurance indemnities and largest area of affected land in the United States.. Improvements Needed
Most existing biotech products address losses due to weeds and insects. Drought, cold and flooding have been ignored. Yet annual Biotech revenues for corn and soy exceeded $2.0 BN in 2001. Biotech Sales * All dollar figures are for 1999
“Wimpy” domesticated tomato Weeds and Primitive Crops Have Advantageous Traits • Ancient farmers selected for major, “visibly” desirable traits through hand selection over thousands of years (domestication) • Modern breeding improved these traits in high input environments “Hardy” wild tomatoes • Primitive crops and weeds retain “invisible” traits that confer hardiness under disadvantageous conditions
How to Improve Crops… • Genomics allows the identification of the genes and gene networks responsible for hardiness… • paving the way for the reintroduction of hardiness into crops or other target species.
Why sequence plant genomes? • Basis for the world’s food supply, which fails to keep up with demand. • Basic model species for understanding general biology. • Complete genome sequences will enable us to improve and enhance desirable traits in cultivated plants and to limit expression of undesirable traits.
Detailed questions that may be approached through a genomics approach include: • Metabolism/secondary product biosynthetic pathways • Stress response • including pests, pathogens, abiotic and water/nutritional • Growth/development • including protein/oil content, flowering, maturity
What are the advantages of obtaining a complete inventory of plant coding sequences? • Gene discovery/novel sequences • Promoters/control of gene expression • Microarray analysis/expression profiling • Biochemical pathways • Intellectual property
Moss Rice Key: Sorghum Arabidopsis Clover Human Tomato Soy Canola Potato Lolium Corn Tobacco Wheat Plant Genomes Can Be Larger Than The Human Genome Relative Genome Sizes
Arabidopsis Human Moss Rice Tomato Soy Canola Potato repetitive “junk” DNA Grass Corn valuable genespace Wheat A small Portion of the Genome Comprises Genes Plant Genome Composition: Junk vs. Genes
ESTs: Taking Advantage of the Cell to Sequence the Genes Central Dogma: DNA → RNA → Protein EST: Expressed Sequence Tag
Guard cell specific expression MADS box gene AGL8 expression is found in the carpel wall and inflorescence meristem… Root columella specific expression The Problem: ESTs Miss Rarely Expressed Genes
How did ESTs do? C. elegans 19,099 A. thaliana 25,498 H. sapiens 31,778 Organism # Genes # ESTs EST matches 109,000 ~40% >113,000 ~60% >3,000,000 ~60% An incomplete and confusing Picture
Plant Genome Sequencing Project Mapping Chromosome Fingerprint analysis identifies BAC, PAC clones. BAC Library Core 1.5 Kb M13 insert 3 Kb plasmid insert 9x coverage; random reads with both dye-labeled primer and terminator. Production Reads called using Phred and Asp. Assembed with Phrap. 3 1 2 4 Prefinishing
Reads called using Phred and Asp. Assembed with Phrap. 3 1 2 4 2 3 4 Mapped contigs using PCR array and RP sequencing. Prefinishing 1 Sequence edited and gaps closed using primer walking and dye terminator sequencing. 1 Finishing Quality check by PCOP programs. Assembly check using digest data. Gene homology search using BLAST programs. Clone Analysis
A whole-genome clone-based map • Eases selection of clones for sequencing • Critical to accurate sequence assembly and alignment • Allows identification of repeat regions (non-coding) • BAC ends, ESTs and fragment analysis based • A critical component of the methyl filtration strategy
Create a BAC Library • Bacterial Artificial Chromosome • Key reagent for any major sequencing project • A typical BAC contains 150,000 bp of DNA • BAC library replicated and spotted onto filters
A BAC Library Sampling of BAC clones 150 kb 150 kb 100 kb 100 kb 50 kb 50 kb 7.4 kb vector 7.4 kb vector NotI Digested
a BAC clone map Physical Mapping genome • Get a set of large clones (BAC, ~150 Kbp) • Map them on the genome • Sequence a minimum tiling subset
Mapping clones to a genome Step 1: BAC clones are assigned to a chromosomal position by anchoring to ESTs and genetic markers.
Mapping clones to a genome A B C D E F G clone Step 2: Clones are cut with restriction enzymes, and contigs assembled by identifying similarities in the fragment patterns, also known as fingerprinting.
Xba I TC AGATCT Xho I CT GAGCTC Eco RI GA CTTAAG Bam HI GG CCTAGG BAC Fingerprinting
Mapping clones to a genome Step 3: Minimally overlapping clones are selected to create a tiling path for sequencing. The sequence from the chosen set of clones will represent the genomic segment to which they were mapped.
Methyl-threshed probe Identifies BAC EST probe Identifies BAC genome Identifying Gene Rich BACs Gene rich BAC clones are identified by hybridizing with ESTs or methyl clones. These and the surrounding BACs are sequenced.
What are the advantages of obtaining a complete inventory ofplant coding sequence? • Gene discovery/novel sequences • Promoters/control of gene expression • Microarray analysis/expression profiling • Biochemical pathways • Intellectual property
Detailed questions that may be approached through a genomics approach include: • Metabolism/secondary product biosynthetic pathways • Stress response • including pests, pathogens, abiotic and water/nutritional • Growth/development • including suckering, flowering, maturity
Arabidopsis contains 41 described gene families 14-3-3 family Kinesins ABC superfamily Lipid metabolism ABC transporters Major intrinsic protein AAAP family Miscellaneous Antiporters MYB Aquaporins Myosins Calcineurin-like B calcium sensors NADPH P450 reductases CHO esterase Nodulin-like CBL-int. S-T Pkases Org. solute co-transporters CW biosynthesis Phospholipase D Chor./Mit. Polysaccharide lyase Cyt P450 Pollen coat proteome Cyt. B5 Primary pumps (ATPases) Cytoskeleton Receptor kinase-like Euk. init. Factors SNARE interacting prot. Expansins Other SNARES Glycoside hyhydrolase Syntaxins Glycosyltransferase Trehalose biosynthesis Hsfs WRKY transcription factors Inorg. solute co-trans. Xyloglucan fucosyltransferase Ion channels
The tomato and potato genomes are very similar Potato and tomato are highly syntenic Same chromosome number: 12 Diploid genome size: ~900 Mb Colinear chromosomes, only 5 inversions Share >99% of their genes Can produce viable, though sterile interspecific F1s
Chr. 1 3 3 4 3 Chr. 2 3 10 12 12 Chr. 3 2 Tomato-pepper co-linearity is dispersed
1-2% of plant genomes are composed of R gene clusters Arabidopsis ~ 200 genes Rice ~1000 genes R gene for RKN