200 likes | 302 Views
Some Jolly Fun with Barley ESTs. David Marshall & All the Folks in Computational Biology. BLAST for Recognition of Undesirable Clones. Summary of 84 Barley Libraries (ver. 0.90) # . % High quality sequences 282,720 E. coli genome 507 0.18
E N D
Some Jolly Fun with Barley ESTs David Marshall & All the Folks in Computational Biology
BLAST for Recognition of Undesirable Clones Summary of 84 Barley Libraries (ver. 0.90) # . % High quality sequences 282,720 E. coli genome 507 0.18 Lambda genome 39 0.01 rRNA 6,075 2.15 Chloroplast 2,664 0.94 Mitochondrion 204 0.07 Fungal cDNA 366 0.13 Repetitive Elements 289 0.10 Low complexity 1,194 0.42 Odd vector 37 0.01 Both polyA & polyT 28 0.01 Total Good 271,317 96.0
Unigenes in ESTs in Current Assembly Ideally: one “unigene” per gene in the genome, expecting ~50,000 based on rice. Maximum unigene count in ESTs: the sum of the number of contigs and singletons following assembly: Contigs 24,208 Singletons 24,899 Total 49,107 Minimum unigene count in ESTs: the sum of the number of contigs and singletons that have good 3’ ends: Contigs 14,589 Singletons 7,219 Total 21,880
The Immediate Objective Microarray Chip Gene Expression Data http://www.affymetrix.com/
Barley 2HSteptoe x Morex Hvcal1 Hvcal2 Oscal1 Oscal2 BAC OSJB0004 Rice R4 Gene Map Barley 2H Caleosins 77cM <0cM> EST alignment EST alignment < 8kb > 0cM 78.2cM
TIGR Rice Caleosin Gene Models OSCal01(R4) OSCal02(R4) OSCal03(R3)
Homology of Wheat G3 Deletion line mapped ESTs to Rice Chromosomes
General Comclusions • EST sequence • May lack polyA • Reading frame may be ambiguous • Exon/intron boundaries may not be obvious • We don’t have all barley genes despite >330,000 ESTS. (probably between 33% to 50%. • Value of comparative studies with rice • BUT poor annotation (actually appalling) • Rice genomic sequencing is work in progress • Comparative route is OK but can’t be only game in town. Several examples of genes not being there !!!
Major Issues • Data validation • Errors in public database sequence • Errors in annotation • ‘Chinese whispers’ – anchoring annotation in biochemistry • Comparative Data • Rice > wheat > maize – but also Arabidopsis • When is homology actually orthology ? • Partial data sets • % match only part of the story • Need for domain/feature information – mammalian/bacterial bias • Everything in work in progress ? • Where are the data sources • dbEST • Nr nucleotide database at NCBI • Gramene at CSHL • TIGR • GrainGenes/wEST at USDA, Albany • CUGI > AGI • Iowa State/USDA • Harvest/Foxpro • ContEST at SCRI • The horses mouth
Phenotype <-> Sequence • Sd1 – green revolution gene in rice. Mutation in gibberellin-20 oxidase (plant hormone production pathway) one member of a small gene family other members have subtely different pattern of expression able to partially compensate for mutation. • Rht1 – green revolution gene in wheat. Mutation in receptor response pathway. Copies in all 3 wheat genomes • Barley - commercially significant dwarfs from both of these and several other pathway or response genes.
Acknowledgements • Robbie Waugh • Peter Hedley, • David Caldwell, • Luke Ramsay, • Hui Liu • Linda Cardle • Paul Shaw • Arnise Druker • Doreen Ware • Dave Mathews • Tim Close • Olin Anderson