550 likes | 683 Views
For Bioinformatics. , Start with:. Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence. carry out dideoxy sequencing. connect seqs. to make whole chromosomes . find the genes!. The Human Genome. E. coli Genome. Reading:. DNA target sample. SHEAR.
E N D
For Bioinformatics , Start with: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes find the genes!
The Human Genome E. coli Genome
Reading: DNA target sample SHEAR Reads LIGATE & CLONE Primer SEQUENCE Vector Shotgun DNA Sequencing of whole genome (WGS)
Assembly: The challenge of eukaryotic genomes E. coli Genome 4 million bp The Human Genome 3 billion bp 50% of genome is repeat sequences!
Assembly of sequence of each chromosome from end to end END, Jan 14 begin
Annotation: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence Robotically do dideoxy-dye data collection Whole genome shotgun OR Ordered clones find the genes !
Annotation: 10/1/5 Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence find the genes ! • ab initio • by evidence
Annotation: For Bacterial genomes, ab initio is adequate ab initio: “from the beginning” יש מאין from first principles… ORFs are MOST of prokaryotic genome
Annotation: ab initio – finding ORFs • -85-88% of the nucleotides are associated with coding sequence • in the bacterial genomes that have been completely sequenced. • example: in Escherichia coli there are 4288 genes that • have an average of 950 bp of coding sequence • and are separated by an average of just 118 bp. So first, to find genes in prokaryotic DNA, search for ORFs!!
Annotation: ab initio – finding ORFs • -85-88% of the nucleotides are associated with coding sequence • in the bacterial genomes that have been completely sequenced. • example: in Escherichia coli there are 4288 genes that • have an average of 950 bp of coding sequence • and are separated by an average of just 118 bp. So first, to find genes in prokaryotic DNA, search for ORFs!!
Annotation: ab initio – beyond ORFs beyond ORFs: • -Prokaryotes have short, simple promoters that are • easy to recognize • -Transcriptional terminators often consist of short inverted • repeats followed by a run of Ts. • -Therefore, programs that find prokaryotic genes search for: • ORFs 60 or more codons long –and codon usage • promoters at the 5' end • Terminators at the 3' end • Homology to known genes from other prokaryotes • Shine-Dalgarno sequences • `
Annotation: ab initio – automated Prokaryotic gene finder examples Glimmer- Interpolated Markov Model method GrailII- Neural Network method (See BioInfo text – Fig 8.8)
Annotation: results
Annotation: Multicellular eukaryotes Done too 10/1/5
Annotation: Multicellular eukaryotes Done too 10/1/5
Annotation: Multicellular eukaryotes Done too 10/1/5
Annotation: 2 ways to annotate eukaryotic genomes: -ab initio gene finders: Work on basic biological principles: Open reading frames Codon usage Consensus splice sites Met start codons ….. -Genes based on previous knowledge….EVIDENCE -cDNA sequence of the gene’s message -cDNA of a closely related gene’ message sequence -Protein sequence of the known gene Same gene’s Same gene’s from another species Related gene’s protein……. -ab initio gene finders: Work on basic biological principles: Open reading frames Codon usage Consensus splice sites Met start codons ….. Genes based on previous knowledge-EVIDENCE -cDNA sequence of the gene’s message -cDNA of a related gene’s message seq. -Protein sequence of the known gene Same gene’s Same gene’s from another species Related gene’s protein…….
start and stop site predictions Unique identifiers Splice site predictions Homology based exon predictions computational exon predictions Tracking information Consensus gene structure (both strands)
Automatically generated annotation
A zebrafish hit shows a gene model protein encoded by a 6 exon gene. This gene structure (intron/exon) is seen in other species, as is the protein size. The proteins, if corresponding to MSP in S. gal., must be heavily glycosylated (likely). At least some have a signal peptide.
The zebrafish hit can be viewed down to nucleotide resolution
Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing , 700 bp each read, MAX connect seqs. to make whole chromosomes
Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes find the genes!
Annotation: End Reads (Mates) Primer SEQUENCE cDNAs & ESTs: Expressed Sequence Tags RNA target sample cDNA Library Each cDNA provides sequence from the two ends – two ESTs
Who Gets Sequenced? Models Pathogens Agriculturals
Protein Structure Database See Swiss-pdb viewer