320 likes | 827 Views
Assembling a shotgun sequenced BAC clone from Anopheles funestus genome. by Irene Kasumba, Faruck Morcos, and Jeffrey Spies Bioinformatics Computing University of Notre Dame. Goal of Project. Gene annotation of a BAC clone from the newly sequenced An. funestus genome.
E N D
Assembling a shotgun sequenced BAC clone from Anopheles funestus genome by Irene Kasumba, Faruck Morcos, and Jeffrey Spies Bioinformatics Computing University of Notre Dame
Goal of Project Gene annotation of a BAC clone fromthe newly sequenced An. funestus genome. University of Notre Dame Bioinformatics Computing
Genetic engineering/recombinant DNA technology: Methods developed to study genes in detail GENE CLONING Isolating a gene and producing many identical copies of it so that it can be studied in detail. CLONE GENE INTO A VECTOR University of Notre Dame Bioinformatics Computing
Vector • A vehicle to transport DNA into a host cell (bacteria) and replicate DNA. • Eg. Plasmid and bacteriophages occur as natural circular DNA in bacteria • Vectors have: • An origin of replication • An antibiotic resistance gene • A selectable marker University of Notre Dame Bioinformatics Computing
Cloning and Transformation University of Notre Dame Bioinformatics Computing
BAC Clone Assembly Original DNA 150kb BAC clone (1 contig) Too big to be sequenced Break BAC into random fragments (8-10x coverage) University of Notre Dame Bioinformatics Computing
BAC Clone Assembly Fragments differ in size (2-3kb) are sub cloned into a vector 3 1 2 Recombinant vector DNA is isolated from bacteria, then 600 bp from each end is sequenced Total of about 1760 clones were sequenced from BAC clone University of Notre Dame Bioinformatics Computing Slightly modified from Neil Lobo ppt
Sequence using plasmid specific primers Forward primer Reverse primer Plasmid vector pHos2 University of Notre Dame Bioinformatics Computing Slightly modified from Neil Lobo ppt
1. Clip vector sequence from fragments Obtained FASTA FILE with 1760 sequences Clip the vector sequence – PHRAP or local alignment University of Notre Dame Bioinformatics Computing Slightly modified from Neil Lobo ppt
2. Assemble sequence fragments Tool used: PHRAP University of Notre Dame Bioinformatics Computing
3. Blast assembled sequence • Purpose: • Select the actual An. funestus sequences • How: • Blast (nr) all assembled sequences and eliminate non-mosquito sequences (i.e. human, vector, bacteria, etc.) • Which is An. funestus? Possibly unknown Blast result, probably the longest sequence because of 8 to 10x coverage University of Notre Dame Bioinformatics Computing
4. Gene prediction • GENSCAN • http://genes.mit.edu/GENSCAN.html • Change “Print options” to “Predicted CDS and peptides” • Fgenesh • http://www.softberry.com/berry.phtml • Select human, Drosophila and An. gambiae • GeneID • http://www1.imim.es/geneid.html • Select human and Drosophila University of Notre Dame Bioinformatics Computing
GENSCAN University of Notre Dame Bioinformatics Computing
Fgenesh University of Notre Dame Bioinformatics Computing
GeneID University of Notre Dame Bioinformatics Computing
GeneID ## source-version: geneid v 1.2 -- geneid@imim.es # Sequence AF1B_consensus_seq10_ctg3 - Length = 92604 bps # Optimal Gene Structure. 15 genes. Score = 66.16 # Gene 1 (Reverse). 1 exons. 78 aa. Score = 0.58 AF1B_consensus_seq10_ctg3 geneid_v1.2 Single 1308 1541 0.58 - 0 AF1B_consensus_seq10_ctg3_1 # Gene 2 (Forward). 3 exons. 162 aa. Score = 0.96 AF1B_consensus_seq10_ctg3 geneid_v1.2 First 2471 2684 -2.23 + 0 AF1B_consensus_seq10_ctg3_2 AF1B_consensus_seq10_ctg3 geneid_v1.2 Internal 4590 4803 3.53 + 2 AF1B_consensus_seq10_ctg3_2 AF1B_consensus_seq10_ctg3 geneid_v1.2 Terminal 9949 10006 -0.33 + 1 AF1B_consensus_seq10_ctg3_2 # Gene 3 (Forward). 3 exons. 297 aa. Score = 5.97 AF1B_consensus_seq10_ctg3 geneid_v1.2 First 11182 11564 4.65 + 0 AF1B_consensus_seq10_ctg3_3 AF1B_consensus_seq10_ctg3 geneid_v1.2 Internal 15006 15360 0.25 + 1 AF1B_consensus_seq10_ctg3_3 AF1B_consensus_seq10_ctg3 geneid_v1.2 Terminal 15421 15573 1.08 + 0 AF1B_consensus_seq10_ctg3_3 # Gene 4 (Reverse). 5 exons. 314 aa. Score = 5.72 AF1B_consensus_seq10_ctg3 geneid_v1.2 Terminal 22289 22526 3.12 - 1 AF1B_consensus_seq10_ctg3_4 AF1B_consensus_seq10_ctg3 geneid_v1.2 Internal 23735 23882 -0.43 - 2 AF1B_consensus_seq10_ctg3_4 AF1B_consensus_seq10_ctg3 geneid_v1.2 Internal 31511 31568 1.38 - 0 AF1B_consensus_seq10_ctg3_4 AF1B_consensus_seq10_ctg3 geneid_v1.2 Internal 37306 37576 2.40 - 1 AF1B_consensus_seq10_ctg3_4 AF1B_consensus_seq10_ctg3 geneid_v1.2 First 39378 39604 -0.74 - 0 AF1B_consensus_seq10_ctg3_4 # Gene 5 (Forward). 2 exons. 133 aa. Score = 2.22 AF1B_consensus_seq10_ctg3 geneid_v1.2 First 4089241118 1.49 + 0 AF1B_consensus_seq10_ctg3_5 . # Gene 15 (Reverse). 1 exons. 42 aa. Score = 0.47 AF1B_consensus_seq10_ctg3 geneid_v1.2 Terminal 9195292077 0.47 - 0 AF1B_consensus_seq10_ctg3_15 University of Notre Dame Bioinformatics Computing
5. Visualize overlap and select best predictions Use Wormbase to visualize overlap between predictions made by the different gene prediction programs: http://wormbase.org/db/seq/frend Parser: http://www.nd.edu/~jspies/bio/ University of Notre Dame Bioinformatics Computing
Wormbase Visualization University of Notre Dame Bioinformatics Computing
6. Select “best” predictions University of Notre Dame Bioinformatics Computing
7. Blast predictions • Use Ensembl and NCBI • Blast proteins • nr,Drosophila, An. Gambiae • Use conservative scoring matrices (Blosum90) for within species Ensembl Blasts University of Notre Dame Bioinformatics Computing
Gene Identity Determination Determine the identity/putative function of predicted genes in order to annotate possible genes in An. funestus University of Notre Dame Bioinformatics Computing
Predicted Gene 12 Ensembl University of Notre Dame Bioinformatics Computing
Ensembl (Dr) University of Notre Dame Bioinformatics Computing
Ensembl Chromosome View (Dr) University of Notre Dame Bioinformatics Computing
Ensembl (Ag) University of Notre Dame Bioinformatics Computing
Ensembl Chromosome View (Ag) University of Notre Dame Bioinformatics Computing
Blast Conserved Domains Uknown, but predicted gene gnl|CDD|16610 pfam00078, RVT, Reverse transcriptase (RNA-dependent DNA polymerase). University of Notre Dame Bioinformatics Computing
3D Structure of RVT University of Notre Dame Bioinformatics Computing
Blast Hits gi|51950578|gb|AAA70222.2| putative ORF2 [Drosophila melanogaste 263 6e-68 gi|6635955|gb|AAF20019.1| pol-like protein [Aedes aegypti] 261 1e-67 gi|11323019|emb|CAC16871.1| pol [Drosophila melanogaster] 251 2e-64 University of Notre Dame Bioinformatics Computing
Conclusions • Importance of bioinformatics tools in prediction and annotation of genes in a newly sequenced genome (e.g. An. Funestus) • Imperative to perform gene prediction using various programs - provides more credible biological insight University of Notre Dame Bioinformatics Computing
Thanks to Neil Lobo. University of Notre Dame Bioinformatics Computing