90 likes | 110 Views
Learn to identify unknown genes using genome browsers. Explore gene annotation, sequence analysis, and comparative genomics. Discover tools to analyze gene models and predicted sequences. Practice with real-life examples to enhance your bioinformatics skills.
E N D
Bioinformatics Workshop 2Identifying Unknown Genes … • Open a web browser and type in the URL: • informatics.gurdon.cam.ac.uk/online/workshops • Bookmark this page • Click on the link to the file: • useful-websites.html • Bookmark this page too • It also contains links to the example sequence files used in the workshop, and the presentations themselves
Genome Browsers Now that most model organisms have had their genomes sequenced, we can get a lot more information about how the gene works, than by just doing a BLAST search against the protein databases. Even if ‘your’ favourite genome is still just in ‘scaffolds’ and not yet assembled into chromosomes, we can still add a lot of value. The main tasks that one does to a genome before releasing it to the user community is to annotate it. In practice this means adding gene models, based on known expressed sequences, both in the same organism and other fairly closely related ones, and possibly also purely predicted ones based on sequence composition analysis and ‘features’ like start and stop codons, and splice sites. And then known mapping markers, SNPs, etc, etc. With ~3,000,000,000 nucleotides in the genome sequence (human) this present a considerable challenge to display on a web browser page, which is of course the preferred option. Most genome browsers (software designed to display genome based data in a web broswer) have taken roughly the same approach, which we’ll take a quick look at…
Gene model gene model genome Aligned cDNA Aligned ESTs
24000 25000 27000 26000 + navigate zoom - Schematic Genome Browser Mus musculus, chromosome 12 genome TRACKS Your sequence Genes ESTs
Exercises 1. Find the web site for the Santa Cruz Genome Browser (sometimes called the Golden Path), and investigate the three genes for which you have the full length cDNA sequence, or the protein sequence, in the file example-sequences.html >TNeu084i05 How many exons does the gene appear to have? Has it been mapped already? Are there any likely upstream regulatory elements (look for conservation across species)? Are there other genes near by? >TGas122d03 Is this a relatively unique gene, or a member of a gene family? What can we learn from the comparison with human genes? Are there any differences between the gene model predicted from your cDNA, and the existing predictions? >hsp70-5 Starts with the protein sequence. How might this be better?
Exercises 2. Now go to the two other main genome browsers, Ensemble and NCBI – find the Xenopus genome, and see if you get the same sort of functionality from them. Use the same two sequences. Are there different features? Are they easier/harder to use?