Unraveling Genes: Genome Browsers & Unknown Gene Identification

Bioinformatics Workshop 2Identifying Unknown Genes … • Open a web browser and type in the URL: • informatics.gurdon.cam.ac.uk/online/workshops • Bookmark this page • Click on the link to the file: • useful-websites.html • Bookmark this page too • It also contains links to the example sequence files used in the workshop, and the presentations themselves

Genome Browsers Now that most model organisms have had their genomes sequenced, we can get a lot more information about how the gene works, than by just doing a BLAST search against the protein databases. Even if ‘your’ favourite genome is still just in ‘scaffolds’ and not yet assembled into chromosomes, we can still add a lot of value. The main tasks that one does to a genome before releasing it to the user community is to annotate it. In practice this means adding gene models, based on known expressed sequences, both in the same organism and other fairly closely related ones, and possibly also purely predicted ones based on sequence composition analysis and ‘features’ like start and stop codons, and splice sites. And then known mapping markers, SNPs, etc, etc. With ~3,000,000,000 nucleotides in the genome sequence (human) this present a considerable challenge to display on a web browser page, which is of course the preferred option. Most genome browsers (software designed to display genome based data in a web broswer) have taken roughly the same approach, which we’ll take a quick look at…

Gene model gene model genome Aligned cDNA Aligned ESTs

24000 25000 27000 26000 + navigate zoom - Schematic Genome Browser Mus musculus, chromosome 12 genome TRACKS Your sequence Genes ESTs

How to Use UCSC Browser

Exercises 1. Find the web site for the Santa Cruz Genome Browser (sometimes called the Golden Path), and investigate the three genes for which you have the full length cDNA sequence, or the protein sequence, in the file example-sequences.html >TNeu084i05 How many exons does the gene appear to have? Has it been mapped already? Are there any likely upstream regulatory elements (look for conservation across species)? Are there other genes near by? >TGas122d03 Is this a relatively unique gene, or a member of a gene family? What can we learn from the comparison with human genes? Are there any differences between the gene model predicted from your cDNA, and the existing predictions? >hsp70-5 Starts with the protein sequence. How might this be better?

Exercise 1. Results >TNeu084i05

Exercises 2. Now go to the two other main genome browsers, Ensemble and NCBI – find the Xenopus genome, and see if you get the same sort of functionality from them. Use the same two sequences. Are there different features? Are they easier/harder to use?

Unraveling Genes: Genome Browsers & Unknown Gene Identification

Unraveling Genes: Genome Browsers & Unknown Gene Identification

Presentation Transcript

Identifying essential genes in M. tuberculosis by random transposon mutagenesis

Finding Disease Genes

Canadian Bioinformatics Workshops

Cluster analysis  Function

C-MORE user network to share bioinformatics skills and experience

Predicting essential genes via impact degree on metabolic networks

Phenologs An example of using bioinformatics to find new genes for genetic traits

Identifying Genes and Defining Alleles

Bioinformatics Workshop

The Living Matrix Workshop Genes

Identifying genes for changes in root architecture under water stress

Identifying the needs of unknown users

2008 Summer Bioinformatics Workshop June 30–August 7, Mon- Thur Afternoon Session Description

Bioinformatics

CISC 667 Intro to Bioinformatics (Spring 2007) Whole genome sequencing

e-BioLab A Bioinformatics Problem Solving Environment

Graph mining in bioinformatics

Guan N. Lin (Nick) Bioinformatics Intern

Ccl2, Ccl7, Pai2, Il1Rl1 Dusp6, Eps8, Ngef, Itga6, Cd44, F3, Mmp3, Hmga1

RNA Bioinformatics Genes and Secondary Structure