650 likes | 798 Views
UCSC Genome Browser Tutorial. http://genome.ucsc.edu/ http://genome-test.cse.ucsc.edu/ The UCSC Toolset & Portal to the Human Genome. Genome Browser Table Browser. “I was blind and now I can see”. UCSC Genome Browser. [version9a]. http://www.openhelix.com/downloads/ucsc/ucsc_home.shtml.
E N D
UCSC Genome Browser Tutorial http://genome.ucsc.edu/ http://genome-test.cse.ucsc.edu/ The UCSC Toolset & Portalto the Human Genome • Genome Browser • Table Browser “I was blind and now I can see” http://cs273a.stanford.edu
UCSC Genome Browser [version9a] http://www.openhelix.com/downloads/ucsc/ucsc_home.shtml http://cs273a.stanford.edu
The UCSC Homepage: http://genome.ucsc.edu navigate navigate General information Specific information— new features, current status, etc.
The Genome Browser Gatewaystart page choices, December 2006 3 2 1 • Make your Gateway choices: • Select Clade • Select species: search 1 species at a time • Assembly: the official backbone DNA sequence practically speaking, there is no such thing as a genome. there is only a genome assembly. assemblies update. frequently. think moving target...
Everything in Genomics is a Moving Target • The genomes • Their annotations • The Portals • Our understanding of Biology Conclusion: write code that can be run... and rerun and rerun and rerun and rerun
The Genome Browser Gatewaystart page choices, December 2006 6 4 5 • Make your Gateway choices: • Select Clade • Select species: search 1 species at a time • Assembly: the official backbone DNA sequence • Position: location in the genome to examine • Image width: how many pixels in display window; 5000 max • Configure: make fonts bigger + other choices
The Genome Browser Gatewaystart page, basic search Helpful search examples, suggestions below text/ID searches 4 • Use this Gateway to search by: • Gene names, symbols • Chromosome number: chr7, or region: chr11:1038475-1075482 • Keywords: kinase, receptor • IDs: NP, NM, OMIM, and more… • See lower part of page for help with format
The Genome Browser Gatewaysample search for Human TP53 select • Sample search: human, March 2006 assembly, tp53 • Select from results list • ID search may go right to a viewer page, if unique
Overview of the wholeGenome Browser page(mature release) Mapping and Sequencing Tracks Genes and Gene Prediction Tracks mRNA and EST Tracks Expression and Regulation Comparative Genomics ENCODE Tracks Variation and Repeats } Genome viewer section Groups of data
Different species, different tracks, same software • Species may have different data tracks • Layout, software, functions the same
Sample Genome Viewer image, TP53 region STS markers Known genes RefSeq genes GenBank seqs 17 species compared single species compared SNPs repeats base position
Visual Cues on the Genome Browser Tick marks; a single location (STS, SNP) < < < < < < < exon exon < exon ex 3' UTR 5' UTR Intron, and direction of transcription <<< or >>> Track colors may have meaning—for example, Known Gene track: • If there is a corresponding PDB entry, = black • If there is a corresponding NCBI Reviewed seq, = dark blue • If there is a corresponding NCBI Provisional seq, = light blue For some tracks, the height of a bar is increased likelihood of an evolutionary relationship (conservation track)
Options for Changing Images: Upper Section Walk left or right Zoom in Zoom out • Change your view or location with controls at the top • Use “base” to get right down to the nucleotides • Configure: to change font, window size, more… Specify a position fonts, window, more click to zoom 3x and re-center
Annotation Track display options Change track view • Some data is ON or OFF by default enforce changes Links to info and/or filters • Menu links to info about the tracks: content, methods • You change the view with pulldown menus • After making changes, REFRESH to enforce the change
Annotation Track options, defined • Dense: all items collapsed into a single line • Squish: each item = separate line, but 50% height + packed • Pack: each item separate, but efficiently stacked (full height) • Full: each item on separate line • Hide: removes a track from view
Reset, Hide, Configure or Refresh to change settings enforce any changes (hide, full, squish…) reset, back to defaults start from scratch • You control the views • Use pulldown menus • Configure options page
Annotation Track options, if altered….important point: the browser remembers! To clear your “cart” or parameters, click default tracks OR • Session information (the position you were examining) • Track choices (squish, pack, full, etc) • Filter parameters (if you changed the colors of any items, or the subset to be displayed) • …are all saved on your computer. When you come back in a couple of days to use it again, these will still be set. You may—or may not—intend this.
Click Any Viewer Object for Details Click the item New web page opens Many details and links to more data about TP53 Example: click your mouse anywhere on the TP53 line
Click annotation track item for details pages informative description other resource links links to sequences microarray data mRNA secondary structure protein domains/structure homologs in other species Gene Ontology™ descriptions mRNA descriptions pathways Not all genes have This much detail. Different annotation tracks carry different data.
Get DNA, with Extended Case/Color Options • Use the DNA link at the top • Plain or Extended options • Change colors, fonts, etc.
Get Sequence from Details Pages Click the line Click the item sequence section on detail page Click a track, go to Sequence section of details page
Accessing the BLAT tool • Rapid searches by INDEXING the entire genome • Works best with high similarity matches • See documentation and publication for details • Kent, WJ. Genome Res. 2002. 12:656 BLAT = BLAST-like Alignment Tool
BLAT tool overview: www.openhelix.com/sampleseqs.html • Make choices • Paste one or more sequences DNA limit 25000 bases Protein limit 10000 aa 25 total sequences • Or upload submit
BLAT results, with links • Results with demo sequences, settings default; sort = Query, Score • Score is a count of matches—higher number, better match • Click browser to go to Genome Browser image location (next slide) • Click details to see the alignment to genomic sequence (2nd slide) go to browser/viewer go to alignment detail sorting
BLAT results, browser link click to flip frame query • Watch out for reading frame! Click - - - > to flip frame • Base position = full and zoomed in enough to see amino acids • From browser click in BLAT results • A new line with your Sequence from BLAT Search appears!
BLAT results,alignment details Your query Genomic match, color cues Side-by-side alignment yours genomic
Understand Blat’s Limitation • Blat was designed to rapidly align sequence from onegenome back to itself (e.g., EST/cDNA data) • It can and it does miss clear hits at times • Blat actually allows for a single mismatch, but it alsoremoves k-mers with excessive counts for efficiency. • Not suitable for cross-species mapping.
Bibliography: • http://genome.ucsc.edu/goldenPath/pubs.html • The UCSC Genome Browser Database: update 2008, update 2007, and earlier. • UCSC Genome Browser Tutorial • UCSC Genome Browser: Deep support for molecular biomedical research • The UCSC Known Genes, 2006. • The UCSC Gene Sorter, 2007. • Piloting the Zebrafish Genome Browser, 2006.
UCSC Genome Browser [version9a]
Genome Browser Database visualize search & download Underlying Database (MySQL) Primary table: positions, names, etc. Auxiliary table: related data
The Table Browser Open browser Open browser http://genome.ucsc.edu/
Table Browser: Choose Genome Choose Genome In the Human genome (hg16), search for simple repeats on a chromosome 4 locationwith copy number more than 10and download the sequence. In the Human genome (hg16), search for simple repeats on a chromosome 4 locationwith copy number more than 10and download the sequence.
Table Browser: Choose Table to Search Choose Data Table In the Human genome (hg16), search for simple repeatson a chromosome 4 locationwith copy number more than 10and download the sequence.
Table Browser: Describe Table Describe table
Table Browser: Choose Region to Search Choose Region to Search In the Human genome (hg16), search for simple repeats on a chromosome 4 locationwith copy number more than 10and download the sequence.
Table Browser: Upload Locations to Search Paste Upload
Table Browser: Filter to Refine Search Create Filter Submit Filter In the Human genome (hg16), search for simple repeats on a chromosome 4 locationwith copy number more than 10and download the sequence.
Table Browser: Output Data Output data In the Human genome (hg16), search for simple repeats on a chromosome 4 locationwith copy number more than 10and download the sequence.
Table Browser: Output Formats Text Fields Output formats
Table Browser: Fasta Sequence Output Sequence
Table Browser: Custom Track Output Custom Track
Table Browser: Hyperlinks Output Hyperlinks
Table Browser: Obtaining Output Adding name creates file on desktop, leaving blank creates output in browser. (exception: custom track) Data Summary
Table Browser: Output configuration Sequence Format Get Sequence
Table Browser: Intersecting Data 2nd Table Any Overlap Intersect Submit Find simple repeats (copy number > 10)within known genes and download the sequence.
Table Browser: Intersecting Data Narrows Search Filtered simple repeats Filtered simple repeats, intersected (overlapping) w/ known genes Summary