1 / 26

Rice Sequence and Map Analysis Leonid Teytelman

Rice Sequence and Map Analysis Leonid Teytelman. Rice Genome Annotation Sequence Alignments Automation Comparative Maps Genetic Marker Correspondences FPC Map FPC I-Map EnsEMBL Pipeline Automated Annotation Compute Farms. Rice Genome Annotation. Aligned Data Sets:.

bgordon
Download Presentation

Rice Sequence and Map Analysis Leonid Teytelman

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Rice Sequence and Map Analysis Leonid Teytelman

  2. Rice Genome Annotation • Sequence Alignments • Automation • Comparative Maps • Genetic Marker Correspondences • FPC Map • FPC I-Map • EnsEMBL Pipeline • Automated Annotation • Compute Farms

  3. Rice Genome Annotation

  4. Aligned Data Sets: • Rice Coding Sequences • Rice Complete CDSs • Rice TIGR GIs • Rice BGI EST Clusters • Rice dbEST ESTs • Rice BGI ESTs • Non-Rice Coding Sequences • Maize Unigene Clusters • Maize TIGR GIs • Maize dbEST ESTs • Barley dbEST ESTs • Wheat dbEST ESTs • Sorghum dbEST ESTs Rice CUGI BAC ends Rice JRGP/Cornell RFLP Markers Rice Cornell SSRs

  5. Alignment Tools: Target Queries • BLAT: search & alignment • pslReps: filtering of low-quality matches • e-PCR: matches based on near-identity to the PCR primers, and correct order

  6. Alignment Tools: • BLAT: search & alignment • pslReps: filtering of low-quality matches • e-PCR: matches based on near-identity to the PCR primers, and correct order Target Target Queries

  7. Alignment Methods: • Rice Coding Sequences: • BLAT search & alignment • pslReps filtering of repetitive matches • Accept based on percent of EST length matched • Non-Rice Coding Sequences : • BLAT search & alignment • pslReps filtering of repetitive matches • Accept based on hit length and hit frequency • Rice BAC ends: • BLAT search & alignment • Accept based on gap length, percent of BAC end length matched, percent identity, and hit frequency.

  8. Alignment Methods: • Rice Markers: • BLAT search & alignment • Accept based on percent of marker length matched and the gap length in case of genomic markers. • Utilize genetic map information; accept those whose genetic & physical chromosome assignment is concordant. • Rice SSRs: • e-PCR with default parameters, allowing 0 mismatches in the primers

  9. February 2002 BAC/PAC Dataset Total BACs/PACs: 1,847 Total bp: 250,879,896 (250MB ) Phase 1: 78 Phase 2: 1,238 Phase 3: 531 Annotated Phase 3: 330 Annotated Genes: 8,034

  10. Alignment Totals

  11. Automating Alignments: • For each group of data sets, there is a script to automatically: • Run pslReps • Load results into the database • Discard low-quality matches • Update documentation

  12. Comparative Maps

  13. Map Correspondences Same marker on multiple mapping studies • Name-identity • Curated evidence • Sequence-based correspondences for JRGP and Cornell markers: • BLAT search & alignment • Utilize genetic mapping information, accepting matches on same chromosome and less than 30cM apart.

  14. curator same name sequence-based

  15. same name curator

  16. FPC data from CUGI, synchronized with the latest release.

  17. Discordant

  18. Cornell/JRGP markers mapped to sequenced clones were assigned positions on the FPC contigs.

  19. Total: 2,272 4,417

  20. EnsEMBL Pipeline in a Nutshell

  21. EnsEMBL Pipeline Overview RepeatMasker Genscan Blast GenomeBuilder Hmmer RepeatMasker BLAT GeneWise Hmmer • System for automated genome annotation • Executes and keeps track of computational jobs • Analysis job execution is serial, allowing stage dependencies • Jobs are user-defined • Can take advantage of a compute farm

  22. Organization • Utilizes and expands on the EnsEMBL-core modules and database schema • Database stores: • analysis program names and parameters • analysis results • rules for job dependencies • and progress status for each job • Perl modules: • access the database • execute specified analysis programs • parse and load into the database the analysis results

  23. Cluster Utilization • How to split up tasks? • Contig-by-contig approach • How to execute jobs on slave nodes? • Load management an scheduling (LSF, PBS, etc) • Management of management: • Automatic job submission • Error/completion checking

More Related