350 likes | 534 Views
Cotton Marker Database (CMD) for Genetic And Genome Research www.cottonmarker.org Anna Blenda, Pengfei Xuan, David Camak, Feng Luo, Don Jones (presenting author) ICGI-2010, Canberra, Australia. CMD Objectives.
E N D
Cotton Marker Database (CMD) for Genetic And Genome Research www.cottonmarker.orgAnna Blenda, Pengfei Xuan, David Camak, Feng Luo, Don Jones (presenting author)ICGI-2010, Canberra, Australia
CMD Objectives • Collect and integrate all public cotton molecular markers (SSRs and SNPs) as a cotton community resource. • Accelerate utilization of molecular markers in cotton breeding. • Provide data retrieval and search tools. • Provide stand-alone data mining tools. • Facilitate collaboration domestically and internationally • The CMD database resourse is available for deposit and data mining of the cotton genomes sequencing data www.cottonmarker.org
CMD New Features 1.Primer Redundancy 2. Traits 3. Updated Cmap Viewer with QTLs 4. New System Platform - powerful computing 5. Future/Work in Progress www.cottonmarker.org
1. Primer Redundancy View section: SSR Projects SNP Projects SSRs Markers Homology SSR Primers Primer Redundancy Panel Publications Maps www.cottonmarker.org
Importance of the Primer Redundancy Check: • Initial step in the analysis of the CMD cotton SSRs collection redundancy. • Avoidance of generating marker redundancy. • Financial component is critical (spending money for non-redundant SSR markers only). • Direct effect on the efficiency of the molecular breeding research.
Primer Redundancy Summary Page: • - 18,002 primer sequences analyzed; • 2,570 (14.2%) redundantprimer sequences; • Types of primer sequence match: • forward-forward; • reverse-reverse; • forward-reverse; • reverse-forward. www.cottonmarker.org
Threshold value for primer sequence match: 81% • The threshold value (81% or higher forsequence match): • chosen based on the threshold value analyses (from 70% to 100% match); • - below 81% match primer redundancyincreases dramatically www.cottonmarker.org
Downloads Page CMD SSR Primer Redundancyresults available from the Downloads page: -excel format www.cottonmarker.org
Search by Primer Redundancy www.cottonmarker.org
2. Traits in Cotton Linked to the Genetically Mapped SSR David Camak, undergraduate student (Erskine College) Example of published traits and QTLs associated with traits
Spreadsheet with Annotated Trait Data Trait Symbol Marker Interval for QTL Trait Name QTL/gene Name R2 Value Trait-linked SSR QTL/gene? Cross QTL Start & Stop Positions SSR Genetic Position Publication Reference Linkage Group QTL Span Marker Type Trait Description
Results • Twenty-nine agriculturally important traits were analyzed overall • Total number of SSR markers associated with those traits was 142 • The total number of crosses/genetic maps analyzed was 15
Boll Size Boll Weight Bolls per Plant Color Components Yellowness Fiber Span Length (2.5%, 50%) Fiber Elongation Fiber Fineness Fiber Maturity Fiber Perimeter Fiber Strength Fiber Micronaire Lint Index Lint Percentage Lint Yield Number of Seed/Boll Reflectance Seed Cotton Yield Seed Index Seed Weight Spiny Bollworm Resistance Short Fiber Index Wall Thickness Weight Fitness Uniformity Index Genic Male Sterility Agriculturally Important Traits Annotated
Number of Cotton SSRs Associated/Linked with the Analyzed Agriculturally Important Cotton Traits
2 *These numbers are continually updated as molecular research and breeding uncover more trait-linked SSRs
View Traits Go to CMD Homepage @ www.cottonmarker.org Click on Traits
Results Search SSRs Listed by Traits Traits by Published Symbol Click on any Trait
SSR Linked with Selected Trait List of SSRs Choose Trait Based on SSR Click
Trait Data From Spreadsheet Trait Information 1 QTL and Marker Information Positions for Genetic Mapping All Relevant Data from Spreadsheet Click on 1or 2 2
1 Specific Molecular Marker Source Page Forward and Reverse Primer Sequences Molecular Marker (SSR) Other Useful Information Related to Specific SSR
2 Trait Search Page Simple search for agriculturally important traits in cotton Search feature available on any page, including the homepage
3. Updated CMap Viewer with QTLs • 26 cotton genetic maps are available to view and compare in CMap Viewer; • QTL information was added
4. New Virtualization /HPC System Platform • - CMD was moved to • virtual machines for • high-performancecomputing (HPC); • jobs submitted by users transfer to • Clemson Palmetto HPC; • - very powerful computing • resource ( more than 5000 computing notes); • daily remote backup. VM3(gmod) VM1(cmdweb) VM2(databases) VM0 ssh Device Manager & Control s/w CMD web page Cmap Cgi-bin CMBioTools MySQL PostgreSQL Unmodified User Software GuestOS (CentOS 5.4) GuestOS (CentOS 5.4) (CentOS GuestOS 5.4) GuestOS (CentOS 5.4)) Palmetto HPC Back-End SMP Front-End Device Drivers Front-End Device Drivers Front-End Device Drivers Virtual CPU Virtual MMU Control IF Safe HW IF Event Channel Xen Virtual Machine Monitor Remote Daily Backup snapshot Hardware (SMP, MMU, physical memory, Ethernet, SCSI/IDE) Warriors HPC Cluster
Future • - 100 SSRs from Siva Kumpatla (Dow Agro): a collaborative project with Texas A&M , SSRs mapped on TM-1 x 3-79 map; • - 200 SSRs from RameshKantety; • Updating of the mapped SSR data is in progress • More SNP data is coming; • Annotation of traits/genes that are mapped and linked to SSRs/SNPs is in progress www.cottonmarker.org
Future (cont.) 3 pipelines were designed(Pengfei Xuan): 1. Eukaryotic Automated Structural Annotation Pipeline 2. Transposable elements denovo 3. Transposable elements annotation
1). Eukaryotic Automated Structural Annotation Pipeline (work in progress) Genome Sequence • - Aimed to identify a vast majority • of genes; • raw sequences are run through • a series of programs and scripts (“pipeline”) in an automated way; • generates a basic working • gene set as a starting point for further work. Phase 1 Preliminary gene finding Repeat Masker EST Database (PASA) Database Comparisons Gene Finders Consensus prediction Phase 2 Manualcheck Manually build gene models (200 genes) Gene Finder Use gene models as Training set Phase 3 Finalize best annotation Consensus prediction EST based refinement (PASA)
2). Transposable Elements Computational Identification (work in progress) This pipeline is searching the genome sequences for TEs and creates a library file of TEs for a genome of interest
3). Transposable Elements Annotation (work in progress) This pipeline mines a genome using a library of TEs from TEdenovo pipeline. Identified TEs are filtered and annotated.
CMD TEAM Anna Blenda, PI Research Assistant Professor, Genetics and Biochemistry Clemson University Feng Luo,collaborator Assistant Professor, School of Computing Clemson University Pengfei Xuan M.S. student Computer Science Clemson University David Camak former member, currently M.S. student Biology SELU
AcknowedgementsCotton Incorporated www.cottonmarker.org