1 / 11

De novo assembly from clinical sample

De novo assembly from clinical sample. Anna Shcherbina Bioinformatics Challenge Day 01/10/2013. This work is sponsored by the Defense Threat Reduction Agency under Air Force Contract #FA8721-05-C-0002. 

raanan
Download Presentation

De novo assembly from clinical sample

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. De novo assembly from clinical sample Anna Shcherbina Bioinformatics Challenge Day 01/10/2013 This work is sponsored by the Defense Threat Reduction Agency under Air Force Contract #FA8721-05-C-0002.  Opinions, interpretations, recommendations and conclusions are those of the authors and are not necessarily endorsed by the United States Government. • Distribution Statement A: Approved for public release; distribution is unlimited.

  2. What is Metagenomics? • Metagenome: Genetic material recovered directly from a clinical/environmental sample. • Metagenomics tries to answer three questions about these samples: • Diversity and abundance of community members (“Who is there?”) • Metabolic potential of the community and its members (“What are they doing?”) • Ecological relations between members of the community (“Why are they there?”)

  3. Your Challenge Given a clinical sample, • Produce a high quality de novo assembly. • Use this assembly to identify pathogens present in the sample. • Determine pathogen significance: which of the numerous organisms in the metagenomic sample are responsible for disease? • Exact species/strain may be absent in reference databases. • High confidence decisions can be difficult. • Some pathogens are opportunistic. • Database contamination adds noise to the analysis.

  4. Example: • Give a clinically relevant example of how metagenomics samples are useful

  5. Part 1: De Novo Assembly • Sequence assembly refers to aligning and merging fragments of a much longer DNA sequence in order to reconstruct the original sequence. • This is needed as DNA sequencing technology cannot read whole genomes in one go, but rather reads small pieces of between 20 and 1000 bases, depending on the technology used. WANT HAVE

  6. NGS Assembly is Platform-Dependent Source: Nature Reviews Genetics 11, 31-46 (January 2010) doi:10.1038/nrg2626

  7. Popular Assemblers

  8. Example: Newbler Assembler • Input:FASTA reads • Output:contigs

  9. Part 2: Using De Novo Assembly for Pathogen Identification • Basic Local Alignment Search Tool (BLAST) • Compares assembled contigs (queries) with a library or database of sequences. • Identifies library sequences that resemble the query sequence above a certain threshold. • Emphasizes speed over sensitivity. • Uses of BLAST • Identifying species • Locating domains • Establishing Phylogeny • DNA mapping • Comparison • Flavors of BLAST • Nucleotide-nucleotide (blastn), protein-protein (blastp), position-specific iterative (blastpgp), MegaBLAST (higher speed) • Publically available databases to search ftp://ftp.ncbi.nlm.nih.gov/blast/db/

  10. BLAST example: Metagenomic Sample Queried Against Bacteria.gdna Query= SCUMS_READ_Arctic2638735 /sample_id=SCUMS_SMPL_Arctic /sequencing_technology=454_Pyro /length=101 Length=101 Score E Sequences producing significant alignments: (Bits) Value CP000356.1 /organism="Sphingopyxisalaskensis RB2256" /molecule... 147 3e-33 > CP000356.1 /organism="Sphingopyxisalaskensis RB2256" /molecule="DNA" /date="05-MAR-2010" /taxonomy="Bacteria; Proteobacteria; Alphaproteobacteria; Sphingomonadales; Sphingomonadaceae; Sphingopyxis." /strain="RB2256" /def="Sphingopyxisalaskensis RB2256, complete genome." Length=3345170 Score = 147 bits (79), Expect = 3e-33 Identities = 95/102 (93%), Gaps = 3/102 (3%) Strand=Plus/Plus Query 1 ACCAACGGCGGCGGTGCGCCACCTTGGCAAGGCGCTGGCGCGCGAATGGGCGCGGCGCGG 60 ||||| ||||||||||||||| || ||||||||||||||||||||||||||||||||||| Sbjct 296267 ACCAA-GGCGGCGGTGCGCCATCTGGGCAAGGCGCTGGCGCGCGAATGGGCGCGGCGCGG 296325 Query 61 GATCA-GCGTCAATGTGATCCAGCCCGGCTATTTCGAATCCG 101 ||| | |||||||||| |||||||| |||||||||||||||| Sbjct 296326 GAT-ATGCGTCAATGTCATCCAGCCGGGCTATTTCGAATCCG 296366 Lambda K H 1.33 0.621 1.12 Gapped Lambda K H 1.28 0.460 0.850 Effective search space used: 547000946085

  11. Part 3: Determining Pathogen Significance • Characterize pathogen count in the sample. • Characterize pathogen virulence. • Databases of interest: • PATRIC (Pathosystems Resource Integration Center) • VFPB (Virulence Factors of Pathogenic Bacteria) • GeneDB • ViPR (Virus Pathogen Resource)

More Related