1 / 13

BioInformatics (2)

BioInformatics (2). Physical Mapping - I. Low resolution Megabase-scale High resolution Kilobase-scale or better Methods for low resolution mapping Somatic cell hybrids (human and mouse or hamster) Fast chromosomal localisation of genes Subchromosomal mapping possible

fern
Download Presentation

BioInformatics (2)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BioInformatics (2)

  2. Physical Mapping - I • Low resolution • Megabase-scale • High resolution • Kilobase-scale or better • Methods for low resolution mapping • Somatic cell hybrids (human and mouse or hamster) • Fast chromosomal localisation of genes • Subchromosomal mapping possible • Fluorescence in situ hybridisation (FISH) • Chromosome painting • Fractionation of chromosomes by flow cytometry

  3. Physical Mapping - II • Methods for high resolution mapping • Long-range restriction mapping • Pulsed-field gel electrophoresis (PFGE) • Assembly of clone contigs • The double digest problem • Ordering fragments from a 2 restriction enzyme digest • Sequence Tagged Sites (STSs) • Sequence fragments in the genome described uniquely by a pair of PCR primers • Usually 200-300 bases • Very useful as ‘landmarks’ on the physical map • Can be mapped to individual clones by FISH • Assembly of STS-content physical maps

  4. Physical Mapping - III • Map units (human genome) • 1 cM = ~ 1 Mb • 1 cR = ~ 30 kb • 1 centiRay = 1% chance of a radiation-induced break between 2 markers • Major information resources • Stanford Human Genome Center (RH maps) • http://www-shgc.stanford.edu • Whitehead/MIT Genome Center (STS content maps) • http://www-genome.wi.mit.edu/ • Centre d’Etude du Polymorphisme Humaine - CEPH (YAC maps) • http://www.cephb.fr/bio/ceph-genethon-map.html

  5. Physical Mapping - IV • Conclusions • The value of physical mapping • Confirmation of chromosomal location of clones and genes • Correction of genetic map errors • Correlation to genetic map reveals ‘hot’and ‘cold’ regions of recombinational activity on chromosomes • Provides useful information for duplicated regions • High resolution mapping provides the framework necessary for high quality sequencing of large genomic regions

  6. System for Assembling Markers (SAM)

  7. DNA Sequencing • Ordered clone library • Sequencing of overlapping clones of known order as determined by restriction analysis • Advantage • Easy ordering of resulting sequence reads • Disadvantage • Detailed mapping is time-consuming • Shotgun sequencing • Partial digestion of DNA with a 4-cuter enzyme • Sequencing of randomly overlapping clones • Computer-aided assembly of reads • Advantage • Speed • Disadvantage • High data redundancy due to random sequencing • Not suitable for large genomes (>300 Mb)

  8. Assembly of Sequence Contigs • The problem: • Semi-automated assembly of a contiguous DNA sequence from overlapping gel readings • Steps • Base identification • Trimming of ends • Vector clipping • Assembly of fragments • Major software packages • SequencherTM from GeneCodes Inc., Ann Arbor, Michigan • Platforms: PowerMac, Windows NT • Up to 70 kb contigs • The Staden package by Staden et al., MRC, Cambridge • PHRED/PHRAP by Green et al., University of Washington, Seattle • Platforms: Unix • Megabase range contigs • Mutation detection capabilities

  9. Quality Control of Sequence DataSource: US DOE Joint Genome Institute • Goals • Complete sequence continuity across a target region (both within and between clones) • No more than one gap in 200 kb • Size of all gaps no larger than 1% of the size of the total region • ‘Allowable gaps’ include • regions unclonable/unstable in conventional cloning vectors • repetitive regions • regions with significant secondary structure or abnormally high GC content • Gap size measured by PCR or restriction digest analysis • Accuracy of finished sequence: 1 error in 10,000 bases • At least 95% double-strand coverage • Assembly Verification • a minimum of three independent restriction digests • reassembly with an independent algorithm • re-sequencing of random clones

  10. Submission and Annotation of Sequence DataSource: US DOE Joint Genome Institute • Size of the starting clone is minimum size of submission to public databases • 95% of the sequence represented on both strands • all ambiguities resolved or annotated • missing data from the end of a clone allowed if sequence overlap is detected with the adjacent clone in the tiling path • Level of annotation • all sequences annotated in a largely automated fashion • identification of putative or known genes, repetitive elements, EST matches and any other useful “miscellaneous features” • computationally-derived predictions must be indicated as such • Immediate release of finished annotated sequence • Global assembly of meta-contigs from previously submitted data will be performed periodically

  11. International Strategy Meeting on Human Genome SequencingBermuda, 25th-28th February 1996Sponsored by the Wellcome Trust • Summary of agreed principles • Primary genomic sequence should be in the public domain • Primary genomic sequence should be rapidly released • Assemblies of greater than 1 Kb should be automatically released on a daily basis • Finished annotated sequence should be immediately submitted to the public databases • Coordination • Large-scale sequencing centres should inform HUGO of their intention to sequence particular regions of the human genome

  12. Annotating the Human Genome Sequence • Identification of coding regions • Exon/intron prediction • High throughput comparison of genomic sequence to protein information • Full-length protein sequences • Databases of protein domains • How automated is automated annotation in reality? • Advantages • High speed • Good for tRNA genes, repetitive regions • Good for high-scoring matches in databases, but • Disadvantages • Error propagation can be detrimental • Domain ‘recycling’ in evolution causes misinterpretation, e.g. in the case of transcription factors similar to peptidases • Very computer-intensive task!

More Related