130 likes | 238 Views
Bioinformatics Lectures at Rice. Lecture 2: High throughput technologies in genomics By Li Zhang. Microarrays. Biology: The biological problems Technology: Microarray mechanism; experimental procedures Statistical methods: data analysis, checking quality, exploration, discovery.
E N D
Bioinformatics Lectures at Rice Lecture 2: High throughput technologies in genomics By Li Zhang
Microarrays Biology: The biological problems Technology: Microarray mechanism; experimental procedures Statistical methods: data analysis, checking quality, exploration, discovery.
Microarray technology • Microarray technology measure copy number of molecules in a mixture on a small slide. • Thousands or millions of different kinds of molecules can be measured simultaneously, thus creating large volumes of data per biological sample. • The molecules can be DNA, RNA or protein.
Major types of microarrays • Two color short oligo arrays http://www.youtube.com/watch?v=VNsThMNjKhM&feature=related • Single color short oligo arrays Synthesized by photolithography: http://www.youtube.com/watch?v=ui4BOtwJEXs&feature=related (Eric Lander) • Bead arrays
The experimental procedure to produce microarray data Affymetrix Gene expression Analysis Sample preparation protocol: • RNA isolation • cDNA synthesis • cRNA synthesis • Hybrdization • Amplification • Scan http://www.digizyme.com/competition/examples/genechip.html
Targets of Microarray measurements • mRNA gene expression • SNP genotyping • DNA copy number (aneuploidy, chromosomal aberration,LOH) • DNA methylation • ChIP-chip. Protein-DNA binding site • Nucleosome binding site
Some key aspects of microarray technology • Parellel. The technology is design to measure a larger number of different molecules. • Almost comprehensive. It can work for some or most of the molecules, but not for all, which will result in some missing data. • Noise and bias. The signals can be affected by unwanted source, e.g., cross-hybridization, which creates biases. Contamination also may have asymmetrical distribution. • Nonlinear response. Saturation causes non-linear behavior. • Evolving annotation. Identity of the molecules may change, reflecting new knowledge through time. • No units. The numbers are often on relative scale, which means the data have are not been calibrated.
Sequence by synthesis on an array • Illumina/SOLiD/454 Life sciences http://www.youtube.com/watch?v=g0vGrNjpyA8 (1.5 hr video, from a meeting in 2010) Illumina’s animation. (http://www.youtube.com/watch?v=l99aKKHcxC4&feature=related) (3 min) Solid’s animation. http://www.youtube.com/watch?v=nlvyF8bFDwM Complete Genomics ( Nanoball sequencing).
Some key aspects of next generation sequencing technology • Compared with microarrays, NGS has less noise, no cross hybridization, and no saturation. • Bias remains a problem. Some sequences simply cannot be dealt with properly. These include high GC sequences, repeats, etc. • Mapping to the genome can be challenging. But paired-ends help a lot. • Biases partly come from PCR amplification, whose efficiency differ depending on the sequences.
3rd Generation sequencing • Single molecule, with no PCR amplification. • No fluorescence dyes, hence less reagent cost. • Longer sequences • Remaining problem: erratic base calling. Ion torrent (http://www.youtube.com/watch?v=yVf2295JqUg) Pacific Biosciences (http://www.youtube.com/watch?v=v8p4ph2MAvI) Nano-pores (http://www.youtube.com/watch?v=8kPfQNzR4FI&feature=results_main&playnext=1&list=PL0AC36A831CCB8690)
Challenges ahead • Complexity of human diseases • Heterogeneity • Biological samples are fragile, subject to degradation, contamination. • Biases, batch effects, standards.