260 likes | 391 Views
Analysis of the Positively Selected and Non-Positively Selected Non-Protein Coding Sequences of Chromosome 16. Kyle Tretina w ith a team led by Dr. Pattle P. Pun in collaboration with Mr. Ross Leung of CUHK. Introduction: Story of Evolutionary History.
E N D
Analysis of the Positively Selected and Non-Positively Selected Non-Protein Coding Sequences of Chromosome 16 Kyle Tretina with a team led by Dr. Pattle P. Pun in collaboration with Mr. Ross Leung of CUHK
Introduction: Story of Evolutionary History • Story: increasing organismal complexity as evolution proceeds Bacteria < Fish < Primate < Human
WHY? • “But little Mouse, you are not alone, In proving foresight may be in vain: The best laid schemes of mice and men Go often askew, And leave us nothing but grief and pain, For promised joy!” –Robert Burns (1785)
Genetics • Central Dogma: DNA RNA Protein • Complexity ~ Number of Genes? • Humans ~30,000 • Flies ~ 14,000
Complexity (K) ~ Gene Number (N)? • Relationship? • proportional: K ~ N • polynomial: K ~ Na • exponential: K ~ aN • factorial: K ~ N! • Jean-Michel Claveries: ON/OFF states • 230,000 / 214,000 ≈ 3x104816
Goal • Determine the role of non-coding DNA in gene regulation by looking at the functions of non-coding SNPs that are positively selected or non-positively selected on chromosome 16
Definitions • SNP: single nucleotide polymorphism • Variable between populations • Importance likely due to stability of variation • Selection: description of phenomena that only organisms best adapted to their environment tend to survive and create progeny • Gene-selection algorithm and neutral selection theory (wrench)
Methods Overview • HapMap Database Selection Data List of Chr16 SNPs • UCSC Genome Database Mirror SNP flanking sequence • TRANSFAC related transcription factor data for each SNP flanking sequence • PReMod confirm results
HapMap Phase I Data • HapMap Project: an international effort to identify and catalog genetic similarities and differences in human beings (Haplotype Maps), also includes: • Selection Data List of Chr16 SNPs • ~25,000 non-positively selected • ~5,000 positively selected
UCSC Genome Browser • Genome.UCSC.edu: a website containing several reference sequences and tools for visual and computational analysis • Methods: • Enter in each from list of RSID’s (SNP Identifiers) • Note intersecting sequences • Copy/Paste Sequences
UCSC Genome Browser Mirror • Efficiency • ~70seq/hr for 1.5yrs = ~1/3 sequences gathered • 2hrs • Online Instructions, but Complicated Data Structure • Henry Ford: 1.1 million lines source code • Many thanks to the Dr. Hayward (Wheaton College CS Faculty)
Sequences Collected • Graph 1. The distributions of the positively selected SNPs used in the study across human chromosome 16 • Graph 2. The distributions of the non-positively selected SNPs used in the study across human chromosome 16
TRANSFAC • TRANSFAC: a relational database, available via the web as six flat files including various data concerning transcription factors, DNA-binding sites, and target genes • Automation at CUHK
PReMod • PReMod: a new database of genome-wide cis-regulatory module (CRM) predictions for both the human and the mouse genomes. • Enter ranges for SNP sequences • Look for same pattern as TRANSFAC
Analysis • MySQL Tables • Programmed Scripts: • Word Patterns: i.e. keywords, recurring identifiers • Unique Entries • Progress Statistics • Overlap between N+ selected and + selected SNPs
Results Table 1. A summary of the manual SNP flanking sequence gathering from the UCSC Genome Browser
Conclusions • Data not all in yet • Possible implications: • Central Dogma Biology: information flow • Quantification Genetic Natural Selection • Views of Complexity of Humans • Lesson Learned: value of bioinformatics • High volume data requires computational analysis, not manual
Acknowledgements • Many thanks to Dr. Pun, for letting me get involved in this project, for his vision and mentorship. • Special thanks to Dr. Hayward, for putting in extra hours unpaid so that a student can follow his dreams of graduate school. • Thanks to our collaborators at the Chinese University of Honk Kong – Dr. Tsui and Mr. Leung – for accessing the TRANSFAC database for us, and for being flexible to the demands of our project. • The most thanks to God, for blessing me with the opportunity to work hard and learn. I pray that I might always be able to do these two things earnestly and voraciously.