1 / 36

The first Korean Genome Sequence analysis using Bioinformatics

The first Korean Genome Sequence analysis using Bioinformatics. Jong Bhak 20091120 jongbhak@yahoo.com Theragen Inc. 테라젠 ( 주 ). Acknowledgement. Gacheon Med. School. LCDi( Lee Gilya Cancer Diabetes Inst.) 김성진박사님 , 안성민박사님 키스티 정민중박사님 Theragen Inc. ( 테라젠 ( 주 )). 3 GB. Human Genome.

bruno
Download Presentation

The first Korean Genome Sequence analysis using Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The first Korean Genome Sequence analysis using Bioinformatics Jong Bhak 20091120 jongbhak@yahoo.com Theragen Inc. 테라젠(주)

  2. Acknowledgement • Gacheon Med. School. LCDi( Lee Gilya Cancer Diabetes Inst.) • 김성진박사님, 안성민박사님 • 키스티 • 정민중박사님 • Theragen Inc. (테라젠 (주))

  3. 3 GB Human Genome • 6,000 km (Seoul  Moscow: 6,600km) • SF  NY (4,100 km) • London  Boson (5,300 km)

  4. Current Status & Prediction • Genome era has arrived in 2007 ~ 2008 • Bioinformatics is becoming “industrial” in 2008 • The BioRevolution started and revolutionizing • the bioresearch, • medical, healthcare, • industrial, • and information tech. by 2016

  5. 8 Complete Genomes in 2009 • NCBI Reference genome, Caucasian • Craig Venter, Caucasian(publically available) • James Watson, Caucasian (publically available) • Nigerian (anonymous), African • HapMap sample  Illumina (publically available) • YH, Chinese, publically available • Kim Seong Jin, Korean, publically available • AK1, Korean, (data not available by Oct. 2009)

  6. DNA sequencing • First genome sequencing: 1977 Sanger method • Phi X 174 • Mitochondrial genome (1981) • 1998: Theoretically it takes one day to sequence a human genome (Church Lab, Harvard) • Polony based (Church, Knome. Inc) • 454 • Sollexa • Now: Over 2 GB per experiment Jong Bhak, under openfree BioLicense

  7. Cost • NCBI reference genome: 3,000 million USD • Craig Venter: 100 million USD • James Watson: 1 million USD • YH Chinese: 0.5 million USD • Nigerian African: 0.25 million USD (Illumina) • Kim Sung Jin: 0.25 million USD • Complete Genomics: 0.005 million USD • 2010: 0.001 million USD • 2012: $100 USD?

  8. Genome sequencing process

  9. Genomics era? Full genome sequencing Full genomics Individual sequencing cost can be $1000 or $100 by the year 2013 However,

  10. Genomics era? However, Useful Genomics per person can still cost $10,000 or more  $1000 genomics  Personal Genome  $0 Genomics an openfree genomics project

  11. Ome versus Omics graph $3,000,000,000 $50,000 per person Cost $ 0 2016 2003 Ome and Omics Balance point Year

  12. The most important aspect of Genomics: Variomics Personal genome comparison is now possible Personalcomparative genomics Variomics  Genomics  Jong Bhak, under openfree BioLicense

  13. A large pool of variation information Provide the map for global human genome(s) project  saves money. Provide association studies on all ethnic groups and individuals for phenotyping. Extract disease association information. Mapping everyone’s distance to each other.  Human diversity. Provide the public with an openfree personal variome analysis package.

  14. The Korean Genome

  15. The first Korean Genome (SJK) • First analyzed by Gacheon medical school LCDIand KOBIC, KRIBB in 2008 (Joint effort among LCDI, KOBIC, and 국가참조표준센터) • First annotated and made public on 4th Dec. 2008 (through web and ftp) • To be used as the first National Reference Genome • SNP, CNV, indels were analysed • Automated phenotypic association study was done • Non-syn. Analysis • Phylogeneticstudy ofmtDNA, Y Chr And autosomes showed Korean relationship to Chinese and Japanese. • First intra-Asian genome comparison (Chinese and Korean) • Analyzed at: 7.8, 17.3, 23.5 and 28 x folds • By Jan. 23.5 fold sequenced and analyzed • Openfreely Available from: http://koreagenome.org

  16. The Karyogram of the donor DNA No obvious chromosomal abnormalities!

  17. Korean Full Genome Statistics Table 1. Summary of data production and mapping to NCBI reference genome Unmapped reads : 5.97% (Korean specific or low quality sequences) 165,466 km of rice grain Earth circumference: 40,075.16 km 4.12 times of the Earth Circumference using rice grains

  18. Variation and Variomics

  19. Genetic variants 0.5 % difference KSJ genetic variations SNPs: 3,439,107 Indels: 342,965 Structural variants: 4298 (2920 deletions, 415 inversions, 963 insertions)

  20. Experimental evaluation of SJK SNP calls using two genotyping chips a HOM ref. : homozygous genotype for reference allele, b HOM var.: homozygous genotype different from reference allele, c HET ref.: heterozygous genotype with one reference allele. d SNP genotypes that are not identical between the two chips were removed (1903 out of 300,139 common markers between the two chips)

  21. Classification and number of intra-genic SNPs Not represent in dbSNP

  22. 박박사 박스 글 각각 두 개 중에 하나씩만 선택하시면 될 것 같습니다. Comparison of individual SNPs SJK shared 56% with Yoruba SJK shared 60% with Chinese Korean vs African : 56% Korean vs Chinese : 60% SJK shared 50% with Venter SJK shared 53% with Watson Korean vs Caucasians : 52%

  23. Korean Genome Variation Browser SJK’s SNPs “NOC2L” gene Hapmap Watson’s SNPs YH’s SNPs Venter’s SNPs http://koreagenome.org/cgi-bin/gbrowse/kgenome/

  24. SJK’s genetic lineage Autosomal phylogenic tree SJK Chromosome Y haplogroup lineage mtDNA ethno-geographic lineage

  25. What global populations share the most in common with SJK?(34 ethnic group) Ethnic group demonstrating system developed by KOBIC

  26. Size distribution and classification of short indels found in SJK Using MAQ, we identified 342,965 short indels  We found that only 247 (0.1%) were validated,113,287 (33.0%) non-validated, and 229,431 (66.9%) indels were not found in dbSNP

  27. Indels in SJK genic regions

  28. Validation of indels in coding gene by PCR & Sanger sequencing We selected nine coding-region indels and validated them (with 100% success) by using PCR (Polymerase Chain Reaction) amplification and Sanger dideoxy sequencing

  29. Comparison of the SJK indels (< 4bp) overlapped with those of YH, HuRef (Venter), Watson, and NA18507 (Yoruba) genomes Comparison of individual Indels This discrepancy seems to result from the method used rather than from the ethnic similarities between SJK and NA18507 (i.e., because, paired-end sequencing was used for SJK and NA18507). This may partially explain why HuRef and Watson which are Caucasian as the NCBI reference, have lower levels (86.2% and 87.8%) of common indels against SJK.

  30. Homo- and heterozygous deletions in KOREF genome (A) Homozygous 2.3 kb genomic deletion and (B) Heterozygous 5 kb genomic deletion.

  31. Detection and identification of structural variants • We found structural variants by using paired-end reads. • 2920 deletions (100bp ~ 100kb) • 415 inversions (100bp ~ 100kb) • 963 insertions (175bp ~ 250bp) • We found deletion SVs in 21 coding genes. •  All heterozygous deletions

  32. SJK specific structural variants (deletion): 331 (11.3%)

  33. Repeat composition in SJK deletion variants Long Interspersed Nuclear Elements (LINE) Short Interspersed Nuclear Elements (SINE)

  34. Genomics & Bioinformatics in Theragen Genomics and Bioinformatics company Marker discovery Drug Target & Drug screening Personalized, Preventive, Predictive medicine Genomics experiment team + Bioinformatics team 연구소: 광교 테크노 밸리, 차세대 융합기술원 2층, 동수원 IC

  35. Genome Information Data Center • Genome information data center • Much experience in handling biodata • Top level of bioinformatics/DB handling in the world • International network • Experience in maintaining large clusters of CPUs and storage

More Related