110 likes | 120 Views
This article discusses data issues in GIS-PET transcriptomics analysis, including library construction and quality control. It also covers PET extraction and mapping for genome analysis.
E N D
U54 Trancriptomics Analysis by GIS-PET (Data Issues Discussion) Atif Shahab, Xiaoan Ruan, and Yijun Ruan (Genome Institute of Singapore) March 13-14, 2008 Barcelona, Spain
Biotin Biotin GsuI GsuI GsuI N/B/M S M/B TT AA N/B/M M/B/S TT pGIS-GIS-flcDNA library (>10E+7CFU) PolyA+ mRNA GIS-PET: library Construction AAAAAAAAAAA • RT make 1st-strand flcDNA (-) • Biotinylation at end Gene-Identification-Signature Paired-End-diTaq +5me-dCTP AAAAA (A)16- (T)16 • Hydrolytic digest bound RNA • Release 1st-strand flcDNA (-) (T)16 flsscDNA flcDNA library construction steps • Synthesize 2nd strand flcDNA • with N/B/M adaptor seq NNNN (T)16 flsscDNA • digest GsuI to remove poly(A) tail AA fldscDNA N/B/M • + 3’ adaptor M/B • digest with NotI add AA n S fldscDNA B//M M/B TT • clone flcDNA into pGIS vector fldscDNA
X X 4 kb 1 kb Prostate long Poly A + flcDNA library QC: • FlcDNA library colony-PCR QC: • 22 colonies showing variable size of PCR products • Average ~2 kb inserts were observed • DNA sequencing QC: • randomly picked colonies (96-well) were sequenced by Capillary sequencing • sequences of 5’ and 3’ end were obtained and mapped against UCSC human browser • out of 67 clones with good quality sequence, 62 were aligned to the boundaries of the matched gene models in browser (Hg18, March 2006). • a 92% full length cDNA ratio was obtained based on the colonies examined.
AA N/B/M M/B/S TT B AA N/B/M fldscDNA clone M/B/S Library construction cont: TT pGIS-flcDNA library (>10E+7CFU) • digest with MmeI to create PET 20bp 20bp AA N/B/M M/B/S TT • re-ligation & transformation 18 16 Single-PET & diPET libraries pGIS Single-PET library (>10E+8 CFU) • restriction cut (S, B) to release PET AA M 18 16 • diPETTING to create diPET 3’ 5’ 5’ 3’ GTCGGATCCGAC 18 16 16 18 Spacer sequence
diPET Sequencing structure: Spacer sequence 3’ 5’ 5’ 3’ GTCGGATCCGAC 18 16 16 18 3’ 5’ 5’ 3’ GSFLX-454 GTCGGATCCGAC 454-adaptor-A 454-adaptor-B 18 16 16 18 diPET sequence structure 3’ 5’ 5’ 3’ Solexa GTCGGATCCGAC Solexa-B 18 16 Solexa-A 16 18 Gene B Gene A 5’ 3’ 5’ 3’
Library Summary • No of diPETS: 619060 • No. of PETS: 955581 • No of unique PETS: 676570 • No. of non map able: 314600 • No of map able: 361970 • No. of Unique mappings: 288230 • No of mappings (2 -10): 62747
Sequence Data • diPET • (16bp3’)(18bp5’)[12bp linker](18bp5’)(16bp3’) • FASTA output >DHP001_454S_FullAnalysisOFF_000061_1518_1923 length=80 uaccno=E6NCSJP01DZLED CACTATGTACAAAACGGTCCGCGCGGCGCAGTCGTCGGATCCGACGGGGAGCGGGCGGCGGCGTAGCACAGCTGGCTGAG >DHP001_454S_FullAnalysisOFF_000093_1619_2463 length=80 uaccno=E6NCSJP01D8G0X GGTTTGCTAATTGCTGACTCCAGAGTTGTATCCGTCGGATCCGACGGCAGGTTCTCTTACATCATTCCCTGTCTTAAACG
PET extraction and mapping • PET • (16bp3’)(18bp5’) • Remove redundancy • Extract 5’ and 3’ ends • Map the ends to the genome >DHP001fr-U_12893_COUNT:2 AGAAACCAAGTTCCCCGGCATGCAGAGATCAGAG >DHP001fr-U_12894_COUNT:1 AGAAACCAAGTTCCCCGGCATGCAGAGATCAGG 1,19(19) 20,33(14) - chr14:60185955-60182507 3449 >DHP001fr-U_12880_COUNT:2 AGAAACAGCCAAAGGGGAACAACAGCTGAAGCAC 1,18(18) 18,35(18) - chr15:70286174-70278424 7751 1,16(16) 18,31(14) + chr6:5918738-5919858 1121
Data Publishing • T2G for GIS • UCSC based system • ENCODE/UCSC • Bed file format • ENCODE U54?
E E E E E E E E Cell-Free Approach for GIS-PET Library Construction-cont. dscDNA E E fldscDNA circularization EcoP15I cut E E PET E E Add sequence adaptor High throughput sequencing
Biotin Biotin GsuI GsuI GsuI E E Cell-Free Approach for GIS-PET Library Construction PolyA+ mRNA AAAAAAAAAAA • RT make 1st-strand flcDNA (-) • Biotinylation at end +5me-dCTP AAAAA (A)- (T)16 • Hydrolytic digest bound RNA • Release 1st-strand flcDNA (-) (T)16 • Synthesize 2nd strand flcDNA • with adaptor seq NNNN (T)16 • digest GsuI to remove poly(A) tail E E fldscDNA