100 likes | 174 Views
Modeling of Spliceosome. 김동민 이경준 임종윤. Gene Finding. Transcription: multi-step process Long sequence in one Que (X) Several steps like many enzymes (O) promoter, 3 ’ -processing, splice site, coding exon. Splicing Site. GT-AG : 99.24%, GC-AG : 0.69%, AT-AC : 0.05% (Burset et al. , 2000)
E N D
Modeling of Spliceosome 김동민 이경준 임종윤
Gene Finding • Transcription: multi-step process • Long sequence in one Que (X) • Several steps like many enzymes (O) • promoter, 3’-processing, splice site, coding exon
Splicing Site • GT-AG : 99.24%, GC-AG : 0.69%, AT-AC : 0.05% (Burset et al., 2000) • Site recognition (Chiara et al., 1996) • 25-base upstream of GT splice • GT, AG splice site • branchpoint sequence
Problem Discription • GT 또는AG sequence site를중심으로특정window size의binary incoding된sequence를입력받아이사이트가exon-intron splicing site인지를판별 • Modeling of spliceosome
Training Data • UCSC data • GT, AG 앞 뒤 40 염기 Correct False Doner 1149 3813 Acceptor 1143 6021
Parameter Values • input node : 328 • hidden node : 70 • output node : 1 • learning rate : 10 • slope parameter: 0.02 (activation function은sigmoid 사용)
Prediction Ratio • Doner :96.33% • Acceptor site : 95.25%
di S mi E ii HMM architecture
HMM architecture(2) • The number of states • The number of distinct observation symbols per state • The state transition probability distribution • The observation symbol probability distribution in state • The initial state distribution