60 likes | 81 Views
Position-dependent motif characterization using Non-negative matrix Factorization (NMF). In collaboration with: Thomas Blumenthal, University of Colorado David Kulp, University of Massachusetts. Joel H Graber Lucie N. Hutchins, Erik McCarthy, Sean Murphy, Priyam Singh
E N D
Position-dependent motif characterization using Non-negative matrix Factorization (NMF) In collaboration with: Thomas Blumenthal, University of Colorado David Kulp, University of Massachusetts Joel H Graber Lucie N. Hutchins, Erik McCarthy, Sean Murphy, Priyam Singh The Jackson Laboratory Funding Sources Current: NIH GM 072706, NIH HD037102 Previous: NIH RR 16463 (INBRE-Maine) NSF 2010 Project DBI 0331497
N position counts PWC Matrix M sequence words Functional site Motifs are often constrained in positioning AUGCACAUAGAGGCAAUUGUGUAUCAAUAUUAAAAAUAAAGUAAAACUUA AAGCAUGUGUAGACCGUGUG AUGAAUCCUUGUAUAAGCAACUGCCAAUGAAAUCGGGCUCGCUGUGGUCA UCCGUGAGUGCUUAUCAUUC UGGUAAUACCGUGGUCUAUUUAUACAAAUAUUAAAAGUGCUGUUUAUAGA GCCUGUGUCAUGUGGCAACU UCCUGUGUCAUGACCUCAGGAAAUAAAUUUCCUUGACUUUAUAAAAGCCA AAACGUUUGCCCUCUUCCUU GGAAUUUGAAAUUACUCCAAUUUAAAAUAAAUUACUGGACUGUGGAAAUA ACAUGUAGAAUUGCAGUUUU ACACUGUAACAGUUGCUUCUGCCUACCUUAUAAAUAAAGAAUCACUAAGA AAAAGAGUUCUCAGGUCUCC CUGAGCUCAGACUGAGGGGAAACGGAGGCAAAUAAAGCUGAGUUUUGAGA ACUCGGUGGCCUGUGUUCCU AGCCUGUACUCACCCCUUCCCUUAAUAAUAAUAAAACAACAACUUUGUGA AUUUGAGUUUUCCUUAGAGC UCAACAGAUCAUAUUCAGUGUCUUGAAUAAAUUGCUCUAUUUUGAUAUUA GAGAACAUAGUGACUGUGUU UGGUACGAUUAUUUUUUUUAACUAAAAUGAGAUAAAAUUCUAUAUUCUUAUGUGUGUGUGGUUUUUGAUG GGUGAAACUGUCUCAAUUUGAAUAAAUAUUUUUAUUGCAAUUCUGAACCA AUUUUAAAAGAAAAGAUACA AAUGUCCUUCCAAAUAGAGCCUUUUUAUUAAUAAAGGGCCUUGUACUUCA CUUGGAACAAAGGACGUUUC AUUUCAUUGUGUUAAAUGUAUACUUGUAAAUAAAAUAGCUGCAAACCUUA AGCCUUUGAGCUACUUGGUG UAUCUCACUCGGUAUUACGUGCUCUGCAAUAGAAGUUGGUGUGAACAUUC CCAGGUGACAUGCAGUGUUA CCACCACCCCUCCAUCAGUAAGCCACUAAUAAAGUGCAUCUAUGCAGCCA CAGGUCUGUCUGCCUCUUUU GGCUGGGCACCUUAAAAGAGAAGUCAAUAAACUGGGCUACACAGUACUUA AAACGCUGAACUGGCUAAGA UGUGUAUUUAUGAAUAUUAAUGAAUAAAAACUGCUUGGAUGGUUUACCUA ACUACUGCAUGAGGUUUUUU UCCUUUCUUUUCUCUCCACUCAAUAAAUACUUUAAAGCACAUUUGGAAUA AAGGAAGAGACUUUUAAGUG GUGCUUAAUGAUAAGGUUUUGACUUGUUAAAUUAAACCAUUUGGAAUAUA UUGUGUGUUUGUAGUAGUCA GUGCCUUUGUUUGUAAACCAAAAAGUAAUAAAUGAAUCCCUAUAUUUCUA UUAUAGCAUCUAUUGUAUUU AAUAUAGUAUUUUAUUUAAGAAAAUAAACUUUGCAGUUUUUGCAUUGUGA AUUCUCUCUCUUCCCGCCCA CUGCCAUGAAAAAUGUUGUUUAUGGAAUAAAAAAAAUGUAACUGCCUUUA AAUUUCCUGGUGGCUGUGUU
NMF decomposes the PWC matrix into characteristic patterns (motifs) Counts (M x N) Bases (M x r) Weights (r x N) Wik= weight of ith word in the kth motif (content) Hkj= abundance of kth motif at the jth position (positioning) r = number of basis functions (patterns)
Test matrix 1 Test matrix 2 Human polyA sequences Artificial sequences RSS provides a robust estimate for the optimal number of vectors (r)
Mouse 3’-processing sequences Human transcription start sites NMF can characterize complex control sequences