1 / 16

An Efficient Approach to Extracting Approximate Repeating Patterns in Music Databases

An Efficient Approach to Extracting Approximate Repeating Patterns in Music Databases. Ning-Han Liu, Yi-Hung Wu, Arbee L.P. Chen. Proceeding of International Conference on Database Systems for Advanced Applications (DASFAA 05). Speaker: Pei-Min Chou Date:2006/4/21. Introduction.

uri
Download Presentation

An Efficient Approach to Extracting Approximate Repeating Patterns in Music Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Efficient Approach to Extracting Approximate Repeating Patterns in Music Databases Ning-Han Liu, Yi-Hung Wu, Arbee L.P. Chen Proceeding of International Conference on Database Systems for Advanced Applications (DASFAA 05) Speaker: Pei-Min Chou Date:2006/4/21

  2. Introduction • Problem finding all the repeating patterns • exact repeating pattern • Suffix tree • String-join • Similarity pattern • costs a lot of time

  3. Pitch String: 67,64,64, 65,62,62, 60,62,64,65, 67,67,67 Interval String: -3, 0, +1, -3, 0, -2, +2,+2,+1, +2, 0, 0 Definition • Pitch string: P=(p1,p2,…pm), ordered list, |P|=m • Interval string: D=(d1,d2,…dm-1), di=pi+1-pi • Interval segment: S[i:j]=(di,di+1,…,dj) • Constraints: Min_len , Max_len • Ex: String:(a,b,c,d) Min_len=2, Max_len=3 Segment (min_len): (a,b),(b.c),(c,d) Segment (max_len):(a,b,c),(b.c.d)

  4. 3 Def. Similar • Edit distance: edit(P,Q)=distance of transform segment P into segment Q • Distance threshold:δP=|P|*γ • |P|:segment length, γ:distance threshold ratio, γ<1 • Similar segment: if edit(P,Q)≤ δP • Q length is at least=|P|- δP • Ex:γ:50% • P=(1,2,3,1,1,2,3,1),Q=(1,3,3,1,2,3,1),edit(P,Q)=2 • δP=8*0.5=4 • edit(P,Q)=2 <4Q is similar to P

  5. Def. Histogram vector • Histogram vector: • D={a1,a2,…an} • HV(S) = <h1S,h2S,….,hnS>, hkS: count of ak • Ex: • Segment S=(1,2,3,1,1,2,3,1) • D={1,2,3} • HV(S)=<4,2,2> • |HV(S)|=8 4 3 2 1 0 1 2 3

  6. Def. Histogram distance • Histogram distance • HD(S1,S2)=max(ins(HV(S1), HV(S2)), ins(HV(S2),HV(S1))) • Ex1: • S1=(1,2,3,1,1,2,3,1) • S2=(1,3,3,1,2,3,1) • S1=S2={1,2,3} • HV(S1)=<4,2,2> ins(HV(S1), HV(S2))=1 • HV(S2)=<3,1,3> ins(HV(S2), HV(S1))=2 • Ex2: • S1=(1,2,3,1,1,2,3,1,3,4,5,3,4,5) • S2=(1,3,1,2,3,1,4,5,2,4,4) • edit(S1,S2)=5, HD(S1,S2)=4 ->HD(S1,S2)=2

  7. Def. ARP • Extension P (Ext(P)) : • pivot P and all similar segments S • Each of segments satisfy overlapping threshold • Ex: P=(a,b,c,d) and min_len=2,max_len=3,δP=1 • segments=((a,b,d)(a,b,c)(b,c,d)(a,b,c,b)(a,b,c,d)) • Support: • The number of segments in an extension= |Ext(P)| • Approximate Repeating Pattern(APR) • |Ext(P)|≥min_sup

  8. Def. Overlap • Overlapping degree: • I[a:b] and J[c:d], a≤c≤b and I similar to J • Overlapping degree=(b-c+1) • Ex:12122212 • Overlapping threshold: • OIJ=min(|I|,|J|)*ρ • ρ:overlapping threshold ratio,0≤ρ≤1 a c b d Degree=(7-4+1)=4

  9. Extraction procedure • Index construction • Candidate generation • Range query formulation • MBR retrieval • Estimation for max number of similar segments • Candidate pruning after HD computations • Candidate pruning after HD computations • ARP extraction

  10. S2 S5,S6 S7 S1,S3 S4 I=(0,0)(2,2) MBR(R1) R1 R3 RM pair={(1:5,2)} R2 Child-p2 I=(1,0)(2,1) I=(0,2)(1,2) R2 R3 RM pair={(1:2,2),(3:5,2)} RM pair={(1:4,2)} S1,S3,S4,S7 S2,S5,S6 Index construction • Construct parametric R*-tree • P(1,2,2,1,1) DP={1,2} • Segment len_2:S1(1,2),S2(2,2),S3(2,1),S4(1,1) • Segment len_3:S5(1,2,2),S6(2,2,1),S7(2,1,1) • Minimal bounding rectangle (MBR) • RM pair :(R,M), R: union range ,M: minimal length Level1 Level2 Segment level Dimention2 Child-p1 Dimention1 S1(1,2)->HV(S1)=<1,1> S2(2,2)->HV(S2)=<0,2>

  11. S2 S5,S6 S7 S1,S3 R1 R3 S4 R2 Candidate generation • γ=0.5 , ρ=0.5 • Range query formulation • S7=P=(2,1,1) , Dp={1,2} , HV(P)=<2,1> • δP=3*0.5=1.5 • Range query=(HV(P), δP)=(<2,1>,1) • MBR retrieval • R1 and R2 included

  12. Estimation for max number of similar segments • MLx=max(|P|-δP,Mx) • Mx :Each RM pair(Rx,Mx) in MBR • Numx:(n-L)/(L-m)+1,n=b-a+1,m=ρ*MLx • NumR: sum max num of MBR • Ex(Cont.) P=(2,1,1) • RM pair in R2=(1:2,2),(3:5,2) • ML1=max(3-1,2)=2, ML2=2 ,m=0.5*2=1 • num1=(2-2)/(2-1)+1=1,num2=1 • numR=1+1=2

  13. m m m m m m m m Numx explain Numx:(n-L)/(L-m)+1 ,n=b-a+1,m=ρ*MLx =((b-a+1-L)/(L-m)) +1 L-m L a b n

  14. Candidate pruning before & after HD computation • Before HD computation • Processed at level above segment level • If numR<min_sup ->prune and go next query • Else (numR≥min_sup)->recursive lower level • After HD computation • Processed at segment level • Compute the Hdistance between P and segment in MBRs. • All segment satisfy δPpermutated and compute max_num • If max_num<min_sup ->prune • Else(max_num≥min_sup) ->candidate ARP and segments

  15. ARP extraction • Given a candidate ARP and its candidate segments • compute the edit distance • If candidate ARP and segments distance> threshold ->remove • Generate all the extention of candidate ARP by consider overlapping threshold • If support of extension < min_sup ->prune • Else Find a ARP

  16. Experiment 1614 12 10 8 6 4 2 0 14 12 10 8 6 4 2 0 20 18 16 14 12 10 8 6 4 Min_len (interval number) 10 12 14 16 18 20 22 24 26 28 30 Max_len (interval number) 181614 12 10 8 6 4 2 0 20 15 10 5 0 0 5 10 15 20 25 30 35 40 45 50 distance threshold(%) 10 9 8 7 6 5 4 3 2 min_sup

More Related