160 likes | 284 Views
An Efficient Approach to Extracting Approximate Repeating Patterns in Music Databases. Ning-Han Liu, Yi-Hung Wu, Arbee L.P. Chen. Proceeding of International Conference on Database Systems for Advanced Applications (DASFAA 05). Speaker: Pei-Min Chou Date:2006/4/21. Introduction.
E N D
An Efficient Approach to Extracting Approximate Repeating Patterns in Music Databases Ning-Han Liu, Yi-Hung Wu, Arbee L.P. Chen Proceeding of International Conference on Database Systems for Advanced Applications (DASFAA 05) Speaker: Pei-Min Chou Date:2006/4/21
Introduction • Problem finding all the repeating patterns • exact repeating pattern • Suffix tree • String-join • Similarity pattern • costs a lot of time
Pitch String: 67,64,64, 65,62,62, 60,62,64,65, 67,67,67 Interval String: -3, 0, +1, -3, 0, -2, +2,+2,+1, +2, 0, 0 Definition • Pitch string: P=(p1,p2,…pm), ordered list, |P|=m • Interval string: D=(d1,d2,…dm-1), di=pi+1-pi • Interval segment: S[i:j]=(di,di+1,…,dj) • Constraints: Min_len , Max_len • Ex: String:(a,b,c,d) Min_len=2, Max_len=3 Segment (min_len): (a,b),(b.c),(c,d) Segment (max_len):(a,b,c),(b.c.d)
3 Def. Similar • Edit distance: edit(P,Q)=distance of transform segment P into segment Q • Distance threshold:δP=|P|*γ • |P|:segment length, γ:distance threshold ratio, γ<1 • Similar segment: if edit(P,Q)≤ δP • Q length is at least=|P|- δP • Ex:γ:50% • P=(1,2,3,1,1,2,3,1),Q=(1,3,3,1,2,3,1),edit(P,Q)=2 • δP=8*0.5=4 • edit(P,Q)=2 <4Q is similar to P
Def. Histogram vector • Histogram vector: • D={a1,a2,…an} • HV(S) = <h1S,h2S,….,hnS>, hkS: count of ak • Ex: • Segment S=(1,2,3,1,1,2,3,1) • D={1,2,3} • HV(S)=<4,2,2> • |HV(S)|=8 4 3 2 1 0 1 2 3
Def. Histogram distance • Histogram distance • HD(S1,S2)=max(ins(HV(S1), HV(S2)), ins(HV(S2),HV(S1))) • Ex1: • S1=(1,2,3,1,1,2,3,1) • S2=(1,3,3,1,2,3,1) • S1=S2={1,2,3} • HV(S1)=<4,2,2> ins(HV(S1), HV(S2))=1 • HV(S2)=<3,1,3> ins(HV(S2), HV(S1))=2 • Ex2: • S1=(1,2,3,1,1,2,3,1,3,4,5,3,4,5) • S2=(1,3,1,2,3,1,4,5,2,4,4) • edit(S1,S2)=5, HD(S1,S2)=4 ->HD(S1,S2)=2
Def. ARP • Extension P (Ext(P)) : • pivot P and all similar segments S • Each of segments satisfy overlapping threshold • Ex: P=(a,b,c,d) and min_len=2,max_len=3,δP=1 • segments=((a,b,d)(a,b,c)(b,c,d)(a,b,c,b)(a,b,c,d)) • Support: • The number of segments in an extension= |Ext(P)| • Approximate Repeating Pattern(APR) • |Ext(P)|≥min_sup
Def. Overlap • Overlapping degree: • I[a:b] and J[c:d], a≤c≤b and I similar to J • Overlapping degree=(b-c+1) • Ex:12122212 • Overlapping threshold: • OIJ=min(|I|,|J|)*ρ • ρ:overlapping threshold ratio,0≤ρ≤1 a c b d Degree=(7-4+1)=4
Extraction procedure • Index construction • Candidate generation • Range query formulation • MBR retrieval • Estimation for max number of similar segments • Candidate pruning after HD computations • Candidate pruning after HD computations • ARP extraction
S2 S5,S6 S7 S1,S3 S4 I=(0,0)(2,2) MBR(R1) R1 R3 RM pair={(1:5,2)} R2 Child-p2 I=(1,0)(2,1) I=(0,2)(1,2) R2 R3 RM pair={(1:2,2),(3:5,2)} RM pair={(1:4,2)} S1,S3,S4,S7 S2,S5,S6 Index construction • Construct parametric R*-tree • P(1,2,2,1,1) DP={1,2} • Segment len_2:S1(1,2),S2(2,2),S3(2,1),S4(1,1) • Segment len_3:S5(1,2,2),S6(2,2,1),S7(2,1,1) • Minimal bounding rectangle (MBR) • RM pair :(R,M), R: union range ,M: minimal length Level1 Level2 Segment level Dimention2 Child-p1 Dimention1 S1(1,2)->HV(S1)=<1,1> S2(2,2)->HV(S2)=<0,2>
S2 S5,S6 S7 S1,S3 R1 R3 S4 R2 Candidate generation • γ=0.5 , ρ=0.5 • Range query formulation • S7=P=(2,1,1) , Dp={1,2} , HV(P)=<2,1> • δP=3*0.5=1.5 • Range query=(HV(P), δP)=(<2,1>,1) • MBR retrieval • R1 and R2 included
Estimation for max number of similar segments • MLx=max(|P|-δP,Mx) • Mx :Each RM pair(Rx,Mx) in MBR • Numx:(n-L)/(L-m)+1,n=b-a+1,m=ρ*MLx • NumR: sum max num of MBR • Ex(Cont.) P=(2,1,1) • RM pair in R2=(1:2,2),(3:5,2) • ML1=max(3-1,2)=2, ML2=2 ,m=0.5*2=1 • num1=(2-2)/(2-1)+1=1,num2=1 • numR=1+1=2
m m m m m m m m Numx explain Numx:(n-L)/(L-m)+1 ,n=b-a+1,m=ρ*MLx =((b-a+1-L)/(L-m)) +1 L-m L a b n
Candidate pruning before & after HD computation • Before HD computation • Processed at level above segment level • If numR<min_sup ->prune and go next query • Else (numR≥min_sup)->recursive lower level • After HD computation • Processed at segment level • Compute the Hdistance between P and segment in MBRs. • All segment satisfy δPpermutated and compute max_num • If max_num<min_sup ->prune • Else(max_num≥min_sup) ->candidate ARP and segments
ARP extraction • Given a candidate ARP and its candidate segments • compute the edit distance • If candidate ARP and segments distance> threshold ->remove • Generate all the extention of candidate ARP by consider overlapping threshold • If support of extension < min_sup ->prune • Else Find a ARP
Experiment 1614 12 10 8 6 4 2 0 14 12 10 8 6 4 2 0 20 18 16 14 12 10 8 6 4 Min_len (interval number) 10 12 14 16 18 20 22 24 26 28 30 Max_len (interval number) 181614 12 10 8 6 4 2 0 20 15 10 5 0 0 5 10 15 20 25 30 35 40 45 50 distance threshold(%) 10 9 8 7 6 5 4 3 2 min_sup