170 likes | 274 Views
LANGUAGE AND INTELLIGENCE. U N I V E R S I T Y O F P I S A. DEPARTMENT OF COMPUTER SCIENCE. Automatic Transcription of Piano Music. Sara Corfini. INTRODUCTION. Trascribing recordings of piano music into a MIDI rapresentation MIDI provides a compact representation of musical data
E N D
LANGUAGE AND INTELLIGENCE U N I V E R S I T Y O F P IS A DEPARTMENT OF COMPUTER SCIENCE Automatic Transcription of Piano Music Sara Corfini AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI
INTRODUCTION • Trascribing recordings of piano music into a MIDI rapresentation • MIDI provides a compact representation of musical data • Score-following for computer-human interactive performance • “Signal-to-score” problem • A hidden Markov model approach to piano music transcription • A “state of nature” can be realized through a wide range of data configurations • Probabilistic data representation • Automatically learning this probabilistic relationship is more flexible than optimizing a particular model • Rules describing the musical structure can be more accurately represented as tendencies AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI
THE MODEL • The acustic signal is segmented into a sequence of frames (“snapshots” of sound) • For each frame a feature vector y1,…,yN is computed • Goal to assign a label to each frame describing its content • A generative probabilistic framework (a hidden Markov model) • outputs the observed sequence of features vectors y1,…,yN • hidden variables labels • A Hidden Markov model is composed of two processes X = X1,…,XNand Y = Y1,…,YN • X is the hidden (or label) process and describes the way a sequence of frame labels can evolve (a Markov chain) • We do not observe the X process directly, but rather the feature vector data • The likelihood of a given feature vector depends only on the corresponding label AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI
THE LABEL PROCESS • GOAL to assign a label to each frame where each label ∈L • Components of the label • the pitch configuration (chord) • “attack”, “sustain”, “rest” portions of a chord • We define a random process (a Markov chain) X1,…,XN that takes value in the label set L • The probability of the process occupying a certain state (label) in a given frame depends only on the preceding state (label) • where p(x’|x) is the transition probability matrix and X1n = (X1,…,XN) AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI
THE LABEL PROCESS • Markov model for a single chord • Markov model for recognition problem • the final state of each chord model is connected to the initial state of each chord model • a silence model is constructed for the recorder space before and after the performance AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI
THE OBSERVABLE PROCESS • Rather than observe the label process x1,…,xN, we observe feature vector data y1,…,yN (probabilistically related to labels) • Assumption of HMM each visited state Xn produces a feature vector Yn from a distribution that is characteristic of that state • Hence, given Xn, Yn, is conditionally independent of all other frame labels and all other feature vectors AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI
THE OBSERVABLE PROCESS • We compute a vector of features for each frame y = (y1,…,yK) • The components of this vector are conditionally independent given that state • The state are tied different states share the same feature distributions • Where the Tk(x) is constructed by hand • Hence we have AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI
THE OBSERVABLE PROCESS • Tk(x) can be clarified by describing the computed features • y1 measures the total energy in the signal (to distinguish between the times when the pianist plays and when there is silence) • T1(x) = 0 for the silence and rest states • T1(x) = 1 for the remaining states • Two probabilistic distributions: • p(yk|T1(x)=0) • p(yk|T1(x)=1) • Partition of the label set generated by • T1(x) : {x ∈ L : T1(x)=0}, {x ∈L : T1(x)=1} AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI
THE OBSERVABLE PROCESS • y2 measures the local burstiness of the signal (to distinguish between note “attacks” and steady state behaviour) • y2 computes several measures of burstiness (is a vector) • For this features, states can be partioned in three groups • T2(x) = 0 states at the beginning of each note (high burstiness) • T2(x) = 1 states corresponding to steady state behaviour (relatively low burstiness) • T2(x) = 2 silence states AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI
THE OBSERVABLE PROCESS • y3,…,yK concerns the problem of distinguishing between the many possible pitch configuration • Each features of y3,…,yKis computed from a small frequency interval of the Fourier transformed frame data • For each window we compute • the empirical mean location of the harmonic (when there is a single harmonic in the window) • the empirical variance to distinguish probabilistically when there is a single harmonic (low variance) and when there is not (high variance) • State can be partinioned as • Tk(x) = 0 states in which no notes contain energy in the window • Tk(x) = 1 states having several harmonics in the window • Tk(x) = t states having a single harmonic at approximately the same frequency in the window AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI
TRAINING THE MODEL • Since the HMM formulation, the probability distribution can be trained in an unsupervised fashion • An iterative procedure (Baum-Welch algorithm) allows to automatically train from signal-score pairs • When the score is known, we can build a model for the hidden process • The algorithm • Starts from a neutral starting place (we begin with uniformly distributed output distributions) • Iterates the process of finding a probabilistic correspondence between model states and data frames • Next, we retrain the probability distribution using this corrispondence AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI
TRAINING THE MODEL • Output distributions on feature vectors are represented through decision trees • For each distribution p(yk|Tk(x)) we form a binary tree • Each non terminal node corresponds to a question yk,v < c (where yk,v is the vth component of feature k) • An observation yk can be associated with a non terminal node by dropping the observation down the tree (evaluating the root question) • The process continues until it arrives at a terminal node, denoted by Qk(yk) • As the training procedure evolves, the trees are re-estimated at each iteration to produce more informative probability distributions AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI
RECOGNITION • The traditional HMM approaches to recognitions seeks the most likely labeling of frames, given the data, through dynamic programming • This corresponds to find the best path through the graph, where the reward in going from state xn-1 to xn in the nth iteration is given by • The Viterbi algorithm constructs the optimal paths of lenght n from the optimal paths of length n-1 • The computational complexity grows with the square of the state-space which is completely intractable in this case • The state space is on the order of 108 (under restrictive assumptions on the possible collection of pitches and the number of notes in a chord) AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI
RECOGNITION • We use the data model constructed in the training phase to produce a condensed version of the state graph • For each frame n we perform a greedy search that seeks a plausible collection of state x ∈ L for that frame • This is accomplished by searching for states x giving large value to p(yn|x). The search is performed by • Finding the mostly likely 1-note hypotheses • Then considering 2-note hypotheses and so on • Each frame n will be associated with a possible collection of states An • The state are blended by letting • The graph is constructed by restricting the full graph to the Bn set • Disadvantage if the true state at frame n is not captured by Bn, then it cannot be recovered during recognition AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI
EXPERIMENTS • The hidden Markov model has been trained by using data taken from various Mozart piano sonatas • The result concerns a performance of Sonata 18, K.570 • Objective measure of performance edit distance • Recognition erroe rates are reported as • Note error rate 39% (184 substitutions, 241 deletions, 108 insertions) • If two adjacent recognized chords have a pitch in common, it is assumed that the note is not rearticulated • Inability to distinguish between chord homonyms AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI
CONCLUSION • Recognition results leave room for improvements • Results may be useful in a number of Music Information Retrieval applications tolerant of errorful representaions • The current system works with no knowledge of the plausibility of various sequences of chord probability of chord sequence • Probabilistic model that models the likelihood of chord sequences • The current system makes almost no effort to model the acoustic characteristics of the highly informative note onsets • A more sophisticated “attack” model would help in recognizing the many repeated notes which the system currently misses AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI
REFERENCES • Christopher Raphael • Automatic transcription of piano music • In Proceedings of the 3rd Annual International Symposium on Music Information Retrieval (ISMIR), Michael Fingerhut, Ed., pp. 15-19, • IRCAM - Centre Pompidou, Paris, France, October 2002. AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI