780 likes | 929 Views
Real-time Human -Computer Interaction with Supervised Learning Algorithms for Music Composition and Performance. Rebecca Fiebrink Perry Cook, Advisor Pre-FPO, 6/14/2010. Source: googleisagiantrobot.com. function [ x flag hist dt ] = pagerank(A,optionsu ) [ m n ] = size(A );
E N D
Real-time Human-Computer Interaction with Supervised Learning Algorithms for Music Composition and Performance Rebecca FiebrinkPerry Cook, AdvisorPre-FPO, 6/14/2010
function [x flag histdt] = pagerank(A,optionsu) [mn] = size(A); if (m ~= n) error('pagerank:invalidParameter', 'the matrix A must be square'); end; options = struct('tol', 1e-7, 'maxiter', 500, 'v', ones(n,1)./n, … 'c', 0.85, 'verbose', 0, 'alg', 'arnoldi', … 'linsys_solver', @(f,v,tol,its) bicgstab(f,v,tol,its), … 'arnoldi_k', 8, 'approx_bp', 1e-3, 'approx_boundary', inf,… 'approx_subiter', 5); if (nargin > 1) options = merge_structs(optionsu, options); end; if (size(options.v) ~= size(A,1)) error('pagerank:invalidParameter', … 'the vector v must have the same size as A'); end; if (~issparse(A)) A = sparse(A); end; % normalize the matrix P = normout(A); switch (options.alg) case 'dense’ [x flag histdt] = pagerank_dense(P, options); case 'linsys’ [x flag histdt] = pagerank_linsys(P, options) case 'gs’ [x flag histdt] = pagerank_gs(P, options); case 'power’ [x flag histdt] = pagerank_power(P, options); case 'arnoldi’ [x flag histdt] = pagerank_arnoldi(P, options); case 'approx’ [x flag histdt] = pagerank_approx(P, options); case 'eval’ [x flag histdt] = pagerank_eval(P, options); otherwise error('pagerank:invalidParameter', ... 'invalid computation mode specified.'); end; function [x flag histdt] = pagerank(A,optionsu)
? Effective Efficient Satisfying
? Effective Efficient Satisfying Machine learning algorithms
Outline • Overview of computer music and machine learning • The Wekinator: A new interface for using machine learning algorithms • Live demo + video • Completed studies • Findings • Further work for FPO and beyond • Wrap-up
Interactive computer music sensed action interpretation response (music, visuals, etc.) computer
Computer as instrument sensed action interpretation sound generation computer
Computer as instrument sensed action interpretation mapping sound generation human + control interface computer
Computer as collaborator sensed action interpretation model meaning sound generation microphone and/or sensors computer
A composed system sensed action mapping/model/interpretation mapping/model/interpretation response
Supervised learning inputs model training data algorithm Training outputs
Supervised learning inputs “C Major” “F minor” “G7” training data model algorithm Training outputs “F minor” Running
Supervised learning is useful • Models capture complex relationships from the data and generalize to new inputs. (accurate) • Supervised learning circumvents the need to explicitly define mapping functions or models. (efficient) So why isn’t it used more often?
A lack of usable tools for making music Existing computer music tools • WEKA: • Many standard algorithms • Apply to any dataset • Graphical interface + API • > 10,000 citations (Google scholar) • (Witten and Frank, 2005) ??? Weka 1. General-purpose: many algorithms & applications ✓ ✓ ✗ ✗ 2. Runs on real-time signals ✓ ✓ 3. Appropriate user interface and interaction support ✗ ✓ Built by engineer-musicians for specific applications
Outline • Overview of computer music and machine learning • The Wekinator: A new interface for using machine learning algorithms • Live demo + video • Completed studies • Findings • Further work for FPO and beyond • Wrap-up
The Wekinator • A general-purpose, real-time tool with appropriate interfaces for using and constructing supervised learning systems. • Built on Weka APIs • Downloadable at http://code.google.com/p/wekinator/ (Fiebrink, Cook, and Trueman 2009; Fiebrink, Trueman, and Cook 2009; Fiebrink et al. 2010)
A tool for running models in real-time Feature extractor(s) .01, .59, .03, ... .01, .59, .03, ... .01, .59, .03, ... .01, .59, .03, ... time model(s) 5, .01, 22.7, … 5, .01, 22.7, … 5, .01, 22.7, … 5, .01, 22.7, … time Parameterizable process
A tool for real-time, interactive design Wekinator supports user interaction with all stages of the model creation process.
Under the hood Learning algorithms: Classification: AdaBoost.M1 J48 Decision Tree Support vector machine K-nearest neighbor Regression: MultilayerPerceptron joystick_x joystick_y webcam_1 … Feature1 Feature2 Feature3 FeatureN Model1 Model2 ModelM … … Parameter1 Parameter2 ParameterM volume pitch 3.3098 Class24
Tailored but not limited to music The Wekinator • Built-in feature extractors for music & gesture • ChucK API for feature extractors and synthesis classes Open Sound Control (UDP) Control messages Other feature extraction modules Other modules for sound synthesis, animation, …?
Outline • Overview of computer music and machine learning • The Wekinator: A new interface for using machine learning algorithms • Live demo + video • Completed studies • Findings • Further work for FPO and beyond • Wrap-up
Recap: what’s new? • Runs on real-time signals and general-purpose • A single interface for building and running models • Comprehensive support for interactions appropriate to computer music tasks
Outline • Overview of computer music and machine learning • The Wekinator: A new interface for using machine learning algorithms • Live demo + video • Completed studies • Findings • Further work for FPO and beyond • Wrap-up
Study 1: Participatory design process with 7 composers • Fall semester 2009 • 10 weeks, 3 hours / week • Group discussion, experimentation, and evaluation • Iterative design • Final questionnaire (Fiebrink et al., 2010)
Study 2: Teaching interactive systems building in PLOrk • COS/MUS 314 Spring 2010 • Focus on interactive music systems building • Wekinator midterm assignment • Master process of building a continuous and discrete gestural control system, and use in a performance • Logging + questionnaire • Final projects
Study 3: Bow gesture recognition • Winter 2010 • Work with a composer/cellist to build gesture recognizer for a commercial sensor bow • Classify standard bowing gestures • e.g., up/down, legato/marcato/spiccato(Fiebrink, Schedel, and Threw, 2010) • Outcomes: classifiers, improved software, written notes on engineering process
Study 4: Composition/composer case studies • Completed: Winter 2010 to present • CMMV (Dan Trueman, faculty) • Martlet (v 1.0) (Michelle Nagai, graduate student) • G (Raymond Weitekamp, undergraduate) • Blinky; nets0 (Rebecca Fiebrink) • Interviews completed with Michelle and Raymond
Outline • Overview of computer music and machine learning • The Wekinator: A new interface for using machine learning algorithms • Live demo + video • Completed studies • Findings • Further work for FPO and beyond • Wrap-up
Findings to date Interacting with supervised learning Training the user Supervised learning in a creative context Usability summary
Interactively training • Primary means of control: iteratively edit the dataset, retrain, and re-evaluate • A straightforward way of affecting the model • Add data to make a model more complex • Add or delete data to correct errors
Exercising control via the dataset N=21; Students re-trained an average of 4.64 times per task (4.91)
The interface to the training data is important • Real-time example recording and a single interface improve efficiency • Supports embodiment and higher-level thinking • Several composers used playalong learning as the dominant method • Supports different granularities of control • K-Bow: visual label editing interface • Spreadsheet editor is still used
Interactive evaluation • Evaluation of models is also an interactive process in Wekinator
“Traditional” evaluation (e.g. Weka) Available data Training set Train model Evaluation set Evaluate
Evaluation in Wekinator Training set Train model Evaluate
Interactive evaluation • Running models is primary mode of evaluation • In PLOrk study: • Model run & used: 5.3 times (5.3) per task; • On average, 4.0 minutes (out of 19 minutes) running • CV computed: 1.4 times (std dev. 2.6) per task • Traditional metrics also useful • Compare different classifiers quickly (K-Bow) • Validation (of the user’s model-building ability)
When is this interaction feasible? • Appropriate and possible for human to provide and/or modify the data • User has knowledge and (possibly control) over future input space • Training process is fast • Training time in PLOrk: Median .80 seconds, 71 % of trainings under 5 seconds • PLOrk # training examples in final round: Mean 692, std. dev. 610
Related approaches to interactive learning • Building models of the user • Standard in speech recognition systems • Use human experts to improve a model of other phenomena • Vision: Fails and Olsen, 2003 • Document classification: Baker, Bhandari, and Thotakura, 2009 • Web images: Amershi 2010 • Novel in music, novel for a general-purpose tool
Findings to date Interacting with supervised learning Training the user Supervised learning in a creative context Usability summary
Interaction is two-way control Machine learning algorithms feedback Running & evaluation
Training the user to provide better training examples • Minimize noise and choose easily differentiable classes
PLork students learned: “In collecting data, it is crucial, especially in Motion Sensor, that the positions recorded are exaggerated (i.e. tilt all the way, as opposed to only halfway.) Usually this will do the trick…” “I tried to use very clear examples of contrast in [input features]... If the examples I recorded had values that were not as satisfactory, I deleted them and rerecorded… until the model understood the difference…”