SEMANTIC HIFI Browsing, listening, interacting, sharing on future HIFI systems

SEMANTIC HIFIBrowsing, listening, interacting, sharing on future HIFI systems Music Technology Group Universitat Pompeu Fabra (UPF) Barcelona

WP5. Performance Workpackage

Interaction & Performance “…it becomes possible for more people to make more satisfying music, more enjoyably and easily, regardless of physical coordination or theoretical study, of keyboard skills or fluency with notation. This doesn’t imply a dilution of musical quality. On the contrary, it frees us to go further and raises the base-level at which music making begins.” (Laurie Spiegel) “Let’s develop virtual instruments that do not just play back music for people, but become increasingly adept at making new and engaging music with people, at all levels of technical proficiency.” (Robert Rowe)

Interaction Has to be: • natural & intuitive • easy And yet… • allow expression • enjoyable • rewarding

Input devices • Feel natural • Maximize bandwidth • Profit from users’ knowledge

Input devices • Feel natural • Maximize bandwidth • Profit from users’ knowledge We propose the use of • Mouth: microphone + small video camera • Hands & arm: remote command used as a baton

Mouth control information will be reinforced by the two simultaneous input modes (sound + image)

Mouth • Mouth interaction will not only allow karaoke • The system will be able to detect at least 4 different mouth input modes: • Singing (karaoke) • Scat (instrumental solos) • Beat boxing (drums) • Silent mouth movements (filters & timbre changes) • Voice transformations include • Voice Excitation based Transformations (pitch change, hoarseness, whisper…) • Vocal Tract based Transformations (timbre…)

Music Context • The results of each of these interaction modes, will depend on the music being played • Use of metadata will provide increasing information

Music Context • The results of each of these interaction modes, will depend on the music being played • Use of metadata will provide increasing information • Example: Scatting on different musical styles

Music Context • This would correspond to a simplified context • More information can be obtained: • From the type of voiced sound (voice analysis - not mere pitch-2-MIDI – should profit all timbre information) • From additional metadata

Additional Metadata* Time-stamped information: Music • Composition parts (A, B, chorus…) • Harmonic & rhythmic details • Score • Program changes • …. Audio Analysis • …. *Format and contents to be defined in WP1.2

Editable Metadata • Advanced users will be able to edit and enrich the Metadata (in non real time), adding value to their contribution

Hands Movements Will provide complementary information • e.g. crash cymbal on beat boxing Alternate functions • e.g. baton conduction • tempo changes • dynamic changes • groove & swing modification • …… • ……

Hand & Body tracking • A camera fixed to the system could be used • For better tracking resolution (spatial & temporal) an additional device seems necessary • We propose to use the same command, fitted possibly with accelerometers (and wireless communication with the system)

Score Following IRCAM: Instrument Score follower (for automatic performer accompaniment) To be defined: • Options • MIDI (or synthetic) accompaniment • Time-stretched prerecorded audio • Data formats • data resulting from the audio analysis (UPF), sent to the score follower module (IRCAM) (voice2MIDI?) • position data from the score follower to the time-stretching module

Performing on a simple keyboard In this part Sony CSL will implement style and performing rules in a simple keyboard able to follow and continue the user play according to simple style constraints.

Deliverables

MTG Participants • Xavier Serra, local manager • Sergi Jordà, technical manager • Alex Loscos, voice processing • Martin Kaltenbrunner, interfaces • 1 additional programmer

SEMANTIC HIFI Browsing, listening, interacting, sharing on future HIFI systems