Android application for pictures/videos voice tagging

Technion - Israel Institute of Technology COMPUTER SCIENCE DEPARTMENT Industrial Project (234313) Android application for pictures/videos voice tagging Students: Yevgeni Sabin, Vladimir Rudenko Supervisors: Nadav Golbandi, Oren Somekh

Motivation • Picture and video sharing above internet is very popular today. • Users wants to tag their pictures for classification/retrieval purposes. • Many of those pictures are taken by mobile devices such as smartphones. • Nowadays in order to tag the picture, user have to type the name/tag on its phone’s keyboard. • The goal of our project is to simplify the process of taking the picture, tagging it and uploading it to the Internet by making it a “one clicks operation”.

Objectives • Make an Android smartphone able to record voice tags and add it to a picture. • Adding voice to the jpeg is done in a seamless way such that it can be still handled by standard jpeg tools (e.g., galleries) • Make an Android smartphone able to manage voice tags by adding, editing or deleting them using a picture browser.

Objectives • Make an Android smartphone able to upload their voice tagged pictures to external web server. • Currently we use Flickr as picture hosting server using Flickr API, which allows user to work with existing and popular web service. • Ensures secured connection to web service. • After uploading the voice tag enhanced picture, the application will be able to receive a feedback from the server that will include the extracted text tags.

Methodology • For achieving these objectives two standalone applications were developed: • TuCo Camera – camera application that allows voice tagging and uploading pictures in addition to standard operations. • TuCo Gallery – gallery application that allows voice tagging and uploading pictures in addition to standard operations. • Both applications were developed from scratch. • Separate development gives the user the opportunity to use only one of the applications in pair with the third party application. (e.g., TuCo Gallery + standard camera) .

Methodology

Image and audio encapsulation • Voice tagging application allows to record up to 15 sec of voice and insert the voice data directly to JPEG file w/o affecting the image data. • The audio file split into chunks of 64K. Each chunk is pushed into one “Application block”. We use App. 3 to App. 13 (they are available according to JPEG specification). • Audio is stored in PCM 16 kHz/16 bit format .

Image and audio encapsulation • Voice data layout • Header (128 byte) – includes various information such as: voice block size, upload status, text tags. • WAV Header (44 byte) – includes voice parameters in wav format. • PCM raw data (up to ~600k) – raw voice data.

System architecture Insert/extract voice from picture Upload picture to server Play/Record audio Shows all pictures in gallery Shows single picture full screen Shows single picture full screen Shows camera view

Future development • Add voice encoding to decrease voice data size • Concurrent multiple pictures uploading • Integration with other photo web services (such as Picasa and Panoramio) • GUI and UI improvement • Porting to other mobile devices (such as iPhone and Windows Mobile)

Android application for pictures/videos voice tagging