140 likes | 248 Views
Computer Science Department Industrial Project (234313). Presented by: Michal Nir , Saar Gross Supervisors: Nadav Golbandi , Oren Somekh. Geo-driven photo tag recommendation.
E N D
Computer Science Department Industrial Project (234313) Presented by: Michal Nir, Saar Gross Supervisors: NadavGolbandi, Oren Somekh Geo-driven photo tag recommendation
This project extends on a previous project which includes a client application (Android) and a server application (Running on Tomcat). • The user takes a photo using his smartphone and records an audio linked to that photo. • Tags are extracted from the audio using speech-to-text and the photo, with its tags, is uploaded to Flickr. • The speech-to-text engine (Sphinx) works best using small dictionaries. • In our project, we will try to supply Sphinx with a custom dictionary created for each photo (Or stack of photos) using the photo’s geo-location information. • Using the geo-location info, we can extract relevant tags from Flickr, thus creating the custom dictionary. introduction
Project Goals • Implement a new module, running on the server application, that will create custom dictionaries for the Sphinx voice-to-text engine. • Optimize the algorithm for creating the custom dictionary while achieving optimal results with acceptable hit on performance.
The server generates tag recommendations, in one of two ways: • Uploading an image (Or multiple images) that contains a geo-location, with an audio file attached, will trigger the server to create a custom dictionary for the Sphinx voice-to-text engine. • The client may ask for tag recommendations by sending a request containing the image’s geo-location only. • The server can also be instructed not to use the image’s geo-location for compiling the recommendations list (Privacy concerns) and in that case, only the user’s “private tags” will be used. Methodology
The server supports uploading multiple images- • When uploading multiple images, images are clustered into different groups based on location (Using a simple and deterministic algorithm). • The server will compile a recommendation list for each group. • Every image with an audio file attached will be processed using Sphinx with its group’s custom dictionary. • All images will be uploaded to Flickr using their identified tags and user-supplied tags. • Returning recommendations only for a group of images is essentially the same. • Except, we only return recommendations for the largest group of images. Methodology
Method of compiling a recommendation list for an image (Or group of images): Methodology Implemented using independent threads (All running in parallel) Public Tags (Based on geo-location) By ranking tags found in images near the given geo-location To Android Client (When asking for Tag Recommendations only) Group of images Public Tags (Based on geo-location) By querying Flicker’s Places API Merging Results Merging parameters are configurable Private Tags (NOT using geo-location) By ranking the user’s past used tags To Sphinx (When uploading images to Flickr)
Server side: • Tag Recommendation are compiled for an image/group of images and can be presented to the user (Recommendation only) or used for Sphinx voice-to-text. • Performance: • In general- Pretty good. • Compiling a recommendation list usually takes no more than a few seconds. • In any case, a time limit is enforced. • Most interaction with Flickr is completely multi-threaded to avoid bottlenecks. • Compiled recommendation lists are cached based on time and location to optimize performance further. Achievements
Server properties file: • Virtually all parameters needed for the server are acquired externally from a properties (Settings) file. • Tweaking the server becomes an easy and intuitive task. • The server uses 2 different sets of settings: • Settings to be used when uploading images to Flickr. • Settings to be used when asking for Tag Recommendations only. • Gives us more flexibility when changing the server’s settings. • Example from imageupload.properties: x Achievements
Client side: Achievements
Client side: • Merged the Camera and Gallery applications into one. • Added a new Tag Editor (Can now add/edit and remove tags from images). • Added support for working with multiple images and getting tag recommendations. • Many bug fixes and GUI improvements: • New Image Properties dialog. • Updated menus and icons. • Improved gallery performance and design. Achievements
For evaluating the algorithm’s performance, we would like to do the following: • Find a user who uploaded many tagged images (With a reasonable time difference between them) in a popular location (San Francisco bridge, Las-Vegas Strip). • Perform a cross-validation analysis- • Choose a subset of images from the user’s images. • Send the images to server and receive tag recommendations for them. • Evaluate the accuracy (Precision and Recall) of the recommendations using the 2 left-out images. • Repeat… • Our expectations are that accuracy will be affected by many factors- • Number of tags merged into final recommendation list from each source. • Dictionary size. Testing
We wrote TagRecTestFramework- • Completely automated. • Behaves like a “normal” client (Server thinks it’s talking to an Android client). • For each given location- • Finds a user with enough tagged images (Configurable…) in the area with a small time difference between images (Also configurable). • Perform cross-validation on grouped images. Testing
Testing • 10 images in each group, Min. of 20 tags per image • Search radius: 1 KM, Time difference between images: Max. 1 day Piazza San Pietro (Vatican City) (41.902309, 12.457341)
Algorithm’s accuracy is very image/user-dependent: • We found that most images in Flickr are not tagged or tagged with irrelevant tags. • Most images on Flickr are not geotagged. • Flickr has ~5 billion photos. • Only ~170 million are geotagged (~3% of all photos). • Quality of results could be improved by tweaking the server’s settings- • Giving more weight to private/public tags affects the accuracy. • Compiling a larger recommendation list (And thus, a larger dictionary for Sphinx) improves recall but may hurt Sphinx’s performance (Sphinx works best with small dictionaries). Conclusions