280 likes | 479 Views
Social Search Engine. Using trusted metadata to improve the relevance of search results. The motivation factor.
E N D
Social Search Engine Using trusted metadata to improve the relevance of search results
The motivation factor • The essential idea of SpaK social searchengine is that people make decisions based primarily on a few people whom they trust. The average person has a set of experts whom they consult in designated areas: the computer expert, the car expert, the fashion expert, the financial expert. If the opinions of these experts can be collected, they are incredibly useful: it is this metadata (data about other data) that gives the most intelligent filtering and sorting of the information on the internet. • "We're the search engine, but you're the fuel."
What is a search engine? • A search engine is a program designed to help find information stored on a computer system, such as the World Wide Web, inside a corporate or a proprietary network or a personal computer. • The search engine allows one to ask for content meeting specific criteria and retrieves a list of references that match those criteria. • Search engines use regularly updated indexes to operate quickly and efficiently. • A program called a ‘crawler’ indexes all the web pages as it ‘crawls’ through the links on a page.
Problems with existing search engines • Page rank system: Page rank assumes each incoming link is a valid vote for a website. Some links are not really valid at all. People use guestbook and blogs to spam and they are becoming less efficient.
Problems with existing searchengines • Word frequency: In this method, relevance is decided based on the number of times the word repeats in web page. Word frequency can be increased by inserting unrelated keywords in a ‘meta’ tag of a web page.
+ Social community Existing search engines Examples: Engineer Doctor Student Actors Indians Americans Examples: Yahoo MSN Google AltaVista Social search engines An alternative: Social Search
About social search engines • Social search engines are a class of search engines that use social networks to organize, prioritize or filter search results. • They use ‘metadata’ to judge the relevance of web pages to a user. • ‘Metadata’ is defined as ‘data about data’. • In this case, the metadata refers to the feedback given by the community about the web pages.
Continued….. • It is really about people indexing the information we find on the web ,instead of the computational formulae that guide the traditional sites. • Since the relevance is based on trust, the users of such a social search engine are automatically secured from spamming and phishing sites.
This is how it works for the example • User searches for "Thailand", and the page containing photos of a friend's Thailand vacation is chosen by the search engine.
The registration process New user SIGN UP REGISTRATION PAGE • ASSIGNED A UNIQUE UID AND USERNAME. • NEW ENTRY IN THE MEMBERS TABLE. NO YES Re-register Login
After successful log in… Profile page 1 View Community 2 View Buddies 3 Plain search 4 Search History 5 Categories
Community View community List of other Registered members THE BUDDY TABLE IN THE DATABASE IS ACCESSED NO YES Rate your friend on a scale of 1 to 5. Display message: ADDED AS FRIEND Display message: ALREADY ADDED AS FRIEND
Buddies… View buddy CLICK TO DELETE Database entry deleted. Display message: DATABASE ENTRY HAS BEEN DELETED
Search history… Personal search history Search on specified topic Buddy search history
Search Stem from user and his buddies Porter’s stemmer algorithm Default results Select the query Similarity function KN(p,q) / MAX [ kn(p), kn(q) ] Aggregate user rating, clicks, similarity. Result array extracted Using community feedback Default result array Using the search api Display re-ordered results
The Similarity Function • For each stemmed word, we select a similar stem from the database, and the queries associated with that stem are extracted. • Using the similarity function KN (p, Q)/max [kn (p), kn (Q)] • Where KN (p, Q) is the number of common words in the extracted query (p) and the user query (Q). • kn(p) is the number of words in the extracted query. • kn(Q) is the number of words in the user query.
Continued… • The output of the similarity function is a real number which lies between (0,1).The similarity value for a user query is stored in the database for corresponding extracted query. • We calculate an aggregate value that is a function of (similarity * clicks * rating) and order the links in the descending order of the output of this function. • The array of links got from community feedback is compared with the default search results of the API. The default search results of the API are then rearranged based on the metadata received from the community.
Continued… • The output is arranged in decreasing order for a user based on previous searches made by his buddies for the same search query or a related query.
Stemming...its importance • Stemming is the process of stripping the suffix off a word. • Stemming is important for our project because words with common stems will usually have similar meanings, • for example: predict, prediction, predicted etc. • Keywords in the search query are grouped according to their stems.
Description of Porter Stemmer • A consonant in a word is a letter other than A, E, I, O or U.If a letter is not a consonant then it is a vowel. A consonant will be denoted by c, a vowel by v. A list ccc... of length greater than 0 will be denoted by C, and a list vvv... of length greater than 0 will be denoted by V. Any word, or part of a word, therefore has one of the four forms: • CVCV ... C CVCV ... V VCVC ... C VCVC ... V • These may all be represented by the single form [C]VCVC ... [V] where the square brackets denote arbitrary presence of their contents. • Using (VC) {m} to denote VC repeated m times, this may again be written as [C](VC){m}[V]. m will be called the \measure\ of any word or word part when represented in his form. The case m = 0 covers the null word.
Technologies used • Linux : The operating system • Apache : The web-server • MySQL : The RDBMS • PHP : Hypertext Pre Processor • CGI : Common Gateway Interface • CSS : Cascading Style Sheets
The End. Thank you. Please try it out!