100 likes | 188 Views
Deepin Search. Wenxu Li & Ziming Zhai. Motivation. Google gives you the best results for everyone, but maybe not the best for you. Besides keyword match, maybe you also be aware of site speed, site quality or category belonging.
E N D
Deepin Search Wenxu Li & Ziming Zhai
Motivation • Google gives you the best results for everyone, but maybe not the best for you. • Besides keyword match, maybe you also be aware of site speed, site quality or category belonging. • It would be great if users can create their owns ranking methods.
Our Approach • Retrieve the first 24 results from google • Send request to Amazon Alexa Services to get insight information of each result url • Use our formula to calculate the score of each criteria for each url • Allow user to change the weight of each criteria • Re-rank the results based on the final scores
Main Functions • Customized rank: User can use scroll bar to give weights to five criteria • Speed • Quality • Popularity • Date Created • Keyword Match • View detailed information of each url User can view the general description of each url, 3 months traffic information and related sites • Display results in category We allow the results to show in categories
Amazon Alexa Services • URL Info • Related Links, Categories, LinksInCount • Rank, RankByCountry, RankByCity • UsageStats, Speed, Keyword, SiteData • ContactInfo, AdultContent, Language, OwnedDomains • SitesLinkingIn • CategoryListings • Domz, return a list of sites within that category • Traffic History since 06-01-2007 • Rank, Reach, PageView
Calculate Ranking Scores • Formula: • S = Us*Sspeed + Ut*Stime + Uq*Squality + Up*Spopularity + Ud*Sgoogle • Normalization: • Squality = (value-min)/(max-min) • Popularity Score: • Reach*PageView • Quality Score: • 0.5 * SLinksInCount + 0.5 * SPageView • Dummy Variable (Keep Google Ranking)
Implementation • jQuery + PHP + MySQL • AJAX + JSON + XML • Hosted on Godaddy • Amazon Alexa Cost $1.6 so far ($0.15 per 1000 requests) • Use hash (inverted index) to index url • Use Trie Structure to organize url in categories
Performance • Each Query (everything on the fly) • 5*3 connections to Google • 24*5 connections to Amazon Alexa • Godaddy has connection limitation • Actually more than 200 connection requests per query • Ajax to split a big task into 6 tasks, each one only deals with one kind of information • Store retrieved info to database, update regularly • It saves money
Demo • Demo • http://www.zhaiziming.com/deepin