410 likes | 521 Views
SMS-Based Web Search for Low-end Mobile Devices. Eric Brewer University of California. Lakshmi Subramanian New York University. Jay Chen New York University. -------- XinMiao Wu 2011-05-11. Outline. What the authors address Introduction Related Work SMSFind Problems
E N D
SMS-Based Web Search for Low-end Mobile Devices Eric Brewer University of California Lakshmi Subramanian New York University Jay Chen New York University -------- XinMiao Wu 2011-05-11
Outline • What the authors address • Introduction • Related Work • SMSFind Problems • SMSFind Search Algorithm • Implementation • Evaluation • Discussion • Conclusion
Explanation • SMS • Short Messaging Service • 140 bytes limited • SMS-Based Web Search • Not via XHTML/WAP • Just uses SMS Service
Conventional SMS-Based Web Search …………… …………… ……………. invoke 2 1 Short message 1.Response1 2. response2 3. response3 4. response4 . . . . . . . . . …………… …………… ……………. response 3 4 SMS Server …………… …………… ……………. User …………… …………… ……………. TOP N search response Short messages Search Engine
What the authors address …………… …………… ……………. invoke 2 1 Short message 1.Response1 2. response2 3. response3 4. response4 . . . . . . . . . …………… …………… ……………. response 140 bytes main Content 3 4 5 SMS Server extract Short message Snippet User TOP N search response Search Engine (SMSFind)
Outline • What the authors address • Introduction • Related Work • SMSFind Problems • SMSFind Search Algorithm • Implementation • Evaluation • Discussion • Conclusion
Why meaningful? • Growth of the mobile phone market • motivated the design of new forms of mobile information services • Growth of Twitter and other social messaging networks • Short-Messaging Service (SMS) based applications and services become popular • Mobile devices in developing regions are still simple low-cost devices • With limited processing and communication capabilities • Voice and SMS will likely continue to remain the primary communication channels
Why SMS-Based Search? • For any SMS-based web service, efficient SMS-based search is an essential building block. vertical (Google SMS and Yahoo! oneSearch) • Existing long tail (ChaCha,JustDial) --- need human being • None of the existing automated SMS search services is a complete solution for search queries across arbitrary topics. ---- Using pre-defined topics, such as “define” or “movies” (e.g. Google SMS: “define boils”)
Difficulties of SMS-Based Search • 140 bytes • Search response time (10 seds ~ several mins) • Small form factor and low bandwidth (Even XHTML/WAP) • Long tail phenomenon • Rarely have the luxury (VS. Desktop) • Ambiguous • Problem: How does a mobile user efficiently search the Web using one round of interaction where the search response is restricted to one SMS message? • SMSFind
Outline • What the authors address • Introduction • Related Work • SMSFind Problems • SMSFind Search Algorithm • Implementation • Evaluation • Discussion • Conclusion
Related Works • Two surveys • First: Need a new mobile search model for low-end mobile devices. • Second: SMS is expected to continue its growth as it is popular, cheap, reliable and private. • Two kinds of SMS search • Vertical: Google , Yahoo! , and Microsoft • Long tail: ChaChaand Just Dial • Automatic Text Summarization • The goal is different
Related Works • The problem that SMSFind seeks to address is similar to: • A question/answering systems (developed by the Text Retreival Conference) • But distinct from: • Unstructured search style queries (simple natural language style) • SMSFind is a snippet extraction and snippet ranking algorithm • The collection of documents being searched over
Outline • What the authors address • Introduction • Related Work • SMSFind Problems • SMSFind Search Algorithm • Implementation • Evaluation • Discussion • Conclusion
SMSFind Search Problem • Characterized as follows: Given <query, hint> + the top N search response pages extract a text snippet as an appropriate search response to the query. Note that: • What is a snippet? • What is the hint?
Outline • What the authors address • Introduction • Related Work • SMSFind Problems • SMSFind Search Algorithm • Implementation • Evaluation • Discussion • Conclusion
Disambiguate query • A common technique: • use additional contextual information from which the search is being conducted. • here we use an explicit hint. • Consider the query : <“Barack Obama wife”, “wife”>.
<“Barack Obama wife”, “wife”> • Most search result pages will contain: • “Michelle” or “Michelle Obama” or “Michelle Robinson” or “Michelle Lavaughn Robinson” within the neighborhood of the word “wife” in the text of the page. • SMSFind will search the neighborhood of the word “wife” in every result page and look for commonly occurring n-grams. • 1<=n<=5. For example, “Michelle Obama” is a 2−gram.
n-grams and snippets • Both represent continuous sequences of words in a document • A n-gram is extremely short in length (1−5 words) • A text snippet is a sequence of words that can fit in a single SMS message • n-grams are used as an intermediate unit • Snippets are used for the final ranking
SMSFind Algorithm • Consider a search query (Q,H) • Q is the search query containing the hint term(s) H. • Let P1, . . . PN represent the textual content of the top N search response pages to Q. • Three steps: Neighborhood Extraction; N-gram Ranking; Snippet Ranking
Basic rationale of n-gram ranking algorithm • Any n-gram which satisfies the following three properties is potentially related to the appropriate response: • 1. the n-gram appears very frequently around the hint. • 2. the n-gram appears very close to the hint. • 3. the n-gram is not a commonly used popular term or phrase. • As an example, the n-gram “Michelle Obama”.
Three Metrics • Frequency - The number of times the n-gram occurs across all snippets. • Mean rank – The sum of the PageRanksof every page in which the n-gram occurs, divided by the n-gram’s raw frequency. • MinimumDistanceto the hint.
Should return the response “rainnwilson” Here, freq(s), meanrank(s) and mindist(s) are normalized scores of a n-gram s
Hint Extraction from the Query • 45% of the queries began with the word “what” . • And over 80% of the queries are in standard forms . (e.g. “what is”, “what was”, “what are”, “what do”, “what does”). • The “what is X” pattern . • Example, the hint of “what is a quote by ernesthemingway” is “quote”. (“a” is a stop word )
Outline • What the authors address • Introduction • Related Work • SMSFind Problems • SMSFind Search Algorithm • Implementation • Evaluation • Discussion • Conclusion 8 mins
IMPLEMENTATION • 600 lines of Python code • 1.8Ghz Duo Core Intel PC • 2 GB of RAM • 2 Mbps broadband • A front-end • Setup a SMS short code with a local telco in Kenya
Outline • What the authors address • Introduction • Related Work • SMSFind Problems • SMSFind Search Algorithm • Implementation • Evaluation • Discussion • Conclusion
EVALUATION • How about the query set? • How about the correct answers? • How to judge correct or not? • How about the percentage of verticals? • Can the hint be always got correctly?
Result • SMSFind results in 57.3% correct answers. • While Google SMS results in only 9.5% of these queries.
What is more interesting? • if remove the vertical queries? • if consider only the highest n-grams returned rather than the entire snippet? • Whether n-grams are necessary or if ranking snippets alone would perform just as well? • How Important is the Hint Term?
Outline • What the authors address • Introduction • Related Work • SMSFind Problems • SMSFind Search Algorithm • Implementation • Evaluation • Discussion • Conclusion
Difficult Types of Queries • Really ambiguous • Explanations • Enumerations • Analysis • Time sensitive SMSFind can not handle these kinds of queries now!
Outline • What the authors address • Introduction • Related Work • SMSFind Problems • SMSFind Search Algorithm • Implementation • Evaluation • Discussion • Conclusion
CONCLUSION • We have presented SMSFind, an automated SMS-based search response system. • SMSFind can work across arbitrary topics. • We find that a combination of simple Information Retrieval algorithms with existing search engines can provide reasonably accurate search responses for SMS queries. • SMSFind is able to answer 57.3% of the queries in our test set.