1 / 34

Search engine and services

Search engine and services. Course: Location Aware Machine Intelligence Presented by : Celestine Mkama Kalendero 25.02.2014. Outline. Search Engine results ranking based on location Review of Personalized Mobile Search Engine Extraction of Address Data from Unstructured Text.

kezia
Download Presentation

Search engine and services

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Search engine and services Course: Location Aware Machine Intelligence Presented by : Celestine Mkama Kalendero 25.02.2014

  2. Outline • Search Engine results ranking based on location • Review of Personalized Mobile Search Engine • Extraction of Address Data from Unstructured Text

  3. Search Engine Results Ranking based on Location Carolyn Watters and Ghada Amoudi Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia. Canada. E-mail: watters@cs.dal.ca Publication Year: 2003

  4. Result Ranking in Search engine ( as in the year 2002 ) Search engine build their indexes based on • Keyword occurence Frequency of query negotiation Prons + Robust, Fast Cons • User sort through pages when queries related to physical distance and location 44 % of users frustrated by search engine (Realname,2000)

  5. Geosearcher • Location based ranking system • Translate search reference point into coordinates (Long,Lat) • Rank search results in ascending order based on distance Geosearcher architecture

  6. Geosearcher architecture-Query • Presented by end system users e.g skiing resort District of Columbia Query- Skiing resolt Reference Point- District of Columbia • Sample random Urls available ( used for evaluation )

  7. Geosearcher architecture-Geocoding • Process of assigning latitude and longitude coordinates to the host for each site; - Preliminary work ( Perfomed by researchers) • Determine Location • Create Lookup table

  8. Geosearcher architecture-Geocoding • Determining Location From Host Urls – DNS,Country Codes,Whois database - Map location into coordinates e.g Use Getty Thesaurus(GS) to map location into cordinates + Containing state and area code for US,Canada + Other Countries b) Lookup Table - Country Codes with Coordinates www.about.com www.dartmouth.camathresource.com

  9. Geosearcher architecture-Geocoding • Determining Location From Host Urls – DNS,Country Codes,Whois database - Map location into coordinates e.g Use Getty Thesaurus(GS) to map location into cordinates + Containing state and area code for US,Canada + Other Countries

  10. Example: Location Information Whois Database Getty thesaurus

  11. Geosearcher architecture-Geocoding The Process • Check coordinates from host table • If not, send domain to whois -Return Country Code(CC) and Area code on Match If CC is ca or us and area code, Lookup in Table :- Get state name or province c) If not ,strip down domain by 1 level (i.e data.about.com to about.com ) d) Unmatched names checked in IPtoLL(Host-LatLong Conversion) - IPtoLL uses administrative contact Store Results in host table Next

  12. Geosearcher architecture-Geocoding The Process • Check coordinates from host table • If not, send domain to whois -Return Country Code(CC) and Area code on Match If CC is ca or us and area code, Lookup in Table :- Get state name or province

  13. Distance and Ranking • For Ranking URL in host table from ref Location • Calculated using haversine distance • Stored in session host table • Rank results based on distance (Insertion sort)

  14. Results Unranked Result- Altavista Using Geosearcher

  15. Results..contd Validation of accuracy • Examined 100 result manually for Location Information • 90 websites assigned correctly • 78% of 83 URLs were accurately identified

  16. Results..contd Algorithm Effectiveness • Tested with 10 sets of 100 URLs using Yahoo Random Link generator

  17. Personalized Mobile Search Engine Using Location and Content Concepts Namrata G Kharate ME-Computer-II MCOERC, Nasik-India • Prof. S. A. Bhavsar • Assistant Prof. Computer Dept. • MCOERC, Nasik-India Publication: November, 2013

  18. Search - Mobile Devices • Search queries on mobile Devices – Shorter,ambiguous • Search Results- Less Accurate Solution We need a system that capture user preference to return personalized result ranking • Personalized Mobile Search Engine (PMSE)

  19. PMSE- System Architecture RSVM- Ranking Support Vector Machine Next

  20. PMSE- System Architecture RSVM- Ranking Support Vector Machine

  21. PMSE Client • Receive user requests • Store Click through Data (Location,Content) • Submit Request to server • Display results • Profile preference in ontology based user profile Server • Forward request to commercial search engine • RSVM Training • Search Result Reranking

  22. Extraction of Address Data from Unstructured Text using Free Knowledge Resources Sebastian Schmidt schmidt@kom.tudarmstadt.De Simon Manschitz manschitz@stud.tudarmstadt.de • Ralf Steinmetz • steinmetz@kom.tudarmstadt.de ChristophRensing rensing@kom.tudarmstadt.de • Multimedia Communications Lab • TechnischeUniversität • Darmstadt Germany Publication: November, 2013

  23. Extraction of Address Data • Is of interest in various domains • Location – based services • Address respiratory –automatically created - Automatic harvesting of web address is not possible Solution Identify business address data,hybrid approach • Combine Pattern & Gazetteers

  24. Address Structure-Germany • Company Name- No special pattern • Street- varies, Burgermeister-Jung,Bgm.-Jung • Street # - Digit sequence, e.g 45a,45-47 • Postal Code-exactly 5 numbers,reserved • Cities –Frankfurt,Ffm,Frankfurt/Main

  25. Address Data Identification Workflow

  26. Address Data Identification Preprocessing • Strip HTML Markup –e.g using Beautiful Soap Library • Clearing- Removing non-unicode chars,White space btn numbers • Line Splitting and Tokenizing –using Apache openNLP toolkit • Part of Speech Tagging- using TreeTagger Next

  27. Address Data Identification Line Splitting and Tokenizing –using Apache openNLP toolkit

  28. Address Data Identification 1. Postal Codes • Token regular expression [0-9]{5} 2. Cities • Generated list based on OpenStreetMap accessed via Overpass-API (28,087 entries) • Known city found in the list • Preceded directly by postal code

  29. Address Data Identification 3. Street Numbers • Use Regular expression ([0-9]{1,3})([a-zA-Z][0-9]?)?(([+|-])([0-9]{1,3})([a-zA-Z][0-9]?)?)? 4. Steet Names • Generated list based on OpenStreetMap accessed via Overpass-API (300,000 entries) • Use street name endings e.g str

  30. Address Data Identification 5. Company Name • Search Identical terms ( Wikipedia )- 29 terms e.g GmbH-Private,AG-Public • Exploit standard address structure

  31. Evaluation & Methology • Site with Legal Note (1,576 websites ) Fraction of full address identified correctly Rcorrect Address- 0.946, Rcompany-0.82

  32. Conclusion Search engine Ranking • Evaluation- Algorithm was accurate and effective • Efficiency- Impacted by reliance on external databases Reccommendation • Have Database of special resources – Increase efficiency • Adaptation to other languages- Address extraction

  33. Thank You! (Q&A)

More Related