310 likes | 325 Views
Geo/Spatial Search with MySQL. Alexander Rubin Senior Consultant, MySQL AB. Why Geo Search?. Stores: find locations new you Social networks: find friends close to you Online maps: find points of interest near your position Online newspapers/yellow pages: find show times next to you home.
E N D
Geo/Spatial Search with MySQL Alexander Rubin Senior Consultant, MySQL AB
Why Geo Search? • Stores: find locations new you • Social networks: find friends close to you • Online maps: find points of interest near your position • Online newspapers/yellow pages: find show times next to you home.
Common Tasks • Task: Find 10 nearby hotels and sort by distance • What do we have: • Given point on Earth: Latitude, Longitude • Hotels table: • Question: How to calculate distance between us and hotel?
Distance between 2 pointsThe Haversine Formula For two points on a sphere (of radius R) with latitudes φ1 and φ2, latitude separation Δφ = φ1 − φ2, and longitude separation Δλ the distance d between the two points:
The Haversine Formula in MySQL R = earth’s radius Δlat = lat2− lat1; Δlong = long2− long1a = sin²(Δlat/2) + cos(lat1) * cos(lat2) * sin²(Δlong/2)c = 2*atan2(√a, √(1−a)); d = R*c 3956 * 2 * ASIN ( SQRT ( POWER(SIN((orig.lat - dest.lat)*pi()/180 / 2), 2) + COS(orig.lat * pi()/180) * COS(dest.lat * pi()/180) * POWER(SIN((orig.lon - dest.lon) * pi()/180 / 2), 2) ) ) as distance angles need to be in radians
Lat can be negative! MySQL Query: Find Nearby Hotels set @orig_lat=122.4058; set @orig_lon=37.7907; set @dist=10; SELECT *,3956 * 2 * ASIN(SQRT( POWER(SIN((@orig_lat - abs(dest.lat)) * pi()/180 / 2), 2) + COS(@orig_lat * pi()/180 ) * COS(abs(dest.lat) * pi()/180) * POWER(SIN((@orig_lon – dest.lon) * pi()/180 / 2), 2) ))as distance FROM hotels dest having distance < @dist ORDER BY distance limit 10;
Find Nearby Hotels: Results +----------------+--------+-------+--------+ | hotel_name | lat | lon | dist | +----------------+--------+-------+--------+ | Hotel Astori.. | 122.41 | 37.79 | 0.0054 | | Juliana Hote.. | 122.41 | 37.79 | 0.0069 | | Orchard Gard.. | 122.41 | 37.79 | 0.0345 | | Orchard Gard.. | 122.41 | 37.79 | 0.0345 | ... +----------------+--------+-------+--------+ 10 rows in set (4.10 sec) • 4 seconds - very slow for web query!
MySQL Explain query Mysql> Explain … select_type: SIMPLE table: dest type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 1787219 Extra: Using filesort 1 row in set (0.00 sec)
10 Miles How to speed up the query • We only need hotels in 10 miles radius • no need to scan the whole table
How to calculate needed coordinates • 1° of latitude ~= 69 miles • 1° of longitude ~= cos(latitude)*69 • To calculate lon and lat for the rectangle: set lon1 = mylon-dist/abs(cos(radians(mylat))*69); set lon2 = mylon+dist/abs(cos(radians(mylat))*69); set lat1 = mylat-(dist/69); set lat2 = mylat+(dist/69);
Modify the query SELECT destination.*, 3956 * 2 * ASIN(SQRT( POWER(SIN((orig.lat - dest.lat) * pi()/180 / 2), 2) + COS(orig.lat * pi()/180) * COS(dest.lat * pi()/180) * POWER(SIN((orig.lon -dest.lon) * pi()/180 / 2), 2) )) as distance FROM users destination, users origin WHERE origin.id=userid and destination.longitude between lon1 and lon2 and destination.latitude between lat1 and lat2
Stored procedure CREATE PROCEDURE geodist (IN userid int, IN dist int) BEGIN declare mylon double; declare mylat double; declare lon1 float; declare lon2 float; declare lat1 float; declare lat2 float; -- get the original lon and lat for the userid select longitude, latitude into mylon, mylat from users5 where id=userid limit 1; -- calculate lon and lat for the rectangle: set lon1 = mylon-dist/abs(cos(radians(mylat))*69); set lon2 = mylon+dist/abs(cos(radians(mylat))*69); set lat1 = mylat-(dist/69); set lat2 = mylat+(dist/69);
Stored Procedure, Contd -- run the query: SELECT destination.*, 3956 * 2 * ASIN(SQRT( POWER(SIN((orig.lat - dest.lat) * pi()/180 / 2), 2) + COS(orig.lat * pi()/180) * COS(dest.lat * pi()/180) * POWER(SIN((orig.lon -dest.lon) * pi()/180 / 2), 2) )) as distance FROM users destination, users origin WHERE origin.id=userid and destination.longitude between lon1 and lon2 and destination.latitude between lat1 and lat2 having distance < dist ORDER BY Distance limit 10; END $$
Speed comparison • Test data: US and Canada zip code table, 800K records • Original query (full table scan): • 8 seconds • Optimized query (stored procedure): • 0.06 to 1.2 seconds (depending upon the number of POIs/records in the given radius)
Stored Procedure: Explain Plan Mysql>CALL geodist(946842, 10)\G table: origin type: const key: PRIMARY key_len: 4 ref: const rows: 1, Extra: Using filesort table: destination type: range key: latitude key_len: 18 ref: NULL rows: 25877, Extra: Using where
Geo Search with Sphinx • Sphinx search (www.sphinxsearch.com) since 0.9.8 can perform geo distance searches • It is possible to setup an "anchor point" in the api code and then use the "geodist" function and specify the radius. • Sphinx Search returns in 0.55 seconds for test data regardless of the radius and zip $ php test.php -i zipdist -s @geodist,asc Query '' retrieved 1000 matches in 0.552 sec.
Different Type of Coordinates • Decimal Degrees (what we used) • 37.3248 LAT, 121.9163 LON • Degrees-minutes-second (used in most GPSes) • 37°19′29″N LAT, 121°54′59″E LON • Most GPSes can be configured to use Decimal Degrees • Other
Converting between coordinates • Degrees-Minutes-Seconds to Decimal Degrees: • degrees + (minutes/60) + (seconds/3600) CREATE FUNCTION `convert_from_dms` (degrees INT, minutes int, seconds int) RETURNS double DETERMINISTIC BEGIN RETURN degrees + (minutes/60) + (seconds/3600); END $$ mysql>select convert_from_dms (46, 20, 10) as DMS\G dms: 46.33611111
Geo Search with Full Text search • Sometimes we need BOTH geo search and full text search • Example 1: find 10 nearest POIs, with “school” in the name • Example 2: find nearest streets, name contains “OAK” • Create FullText index and index on LAT, LON • Alter table geonames add fulltext key (name); • MySQL will choose which index to use
Geo Search with Full Text search: example • Grab POI data from www.geonames.org, upload it to MySQL, add full text index Mysql> SELECT destination.*, 3956 * 2 * ASIN(SQRT(POWER(SIN((orig.lat - dest.lat) * pi()/180 / 2), 2) + COS(orig.lat * pi()/180) * COS(dest.lat * pi()/180) * POWER(SIN((orig.lon -dest.lon) * pi()/180 / 2), 2) )) as distance FROM geonames destination WHERE match(name) against (‘OAK’ in boolean mode) having distance < dist ORDER BY Distance limit 10;
Geo Search with Full Text search: Explain mysql> explain SELECT destination.*, 3956 * 2 * ASIN(SQRT(POWER(SIN(… table: destination type: fulltext possible_keys: name_fulltext key: name_fulltext key_len: 0 ref: rows: 1 Extra: Using where; Using filesort
DEMO DEMO: Find POI near us • Use GPS • All POIs near GPS point • Match keyword
Using MySQL Spatial Extension CREATE TABLE `zipcode_spatial` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `zipcode` char(7) NOT NULL, … `lon` int(11) DEFAULT NULL, `lat` int(11) DEFAULT NULL, `loc` point NOT NULL, PRIMARY KEY (`id`), KEY `zipcode` (`zipcode`), SPATIAL KEY `loc` (`loc`) ) ENGINE=MyISAM;
Zipcode with Spatial Extension mysql> select zipcode, lat, lon, AsText(loc) from zipcode_spatial where city_name = 'Santa Clara' and state ='CA' limit 1\G ****** 1. row******** zipcode: 95050 lat: 373519 lon: 1219520 AsText(loc): POINT(1219520 373519)
Spatial Search: Distance Spatial Extension: no built-in distance function CREATE FUNCTION `distance` (a POINT, b POINT) RETURNS double DETERMINISTIC BEGIN RETURN round(glength(linestringfromwkb (linestring(asbinary(a), asbinary(b))))); END $$ (forge.mysql.com/tools/tool.php?id=41)
Spatial Search Example SELECT DISTINCT dest.zipcode, distance(orig.loc, dest.loc) as sdistance FROM zipcode_spatial orig, zipcode_spatial dest WHERE orig.zipcode = '27712' having sdistance < 10 ORDER BY sdistance limit 10;