710 likes | 829 Views
Making the Sky Searchable: Automatically Organizing the World’s Astronomical Data. Sam Roweis, Dustin Lang & Keir Mierle University of Toronto David Hogg & Michael Blanton New York University. Organize All Astronomical Data.
E N D
Making the Sky Searchable:Automatically Organizing the World’s Astronomical Data Sam Roweis, Dustin Lang & Keir Mierle University of Toronto David Hogg & Michael Blanton New York University roweis@cs.toronto.edu
Organize All Astronomical Data • Vision: Take every astronomical image ever taken in the history of the world and unify them into a single accurately annotated and easily searchable database. • We want to include all modern professional telescope surveys plus all amateur photos, satellite images, historical plate archives… • How could this ever be possible? roweis@cs.toronto.edu
Core Technology: Starfield Search • You show me a picture of the night sky. • I tell you where on the sky it came from. roweis@cs.toronto.edu
? Rules of the game • We start with a catalogue of stars in the sky, and from it build an index which is used to assist us in locating (‘solving’) new test images. roweis@cs.toronto.edu
Rules of the game • We start with a catalogue of stars in the sky, and from it build an index which is used to assist us in locating (‘solving’) new test images. • We can spend as much time as we want building the index but solving should be fast. • Challenges:1) The sky is big.2) Both catalogues and pictures are noisy. roweis@cs.toronto.edu
Distractors and Dropouts • Bad news:Query images may contain some extra stars that are not in your index catalogue, and some catalogue stars may be missing from the image. • These “distractors” & “dropouts” mean that naïve matching techniques will not work. roweis@cs.toronto.edu
Find this “field” on this “sky”. Example (a million times easier) roweis@cs.toronto.edu
Find this “field” on this “sky”. Example (a million times easier) roweis@cs.toronto.edu
Robust Matching Required • We need to do some sortof robust matching of thetest image to any proposed location on the sky. • Intuitively, we need to ask:“Is there an alignment of the test image and the catalogue so that surprisingly many catalogue stars lie (almost*) exactly on top of an observed star?” [*Depends on the noise levels in the catalogue & image. ] roweis@cs.toronto.edu
Search, then Robustly Verify • For each potential match our search algorithm finds, we do a robust verification test to see if we really have the correct alignment on the sky. • This test looks at the number of catalog-object matches and computes the log-odds of getting that many “hits” if the query image were dropped on a random patch of sky. roweis@cs.toronto.edu
Solving the search problem • Now we know how to robustly check if a match is correct. But we still have to solve a huge search problem. • Which match locationsshould we try to verify? • Exhaustive search? ? too expensive! The Sky is Big TM roweis@cs.toronto.edu
(Inverted) Index of Features • To solve this problem, we employ theclassic idea of an “inverted index”. • We define a set of “features” for any particular view of the sky (image). • Then we make an (inverted) index, tellingus which views on the sky exhibit certain (combinations of) feature values. • When we see a new test image,we compute which features arepresent, and use our inverted indexto look up which possible viewsfrom the catalogue also have those feature values. roweis@cs.toronto.edu
Matching a test image • When we see a new test image, we compute which features are present, and use our inverted index to look up which possible views from the catalogue also have those feature values. • Each feature generates a candidate list in this way,and by intersecting the listswe can zero in on the truematching view. The features in our inverted index actas “hash codes” for locations on the sky. roweis@cs.toronto.edu
Robust Features for Geometric Hashing • In simple search domains like text, the inverted index idea can be applied directly. • However, in our star matching task, the features we chose must be invariant to scale, rotation and translation. • They must also be robust to small positional noise. • Finally, there is the additional problem of distractor & dropout stars. The features we use are the relative positions of nearby quadruples of stars. roweis@cs.toronto.edu
B C D A Continuous, vector-valued hash codes! Quads as Robust Features • We encode the relative positions of nearby quadruples of stars (ABCD) using a coordinate system defined by the most widely separated pair (AB). • Within this coordinate system, the positions of the remaining two stars form a 4-dimensional code for the shape of the quad. • Swapping AB or CD does not change the shape but it does “reflect” the code, so there is some degeneracy. roweis@cs.toronto.edu
B C D A Continuous, vector-valued hash codes! Quads as Robust Features • This geometric hash code is invariant to scale, translationand rotation. It is continuous! • It also has the property that if stars are uniformly distributedin space, codes are uniformly distributed in 4D. • We compute codes for most nearby quadruples of stars, but not all; we require C&D to lie in the unit circle with diameter AB. roweis@cs.toronto.edu
“Solving” a new test image • Identify objects (stars+galaxies) in the image bitmap and create a list of their 2D positions. • Cycle through all possible valid*quads (brightest first) and compute their corresponding codes. • Look up the codes in the code KD-tree to find matches within some tolerance; this stage incurs some false positive and false negative matches. • Each code match returns a candidate position & rotation on the sky. As soon as 2 quads agree on a candidate, we proceed to verify that candidate against all objects in the image. roweis@cs.toronto.edu
“Solving” is Easy as 1-2-3 1) Get your image. 2) Upload it to us. 3) Your exact location + lots of other data! roweis@cs.toronto.edu
Astronomy Picture of the Day ? roweis@cs.toronto.edu
Preliminary Scaleup Results: SDSS • The Sloan Digital Sky Survey (SDSS) is an all-sky, multi-band survey which includes targeted spectroscopy of interesting objects. • The telescope is located at Apache Point Observatory. • Fields are 14x9arcmin corresponding to 2048x1361 pixels. roweis@cs.toronto.edu
Preliminary Scaleup Results: SDSS • 336,554 fieldsscience grade+ • 0 false positives • 99.84% solved 530 unsolved • 99.27% solve w/ 60 brightest objs Assume known pixel scale(for speedup of solving only.) Magnitudes used only to decide search order. roweis@cs.toronto.edu
Speed/Memory/Disk Indexing takes ~12 hours, uses ~ 2 GB of memory and ~100 GB of disk. Solving a test image almost always takes <<1sec (not includingobject detection). Results on all of SDSS All the work is in the hardest few% of fields roweis@cs.toronto.edu
astrometry.net is open source! • We have released all our code.Download it from astrometry.net if you want to try the system out yourself. • We are putting the engine on the web.email alpha@astrometry.net if you want to be an alpha tester for the web service. • Our internal trac pages are public.Check out trac.astrometry.net if you want to see all the gory details. roweis@cs.toronto.edu