1 / 15

To the Problem of Organizing Heterogeneous Information

To the Problem of Organizing Heterogeneous Information. Olga Zhelenkova 1,2 , Vladimir Vitkovskij 1,2 (1) SAO RAS ( Nizhnij Arkhyz ), (2) ITMO University (Saint-Petersburg). SAO RAS.

cstegner
Download Presentation

To the Problem of Organizing Heterogeneous Information

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. To the Problem of Organizing Heterogeneous Information Olga Zhelenkova1,2, Vladimir Vitkovskij1,2 (1) SAO RAS (NizhnijArkhyz), (2) ITMO University (Saint-Petersburg) SAO RAS Big Data Across Disciplines: In Search of Symbiosis. 3-5 November, 2014. Groningen, Netherlands

  2. Series of blind surveys of 20´ sky strip centered on δ1981=+04° 57´± 20´ (SS433) carried out on the radio telescope RATAN-600 in 1980-1999 on 3.9GHz . (1) RC (RATAN COLD) catalogue obtained from observations of the deep survey COLD in 1980(a,b). The steep spectrum RC-sample studied since the early 90s(c,d). (2) Refined RC (RCR) catalogue obtained from the blind survey observations, 1980-1999(e). 562 RCR radio sources are in the range α2000= [07h– 17h] (~100□°) intersecting with SDSS and FIRST surveys; 90%-completeness on S3.9GHz>15mJy (S1.4GHz>28mJy) for αmean~0.52 (Sν~ν-α). They are almost completely identified (96%), with 260 objects identified the first time(f). The science use case: a multi-band study of a sample of radio sources (I) a- Parijskij et al., 1992A&AS...96..583P; b- Parijskij et al., 1993A&AS...98..391P; c- Goss et al., 1992AZh....69..673G; d- Parijskij et al., 2010ARep...54..675P; e- Soboleva et al., 2010AstBu..65...42S; f- Zhelenkova et al., 2013AstBu..68…26Z. Big Data Across Disciplines: In Search of Symbiosis. 3-5 November, 2014. Groningen, Netherlands

  3. The science use case: a multi-band study of a sample of radio sources (II) • collect all available in free access data for optical identification and investigate of the RCR-sample; • data collecting, visualization, statistic analysis with VO tools – ALADIN(1), TOPCAT(2), VIZIER(3), NED(4), ds9(5), casjobs(6), SkyView(7); • organize collected data (PostgrSQL + web-inteface) for a further study(8). (1) Bonnarel et al., 2000A&AS..143...33B; (2) Taylor, 2005ASPC...347..29; (3) Ochsenbein et al., 2000A&AS..143…23O; (4) Mazarrella et al., 2007ASPC..376..153M; (5) Joye&Mandel, 2003ASPC..295..489J; (6) O’Mullane et al., 2005cs........2072O; (7) McGlynn, 2007ASPC..382...43M; (8) http://www.sao.ru/fetch/cgi-bin/SkyObj/rcrn.cgi Big Data Across Disciplines: In Search of Symbiosis. 3-5 November, 2014. Groningen, Netherlands

  4. The science use case: a multi-band study of a sample of radio sources (III)

  5. The science use case: a multi-band study of a sample of radio sources (IV) Big Data Across Disciplines: In Search of Symbiosis. 3-5 November, 2014. Groningen, Netherlands

  6. The science use case: a multi-band study of a sample of radio sources (V) Big Data Across Disciplines: In Search of Symbiosis. 3-5 November, 2014. Groningen, Netherlands

  7. The science use case: problems – manipulate with many parameters and images • 1st stage: VLSS, NVSS, FIRST, GB6 and DSS (USNO-B1, GSC.2.3), SDSS DR1, 2MASS, also NED; • 2nd stage: added LAS UKIDSS, used newer release SDSS; • 3rd stage: added WISE, used newer releases SDSS LAS UKIDSS; • 4th stage: added Planck, used SDSS DR10, LAS UKIDSS DR9. 9 catalogues (~110 parameters) and images from 7 digital surveys (12 maps, contour overplays); 10 catalogues (~130 parameters) and images from 8 digital surveys (16 maps, contour overlays); 11 catalogues (~150 parameters) and images from 9 digital surveys (18 maps, contour overlays). Results: RCR-sources are almost completely identified (96%), with ~45% objects identified the first time; 12 catalogues (>150 parameters) and images from 10 digital surveys (28 maps, contour overlays). Big Data Across Disciplines: In Search of Symbiosis. 3-5 November, 2014. Groningen, Netherlands

  8. The science use case: what we need Easy access to data – request and download - ++ Visualization of different type of data - + Keep the collected data up to date - ? Can easily manipulate collected data - ? Interchange and publish new knowledge about objects - ? Store together different data and knowledge about an object - ? Thanks for efforts of the International Virtual Observatory Alliance we now have excellent tools providing web-services for access and visualization data like ALADIN, SAOImage DS9, TOPCAT, Vizier, NED and so on. But other problems need further activities. Big Data Across Disciplines: In Search of Symbiosis. 3-5 November, 2014. Groningen, Netherlands

  9. Available projects: keep the collected data up to date VO DataKeeping-upAgent (VOdka)- the web-application for support users’ data [O.Laurino & S.Smareglia, ASP 442, 571 (2011)]: • possibility for users to be asynchronously notified when new data are available, • give users a quick look of what data, relevant to their research interests, can be found in the Virtual Observatory, • make the users’ queries and results persistent. Big Data Across Disciplines: In Search of Symbiosis. 3-5 November, 2014. Groningen, Netherlands

  10. Available projects: interchange and publish new knowledge about objects with annotations AstroDAS (Bose et al. 2006IPAW..1445..154B): annotating astronomy catalogues to provide astronomers with the ability to share their assertions about matching celestial objects. AstroDAbis (Gray N. et al., arXiv:1111.6116, http://astrodabis.roe.ac.uk) service provides a stand-off annotation service for astronomical catalogue entries. AstroDAbis service will implicitly create URI names for every object in catalogues. SKUA (Semantic Knowledge Underpinning Astronomy, N. Gray & T. Linde, ASP , 2009, https://code.google.com/p/skua/) is a web-application for a semantic infrastructure for astronomy based on the organisation of annotation services. ADSASS (ADS All-Sky Survey, Pepe A. et al., arXiv:1111.3983) is an ongoing effort aimed at turning the NASA Astrophysics Data System (ADS) into a data resource based on ideas of geo-information systems. Big Data Across Disciplines: In Search of Symbiosis. 3-5 November, 2014. Groningen, Netherlands

  11. Available formats: store together different data about an object. FITS FITS is a simple and easily understood self-describing format which holds its information in metadata and data blocks. Metadata are captured via key-value pairs. Headers may or may not be then grouped with data blocks. The first header is denoted as the “primary” header and subsequent headers known as “extensions”. The standard supports rules for development new data structure – extension (Pence et al., A&A 524, A42 (2010) . Big Data Across Disciplines: In Search of Symbiosis. 3-5 November, 2014. Groningen, Netherlands

  12. Available formats: store together different data about an object. VOTable VOTable is designed as a flexible storage and exchange format for tabular data. Its interoperability is encouraged through the use of XML. VOTable has built-in features for big-data and Grid computing. It allows metadata and data to be stored separately, with the remote data linked. (VOTable Format Definition V.1.093 (http://cdsweb.u-strasbg.fr/doc/VOTable/1.092/votable.htx). Big Data Across Disciplines: In Search of Symbiosis. 3-5 November, 2014. Groningen, Netherlands

  13. Summary Astronomy is a very good science at free sharing data, but poorer at sharing knowledge. The fundamental problem remains - data and knowledge store in different places: archives contain only basic observational data, whereas all the astrophysical interpretation of that data is contained in journal papers. Need to do the next step which may help for more effective discovery and research - to keep all collected about an object/objects of researcher’s interest data together also add annotations and textual representation of queries (for possibility of repeat updating requests). Big Data Across Disciplines: In Search of Symbiosis. 3-5 November, 2014. Groningen, Netherlands

  14. ALADIN stack as a new FITS-extension (or VOTable, of HDF5 variant) The internal format of ALADIN is named a stack. It is a flat XML-similar file represented all-collected (images and tables) about an object information as planes with appropriate descriptions and results of requests. This data format proved convenient when working with heterogeneous information collected for the study of the objects of interest to the researcher. Structure of the ALADIN stack can be represented as a new extension of FITS. Big Data Across Disciplines: In Search of Symbiosis. 3-5 November, 2014. Groningen, Netherlands

  15. Thank you for attention ! Work supported by the Russian Fund of Basic Research, grants 12-07-00503-a, 14-07-00361-a Big Data Across Disciplines: In Search of Symbiosis. 3-5 November, 2014. Groningen, Netherlands

More Related