1 / 17

Big data now playing ..... a t the sandbox

Big data now playing ..... a t the sandbox. John.Dunne@cso.ie 17 th October 2014 IAOS, Vietnam. Overview. Context How CSO got interested in b ig data The sandbox Learning from other industries Learning from the past The sandbox – looking to the future

Download Presentation

Big data now playing ..... a t the sandbox

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Big data now playing ..... at the sandbox John.Dunne@cso.ie 17th October 2014 IAOS, Vietnam

  2. Overview • Context • How CSO got interested in big data • The sandbox • Learning from other industries • Learning from the past • The sandbox – looking to the future • Concluding comments Keywords – big data, modernisation, sandbox

  3. Big data – working definition Data that is difficult to collect, store or process within the conventional systems of statistical organizations. Either, their volume, velocity, structure or variety requires the adoption of new statistical software processing techniques and/or IT infrastructure to enable cost-effective insights to be made.

  4. Do more with less Mindset - Opportunities exist with secondary data sources

  5. Legal environment Data Protection Freedom of Information Official Statistics Key : 3 Legislative pillars

  6. Modernisation and big data 2011Conference of European Statisticians endorse modernisation strategy 2012Big data on modernisation agenda 2013ESSC Scheveningen memorandum on Big data and official statistics 2013International Big data team gets going 2014Big data on UNSC agenda 2014The sandbox goes live at MSIS Dublin

  7. 2013 CSO Project - To determine household composition using smart metering data Origin of data : Consumer Behaviour Trials in 2009 and 2010 • Over 5000 households in pilot • 3 months baseline data (reading every 30 mins) • Pre-trial survey using CATI http://www.unece.org/stats/documents/2013.09.coll.html

  8. Project with pilot data brought challenges Pilot 7 million data points per month ICHEC helped out Go live 2160 million data points per month Joe, we need a bigger computer https://www.ichec.ie/

  9. The sandbox The hardware on which the sandbox system is based is a High Performance Computing cluster called Stoney. The cluster is hosted in the National University of Ireland, Galway since April 2009 and is composed of 60 compute nodes each of which has two 2.8GHz Intel (Nehalem EP) Xeon X5560 quad-core processors, 48GB of RAM and a 1TB local disk. Each node is connected to two networks – an InfiniBand network for accessing the shared Lustrefilesystem and for high performance communications as well as a Gigabit Ethernet network for management tasks. In addition, a 20TB shared filesystem is available to all nodes. ICHEC will dedicate 20 compute nodes to enable a Hadoop cluster with 160 cores almost 1TB of RAM and 20TB of HDFS distributed storage.

  10. The sandbox provides an environment to • test feasibility of remote access and processing • test whether existing standards/models/methods can be applied to big data • evaluate the usefulness of big data software tools • learn by doing with respect to potential uses, advantages and disadvantages of big data • facilitate further collaboration in the international community

  11. The toys (data sources) • twitter data • mobile phone data • satellite imagery / aerial photography • price data/ job vacancy data via scraping • scanner data/price data sourced via large vendors • data from road traffic sensors • smart meter data on electricity/gas consumption

  12. Some of the players To play, contact Steven.Vale@unece.org

  13. Learning from other industries- technical partners can have a role to play Exchange of data for billing purposes Irish Mobile Network Operators MNOs Data Clearing Houses ROW Mobile Network Operators

  14. Learning from the past- think about the bigger picture Nordbotten, Thygesen and the statistical archive concept

  15. Learning from the past- do not underestimate privacy concerns http://www.census.gov/history/pdf/kraus-natdatacenter.pdf http://blog.modernmechanix.com/the-national-data-center-and-personal-privacy/ The National Data Center and Personal Privacy By Arthur R Miller

  16. The sandbox - looking to the future • Centres for Research and Development ? • Centres of Excellence ? • Partner organisations for collecting, processing or storing data of a less or non sensitive nature ??? • Significant partner organisations enabling the collection, processing or storing data of a sensitive nature ?????

  17. Concluding remarks • Think about bigger picture / broader system • An open mind to the possibility of new partners • Be open and transparent • Don’t underestimate privacy concerns • Continue to collaborate and share

More Related