1 / 24

Production of grid based statistics in Statistics Estonia

Production of grid based statistics in Statistics Estonia. Kreet Masik, Leading GIS specialist. Agenda. Grid based statistics before 2011 Census Grid based statistics after 2011 Census. Grid based statistics before. 3 different resolutions – 500x500m, 1x1km, 5x5km;

kasa
Download Presentation

Production of grid based statistics in Statistics Estonia

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Production of grid based statistics in Statistics Estonia Kreet Masik, Leading GIS specialist

  2. Agenda • Grid based statistics before 2011 Census • Grid based statistics after 2011 Census

  3. Grid based statistics before • 3 different resolutions – 500x500m, 1x1km, 5x5km; • Only in one projection - L-EST97 (epsg:3301); • Statistical data is joined with spatial data based on grids ID; • Mostly published variable - total population; • Variables were published separetly • The most common method is counting;

  4. Grid based statistics before • Grid is based on Estonian Base Map grid; • Every grid has a row and a column number; • Grid’s code = number of row * 10 000 + number of column; • Grids that intersect with borders are cut and the values are not re-calculated according to the real grid size; • Grids that intersect with borders are not cut.

  5. Grid based statistics before - confidentiality • If the value of the grid is smaller than 3 -> value is replaced with 99999; • Total value of one grid were not published (for example – in age groups gridmap smaller values than 3 where replaced with 99999 and total number of persons per grid were not published);

  6. Grid based statistics before - confidentiality

  7. Grid based statistics before - confidentiality

  8. Grid based statistics before - confidentiality

  9. Grid based statistics in the future • 4 different resolutions – 250x250m (only in the biggest cities), 500x500m, 1x1km, 5x5km • At least two projection - L-EST97 (epsg:3301) and ETRS-LAEA (epsg:3035) • Grids that intersect with border are cut and the values can be calculated according to actual grid size (based on centroids); • Statistical data is joined with spatial data based on building ID -> enables to aggregate on whatever grid or region; • More variables; • The most common method is counting;

  10. How to solve confidentiality issues? • Replace all values that are 1 or 2 with 0 -> is not suitable for Estonia because these grids cover about 1/5 of the hole territory. Dissemination of some variables will be pointless. Based on 2000 Census results in 1x1km grid the total population will be about 12 900 smaller. • Replace all values that are 1 or 2 with 3 -> total number of persons will increase. Data in different tables will be controversial. There will be contradictions inside one table. Based on 2000 Census results in 1x1km grid the total population will grow 6300 people. • Replace all values that are 1 or 2 with 99999 -> if there is a possibility to combine information from different datasets -> data will not be confidential anymore

  11. How to solve confidentiality issues? • If we choose random grid where total number of dwellings with area 90-99 m² is 99999 -> which means that in the selected grid there is only 1 or 2 such dwellings • If we compare this information with other variables where value is replaced with 99999 -> we will identify that: • The dwelling contains: • more than 4 rooms • is one-family house and • is built in 1991 – 1995 • In the dwelling there lives person who: • whose citizenship is undefined • is divorced • Is working in the field of education • is economically active, employed, employee with stable contract • belongs to agegroup 34 - 50

  12. How to solve confidentiality issues?

  13. How to solve confidentiality issues? • If we choose random grid where total number of dwellings with area 40-49 m² and 80-89 m² is 99999 -> which means that in the selected grid there is 1 dwelling which has an area of 40-49 m² and another dwelling which has an area of 80-89 m². • If we compare this information with other variables where value is replaced with 99999 -> we will identify that: • The dwellings contain: • 3 rooms • is in one-family house • one dwelling is built in before 1919 • other dwelling is built in 1946 – 1960 • In the dwellings there live 5 persons: • whose citizenship is estonian • one person is single and 4 are legally married • 3 persons are economically active, employed, employee with stable contract • the persons belong to following agegroups: 16-20, 41-50, 51-64 and 65+

  14. How to solve confidentiality issues?

  15. How to solve confidentiality issues? • There is possibility to ascertain person/building only if you are local. • Is afore mentioned info confidential? In Estonian Statistical Law it is said: “Data that will allow to directly or indirectly identify statistical unit is confidential.” • If we want to publish more variables –> we have to change the disclosure rules

  16. How to solve confidentiality issues? • On the map we can lable these grids >5 (>3) or “confidential” like in INSPIRE -> is suitable also when there lives only one person • Publish per grid total variables (total number of persons, total number of dwellings ect) but will not publish more detailed variables if their values are 1 or 2 (age, marital status, family nuclei, area of dwelling, time of construction ect). If the total values per grid are larger than for one certain threshold value (for example 9) then small values for detailed variables will not be problematic. More unhabited grids!

  17. How to solve confidentiality issues? Population topics • Sex All 3 variables 99999 -> 1873 grids (8.6%) At least one of variables is 99999 -> 10046 grids (46.2%)

  18. How to solve confidentiality issues? • Age At least one of variables is 99999 -> 19798 grids (91.1%)

  19. How to solve confidentiality issues?

  20. How to solve confidentiality issues? Housing topics • Number of rooms At least one of the variables is 99999 -> 18927 grids (88.1%)

  21. How to solve confidentiality issues? • Type of dwellings At least one of the variables is 99999 -> 12823 grids (59.7%)

  22. How to solve confidentiality issues? • How to deal with delicate personal data (nationality and belief) – is it enough when we publish general group like estonians, non-estonians, unknown? Or should we publish that in one grid from 300 persons there are 2 scotsmans • Aggregate values to larger groups (instead of 5 to 10 years) -> is not acceptable for users • Aggregate grids – the legend will not be correct anymore and the grids are not so useful for spatial analysis

  23. Plans for future • Analysis with different projections -> what can occure when we provide the same data in different projections? • Analyze more variables • Analyze more resolutions

  24. Thank you!

More Related