420 likes | 440 Views
Kraków, Poland 26-29 June , 2018. Communication as a quality factor in multi-source spatial data integration process. Włodzimierz Okrasa Cardinal Stefan Wyszynski University in Warsaw and Statistics Poland. G oal / rationale for ….
E N D
Kraków, Poland 26-29 June, 2018 Communication as a qualityfactorinmulti-sourcespatial data integrationprocess Włodzimierz Okrasa Cardinal Stefan WyszynskiUniversityin Warsaw and Statistics Poland
Goal / rationale for … • to dealwithanalyticallimitationscaused by thelack of appropriate data needed for exploringcross-levelrelationships - such as betweenindividual and local community wellbeing - and • to demonstrateempiricallypotentials of spatialintegrationapproachthroughconstructing an analyticalmultilevelmulti-sourcedatabasecontainingdifferenttypes of units - individual and group (NUTS5 / local community, gmmina), alongwith (iii) emphasizingthe role played by communication* within the statistical research, contributingatthe same time to a methodological framework of 'spatially integrated statistical research‘, resultingin (iv) better quality of the final product of thewholestatisticalprocess - from data to evidence-baseddecision/policy (an ‘outcome’ unavaiableotherwise) 'better communication for better quality’. *) „… communication is not just an appendix of the core business focused on data production, but a key function that can determine the success or the failure of an official data provider”Givannini, 2008, p.10
Problem / questions: • Whatisthe role of communicationintheprocess of statisticalresearch – whatareitsimportantcomponents? • howdoesitaffectthequality of boththeprocess and product (statistical products)? • whichaspects / attributes of qualityareparticularyexposed to (or dependent on) thecommunication [assumingthe high quality of the ‘communication’ itself] quality of statisticalproduct
Problem / questions– cont. II. Communication and qualityinthecontext of workingwithspatial data • constructingmulti-sourcegeo-referenceddatabases ►communication and qualityinaggregation / of aggregated (matched) databases ►(since spatialaggregationmaylead to multi-level /’nested’ data structure – e.g., householdsincommunities): howspatialintegration of data contributes to theimprovement of overallquality of theresearchoutcome? exemplification for community & individualwell-beingrelationship /spatial data analyis
Quality of ‘statistical product’ /public (official) statistics = f {quality of ‘endogenous’ (internal) factors /research process & quality of ‘exogenous’ (external) factors/data from other sources} ‘Internal’ quality /surveyresearch Quality of statisticalproduct ‘External’ quality/other /administrative data sources
Communicationtowardsensuringquality of ‘statisticalproduct’ • Communication in survey data production process • factual information (household survey, establishment survey – different types of respondents) • the response process model: comprehension judgement communication: filtering response & providing answer (eg. Biemer et al., 2004, p.230) • Communication in non-survey and administrative data use for statistical purpopes. • Communication in multi-source database construction • Communication in statistical processes (SP) involving spatial data (SD) stakeholders in SP/SD
Chain of communicationinstatisticalprocess: fromdata to decision(policy) making – alternativeparadigms Two-wayre-paradigmatization/ a newmeta-paradigm? (a) data ‘first’(as a policyinput) • Data-basedPolicy/decisionmaking • Data-driven (Evidence-basedpolicy)* • Data/evidence-informed (b) users ‘first’ (needs for…) • Policy-oriented • Policy-basedData/informationproduction • Policy-driven ** *) „ Evidence-influenced politics is suggested as a more informative metaphor, descriptively and prescriptively, than evidence-based policy” (Prewitt, 2012; p. 4). **)”Evidence-based policy or policy-based evidence?” (Sanderson2011)
Communicationinmulti-sourcedatabase construction: influence of particular data set’squality on overallquality of themulti-sourcedatabase, by pattern of matching Horizontalpattern /completeoverlapping A↔B↔C A Source A . . . . A, B, C Quality A vertical(hierarchical) /nested data structure Source B A B Quality B C cross-sectional /mixed: partlyhorizontaland vertical) Source K Quality K
Scientific (statistical) communicationentails several elements (eg. Maggino and Trapani, 2010, Nymand-Andersen, 2017, …<) Communicationcomposingelements: (i) contents /information (‘message’) to be communicated / transmitetd to a ‘receiver’; (ii) communicator / transmitetr (‘statisitician) (iii) reciver / audience (addressees– experts / researchers, policymakers, users of statistical data/product, etc. (iv) communication channel -“a medium through which a message is directed to and exchanged with its intendedaudience”;multiple-channelcommunication; • code (transmitter’s / receiver’scode): - the waystatistics are reported (and presented; - the tools used in order to transmit statistics (tools), (vi) context, (vii) feedback, (viii) noise.
Communication and Quality –symmetrical issues: • Quality of communication • Ensuringhigh-quality (error-free) transmittion of information: (i) inter-/acrossphases/ stages of thestatisticalprocess, meant as data production and analysis; (ii) amongthestakeholdersinthestatisticalprocess communication of quality(‘reflected in the TQM principle of participation by all’) • communication with user, otherstakeholders
Communication and quality: (1)institutional and (2) methodologicalaspects 1. Institutions of public statistics seekstheappropriatestrategy of effectivecommunicationexternally( ‘target groups’: general public-politicians- research/academia – media-voluntarysector-internationalbodies] internally • e.g. „Communication strategy for Statistics Norway 2014-2017” stressescredibility of institution as thekey element affected - in ‘positive’ or ‘negative’ way - by communicationmeant as a precondition (a must) of quality of statisticalproduct: • „ If the communication weakens the credibility of our work, the high quality of the statistics and research will be of no help”(p.3) • Example of principles of effectivecommunication (p.4) • transparency - accessibility - understandability(form of statisticalinformation) - independence(no authoritiesorinterestgroupspreferred).
Assessing statistical communication • Conceptualframework(Maggino and Trapani, op.cit.) • Thedimensions to evaluate: focus on the transmitter’s code, specified in terms of (i) outline, (ii) tools, and (iii) cloths. • Thecriteria of evaluation: (A) suitability/ consistency, (B) correctness, and (C) clarityof the code. 3. The components of the transmission process: (i) the receiver/audience (and its receiving code), (ii) the available channel, and (iii) the available context and setting, and, in some way, (iv) the contents message.
2. Communicationinstatisticalresearch*: processunderpinningactivitiesfrom data collection to decisionmakingwhichintegratesinstitutional and methodologicalelementsaffectingqualityof thefinaloutcome(evidence-baseddecisions/policy) : data information knowledge decision demand for models/methods program/policy-driven i information of convertingexpectationson informationintoproblem-solving a knowledgebase /usersneeds Quantification and measurement co m m u n i c a t i o n: *) „Numbers, figures and patternsare first of all of the communicationstrategies …” (Porter, 1995) , and as suchare ‘sociallyconstructed(Schield, 2013)
Thelogic of methods and thelogic of actions:Two-dimensionalcommunication-routeintheSP • Communicationaccording to the logic of method: retro-implication/backwardimplication:conditionalacceptability of an alternative ‘methodological choice’ in the SP and reduction of quality-relatedriskfactors in matching/assembling - ste-by-step: a, b, …, r – selection from amongalternativeavailablemethods, techniques and procedures Si = {s a × sb … × sr} ; Si– element of the set of all ‘methodologicallyfeasible’ strategies (containingcombination of elements of sub-strategies , sa,…, sr.) methodologicalquality of research evaluationresearch as a ‘communicate’ inmakingdecision (i) ‘cognitive’ (eg., abouthypotheses) or (ii) ‘practical’ (changein ‘goal state’, eg., Ackoff, 1958): reliability – validity – generalizability– cognitivevalueof research /’epitemicutility’, ‘pragmaticvalue of information’ (Marschak 1974, Szaniawski 1974, Okrasa 1978).
Communication in the SP according to the logic of method–contin.
Communication in the SP - strategies of actions (operationappr.): communicationbetween‘twocommunities’: researchersand policy makers(Prewitt et al. 2012) • Translation, a supply-side solution to the use problem - applicable to questions of using science in policy (eg., social science, medical science). • Brokering involves filtering, synthesizing, summarizing, and disseminating research findings in user-friendly packages. • (in contrast to translation strategies) brokering involvestwo-way communication (op cit.,,.45) • An interaction model, covers a family of ideas directed to systemic changes in the means and opportunities for relationships between researchers and policy makers ; • emphasis on use-inspired research and increased visibility and an insight into what the use of science means in practice. The issue of particular importance in this context concerns how data collectors/producers communicate with data providers? the long-lasting dilemma between confidentiality and accessibility of official statistics (Duncan et al.,1993, 2011; Okrasa 1994, 2008).
Communication among the SPstakeholders - cont. Communication outside the SP /macro-context – EU policy: • „Strengthening the communication function of the national and international statistical systems” and „…to modernize the statistics function by adding the “communication function” as an integrated part of the statistical” (Numand-Andersen, 2017, p.2). • The lack of a proactive communication strategy may have contributed reputational loss and trust in statistics, in its profession and in those statistical agents engaged in developing methodology and producing reliable and comparable national and international statistics, as a public good for sound andsustainable decision making.
The statistics communication function(Numand-Andersen, op.cit.) • Thestatistics communication function relates predominantly to five main components; • what - extracting statistics intelligence in context from the pool of disseminated/available statistics; • who - applying tailored market segmentations; • how - using various communication concepts & forms; • which - applying multiple communication channels • why - descriptions and applying statistics knowledge. • Preparing a statisticscommunicationstrategy: to build an user-centred communication strategy, seeking to understand users’ needs, the barriers they face and theirworking processes; and …to establish a new two-way statistics communication strategy and thereby involve users in the statistics community.
CommunicatonbasisintheSPisquantification: convention and measurement • Quantification:process of data transormation in numerical form through measuring characterisitcs of a set of units or items on a scale in form of numbers, letters or verbally (e.g. Federer, 1991) • Quantification is a more than measurement (Desrosières , 2001) - it requires conventions on which the measurement is based • Quantifcation (a sociological phenomenon, eg., Espeland and Stevens, 2008; Alonso and Starr, 1987; Federer, 1991) : production of information and communication using numbers (precondition for measurement is classification). Classification (pre-measurement) system of communicationcodes - inthecontext of spatialanalysis integration of data fromdifferentsources
Matrix of quality-related risk factors and responsibility: Communication elements in the SP (eg, survey) – integrating 2dimensions {Qcij=dff(qcm, qca)} qcm =dfg (V(r); V(r) – value of research qca=dfg’ (R). R – risk/reductionof… *)Some DSOs tend to view wide-scale communication about statistical confidentiality as a risk in itself. Their concern is that such communicationwould decrease response rates. But good communication about risksand benefits could lead to greater buy-in by respondents and also reduced negativeconsequences if a disclosure event takes place(Duncan et al., 2011).
Assumption:Integration of data from differentsourcesrequireseffectivecommunication, and contributes to quality of the finalproduct /’outcome’ • Spatialintegration of data – geo-referenced data (coordinates X, Y) and statisticalmatchingtechniques (e.g., D'Orazio et al., 2006) allow to solveseveralmethodologicalproblems,for instance: • integration of objective and subjectivemeasures of wellbeing, such as individual and community wellbeing - a dual system of indicators(beyond GDP) • multidisciplinaryintegration of scientificknowledge– spatiallyintegratedsocialresearch(Goodchild et al., 2004), a framework for integration of social and behavioraldisciplines(Stimson, 2014; Okrasa, 2012;) Spatialintegrationinsocialresearch: • key role of geographicalconcepts and spatial data: place, locality, distance/proximity, distribution, nejghborhood, region; • exploration and analysis of spatialpatterns of socialphenomena.
Spatialintegration and communication of statisticalknowledge(e.g., community wellbeing) • Strategies of spatialintegration – motives for: • data- drivendue to growingrapidlyamount of availablegeo-referenced data; • method-drivenoranalyticalintegration, eg., usingmultilevelmodelling; • programme-drivenorpolicy-drivenfollowiingsomelocalorregional development projectorex anteevaluation of validity of a system of geographictargeting of public resources, etc.; • problem-drivenwhenspatialperspectiveprovides anappropriatedirection to solve the problem of identification and addressing a policy issuesuch as spatial concentration of poverty – eg., integration of survey and administrativeorcencus data to mappingpoverty for betteruse of poverty-reduction resources, etc. spatialIntegration /exemplification
Spatial data quality(includingGIS-type data)(Veregin 1999, Radson 2000, van Oort 2006) • Lineage: describes the source of the derived data, derivation methods, and all transformations employed in producing the final data • Spatial accuracy • positional accuracy: compares spatial data to an independent and more accurate source. • attribute accuracy: may include deductive estimates or may be based on independent samples from polygon overlay • Logical consistency: determines the faithfulness of data structures embedded in a transfer file. • Completness :A data set is complete with regards to aspects such as minimum area employed in polygon construction, gaps in either the data element set or attribute values, etc
[Examplification] Spatialintegration: ‘place’ / locality (NTS5=gmina) as an integrator, a merging element . Individualwellbeing and quality of local community (neighborhood), orcommunity wellbeingWhichcomponents of communicationcontributessignificantly to quality of statisticalproduct– throughenablinganalysis of relationshipbetweenindividual (subjective) and community wellbeing – due to construction of multilevelmulti-sourcedatabase:• Local Data Bank (LDB) • Time UseSurvey data (TUS /GUS 2013) • SocialDiagnosis (SD)
Statisticalmatching problem A. Thenonresponse (missingobservations) problem and identification of missing data generatingmechanismstandardprocedures (omitted as not specific to data integrationcontext – eg., Särndal and Lundström, 2005) – seeD’Orazio et al., op cit. B. Problem of missing joint information on random variablesX, Y, Z withdensityf(x, y, z), - unitsin data set A (samplesurvey A) have Z missing: (xaA,yaA) = (xa1A,…, xaPA, ya1A,…,yaQA), a = 1, …, nA - unitsin data set B (samplesurvey B) have Y missing: (xbB,zbB) = (xb1B,…, xbPB, zb1B,…,zbQB), b = 1, …, nB [Data set AU B isuniqueand has joint distributionf(x,y,z) [strongversionrequiresthatthedatasetsarefromthe same time of research]
Statisticalmatching problem – contin.(D’Orazio et al., 2006, p.5) Unobservedvariables Unobservedvariables
Empiricalillustration. Multi-sourcedatbase:TUS and LDB - unitsidentifies and mergdedacc. to TERRYT-code eg., gminas/communes (NTS5-units) with ≥20 TUSrespondents
Multi-sourcedatbase– contin.SD and LDB Communes/gminaswith ≥20 respondentsinTUS
Multi-sourcemultilevelanalitical data base: LDB & SD & TUS (gmina – households – individuals)– integrator : gmina, TERYT-code, coordinates X, Y ) Communes/gminas: (a) ≥20 TUSrespondents (b) ≥10 TUSrespondents L1 L2 Level 1 : 23 285 persons, Level 2: (a) 386 gminas (NUTS5 units) w/ 20+TUSrespondents (b) 1036 gminas w/10+ TUSrespondents
DATA and MEASURES: LocalDeprivation and SubjectiveWell-Being (SW-B) • Multi-sourcedatabase: • commune/gmina level data:Regional / Local Data Base (CSO – public file 2004, 2008, 2010 and 2012, 2014, and 2016); NUTS5/LAU2; (N= 2 478); • Measuringareadeprivationatthecommunelevel • MultidimensionalIndex of LocalDeprivation (MILD) ‘Confirmatory’FactorAnalysis / PCA (single-factorselection): Eleven (pre-selected) domains of deprivation - eachcharacterized by a lnumber of originalitems: ecology – finance – economy – infrastructure – municipal utilities –culture – housing – socialassistance – labour market – education – health [65 items] comparingresultsbased on twospatialdatasets: 386 and 1036 communes/gminas: whatarethedifferencesabout(in ‘ space’)?
Individual (Subjective) Wellbeing: TUSdata-basedmeasures • Socialindicatorsapproach– attmepts to exploitTUS data (Th. Juster; and others, e.g., F. Andrews, 80s.) : • surveyresearch(day reconstructiontechniques- e.g., day and week-recall data -TUS_2013 ) • Psychometricmeasures • Econometricresearch and econometric/psychometriccombinedapproaches -Krueger and Khaneman et al., (2008) – indicator of emotion / negative /positiveaffects associated withactivities / ‘time of unpleasant state’ - U-index : Ui = ΣjIijhij / Σjhij(in TUS2013: I = -1, 0, +1) and U = Σi(ΣjIijhij / Σjhij ) / N for N-persons / group inpopulation (usedalsoinpovertyresearch ubóstwa (subiektywnego poverty))
Spatial association between local deprivation (MILD) and subsidies per person accrued to commune in: (a) 386 gminas w/20+ respondents; (b) 1036 gminas w/10+ respondents. (a) Moran’s I = 0.03 (b) Moran’s I = 0.17 0.03 Differentpatterns of spatialclustersin (a) and (b), repectively (not significantin ‘a’) and differentconclusions and recommendations
Scatter plot and cluster map of:(a) local deprivation of allgminas (MILD-2014),and (b) average level of individual wellbeing/U-index(1036gminas with presence of atleast 10 ormoreTUS-respondents) (a) Local deprivation /MILD-2014 (Moran-I = 0.20) (b) U-index for all activities (in 1036 gminas) (Moran-I = 0,10) Source: Okrasa (2017) O (b) U-index for all activities (in 1036 gminas) (Moran-I = 0,10) Source: Own elaboration.
Approximation of ‘life satisfactionequation’ (eg. Clark, 2018)using TUS2013-data (U-index) and BDL-data (MILD2014).
A .Subjective wellbeing of residents by U-index and deprivation in the domain of local labour market , Moran I = 0.22.B. Subjective wellbeing of residents by U-index and deprivation in the domain of local social welafre, Moran I = - 0,08 A. B.
Individual wellbeing/U-index and local deprivation in the domain of local social welfare (Moran I=0.46); Masovian Individual wellbeing/U-index for commuting, i.e., associated with commuting (work and other), and gmina’s (local) deprivation in the domain of local labour market Masovian (Moran-I = 0.44 Source: Okrasa (2017)
Discussion and Summary • Confirmation of assumption concerning the role of communication in research - its requirements and tools (metadata, geo-coding, compatibility of measures and information, etc.) – towards improving overall quality of the final research outcome , through first, allowing for spatial integration of data from different sources with information on different types of units - community, households, individuals , and for constructing a multi-source multilevel analytical database, and second, enabling analysis of cross-level variables such as, on the one hand, local community deprivation (MILD /Multidimensional Index of Local Deprivation) and , on the other hand, individual income or subjective wellbeing .
Conclusion. Empirical analysis has proven possibility to go beyond the limitations that emerge when data from particular datasets are used separately. And that integration of data generated in different research processes pays off in terms of the quality of final outcome, if not also of quality of each of the research involved as well. Especially, if it would become a norm in research practice to expect that such a methodological strategy - implied by the spatially integrated research framework - would be recommended to be explicitly considered as an option in every research underway. Accordingly, a system of needed information and ways of its transmission, constituting communication process underpinning all the activities - from planning collection of data to using evidence-based knowledge - would have to be clearly determined, both in methodological and institutional aspects, and put into operation.
References Ackoff R.L. 1959) From data to wisdom. Journal of Applied Systems Analysis15. Biemer P. P., Groves R. M., Lyberg L. E., Mathowetz, N. A., S. Sudman, 2004. Measurement Errors in Surveys. J. Wiley & Sons., Hoboken, New Jersey. Burns T. W., O'Connor D. J.,Stocklmayer S M.,2003. Science Communication: A Contemporary Definition .Public Understanding of Science. 12; 183 Citro, C. F., Hanushek, E. A., Eds. (1991) . ImprovingInformation for Social Policy Decisions. TheUse of MicrosimulationModeling. Vol. 1. CNSTAT. National Academies Press. Washington D.C. D’Orazio, M., Di Zio, M., Scanu, M., 2006. StatisticalMatching. Theory and Practice. Wiley. Desrosières, A., 2001. How Real Are Statistics? Four Possible Attitudes.“Social Research”, Vol. 68, No.2 Duncan G T., Jabine T B., de Wolfe V A., (Eds) 1993.Private Lives and Public Policies. Confidentiality and Accessibility of GovernmentStatistics. , Committee on National Statistics. National Acedemy Press, Washington, D.C. Espeland, W., Stevens, M., 2008. A Sociology of QuantificationArch.European Sociology, XLIX, 3. Federer, W. T., 1991 Statistics and Society: Data Collection and Interpretation, 2nded.
References Giovannini E. (2008) The role of communication in transforming statistics into knowledge, OECD, paper presented at conference “Innovative Approaches to Turning Statistics into Knowledge”, Stockholm, 26-27 May Maggino F., Trapani M., Presenting and communicating statistics: principles, components, and their quality assessment. A proposal Nymand-Andersen, P. 2017. Preparing a statisticscommunicationstrategy. Conference of europeanstatisticians . Workshop on Statistical Data Dissemination and Communication. Marschak J., 1974. Information. Decision, and Scientist[in] Cherry, C., (ed.) PragmaticAspects of HumanCommunication.Reidel. Dordrecht, Okrasa W., 2017. , Community Wellbeing, Community Cohesion and Individual Wellbeing – towards a multilevel spatially integrated approach. In: W. Okrasa (Ed.) Quality of Life and Spatial Cohesion:Interaction of Development and Wellbeing in the Local Context. The Cardinal Stefan WyszynskiUniversity Scientific Press. Warsaw. Okrasa W., 2014. Towards a data-and-policy spatially integrated systemin the local context: Fromevidence-based to policy-driven statistical knowledge . Paper presentedattheConference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics SapienzaUniversity, Rome , November 25-26.
References Okrasa, W., 2012. Spatially Integrated Social Research and Official Statistics: Methodological Remarks and Empirical Results on Local Development. Comparative Economic Research, Vol. 15, Issue 4, pp. 191-206. Okrasa W., 1994.Private Lives, Public Policies: Report of the Panel on Confidentiality and Data Access and its Relevance for Designing Information Systems in Central and Eastern Europe. ITEMS - Social Science Research Council, vol. 48, No. 1, 1994. Okrasa W., Rozkrut D., 2018 The Time Use Data-based Measures of the Wellbeing Effect of Community Development. Paper presented at the Federal Committee on Statistical Methodology Research and Policy Conference , March 7-9, 2018 Washington, DC . Prewitt K., Schwandt T. A., Straf, M. L., (Eds). 2012 Using Science as Evidence in Public Policy. Committee on the Use of Social Science Knowledge in Public Policy; Center for Education; Division of Behavioral and Social Sciences and Education; National Research Council. Rasdorf W., 2000. Spatial Data Quality. https://repository.lib.ncsu.edu/bitstream/handle/1840.2/ (10.04. 2018).
References Sanderson, I., 2011. Evidence-based policy or policy-based evidence? Reflections on Scottish experience. The Policy Press. ISSN 1744 2648. Pp.59-76.https://books.google.pl/books?id=EH5LXJEKLFYC&pg (April 2018). Santos, A S., Gracs MedeirosN., Santos G R., Filho L J., Use Of Geostatistics On Absolute Positional Accuracy Assesment Of Geospatial Data. BCG – Bulletin of Geodetic Sciences - On-Line version, ISSN 1982-2170 http://dx.doi.org/10.1590/S1982-21702017000300027. Särndal C-E., Lundström S., 2005. Estimation in Surveys with Nonresponse. Wiley Shi W., Fisher M., Goodchild M. F., 2002. Spatial Data Quality, Stimson R. J., 2014. A spatially integrated approach to social science research [in] Stimson, R., (Ed.), Handbook of Research Methods and Applications in Spatially Integrated Social Science, Edward Elgar, Northhampton, MA, USA Szaniawski K., 1974. Two concepts of information. D. Reidel.Dordrecht. van Oort P.A. J., 2006. Spatial data quality: from description to application Optima, Rotterdam . Veregin H., 1999. Data quality parameters. In: P. A. Longley, M. F. Goodchild, D. J. Maguire, and D. W. Rhind (Eds.). Geographical Information Systems (pp. 177-189). New York: John Wiley and Sons.