310 likes | 467 Views
GeoPKDD. Geographic Privacy-aware Knowledge Discovery and Delivery Kick-off meeting Pisa, March 14, 2005. Agenda. 11:30 –12:00 Introduction 12:00 – 13:00 Pisa 13:00 – 14:00 Lunch 14:00 – 15:00 Venezia 15:00 – 16:00 Cosenza 16:00 – 18:00 Discussion, Planning.
E N D
GeoPKDD Geographic Privacy-aware Knowledge Discovery and Delivery Kick-off meeting Pisa, March 14, 2005
Agenda • 11:30 –12:00 Introduction • 12:00 – 13:00 Pisa • 13:00 – 14:00 Lunch • 14:00 – 15:00 Venezia • 15:00 – 16:00 Cosenza • 16:00 – 18:00 Discussion, Planning
GeoPKDD – general project idea Main Innovations • extracting user-consumable forms of knowledge from large amounts of raw geographic data referenced in space and in time. • knowledge discovery and analysis methods for trajectories of moving objects, which change their position in time, and possibly also their shape or other significant features • devising privacy-preserving methods for data mining from sources that typically contain personal sensitive data.
GeoPKDD – specific goals • new models for moving objects, and data warehouse methods to store their trajectories, • new knowledge discovery and analysis methods for moving objects and trajectories, • new techniques to make such methods privacy-preserving, • new techniques to extend such methods to distributed data coming in continuous streams; • new techniques for reasoning on spatio-temporal knowledge and on background knowledge.
GeoPKDD - applications Geographic information coming from mobile devices is expected to enable novel classes of applications In these applications privacy is a concern In particular, how can mobile trajectories be stored and analyzed without infringing personal privacy rights and expectations?
Possible Application Scenario • Data source: log data from mobile phones tracking the movements of users from cells • Entering the cell - e.g. (UserID, time, IDcell, in) • Exiting the cell - e.g. (UserID, time, IDcell, out) • Movements inside the cell? Eg (UserID, time, X,Y, Idcell) • Trajectory reconstruction • Knowledge extraction techniques - emphasis on privacy • Description of models – local vs. global
Possible Application Scenarios Three possible scenarios to exploit the extracted knowledge: • Towards the system: adaptive band allocation to cells • Towards the society: dynamic traffic monitoring and management for sustainable mobility, urban planning ... • Towards the individual: personalization of location-based services, car traffic reports, traffic information and predictions
t13 t4 t1 t5 t10 t6 t7 t2 t8 t11 t12 t9 Reconstructing trajectoriesScenario 1 In the log entries we have no ID Log entries become time-stamped events • We can extract aggregated info on traffic flow, but not individual trajectories
t13 t4 t1 t5 t10 t6 t7 t2 t8 t11 t12 t9 Reconstructing trajectoriesScenario 2 In the log entries we have (encrypted) IDs Log entries can be grouped by ID to obtain sequences of time-stamped cells • We can extract individual trajectories, with the spatial granularity of a cell: positions of t5 and t8 can be distinguished, but not t5 and t13
t13 t4 t1 t5 t10 t6 t7 t2 t8 t11 t12 t9 Reconstructing trajectoriesScenario 3 In the log entries we IDs and (approximated) position in the cell • We can extract individual trajectories, with a finer spatial granularity: now, positions of t5 and t13 can be distinguished.
Which patterns on “trajectories”Clustering • Group together similar trajectories • For each group produce a summary = cell
Which patterns on “trajectories”Frequent patterns • Discover (sub)paths frequently followed
20% 5% 7% 60% 8% Which patterns on “trajectories”Classification • Extract behaviour rules from history • Use them to predict behaviour of future users ?
Privacy in GeoPKDD • ... is a technical issue, besides ethical – social – legal, in the specific context of ST-DM • How to formalize privacy constraints over ST data and ST patterns? • E.g., cardinality threshold on clusters of individual trajectories • How to transform data to meet privacy constraints? • How to design DM algorithms that, by construction, only yield patterns that meet the privacy constraints?
GeoPKDD Spatio-temporal patterns
Why emphasis on privacy? • More, better, and new data being gathered, more likely to be sensitive • Increased vulnerability from correlation • Data becoming more accessible • Increased opportunity for misuse • Need to restrict access to data (patterns) to prevent misuse • On the other hand, added data bring new opportunities • Public utility, new markets/paradigms, new services • Need to maintain privacy without giving up opportunities
GeoPKDD technologies • Spatio-temporal models for moving objects • Trajectory warehouses • Spatio-temporal data mining methods and data mining query languages • Privacy-preserving data mining • Distributed and stream data mining • Spatio-temporal reasoning
GeoPKDD workpackages • (WP1) Privacy-aware trajectory warehouse • (WP2) Privacy-aware spatio-temporal data mining methods • (WP3) Geographic knowledge interpretation and delivery • (WP4) Harmonization, integration and applications
WP1: Privacy-aware trajectory warehouse • Tasks: • a trajectory model able to represent moving objects, and to support multiple representations, multiple granularities both in space and in time, and uncertainty; • a trajectory data warehouse and associated OLAP mechanisms, able to deal with multi-dimensional trajectory data; • support for continuous data streams.
WP2: Privacy-aware spatio-temporal data mining • Task: algorithms for spatio-temporal data mining, specifically meant to extract spatio-temporal patterns from trajectories of moving objects, equipped with: • methods for provably and measurably protecting privacy in the extracted patterns; • mechanisms to express constraints and queries into a data mining query language, in which the data mining tasks can be formulated; • distributed and streaming versions.
WP3: Geographic knowledge interpretation and delivery • Task: interpretation of the extracted spatio-temporal patterns, by means of ST reasoning mechanisms • Issues • uncertainty • georeferenced visualization methods for trajectories and spatio-temporal patterns
WP4: Harmonization, Integration and Applications • Tasks: • Harmonization with national privacy regulations and authorities – privacy observatory • Integration of the achieved results into a coherent framework to support the GeoPKDD process • Demonstrators for some selected applications: for public authorities, network operators and/or marketing operators, e.g., in sustainable mobility, network optimization, geomarketing.
Deliverables of Phase 1(months 1-5) • WP1: Privacy-aware trajectory warehouse • [TR1.1] Alignment report and preliminary specification of requirements. • WP2: Privacy-aware spatio-temporal data mining • [TR1.2] Alignment report on ST data mining techniques. • [TR1.3] Alignment report on privacy-preserving data mining techniques. • [TR1.4] Alignment report on distributed data mining. • WP3: Geographic knowledge interpretation and delivery • [TR1.5] Alignment report on ST reasoning techniques. • WP4: Harmonization, Integration and Applications • [TR1.6] Report on characterization of GeoPKDD applications and preliminary feasibility study. • [A1.7] Implantation of the Privacy Regulation Observatory.
Deliverables of Phase 2(months 6-17) • WP1: Privacy-aware trajectory warehouse • [TR2.1] TR on design of the trajectory warehouse. • [P2.2] Prototype of the trajectory warehouse. • WP2: Privacy-aware spatio-temporal data mining • [TR2.3] TR on new techniques for ST and trajectory Data Mining. • [TR2.4] TR on new privacy-preserving ST Data Mining. • [TR2.5] TR on distributed data mining • [P2.6] Prototype(s) of privacy-aware ST data mining methods. • WP3: Geographic knowledge interpretation and delivery • [TR2.7] TR on ST reasoning techniques and DMQL for geographic knowledge interpretation and delivery. • [P2.8] Prototype(s) of the ST reasoning formalism and DMQL • WP4: Harmonization, Integration and Applications • [TR2.9] Requirements of the application demonstrator(s).
Deliverables of Phase 3(months 18-24) • WP4: Harmonization, Integration and Applications • [TR3.1] TR on the design of a system prototype allowing the application of privacy-preserving data mining tools to spatio-temporal and trajectory data. • [P3.2] Prototype implementing the system described in the technical report [TR3.1]. • [P3.3] Prototype extending the system prototype [P3.2] to work on a distributed system. • [TR3.4] TR on the description of the prototypes developed and the results of the experimentation. • [TR3.5] Final report on harmonisation actions and mutual impact between privacy regulations and project results.
Pisa: objectives • spatial and spatio-temporal privacy-preserving data mining, with particular focus on • clustering, • constraint-based frequent pattern mining • spatial classification; • spatio-temporal logical formalisms to reason on extracted patterns and background knowledge.
Venezia (+ Milano): objectives • trajectory model and privacy-preserving data warehouse, within a streamed and distributed context • methods to mine sequential and non sequential frequent patterns from trajectories, within a streamed and distributed context • postprocessing and interpretation of the extracted spatio-temporal patterns
Cosenza: objectives • Trajectory mining • Clustering • Privacy-preserving data mining • Probabilistic approach • Distributed data mining
Transversal activities • experiments, • application demonstrators, • harmonization with privacy regulations and authorities, • dissemination of results.