270 likes | 408 Views
CENTRE. Cellular Network’s Positioning Data Generator. Fosca Giannotti KDD-Lab Andrea Mazzoni KKD-Lab Puntoni Simone KDD-Lab Chiara Renso KDD-Lab. Why to generate data?. Trouble in finding Due to ITC Companies reticence …and for legal and privacy reasons Need to have ad-hoc datasets
E N D
CENTRE Cellular Network’s Positioning Data Generator Fosca Giannotti KDD-Lab Andrea Mazzoni KKD-Lab Puntoni Simone KDD-Lab Chiara Renso KDD-Lab
Why to generate data? • Trouble in finding • Due to ITC Companies reticence • …and for legal and privacy reasons • Need to have ad-hoc datasets • To improve algorithm development • To have a tools for validation and testing phases
CENTRE: • CEllular Network Trajectory Reconstruction Environment: • A positioning data (LOG) generation Environment aimed to Mobile technology • Developed as tool of GeoPKDD projects
GeoPKDD: Geographic Privacy-Aware Knowledge Discovery & Delivery
The Idea • To generate positional mobile data (LOG) by the simulation of the event deriving from: • Trajectories of hypothetical mobile network’s users that travel on territory • The resulting survey of this movements using synthetic ad-hoc GSM coverage (the set of BTSs) • So we can analyze the set of LOGs and recontruct trajectories of mobile network’s users
Motivation • With this model we want to reach: • More rigorous and realistic semantic of generating data. • Possibility to compare synthetic trajectories with reconstructed one. • Chance of validate mining and knowledge discovery algorithms results with synthetic trajectories.
What CENTRE do… • Then we overlap a set of antennas represented by circles of their coverage areas: • First of all we generate a sequence of spatio-temporal points represent a trajectory. We can customize: • Starting point • Velocity • Agility • Direction • Groups of behavior • Infrastructures, ect.
Where: Obj_ID is the identifier of observed object BTS_ID is the identifier of antenna that made this survey TimeStamp is the time of survey D is a evaluation of distance from object to the center of BTS So LOG is represented by a tuple: ( Obj_ID, BTS_ID, TimeStamp, d) LOG extraction • Result of extraction: • LOG at time tt2 (P2) • {Cell1, BTS1, tt2, d12} • LOG at time tt3 (P3) • {Cell1, BTS1, tt3, d13}, • {Cell1, BTS2, tt3, d23}, • {Cell1, BTS3, tt3, d33} • LOG at time tt4 (P4) • {Cell1, BTS2, tt4, d24}
Trajectories reconstruction • Once LOG are produced and stored, we forget about synthetic trajectories and try to reconstruct these only from: • LOG collection • Synthetic coverage
Information types • Reconstruction was performed considering all LOGs produced on a single temporal instant for a single trajectoty • The number of LOGs with same time and same device identificator (id_cell) represent the number of simultaneous relevations 3 LOGs 1 LOG 2 LOGs
Recontruction method • When we have: • Only one relevation: our point may be inside the entire antenna covered area, so we take antenna center as point positions • With two or more relevations: point may be only inside the intersection area of them, so we take centroid of this area as point position
Now we work on… • Make new extensions to main generation engine • In order to test and validate spatial KD algorithms with more efficiency and accuracy. • Change old code (that was derived from GSTD code) • Introducing improvements on class structures • Introducing new data characterization specially on spatial and temporal aspects
Multiple generation engines • The Idea is to develop extensions to main engine every time we need new features to test and validate KD algorithms. • And use each time the best implementation on sinthetic trajectories production engine depending of type of data we need to obtain
Density based clustering • We have seen that for best results with this algorithm is useful to have a simple method for: • create clusters and • identify relation between objects and clusters.
Attraction engine • For this particular type of algorithm we are developing a new engine extension that use an attraction-like mechanism. • Each objects chooses and tries to reach its next attraction area. • When it reaches its destination area chooses another one, and so on…
Cluster construction • A cluster if formed by a set of objects that are forced to pass through a sequence of areas.
…a simple example • In this scenario we can see one object that every time chooses a region with a completely random order. • Chosen a region, and a point on it, the object tries to reach this point. • …and so on
Others improvements • Formalization of some concepts (at code level): • Spatio-temporal data • Spatio-temporal object • Trajectory • and a real measures in data values: • Positions are expressed in meters • Velocities are expressed in meters/seconds • Times are expressed in seconds
Conclusions • Nowadays work is in progress, and we hope to test as soon as possible a Density Based Algorithm on this new generation engine • Contextually we also work on a engine for testing Temporal and Sequential Frequent Pattern Algorithm • And also to improve generator use, through simplification of number and form of parameters, graphical interface, ect.