100 likes | 191 Views
A Clustering method for network visualization and monitoring KDD4Service, san diego, 2011. Perikles Rammos Ericsson oss research Yangcheng huang Ericsson pm systems. Why clustering?.
E N D
A Clustering method for network visualization and monitoringKDD4Service, san diego, 2011 Perikles Rammos Ericsson oss research Yangcheng huang Ericsson pm systems
Why clustering? • Problem: Telecommunications networks produce too much data to be easily digested and manipulated by a human user. • Solution: Reduce the number of objects shown on-screen without greatly reducing the conveyed information and usability. • Prerequisites: • Large datasets • Real-time processing • Dynamic nature of the network • Instant interactivity (e.g. levels of zoom) • Some trade-offs in visualization accuracy are tolerated. 50 billion !!
ε = 1 MinPoints=5 ε = 3 MinPoints = 5 ε = 1 MinPoints=20 ε = 3 MinPoints=20 Clustering algorithms • K-means • Fast algorithm • Need for prior knowledge of… • Number • Initial location …of cluster centroids • Sensitivity to initial choices • Only convex-shaped clusters are detectable • DBSCAN • Concave-shape detection • Can handle noise • May not respond well to datasets with varying density Figures from Tan,Steinbach,Kumar, Introduction to Data Mining(online slides) http://www-users.cs.umn.edu/~kumar/dmbook
Density histogram • A grid of Cells is overlaid on the area of interest. • Resolutions Xres,Yres are automatically calculated. • For every point {x,y} in the dataset • Physical coordinates {x,y} are converted to grid coordinates {i,j} • Point is assigned at Cell {i,j}. (population increased by 1) • “Topographical” features will have a decisive role. • O(n) time • Density Histogram replaces the Dataset in further calculations, greatly boosting performance, since it contains much less elements to iterate on.
Cell hierarchies • Each Cell is compared with it’s neighboring Cells. • Looking for a Denser (more populated) Cell… • If such a Cell is found, the central Cell will connect with the larger Cell, becoming it’s ‘Child’. • If no such Cell is found, this Cell is labeled as a ‘Maximum’. • A tree-like hierarchy of cells is thus formed. • Density Histogram is now comprised of several distinct trees. • A Maximum is associated with each tree, occupying it’s root. • Each of those trees will be a Cluster! • Cells are self-organized into hierarchies. in an expanding neighborhood. Clusters emerge naturally.
trees – maxima - clusters • Parameter: Maximum Allowed Neighborhood • area of “tolerance” • self-adjustment optimizes clustering • Small Clusters are aggregated in hierarchies. 11x11 3x3
Performance results • Qualitative • Clusters of arbitrary shape can be detected. • Cell hierarchy can make visualization more intuitive. • Resistance to sparse noise, by rejecting isolated maxima. • Quantitative • Linear time O(n) • Comparison with k-means • maxima as initial centroids • greatly outperforms k-means
Conclusions & future work • Suitable for real-time visualization of large-scale telecommunication networks due to: • Qualitative properties • Quantitative properties • Future plans: • Higher dimensions : 3D, 4D • Dynamic Clustering • Always room for improvement • Author’s contacts perikles.rammos@ericsson.com yangcheng.huang@ericsson.com
Questions? • Experimental Setup • Datasets: Random bounding boxes out of a synthesized, realistic, widely varied dataset, containing 273,000 points. • 3000 random bounding boxes for O(n) testing • 500 random bounding boxes for k-means comparison • Hardware: Laptop PC, 2x1.83Ghz CPU, 4 GB RAM • Software: Windows Vista SP2, Visual Studio 2008 C# • K-means • maxima as initial centroids : Very compatible with k-means, since a large percentage of the points will be very close to the initial centroids • Euclidean distance • Iterations: avg=8.72, sd=5.93 • Clusters: avg=27.3, sd=18.2 • Maximum Allowed Neighborhood : 11x11 • Why isn’t a Large Neighborhood applied directly? • Performance: When a Denser cell is found nearby, no need to check further away. • Better Resolution: Detection of Irregular Shapes & Distributions • Xres,Yres Resolutions calculation • Preserving aspect ratio of bounding box • Ceiling at 100x100 cells