270 likes | 407 Views
VoIP Data IIIT Allahabad . Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Dallas, Texas 75275, USA mhd@lyle.smu.edu Support provided by Fulbright Grant and IIIT Allahabad. VoIP Data Outline. VoIP overview CDR CDR Example using EMM.
E N D
VoIP DataIIIT Allahabad Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Dallas, Texas 75275, USA mhd@lyle.smu.edu Support provided by Fulbright Grant and IIIT Allahabad
VoIP Data Outline • VoIP overview • CDR • CDR Example using EMM
VoIP Overview http://www.voipmechanic.com/what-is-voip.htm
VoIP Advantages • Travel • Cost reduction • Additional Features: Voice messages, call forwarding, logs, caller ID, … • Integration of business tools • Common network infrastructure
VoIP Disadvantages • Need reliable broadband internet connection • Voice quality
Telephone-VoIP Steps • Analog Telephone Adapter (ATA) converts analog phone call to digital signal. • Sent over internet as data packets. • Converted back to digital analog.
VoIP Codec • Software on server or ATA that converts voice signal into digital data. • COmpressor – DECompressor • COder – DECoder • Sample (8000, 24000, 32000 times per second) • Sort • Compress • Packetize
Protocols • SIP (Session Initiation Protocol) • Signaling to set up and tear down sessions. • SDP (Session Description Protocol) • Describe call • RTP (Realtime Transport Protocol) • Exchange data/voice packets • Media Transport to transmit packets
SIP • Setup • Connect • Disconnect • Syntax similar to HTTP • Bind to IP address using SIP registration • URLs for address format: mhd@lyle.smu.edu • Independent of application or data types • Uses RTP and SDP
SIP Overview http://www.voipmechanic.com/sip-basics.htm
VoIP Data • Any of this digital data could be saved and analyzed. • Typically only statistical/summary information about the calls is saved • These Call Detail Records (CDR) are use for billing and analysis
Call Detail Record • Log of VoIP usage • May be by account • Typical attributes: • Source • Destination • Duration of call • Amount billed • Total usage time in billing period • Remaining time in billing period • Total charge in billing period • The format of the CDR varies among VoIP providers or programs. Some programs allow CDRs to be configured by the user.
CDR Generation [3] • Usually created through special Authentication, Authorization, and Accounting (AAA) server. • May also be created by logging capabilities at gateway or router using a syslog server software. • Normally simply csv format. • Normally uses UDP, so underlying data packets are not sequenced and may be lost (Redundancy of servers can help.) • Timestamps between routers can be synchronized using a Network Time Protocol (NTP). • CDR generated for both forward and return leg of call. • http://www.cisco.com/en/US/tech/tk1077/technologies_tech_note09186a0080094e72.shtml
Example: CISCO CDR Data • VoIP traffic in their Richardson, Texas facility from Mon Sep 22 12:17:32 2003 to Mon Nov 17 11:29:11 2003. • Over 1.5 million call trials were logged • 272,646 connected calls • 66 attributes including source, destination, starting time, duration, routing/switching, device, etc • Application: Anomaly Detection (Classification) • Goal: Find unusual call patterns based on type and time of call • Technique: New data structure, New classification algorithm, New visualization technique • Sample of raw csv data: http://lyle.smu.edu/~mhd/iiit/start.csv
CISCO Preprocessing • Remove the attributes other than source, destination, starting time, duration from the logs. • Count the connected calls and discard unconnected calls. • The total number of connected calls was 272,646.5 phone classes: internal, local, national, international, unknown. • 25 link classes (source class + destination class) • Data is aggregated into 15 minute time intervals. • The total number of time points is 5422 and the total number of attributes is 26. • Add two attributes, namely, type of day (workday or weekend) and time of the day, to the processed data. This step gives a spatio-temporal cube in the model space. • http://www.engr.smu.edu/~mhd/7331f08/CISCOEMM.xls
CISCO Data Visualization http://www.lyle.smu.edu/~mhd/7331f11/CiscoEMM.png
Spatiotemporal Stream Data • Records may arrive at a rapid rate • High volume (possibly infinite) of continuous data • Concept drifts: Data distribution changes on the fly • Data does not necessarily fit any distribution pattern • Multidimensional • Temporal • Spatial • Data are collected in discrete time intervals, • Data are in structured format, <a1, a2, …> • Data hold an approximation of the Markov property.
Spatiotemporal Environment • Events arriving in a stream • At any time, t, we can view the state of the problem as represented by a vector of n numeric values: Vt = <S1t, S2t, ..., Snt> Time
Data Stream Modeling • Single pass: Each record is examined at most once • Bounded storage: Limited Memory for storing synopsis • Real-time: Per record processing time must be low • Summarization (Synopsis )of data • Use data NOT SAMPLE • Temporal and Spatial • Dynamic • Continuous (infinite stream) • Learn • Forget • Sublinear growth rate - Clustering 20
MM A first order Markov Chain is a finite or countably infinite sequence of events {E1, E2, … } over discrete time points, where Pij = P(Ej | Ei), and at any time the future behavior of the process is based solely on the current state A Markov Model (MM) is a graph with m vertices or states, S, and directed arcs, A, such that: • S ={N1,N2, …, Nm}, and • A = {Lij | i 1, 2, …, m, j 1, 2, …, m} and Each arc, Lij= <Ni,Nj> is labeled with a transition probability Pij = P(Nj | Ni).
Extensible Markov Model (EMM) • Time Varying Discrete First Order Markov Model • Nodes are clusters of real world states. • Learning continues during application phase. • Learning: • Transition probabilities between nodes • Node labels (centroid/medoid of cluster) • Nodes are added and removed as data arrives
2/3 1/2 N3 2/3 N1 2/3 1/2 N3 1/3 1/1 N2 N1 N1 1/2 2/3 1/3 1/1 N2 1/3 N2 N1 1/3 N2 N3 1/1 1 N1 1/1 2/2 1/1 N1 EMM Creation <18,10,3,3,1,0,0> <17,10,2,3,1,0,0> <16,9,2,3,1,0,0> <14,8,2,3,1,0,0> <14,8,2,3,0,0,0> <18,10,3,3,1,1,0.>
EMMRare • EMMRare algorithm indicates if the current input event is rare. Using a threshold occurrence percentage, the input event is determined to be rare if either of the following occurs: • The frequency of the node at time t+1 is below this threshold • The updated transition probability of the MC transition from node at time t to the node at t+1 is below the threshold
References • VoIP Mechanic, “What is VoIP?, a tutorial.” http://www.voipmechanic.com/what-is-voip.htm . • Yu Meng, Margaret Dunham, Marco Marchetti, and Jie Huang, ”Rare Event Detection in a Spatiotemporal Environment,” Proceedings of the IEEE Conference on Granular Computing, May 2006, pp 629-634. • Cisco, “CDR Logging Configuration with Syslog Servers and Cisco IOS Gateways,” Document ID: 14068, February 24, 2006, http://www.cisco.com/en/US/tech/tk1077/technologies_tech_note09186a0080094e72.shtml . • Cisco, “Voice Over IP – Per Call Bandwidth Consumption,” Document ID: 7934, February 2, 2008, http://www.cisco.com/en/US/tech/tk652/tk698/technologies_tech_note09186a0080094ae2.shtml . • “VoIPThink”, http://www.en.voipforo.com , Accessed February 1, 2012. • Jie Huang, Yu Meng, and Margaret H. Dunham, “Extensible Markov Model,” Proceedings IEEE ICDM Conference, November 2004, pp 371-374. • Yu Meng and Margaret H. Dunham, “Efficient Mining of Emerging Events in a Dynamic Spatiotemporal,” Proceedings of the IEEE PAKDD Conference, April 2006, Singapore. (Also in Lecture Notes in Computer Science, Vol 3918, 2006, Springer Berlin/Heidelberg, pp 750-754.) • Yu Meng and Margaret H. Dunham, “Mining Developing Trends of Dynamic Spatiotemporal Data Streams,” Journal of Computers, Vol 1, No 3, June 2006, pp 43-50. • Yu Meng and Margaret H. Dunham, “Efficient Mining of Emerging Events in a Dynamic Spatiotemporal,” Proceedings of the IEEE PAKDD Conference, April 2006, Singapore. (Also in Lecture Notes in Computer Science, Vol 3918, 2006, Springer Berlin/Heidelberg, pp 750-754.) (Extended version submitted to Journal of Computers.) • Yu Meng, Margaret Dunham, Marco Marchetti, and Jie Huang, ”Rare Event Detection in a Spatiotemporal Environment,” Proceedings of the IEEE Conference on Granular Computing, May 2006, pp 629-634.