1 / 27

VoIP Data IIIT Allahabad

VoIP Data IIIT Allahabad . Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Dallas, Texas 75275, USA mhd@lyle.smu.edu Support provided by Fulbright Grant and IIIT Allahabad. VoIP Data Outline. VoIP overview CDR CDR Example using EMM.

dobry
Download Presentation

VoIP Data IIIT Allahabad

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. VoIP DataIIIT Allahabad Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Dallas, Texas 75275, USA mhd@lyle.smu.edu Support provided by Fulbright Grant and IIIT Allahabad

  2. VoIP Data Outline • VoIP overview • CDR • CDR Example using EMM

  3. VoIP Overview http://www.voipmechanic.com/what-is-voip.htm

  4. VoIP Advantages • Travel • Cost reduction • Additional Features: Voice messages, call forwarding, logs, caller ID, … • Integration of business tools • Common network infrastructure

  5. VoIP Disadvantages • Need reliable broadband internet connection • Voice quality

  6. Telephone-VoIP Steps • Analog Telephone Adapter (ATA) converts analog phone call to digital signal. • Sent over internet as data packets. • Converted back to digital analog.

  7. VoIP Codec • Software on server or ATA that converts voice signal into digital data. • COmpressor – DECompressor • COder – DECoder • Sample (8000, 24000, 32000 times per second) • Sort • Compress • Packetize

  8. Protocols • SIP (Session Initiation Protocol) • Signaling to set up and tear down sessions. • SDP (Session Description Protocol) • Describe call • RTP (Realtime Transport Protocol) • Exchange data/voice packets • Media Transport to transmit packets

  9. SIP • Setup • Connect • Disconnect • Syntax similar to HTTP • Bind to IP address using SIP registration • URLs for address format: mhd@lyle.smu.edu • Independent of application or data types • Uses RTP and SDP

  10. SIP Overview http://www.voipmechanic.com/sip-basics.htm

  11. VoIP Data Packet [4]

  12. VoIP Data • Any of this digital data could be saved and analyzed. • Typically only statistical/summary information about the calls is saved • These Call Detail Records (CDR) are use for billing and analysis

  13. Call Detail Record • Log of VoIP usage • May be by account • Typical attributes: • Source • Destination • Duration of call • Amount billed • Total usage time in billing period • Remaining time in billing period • Total charge in billing period • The format of the CDR varies among VoIP providers or programs. Some programs allow CDRs to be configured by the user.

  14. CDR Generation [3] • Usually created through special Authentication, Authorization, and Accounting (AAA) server. • May also be created by logging capabilities at gateway or router using a syslog server software. • Normally simply csv format. • Normally uses UDP, so underlying data packets are not sequenced and may be lost (Redundancy of servers can help.) • Timestamps between routers can be synchronized using a Network Time Protocol (NTP). • CDR generated for both forward and return leg of call. • http://www.cisco.com/en/US/tech/tk1077/technologies_tech_note09186a0080094e72.shtml

  15. Example: CISCO CDR Data • VoIP traffic in their Richardson, Texas facility from Mon Sep 22 12:17:32 2003 to Mon Nov 17 11:29:11 2003. • Over 1.5 million call trials were logged • 272,646 connected calls • 66 attributes including source, destination, starting time, duration, routing/switching, device, etc • Application: Anomaly Detection (Classification) • Goal: Find unusual call patterns based on type and time of call • Technique: New data structure, New classification algorithm, New visualization technique • Sample of raw csv data: http://lyle.smu.edu/~mhd/iiit/start.csv

  16. CISCO Preprocessing • Remove the attributes other than source, destination, starting time, duration from the logs. • Count the connected calls and discard unconnected calls. • The total number of connected calls was 272,646.5 phone classes: internal, local, national, international, unknown. • 25 link classes (source class + destination class) • Data is aggregated into 15 minute time intervals. • The total number of time points is 5422 and the total number of attributes is 26. • Add two attributes, namely, type of day (workday or weekend) and time of the day, to the processed data. This step gives a spatio-temporal cube in the model space. • http://www.engr.smu.edu/~mhd/7331f08/CISCOEMM.xls

  17. CISCO Data Visualization http://www.lyle.smu.edu/~mhd/7331f11/CiscoEMM.png

  18. Spatiotemporal Stream Data • Records may arrive at a rapid rate • High volume (possibly infinite) of continuous data • Concept drifts: Data distribution changes on the fly • Data does not necessarily fit any distribution pattern • Multidimensional • Temporal • Spatial • Data are collected in discrete time intervals, • Data are in structured format, <a1, a2, …> • Data hold an approximation of the Markov property.

  19. Spatiotemporal Environment • Events arriving in a stream • At any time, t, we can view the state of the problem as represented by a vector of n numeric values: Vt = <S1t, S2t, ..., Snt> Time

  20. Data Stream Modeling • Single pass: Each record is examined at most once • Bounded storage: Limited Memory for storing synopsis • Real-time: Per record processing time must be low • Summarization (Synopsis )of data • Use data NOT SAMPLE • Temporal and Spatial • Dynamic • Continuous (infinite stream) • Learn • Forget • Sublinear growth rate - Clustering 20

  21. MM A first order Markov Chain is a finite or countably infinite sequence of events {E1, E2, … } over discrete time points, where Pij = P(Ej | Ei), and at any time the future behavior of the process is based solely on the current state A Markov Model (MM) is a graph with m vertices or states, S, and directed arcs, A, such that: • S ={N1,N2, …, Nm}, and • A = {Lij | i 1, 2, …, m, j 1, 2, …, m} and Each arc, Lij= <Ni,Nj> is labeled with a transition probability Pij = P(Nj | Ni).

  22. Extensible Markov Model (EMM) • Time Varying Discrete First Order Markov Model • Nodes are clusters of real world states. • Learning continues during application phase. • Learning: • Transition probabilities between nodes • Node labels (centroid/medoid of cluster) • Nodes are added and removed as data arrives

  23. 2/3 1/2 N3 2/3 N1 2/3 1/2 N3 1/3 1/1 N2 N1 N1 1/2 2/3 1/3 1/1 N2 1/3 N2 N1 1/3 N2 N3 1/1 1 N1 1/1 2/2 1/1 N1 EMM Creation <18,10,3,3,1,0,0> <17,10,2,3,1,0,0> <16,9,2,3,1,0,0> <14,8,2,3,1,0,0> <14,8,2,3,0,0,0> <18,10,3,3,1,1,0.>

  24. EMMRare • EMMRare algorithm indicates if the current input event is rare. Using a threshold occurrence percentage, the input event is determined to be rare if either of the following occurs: • The frequency of the node at time t+1 is below this threshold • The updated transition probability of the MC transition from node at time t to the node at t+1 is below the threshold

  25. Sublinear Growth Rate

  26. Rare Event in Cisco Data

  27. References • VoIP Mechanic, “What is VoIP?, a tutorial.” http://www.voipmechanic.com/what-is-voip.htm . • Yu Meng, Margaret Dunham, Marco Marchetti, and Jie Huang, ”Rare Event Detection in a Spatiotemporal Environment,” Proceedings of the IEEE Conference on Granular Computing, May 2006, pp 629-634. • Cisco, “CDR Logging Configuration with Syslog Servers and Cisco IOS Gateways,” Document ID: 14068, February 24, 2006, http://www.cisco.com/en/US/tech/tk1077/technologies_tech_note09186a0080094e72.shtml . • Cisco, “Voice Over IP – Per Call Bandwidth Consumption,” Document ID: 7934, February 2, 2008, http://www.cisco.com/en/US/tech/tk652/tk698/technologies_tech_note09186a0080094ae2.shtml . • “VoIPThink”, http://www.en.voipforo.com , Accessed February 1, 2012. • Jie Huang, Yu Meng, and Margaret H. Dunham, “Extensible Markov Model,” Proceedings IEEE ICDM Conference, November 2004, pp 371-374. • Yu Meng and Margaret H. Dunham, “Efficient Mining of Emerging Events in a Dynamic Spatiotemporal,” Proceedings of the IEEE PAKDD Conference, April 2006, Singapore. (Also in Lecture Notes in Computer Science, Vol 3918, 2006, Springer Berlin/Heidelberg, pp 750-754.) • Yu Meng and Margaret H. Dunham, “Mining Developing Trends of Dynamic Spatiotemporal Data Streams,” Journal of Computers, Vol 1, No 3, June 2006, pp 43-50. • Yu Meng and Margaret H. Dunham, “Efficient Mining of Emerging Events in a Dynamic Spatiotemporal,” Proceedings of the IEEE PAKDD Conference, April 2006, Singapore. (Also in Lecture Notes in Computer Science, Vol 3918, 2006, Springer Berlin/Heidelberg, pp 750-754.) (Extended version submitted to Journal of Computers.) • Yu Meng, Margaret Dunham, Marco Marchetti, and Jie Huang, ”Rare Event Detection in a Spatiotemporal Environment,” Proceedings of the IEEE Conference on Granular Computing, May 2006, pp 629-634.

More Related