190 likes | 266 Views
New insights into telephone call dynamics. Analysis of call record data from the BT Home Online study David K Hunter, School of CSEE Ben Anderson, Department of Sociology Alexei Vernitski , Department of Mathematical Sciences. Transactional data in sociology. Transactional data:
E N D
New insights into telephone call dynamics Analysis of call record data from the BT Home Online study David K Hunter, School of CSEE Ben Anderson, Department of Sociology Alexei Vernitski, Department of Mathematical Sciences
Transactional data in sociology • Transactional data: • Generated by everyday life • Automatically captured as part of 'business as usual' • N = millions • Billions of data points • Literature commentary: • Surveillance, Computer Science • Social Science • Savage & Burrows, 2007 • doi:10.1177/0038038507080443 • 101 citations (Google Scholar) • http://www.youtube.com/watch?v=ARLARDwLJhw
A 21st Century Sociology? • Re-assessing old questions • Networks, place, space and social relationships (capital) • Consumption, leisure and class? • Public performance of self? • Imagining new questions? • Software & social stratification? • ? New empirical resources
Data • We have data for 400 households, collected by BT between 1998 and 2001 • For each household, we have records of their incoming and outgoing calls: • Caller’s and the callee’s ID (anonymisedtelephone numbers) • The time the call was made • The length of the call • And some other data (tariffs, ISP calls, etc) • We also have demographic data for many of the households, although we have not used this yet
Data • Interestingly, our data is not network data • We are looking at isolated fragments of the network of telephone connections • These are called “ego networks”
Timing and interrelationship of calls • This area is a useful niche for developing research • At the interface between teletraffic theory and social network analysis • More appropriate for our ego network data • Existing software would not help much • Entire dataset (400 ego networks) read into RAM • Storage format in RAM is tailored to our dataset and to the general analysis of call dynamics • Library of C functions is being developed with general applicability to this kind of analysis • In general, call dynamics, considering timing, length, interrelationship and correlation between calls • Could be integrated into stataor R
Grapevine calls and batch calls • Grapevine calls are made in response to a telephone call that has been received • Made to pass on information or to get more information • Batch calls are a collection of calls made at one sitting • Often done intentionally, to make arrangements with several people, or to pass on news • Making a single call can prompt more calls to be made, even if it was not originally intended • Other reasons: take advantage of cheap rate, boredom, loneliness
Call groups • The software is presently configured to discard any calls which: • Overlap in time with the previous one • Are to an Internet Service Provider (ISP) • Have zero cost • Are shorter than 5 seconds • Are to or from telephone numbers shorter than 8 digits • Are between two different panel households • Are between two numbers of the same panel household • Are to the same number – loopback within the same household • 1,274,916 of the original 1,590,092 calls remain • It identifies “call groups” – two or more calls where each new call begins less than 120 seconds after the previous one ends • Grapevine calls occur when the first call in a group is incoming, but the remainder outgoing • Batch calls occur when all the calls in a group are outgoing
Classification of call groups • 1,274,916 calls held in RAM – 887,019 single calls • 1,045,027 call groups – 158,008 groups of two or more calls • 887,019 groups of 1 call – 416,718 grapevine and 470,301 batch • 116,107 groups of 2 calls – 19,937 grapevine and 68,792 batch • 27,079 groups of 3 calls – 3,470 grapevine and 15,913 batch • 8,699 groups of 4 calls –819 grapevine and 5,167 batch • 3,153 groups of 5 calls –242 grapevine and 1,858 batch • 1,301 groups of 6 calls –96 grapevine and 787 batch • 653 groups of 7 calls –35 grapevine and 418 batch • 389 groups of 8 calls –19 grapevine and 241 batch • 211 groups of 9 calls –8 grapevine and 137 batch • 112 groups of 10 calls –6 grapevine and 63 batch • … and so on … • Group of 301 calls – repeated calls to 0845 756 000, an unlisted ISP number
Markov chain • Each call group is identified by a string of one or more ‘O’s or ‘I’sfollowed by a ‘G’ • An incoming call followed by two outgoing calls = “IOOG” • Each state (other than the null state) is identified by a string of one or more characters which is called the identifier • Each character is either ‘I’ or ‘O’ • The null state “” is entered when the subscriber is idle for more than 120 seconds • It has two possible outgoing transitions – into state “I” or state “O” • Every other state (represented by a string S) has three possible outgoing transitions: • To the null state • To state S+”I” • To state S+”O” • Call group of “IOOG”: • The Markov chain starts off in the null state, “” • After the first call arrives, it goes into state “I”. • When the second call arrives, it goes into state “IO” • When the third call arrives, it goes into state “IOO” • When more than 120 seondselapse without another call arriving, it goes back into the null state, “”
Markov chain states and transitions • 1058 states and 1,444 distinct transitions in the Markov chain • state '' (1,045,027 calls): freq out = 585519, in = 459508, gap = 0 • state 'O' (585,519 calls): freq out = 98358, in = 16860, gap = 470301 • state 'I' (459,508 calls): freq out = 26326, in = 16464, gap = 416718 • state 'OO' (98,358 calls): freq out = 26259, in = 3307, gap = 68792 • state 'OI' (168,60 calls): freq out = 2371, in = 915, gap = 13574 • state 'IO' (26,326 calls): freq out = 5055, in = 1334, gap = 19937 • state 'II' (16,464 calls): freq out = 1321, in = 1339, gap = 13804 • state 'OOO' (26,259 calls): freq out = 9449, in = 897, gap = 15913 • state 'OOI' (3,307 calls): freq out = 618, in = 223, gap = 2466 • state 'OIO' (2,371 calls): freq out = 600, in = 192, gap = 1579 • state 'OII' (915 calls, total 57571.96 sec): freq out = 136, in = 88, gap = 691 • state 'IOO' (5,055 calls): freq out = 1342, in = 243, gap = 3470 • state 'IOI' (1,334 calls): freq out = 220, in = 110, gap = 1004 • state 'IIO' (1,321 calls): freq out = 276, in = 101, gap = 944 • state 'III' (1,339 calls): freq out = 105, in = 222, gap = 1012 • and so on…
Conditional transition frequencies • 1058 states ending '' (2319943 calls): freq out = 771231, in = 503685, gap = 1045027 • ….. • 573 states ending 'O' (771231 calls): freq out = 153845, in = 23929, gap = 593457 • 484 states ending 'I' (503685 calls): freq out = 31867, in = 20248, gap = 451570 • 1 state ending 'O' and of length 1 (585519 calls): freq out = 98358, in = 16860, gap = 470301 • 1 state ending 'I' and of length 1 (459508 calls): freq out = 26326, in = 16464, gap = 416718 • 2 states ending 'O' and of length 2 (124684 calls): freq out = 31314, in = 4641, gap = 88729 • 2 states ending 'I' and of length 2 (33324 calls): freq out = 3692, in = 2254, gap = 27378 • ….. • 472 states ending 'OO' (153845 calls): freq out = 48990, in = 5274, gap = 99581 • 108 states ending 'OI' (23929 calls): freq out = 3821, in = 1437, gap = 18671 • 100 states ending 'IO' (31867 calls): freq out = 6497, in = 1795, gap = 23575 • 375 states ending 'II' (20248 calls): freq out = 1720, in = 2347, gap = 16181 • 1 state ending 'OO' and of length 2 (98358 calls): freq out = 26259, in = 3307, gap = 68792 • 1 state ending 'OI' and of length 2 (16860 calls): freq out = 2371, in = 915, gap = 13574 • 1 state ending 'IO' and of length 2 (26326 calls): freq out = 5055, in = 1334, gap = 19937 • 1 state ending 'II' and of length 2 (16464 calls): freq out = 1321, in = 1339, gap = 13804 • …..
Current status and future directions • The work thus far is a proof-of-principle investigation of what is possible • It has only scratched the surface of what can be done with the dataset • Specific ideas for further work follow • In particular, demographic data can also be considered • The software is presently a standalone C program • However it could be developed into functions for R or stata • The functionalities in the current software can be combined and developed further • Sophisticated analysis of call dynamics will be possible
Ideas for future topics • Link the analysis to the demographic data • Are people living alone more likely to make batch calls or grapevine calls? • How do age and gender of the household inhabitants affect the dynamics of calling patterns? • Determine whether it’s possible to devise a useful method to estimate the number of people in a house, or even age, gender etc • May be able to detect home businesses or teenagers • It might not be feasible, but it’s worth investigating • We could test the output from our method against our existing demographic data (hypothesis testing)
Further future topics • On 22 occasions, one or other subscriber made and received 200 or more calls in a day • This could be investigated in more detail, for example, day of week and times • Explanations could be sought for this behaviour • Calls to ISP, or cold calling • Demographic data would indicate if particular types of households exhibit this behaviour • Develop more sophisticated Markov chain model which considers whether same phone number occurs more than once in a call group • Study the dynamics of call timings and duration over a long period between two specific numbers
One-person households • People living alone are a special case • Particularly when subscriber is physically isolated from friends • It’s virtually certain who is making and receiving each call • Possible exception is visitors • Use of mobile phone records would also solve this anonymity problem • Effect of age and gender on calling patterns would be tied to specific individuals • Would probably have to be aware of sample bias • Relatively small fraction of customers • Even smaller proportion of calls between two one-person households • For example, could compare calling frequencies and durations between genders • Compare with findings by Friebel (Greece and Italy), and by Smoreda (France) • Friebel found that women make fewer but longer mobile calls on average
Effect of life rhythms • Corroborate and extend existing results from Lacohee and Anderson • This existing study is based on self-report data (time-use diaries) • Effect of occupation on calling times • Some occupations require shift working • Effect of having children in the household on call distribution in evening • Households with children use the phone less from 17:00 to 20:00 than those without children • The reverse is true from 21:00 to 23:00 • Generate more extensive set of results from dataset and consider influence of other demographic factors
Bibliography • ZbigniewSmoreda, Christian Licoppe, “Gender-Specific Use of the Domestic Telephone”, Social Psychology Quarterly, vol 63, no 3, 2000, pp238-252. • Hazel Lacohee,Ben Anderson, “Interacting with the Telephone”, Journal of Human Computer Studies, vol 54, no 5, May 2001, pp665-699. • Guido Friebel, Paul Seabright, “Do Women Have Longer Conversations? Telephone Evidence of Gendered Communication Strategies”, Journal of Economic Psychology, vol 32, 2011, pp348-356.