1 / 80

Khurshid Ahmad, Department of Computing, University of Surrey

The quality of social interaction: Towards an automatic analysis of sentiments in informative and persuasive texts. Khurshid Ahmad, Department of Computing, University of Surrey Department of Computer Science, Trinity College, Dublin, Ireland

zeke
Download Presentation

Khurshid Ahmad, Department of Computing, University of Surrey

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The quality of social interaction: Towards an automatic analysis of sentiments in informative and persuasive texts. Khurshid Ahmad, Department of Computing, University of Surrey Department of Computer Science, Trinity College, Dublin, Ireland Workshop on Information Management and e-Science, Lancaster e-Science Centre, Lancaster University, 5th October 2005

  2. Motivation Newly emergent subjects and e-Science: Behavioural Economics Investor Psychology; Social Studies of Finance; Economic Sociology; ‘The number of items of quantitative and qualitative information available to well-equipped actor is, in effect, infinite, yet the capacity of any agencement [humans, machines, algorithms, location,..] to apprehend and to interpret that data is finite’ (Hardie and Mackenzie 2005). ‘The economies of calculation’ (Mackenzie 2003, 2004, 2005)

  3. Motivation Newly emergent subjects and e-Science: “I remember ’29 very well,” Steinbeck writes (2002: 17), “We had it made…I remember the drugged and happy faces of people who built paper fortunes in stocks they couldn’t possibly have paid for…Their eyes had the look you see around the roulette table.” Then, however, “came panic, and panic changed to dull shock…People remembered their little bank balances, the only certainties in a treacherous world. They rushed to draw the money out. There were fights and riots and lines of policemen. Some banks failed; rumors began to fly”

  4. Motivation • Of all the contested boundaries that define the discipline of sociology, none is more crucial than the divide between sociology and economics […] Talcott Parsons, for all [his] synthesizing ambitions, solidified the divide. “Basically,” […] “Parsons made a pact ... you, economists, study value; we, the sociologists, will study values.” • If the financial markets are the core of many high-modern economies, so at their core is arbitrage: the exploitation of discrepancies in the prices of identical or similar assets. MacKenzie, Donald. 2000b. “Long-Term Capital Management: a Sociological Essay.” In (Eds) in Okönomie und Gesellschaft, Herbert Kaltoff, Richard Rottenburg and Hans-Jürgen Wagener. Marberg: Metropolis. Pp 277-287.

  5. Motivation • Social studies of finance repopulates abstracted financial markets with human • traders and speculators, who have particular and complex relations to what they understand to be the market; • inventors of market models and formulas, that prove to be contested and fallible interpretations of economic reality rather than unproblematic representations; • designers of technology and risk assessment models, which have normative choices and criteria at their hearts; and • journalists who do not just write impassive financial news, but play important roles in marketing financial products and creating space for speculation in everyday life. de Goede, Marieke (2005). "Resocialising and Repoliticising Financial Markets: Contours of Social Studies of Finance". Economic Sociology.Vol. 6, No. 3 - July 2005

  6. Motivation Newly emergent subjects and e-Science: Criminology: Crime Perception, Detection and Prevention; Anthropology: Ethnic and Cultural Identity ‘The number of items of quantitative and qualitative information available to well-equipped actor is, in effect, infinite, yet the capacity of any agencement [humans, machines, algorithms, location,..] to apprehend and to interpret that data is finite’ (Hardie and Mackenzie 2005)

  7. Motivation: Bounded Rationality • Herbert Simon • Mechanisms of Bounded Rationality – rationality is bounded when it fails short of omniscience – largely due to failures of knowing all of the alternatives, uncertainty about relevant exogenous events, and inability to calculate consequences(pp 356) • Human behaviour, even rational human behaviour, is not to be accounted for by a handful of invariants (pp 367)

  8. MotivationSentiment Analysis? • In the 1960’s and 1970’s “The unpredictability of inflation was a primary cause of business cycles”. • Friedman: “the level of inflation was not a problem; it was the uncertainty about future costs and prices that would prevent entrepreneurs from investing and lead to a recession” (Milton Friedman 1977). • Friedman’s conjecture “could only be plausible if the uncertainty were changing over time so this was my goal. Econometricians call this heteroskedasticity.” (Robert Engle 2003) Friedman, M. (1977), "Nobel Lecture: Inflation and Unemployment," Journal of Political Economy, 85, 451-472. Engle, Robert (2003)RISK AND VOLATILITY: ECONOMETRIC MODELS AND FINANCIAL PRACTICE, Nobel Lecture, December 8, 2003

  9. Motivation :Sentiment Analysis? • Two strands of literature imply asymmetry in the response of exchange rates to news. • First Strand: bad news in “good times” should have an unusually large impact • Second Strand: “bad news should have unusually large effects” • Robert Engle was shared the 2003 Nobel Prize in Economic sciences on formulating the impact of ‘news’ on economic and financial variables. ‘News’ was code for the ‘announcement of key economic indices by various agencies’. Torben G. Andersen, Tim Bollerslev, Francis X. Diebold &Clara Vega (2002). MICRO EFFECTS OF MACRO ANNOUNCEMENTS:REAL-TIME PRICE DISCOVERY IN FOREIGN EXCHANGE. Working Paper 8959 Cambridge, MA: NATIONAL BUREAU OF ECONOMIC RESEARCH. http://www.nber.org/papers/w8959

  10. Motivation: Bounded Rationality • Daniel Kahneman • Maps of Bounded Rationality – Two generic modes of cognitive function: an intuitive mode, where judgements and decisions are made automatically and rapidly,and a controlled mode which is deliberate and slower(pp 449) • Kahneman and Tversky found that intuitive judgements occupy a position […] between automatic operation of perception and the deliberate operations of reasoning (e.g. discrepancy between statistical judgement and statistical knowledge).(pp 450) • Highly accessible features will influence decisions, while features of low accessibility will be largely ignored.(pp459) • Abrupt transition from risk aversion to risk seeking could not be plausibly explained by a utility function for wealth (pp 461)

  11. Motivation: Bounded Rationality Japanese yen/US dollar exchange rate (decreasing solid line); US consumer price index (increasing solid line); Japanese consumer price index (increasing dashed line), 1970:1 − 2003:5, monthly observations Why is it that Japanese consumer price index is following the same trend as the US CPI?

  12. Motivation: I wrote therefore I existed; I may write and change the world ++ Language and text are constitutive (and not merely representational) -- ‘society is not reducible to language and linguistic analysis (Hodgson 2000:62). -- Discourses are broader than language, being constituted not just in texts, but also in definite institutional and organizational practices’ (Jackson 2004). ++ But text is all we have after the event, the interview, the survey, the news, the review – a trace of the sentiment.

  13. The quality of social interactionor the world according to Khurshid Ahmad Any analysis of the interaction between the members of a well defined social group, where each is engaged in optimising return on his or her economic and social investment, should involve an analysis of the 'sentiments' of the group members

  14. The quality of social interactionor the world according to Khurshid Ahmad The sentiment is expressed in the news and views that emanate for and on behalf of the members in free natural language writing and speech excerpts. The quantifiable aspects of the exchange of objects abstract (power) and concrete (money, goods, and services) have to be assessed in the context of how the news and views may impact on the exchange.

  15. The quality of social interactionor the world according to other folk More importantly the sentiment may be expressed through action: (a) panic buying and selling of financial instruments by the investors and traders, and (b) the sometimes complacent attitude of the regulators, are good examples of economic, social and political action by individuals and groups. Simon, H.A. (1978). “Rational Decision-Making in Business Organizations”. Nobel Lectures, Economics 1969-1980, (Editor) Assar Lindbeck, World Scientific Publishing Co.: Singapore, 1992. http://www.nobel.se/economics/laureates/1978/simon-lecture.html. Kahneman, D. (2002). “Maps of Bounded Rationality: A perspective on Intuitive Judgement and Choice”, Les Prix Nobel 2002. (Editor) Professor Tore Frangsmyr. http://www.nobel.se/economics/laureates/2002/kahneman-lecture.html. Mackenzie, Donald. (2000). ‘Fear in the Markets’. London Review of Books. Vol 22 (No. 8).

  16. The quality of social interactionor the world according to other folk Actions motivated by panic can equally well be seen in mass hysteria related to national/ethnic identity that, in turn, can motivate concerns related to security and safety (Jackson 2004). Jackson, Richard (2004). ‘The Social Construction of Internal War’ In (Ed.) Richard Jackson. (Re)Constructing Cultures of Violence and Peace. Rodopi: Amsterdam/New York.

  17. e-Science and social interaction? • The UK e-Science programme is moving towards successful completion. • Major contribution has been made to UK science and technology: • Bioinformatics, psychiatry, chemistry and engineering (Discovery Net and myGrid) • New ways of doing chemistry (CombeChem) • Visualisation of complex systems (RealityGrid); • Novel design (GEODISE); • Safer aircrafts (DAME)

  18. e-Science and social interaction? • Crime, conflict, and economy are deeply interrelated and highly interactive. • However, data and methods in each area are in a mono-disciplinary silo, referred to by some as data tombs, where access to others requires significant mediation. • Data required in each case includes quantitative data, textual data, and historical data.

  19. e-Science and social interaction? • Social sciences and the so-called hard sciences increasingly use complementary methodologies, and a century or more of discussion of methodology, statistical methods and structural models is witness to this. • E-Science offers the potential for convergence of scientific methods through provision of a common underlying structure, or "grid", of computational methods, data-base technologies and conceptual models.

  20. e-Science and social interaction? • Social scientists often want to develop evidence basedsubstantive theory. They want to know “what determines what”, e.g. long term unemployment and social exclusion • And social scientists want to explore the consequences ofpolicy changes on individual behaviour, e.g. encouragement to stay on at school on educational attainment, truancy, and social exclusion • Social science data sets may be small (<10GB (some exceptions)) but they are complex (Imitation is the sincerest form of flattery – Rob)

  21. e-Science and social interaction?

  22. The Surrey Society Grid Demonstrator • Was developed under the aegis of the ESRC e-Social Science Programme (FINGRID). • demonstrated how Grid technologies could support novel research activities in financial economics that involve • the rapid processing of large volumes of time-varying qualitative and quantitative data (Monte Carlo simulation, wavelet analysis, fuzzy logic and neural network based simulations) • fusing/visualising of such qualitative and quantitative data (qualitative data –news, e-mails- and quantitative data – non-stationary and heteroskadistic data collated at different frequencies and in different units.

  23. The Society Grid Demonstrator • Globus Toolkit 3.0 (based on Open Grid Services Architecture (OGSA)) • Java CogKit (Java Commodity Grid) for resource management and system integration • Languages for Development: • Java for the implementation of the application • Reuters SSL Developer’s Kit (Java) for the connection with the Reuters streaming data • Other Technologies: • XML (NewsML) for the news information • JMatlink (adapted to Linux environment for the communication with Matlab environment) • CGI for communication of Java Applet with the server side

  24. The Society Grid Demonstrator • Live financial data: news, historical time series data and tick data provided by Reuters, (Reuters SSL SDK). • Time series analysis: a FORTRAN bootstrap algorithm, and the MATLAB toolkit for Wavelet Analysis (via JMatLink) • News/Sentiment analysis: System Quirk components for terminology extraction, ontology learning and local grammar analysis. • Visualisation and fusion: System Quirk components for corpus visualisation, financial charting, and data fusion.

  25. Design and Performance of the Society Grid Time in ms (log) Number of CPUs

  26. The new (e-) Social Sciences? • Social sciences deal with collectives, or agencements comprising human beings, technical devices, algorithms, workplaces and so on (Callon 1998), such that the number of items of quantitative and qualitative information to a well equipped economic actor, or agencement, ‘is, in effect, infinite, yet the capacity of any agencement to apprehend and to interpret that data is finite’ (Hardie and MacKenzie 2005) Callon, Michael. (1998). The Laws of the Markets. Oxford: Blackwell. Hardie, Iain & MacKenzie, Donald. (July 2005). An Economy of Calculation: Agencement and Distributed Cognition in a Hedge Fund (http://www.sps.ed.ac.uk/staff/An%20Economy%20of%20Calculation.pdf)

  27. The new (e-) Social Sciences? • The number of data items available to an agencement in a market place – financial instruments, commodity markets, e-Bay (?) – is potentially infinite but at any give time only a fraction of that data can be processed. The market place is a fickle place and the information derived from historical data can be so quickly outdated that ‘in any agencement for a selective, socially distributed, technologically-mediated ‘economy of calculation’. • “The economies of calculation and the agencements that underpin them stretch beyond individual firms: the sifting of information often takes place in networks of interacting participants. The features of processes involved – for instance, where agency lies, the types of information that are deemed relevant or irrelevant, how that information is processed – are consequential. They affect, for example, the possibility of a ‘global’ market and help shape how ‘markets’ and ‘politics’ interact.” (Hardies & Mackenzie 2005). Hardie, Iain & MacKenzie, Donald. (July 2005). An Economy of Calculation: Agencement and Distributed Cognition in a Hedge Fund (available from D.MacKenzie@ed.ac.uk)

  28. The new (e-) Social Sciences? • Sentiments and the sociology of financial markets • Mackenzie has focused on how a mathematical-economics theory is used to create a new instrument – especially arbitrage (Mackenzie 2003) and options markets (Mackenzie and Millo 2003, Mackenzie 2004)- and then the theory is used to explain and monitor the workings of the instrument. • Mackenzie, Knorr-Cettina and others are studying the rise of electronic markets – where people in distant geographical locations can be ‘interactionally present’ Mackenzie, Donald. (2003). ‘Long-Term Capital Management and the sociology of arbitrage’. Economy and Society Vol. 32 (No. 3). pp 349-380.

  29. The new (e-) Social Sciences? Sentiments and the sociology of financial markets • Mackenzie used interviewing techniques to understand the collapse of a large arbitrage firm (Long-Term Capital Management, LTCM), a firm that pioneered trading of financial instruments that sought to profit from price discrepancies; the 24/7 watch on price discrepancies requires a distributed computational infrastructure. • Mackenzie (2003) has looked at the change in the value of the instruments and has conducted just under 70 interviews with partners and employees of the failed firm, including a Nobel Laureate who was a partner, and with other experts, together with documents that were found to have precipitated or hastened the demise of LTCM. The sentiment about LCTM as expressed in the interviews, and in some of the key documents, formed the basis of an analysis of a set of time series and the computation of key parameters of the time series. Mackenzie, Donald. (2003). ‘Long-Term Capital Management and the sociology of arbitrage’. Economy and Society Vol. 32 (No. 3). pp 349-380.

  30. The new (e-) Social Sciences? Sentiments and the sociology of financial markets • Mackenzie found that he was working with a community of people who had organized themselves and knew each other. There was evidence that imitation of the business model and practices adapted by the firm by others played a major role in the demise of the firm. Most importantly for us Mackenzie cites the existence of a fax sent by one of the principals of the firm that asked investors to make more investment as problems had started to arise: this fax was posted on the Internet within five minutes of its dispatch and contributed to the demise of the firm. The sentiments expressed by the principal were misconstrued by the recipients and despite the fairly sound reasons expressed in the fax, albeit in a febrile atmosphere, bounded rationality of the recipients came into play. Mackenzie, Donald. (2003). ‘Long-Term Capital Management and the sociology of arbitrage’. Economy and Society Vol. 32 (No. 3). pp 349-380.

  31. The new (e-) Social Sciences? • Sentiments and the sociology of financial markets • Knorr-Cetina and Bruegger (2002) have looked at the emergence of electronic markets and focused on the virtual societies being formed in the financial markets through the infrastructure that supports electronic trading. • The trading room operative is in a disembodied world dealing with a on-screen reality that ‘lacks an off-screen counterpart’– a form of arepresentation (appresentation) of markets. The operative is connected to others through electronic mail, news and data feeds (this is not explicitly dealt with in Knorr-Cteina and Bruegger), and has access to a computing system that can process very complex data in a timely and efficient manner. • This virtual world has fast throughput of data and processed information and the rapidity of the interaction perhaps compensates for the disembodied nature of the electronic trading markets. Knorr-Cetina, Karin & Bruegger, Urs. (2002). ‘Global Microstructures: The Virtual Societies of Financial Markets’. American Journal of Sociology. Volume 107, pp 909-950.

  32. The new (e-) Social Sciences? There is a constant stream of news and e-mails in a dealing room. Some directly from news agencies (*) and some annotated items based on the news Hardie, Iain & MacKenzie, Donald. (July 2005). An Economy of Calculation: Agencement and Distributed Cognition in a Hedge Fund (available from D.MacKenzie@ed.ac.uk)

  33. The new (e-) Social Sciences? There is a constant stream of news and e-mails in a dealing room. Some directly from news agencies (*) and some annotated items based on the news Hardie, Iain & MacKenzie, Donald. (July 2005). An Economy of Calculation: Agencement and Distributed Cognition in a Hedge Fund (available from D.MacKenzie@ed.ac.uk)

  34. The new (e-) Social Sciences? Hardie, Iain & MacKenzie, Donald. (July 2005). An Economy of Calculation: Agencement and Distributed Cognition in a Hedge Fund (available from D.MacKenzie@ed.ac.uk)

  35. The new (e-) Social Sciences? But whilst the trader is not ‘reading’ the news off the live news wire streams – Reuters, Bloomberg, BBC, CNN- somebody else is eyeballing the news for the content (Brazilian economics, Chilean politics) and the sentiment (bonds so hot that they were on fire!) Hardie, Iain & MacKenzie, Donald. (July 2005). An Economy of Calculation: Agencement and Distributed Cognition in a Hedge Fund (available from D.MacKenzie@ed.ac.uk)

  36. The classical Social Sciences: Eyeballing the text! • The key requirement in contemporary social sciences is to complement the analysis of a range of data sets, demographic, economic and political, with data related to the person (Kahneman 2002, Simon 1972), or lived experience (Sacks 1992, Sliverman 2004) Sacks, H., (1992). Lectures on Conversation. Oxford: Blackwell Publishers (Ed. Gail Jefferson). Silverman, David. (2004). ‘Who cares about experience?’. In (Ed.) David Silverman. Qualitative Research. London: Sage Publications. ‘pp 342-367.

  37. The classical Social Sciences: Eyeballing the text!

  38. The classical Social Sciences: Eyeballing the text! • What is missing in the qualitative analysis packages? • The texts have to be eye-balled – Most phrases, clauses, paragraphs have to be coded/annotated by hand  impossible task when texts all around us is exploding; • There is a need for a domain specific thesaurus (conceptually-organised terminology or ‘ontology’) for each new domain  • Identify ontological commitments; • Find terms, and the broader/narrower equivalents; synonyms and antonyms; • Maintain terminology data bases • Texts that are conceptually similar within a domain have to be clustered using unsupervised learning algorithms

  39. The new (e-) Social Sciences? Towards an automatic analysis • What is missing in the qualitative analysis packages?

  40. The new (e-) Social Sciences? Towards an automatic analysis • One key result of close social interaction is the emergence of a sub-set of the natural language of a given community that is idiosyncratic of the desires, aspirations, goals and prejudices of the community  idiosyncratic nature of the ontological commitment of the community; • The subset has its own lexicogrammar and is called language for special purposes of a given specialism • Lexicogrammar: Vocabulary (terminology) + Local Grammar

  41. The new (e-) Social Sciences? Towards an automatic analysis July 2005 Reuters Financial News Service: News items disambiguated using an automatic extracted terminology and an automatically local grammar that only recognises changes in financial instruments

  42. The new (e-) Social Sciences? Towards an automatic analysis Changes in ‘semantic orientation’ for a news input, for July 2005 for all shares in the FTSE.

  43. The new (e-) Social Sciences? Towards an automatic analysis • There is no obvious technique in social science research method that can improve the researchers productivity in collecting and analysing large volumes of speech and text. • Social scientists survey, and occasionally interview, interesting individuals in various social groups – analyse the survey form and quantify. • So what about the data collected in the field. Data is buried in tombs never to be taken out again. • Most text, if ever, is hand-coded by the social science researcher and then the proxy of the interpretation of the codes is presented as objective analysis.

  44. The new (e-) Social Sciences? Towards an automatic analysis • We present a method for systematically identifying sentiment bearing phrases in large volumes of streaming texts – a local grammar comprising templates to extract the phrases with a minimal number of false positives. • The sentiments are aligned with quantitative (time-varying) information and results co-integrated and tested for Granger causality • The grammar itself is constructed automatically from a corpus of domain specific texts

  45. Conclusions and Future Work • The methods developed in the Society Grids project can be used • to investigate how a person’s perception of his or her own well being, at different times and in different places, and in various facets - social, political and economic. • This can be the same or at variance with, say for example, crime statistics, economic indicators, achievements or failures of (other) ethnic/racial categories. • These can be extended to the new areas like • the reassurance gap in policing • totalising war discourse that leads to ethnic/racial conflicts

  46. Towards an automatic analysis of sentiments? We rely on reviews and opinion polls of various kinds: • Film & TV reviews; Book reviews; Resort reviews • Bank reviews; Automobile Review; White good reviews; • Consumer surveys; ‘write your own’ reviews; • Newspaper editorials; Editors’ choice.

  47. Towards an automatic analysis of sentiments? • We rely on the sentiment of the reviewers, editors, investment experts, and …… • We do know the cost of durables, shares, holidays. • A reasonable price is rejected if the reviews are poor; an exorbitant price is acceptable if the reviews are good; • Bad reviews stick in the mind for longer than good reviews.

  48. Towards an automatic analysis of sentiments? • We rely on the sentiment of the more vociferous in the society sometimes • The vociferous may call black white, and white black; • The vociferous may repudiate facts and purvey fiction.

  49. Towards an automatic analysis of sentiments? A new bank has just been launched: Punter Smith has passed his judgement on the bank. Which of the two columns tells us that he likes the new outfit? Turney, Peter D. (2002). “Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews”. In Proc of the 40th Ann. Meeting of the Ass. for Comp. Linguistics (ACL). Philadelphia, July 2002, pp. 417-424. (Available at http://acl.ldc.upenn.edu/P/P02/P02-1053.pdf).

  50. Towards an automatic analysis of sentiments? How can a machine detect the positive/negative sentiment from texts? We eyeball the collocation of words like excellent & poor in text corpus. The point wise mutual information is computed between word1 & word2: Semantic orientation of phrase is given as: Turney, Peter D. (2002). “Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews”. In Proc of the 40th Ann. Meeting of the Ass. for Comp. Linguistics (ACL). Philadelphia, July 2002, pp. 417-424. (Available at http://acl.ldc.upenn.edu/P/P02/P02-1053.pdf).

More Related