330 likes | 442 Views
Matching in Information Systems. ISD3 Lecture 11. Contents. Matching exercises Integrity and Fidelity Fidelity as a matching problem – between the world and its representation in the system Stateful-stateless interaction Co-evolution of user-machine fitness.
E N D
Matching in Information Systems ISD3 Lecture 11
Contents • Matching exercises • Integrity and Fidelity • Fidelity as a matching problem – between the world and its representation in the system • Stateful-stateless interaction • Co-evolution of user-machine fitness
Fuzzy Matching in the Telephone Directory • UWE telephone directory • Only fuzzy matching is partial matching on initial string • ‘wall’ finds ‘wallace’, ‘wallis’, ‘walls’, … • Easy to do in SQL • ..where surname like ‘reqsurname%’ • Substring matching anywhere is slower • .. Where surname like ‘%reqsurname%’
Telephone Schema Person • Facilities(‘help desk’, ‘reception’ etc) forced to fit Person schema • Lack of inclusion in schema creates searching problems: • Helpdesk • Help desk • CSM help desk • No support for categories of facility to control vocabulary • A Naming and Classification problem • Need for generalisation: Surname : str Firstname : str ExtNo : str Person Contact Facility
Distance (fitness) function • Distance (P1, P2) = • Distance(P1, P2-Pref) + Distance(P2,P1-Pref) • Individual differences: • agediff = if P1.age <P2-Pref.min or P1.age >P2-Pref.max ? 1000 : 1 – abs(P1.age / ((P2-Pref.min+P2-Pref.max)/2 )) • gendiff = P1.gen == P2-Pref.gen ? 1000 : 0 • s1diff = abs(P1.s1 – P2-Pref.s1) • s2diff = abs(P1.s2 – P2-Pref.s2) • Combined weighted differences • Euclidean distance • sqrt (wtage*agediff^2 + wtgen*gendiff^2 + wts1*s1diff^2 + wts2*s2diff^2…..) • Problems • Age is a ratio scale (40 is twice as old as 20) • Preference scales are not – rating a scenario a 6 does not imply it is twice as good as a rating of 3 – Preference scales are Ordinal • Age and Gen are go-no go – simulated by very high value for a mismatch
Integrity • Data in a database should agree with the rules in the schema • Checks on values • Referential integrity • Primary key • A weak schema allows erroneous data • E.g. Invalid manager relationships in the Emp-Dept example • Need for extended Business rules in middle tier of application
Fidelity • HiFi “exactitude in reproduction” • A database as an image of its Domain of Discourse (Real World) • Loss of fidelity when: • Two records in database but only one person in the RW • Address data does not correspond to an existing address in the RW • Address in database does not correspond to the current address of its owner • But fidelity only has to be ‘good enough’ for its purpose • Veracity means roughly the same – ‘truthful’
Data Quality • Poor data quality results from loss of integrity and lack of fidelity. • “Current data quality problems cost US businesses more that $600 billion per year” (report by the Data Warehousing Institute, 2002 • Gartner Research estimates that through 2005 more than 50% of business intelligence and CRM deployments will suffer limited acceptance if not outright failure due to lack of attention to data quality issues. • Direct costs of poor quality information estimated at between 10% and 20% of revenue
Information systems / computer systems • Computer system quality depends only on ensuring the system doesn’t fall over when presented with bad data • Information Systems quality depends on ensuring the system delivers information of high quality • Information System includes procedures and guidance to users to meet this need.
Problem analysis • Analyse chain of cause and effect of poor quality • Systems approach: • Information system: • Data flow model analysed for points where errors can be injected • Organisation: • Attitudes and ethos
Data Flow in the Information System • Information source • Information gathering • Information collation • Information storage • Information retrieval
Data source problems • Data has only a limited lifetime of fidelity since world is in constant flux • Length of lifetime depends on • Volatility of the data source – address for young out-of-work person or address of retired person • Need to re-validate data on a cycle dependent on the lifetime
Data capture • Data gathering procedures a major source of error. • Integrity and Fidelity can be in conflict • If telephone number is mandatory, operator in hurry will enter any old number to get the record accepted • Data quality depends on training and guidance given to operators
Collation • Matching of new applicants with existing applicants is poor so duplicates generated. • Postcodes accepted even if not matching Post Office database
Storage • Database integrity failures or loss of backup data, or reload with duplicates (auto number primary key)
Improvement Process • Based on learning cycle • Shewart cycle – Plan- Do –Check – Act • Deming cycle • Six Sigma – Define-measure-analyse-improve-control • Kolb learning cycle – act – reflect – theorise – plan
Improvement/ Learning Cycle • Measure and observe the current process • Analyse / develop theory of causes of problem • Plan changes based in the theory • Put plan into effect • Measure /observe the resultant improvement ….
Stateless/ Stateful Interaction • Stateless • Person interacts with machine • Machine response depends only on the request (and the state of data sources..) • Each interaction is independent of previous interactions with the same person • Machine has no memory of previous interactions • Person presumably does have memory of previous interactions! • Stateful • Machine has memory of previous interactions • Response to an request depends on only on the current request but on previous interactions • Support for ‘long-running transactions’ such as placing an order, booking a holiday, buying the best house insurance
Example stateless/stateful interactions • Person- organisation • I enter my local supermarket • I enter my local pub • Person – organisation • I make a purchase from my local supermarket with a loyalty card • I go to my local pub for a drink • Person – website • I click on a link to the UWE website • I click on a link to a site and I’m prompted to accept a cookie
Stateful interaction • Advantages • Interaction is not one sided – I remember how the system has behaved, it remembers something about me and how I’ve behaved • Interaction is more like talking to another person • Machine can make better decisions about a suitable response • State can be a problem too • Stateful behaviour can be hard to understand. • Bad memories - ‘let’s just start all over again’ • Modal dialogue problem • Application puts up a modal dialogue box which must be responded to before anything else happens. • Dialogue box gets hidden behind other windows.
The evolving person-machine system User Machine machine user
Machine-side state mechanisms • A state mechanism has to deal with • What to store about the interaction • How much information about the user to retain • Issues : explicit/ implicit, transaction log, data protection act • How to store the state for the duration of the interaction • Length of interaction ranges from a site visit to ‘forever’ • Issues : what to store, security, reliability, access by other applications • Matching a user to a stored state – the ‘identity’ problem • How is a user identified • Issues : can id be spoofed, is id secure, can identity be mistaken..
Storing the state • Hidden fields in form • Server can sent data to the user in a hidden field, which will then be returned when the user resubmits • Session variable • Server can store data keyed by a session variable – session id can be sent back in hidden field • Cookies • Server sends the user a cookie to store data which is send back when the user next visits the site • Database • State is stored in a database keyed by some user characteristic
Identifying the user • IP address of client machine • Session id • User id – login id, National Insurance number, passport number … • Cahoot internet bank problem last week • Address • Mobile phone number • Biometric data – finger print, iris pattern..
What to keep • State must grow and change as the system learns more about you. • State of interaction includes: • Current attributes of user : name, company.. • History of every interaction allows unanticipated questions to be asked – cf data mining • Derived / deduced attributes – total expenditure, most recent address • For data protection reasons, must not retain any more information than necessary?? • State can be defined using a ER model even if not stored in a database
Explicit / Implicit distinction • Explicit • Facts held as data in the database • the person’s name and address • Implicit • The implicit assumptions about the user which are built into the system: • The user’s language, ethnicity, location, capabilities • Implicit -> Explicit • Surfacing assumptions • Representing assumptions explicitly • multi-language responses
User’s model of the machine • User’s need to develop their model of the machine to be able to us it effectively • Part of the machine’s task is to help the user develop an appropriate model of itself. • User’s have an implicit model of the machine – preconceptions about how to use it. • What does a person’s model of the machine look like and how does it develop?
Strategies to help the user • Reduce the need for the user to have an extensive machine model • Provide guidance • Design the interaction to work in the way a user would naturally expect: • Donald Norman’s idea of affordance • The door handle example • Use natural language • Follow / establish standards
SMS Currency converter • Exercise last year to design an SMS currency converter. • More difficult interaction design than a web page converter: • No list of currencies to select from • Message length limits explanations • More interesting • Input is limited natural language • User is mobile
Currency converter – stateful interaction • Stateful interaction • Request: Cur 100 GBP USD • Machine stores from and to codes as state, identified by originating mobile number • Request: Cur 200 • Machine identifies the request as originating from the same user, no from or to code supplied, so default to stored values • Request: Cur 100 GBP EUR • From and to codes set , so update state
Currency Converter – message format • Natural interaction • Allow multiple and surrounding spaces • Allow all sensible ordering of codes • 100 gbp usd • Gbp 100 usd • Gbp usd (assume 1 unit) • Allow noise words • Convert 100 usd into eur • Allow synonyms • Convert 100 pounds into euros (assume GBP) • Allow mistypes? • 100 GPB ERU
Currency converter - help • Helpful feedback • If request not understood, give helpful response • Format of request • Codes for common currencies • Reference to source of codes • Support country to currency code query (perhaps by another service to get basic country data?) • Should help be stateful – not the same response each time, but one which depends on what has already been send ( but how long ago?)