430 likes | 550 Views
Demand-oriented Supply of Digital Content. Tutorial at RCDL 2009 Kurt Sandkuhl 17 September 2009. Overview . Background information overload Information Logistics Information Demand Modelling User profiles Situation based Context based Evaluating Content
E N D
Demand-oriented Supply of Digital Content Tutorial at RCDL 2009 Kurt Sandkuhl17 September 2009
Overview Background • information overload • Information Logistics Information Demand Modelling • User profiles • Situation based • Context based Evaluating Content Matching Content and Demand
Information Overload Phenomenon of “information overload” has been observed and studied since many decades Increasing amount of information 5000 new books every day, total amount of information doubles every 5 years, 550 billion web pages (life time: 50 -100 days), etc. We are dependant on information information is an important production factor
Information Overload – Survey in the U.S. • Source: Perspectives on Information • Retrieval. Delphi Group, 2002 estimated time business professionals spend searching for information Perceived impediments for locating information
Survey in Region Småland (Sweden) 10 pilot interviews; questionnaires with 35 questions 412 companies; response rate 39,8%; 71% SME not product oriented no subscription function > 120 minutes < 10 minutes 60 – 120 minutes not process oriented too many hits 30 - 60 minutes 10 - 30 minutes irrelevant hits Time needed daily to find right information Perceived shortcomings in Intranet/DMS
Information Logistics The main objective of Information Logistics is improved information supply and information flow. This is based on needs and demands with respect to the content, the time of delivery, the location, the presentation and the quality of information. The scope can be a single person, a target group, a machine/facility or any size of networked organisation. The research field Information Logistics explores, develops and implements concepts, methods, technologies and solutions for the above mentioned purpose.
Demand Content Distribution Find & select the right information content; aggregate & present Information Logistics Information demand w.r.t. content, time, place, quality Provide informationtimely in the best way
What determines our Information Demand? • Topic of interest • Task we are working on • Skills and education background • Role and position • Time relative to an event, or in absolute terms • Location • Social environment • ... and probably more
IR Perspective: Different Manifestations of Relevance • System or algorithmic relevance:Relation between a query and information objects in a system retrieved by a given algorithm • Topical or subject relevance:relation between subject or topic expressed in a query, and topics or subjects covered by the retrieved information object • Cognitive relevance or pertinence:relation between the cognitive information need of a user and the information objects retrieved • Situational relevance or utility:relation between situation, task or problem at hand and information objects retrieved • Motivational or affective relevance:relation between intents, goals, and motivations of a user information objects retrieved
Information Demand • Information Demand (Lundqvist et al. 2004) • Information Demand isthe constantly changing need for current, accurate, and integrated information to support (business) activities, when ever and where ever it is needed. • Information Demand Modeling aims at capturing and formalizing all information relevant for demand orientedinformation supply • Selected approaches for Demand Modeling • User profile based • Situation based • Context based
User Profiles • Some Examples • Roaming profiles in MS-Windows • Cookies • Profiles in mobile phones • W3C’s Composite Capability / Preference Profiles • Definition: • A structured set of data representing preferences of and information about a user in a machine understandable way • Typical characteristics • User provides input for the profiles • Capture the different perspectives of information demand • Limits in covering dependencies between perspectives
Example: Weather Information on demand (2) • Example: Precise weather warning based on user profiles and radar prognosis Storm Forecast Logistic User Information flow
Situation-based Message Supply (1) user’s situations incomingmessages t t flight lunch atairport bus rideto airport
Situation-Based Message Supply (2) t t Electronic Assistant flight lunch atairport bus rideto airport
Finding the right situation for information supply Objective: motivational relevance 4 Acceptance 3 Utility Situation Topic Message 1 2 Topical Relevance Is classified by
Context • One of many definitions (Dey, 2001) • Context is any information that can be used to characterize the situation of an entity. An entity is a person, place, or object that is considered relevant to the interaction between a user and an application, including the user and applications themselves. • A system is context-aware if it uses context to provide relevant information and/or services to the user, where relevancy depends on the user’s task.
Context-based Information Demand Enterprise Model Process Model Organisational Structure Resources Information Demand Context Information Demand Role Activities Selected Resources Social Network Situation Contacts Member- ships Time Topic
Docs Task Task Role Tools Information DemandContext Input
Information Demand Patterns • An information demand pattern addresses a recurring information demand problem that arises for specific roles and work situations in an enterprise, and presents a solution to it. • Information demand pattern consists of • organisational context where the pattern is useful • Problems of a role that the pattern addresses. • The solution that resolves the problem: • Information demand of the role • Quality criteria for the different parts of the information demand • a timelineindicating important points in time • The effects that play in forming a solution • Pattern representation: textual description and visual model
Example: Material Specification Responsible (ctd.) Timeline Policy, law, regulation changes Customer Complaints Test results Customer Change Requests Production Complaints Supplier Changes time - several months - 8 weeks Effectiveness Of Changes - 4 weeks
How to decide whether information is relevant? • Different aspects for deciding on relevance • Metadata • Structure • Content • Source / location • Recommendations • etc.
Language Identification Word stem Stopwords Compound terms Synonym Translation Entity Class Typical Approach from IR: Topic Extraction • Topics describe the content and used as features of a text • Words occur in various forms, some words are irrelevant • „Information Logistics“ is one term • „car“, „auto“, „motorcar“, „bil“ means the same • Location (City, Island), Personal name, Company name, Date, active ingredient
Content Categorization by using concept paths • Aim: define additional meta-data based on topic extraction • Categorization approach: • taxonomic structure used for categorization(used with established ontology or taxonomy) • Concept sets:mapping from the set of artefacts to the power set of the concept set occurring in the taxonomy • Concept paths through the taxonomy (taxonomic paths):mapping from the set of artefacts to the power set of TP(TP is the set of all taxonomic paths)
Product → Component → Engine Events → Meeting → Board Meeting Location → ISST → Berlin → R6.30 Taxonomic Path Sets Business Location Product ING Component subsume Engine Electric Light T5 M6 Front Light
Product → Component → Engine → T5 … … Product → Component → Engine → T5 … … 5 5 1 4 9 8 1 2 7 6 3 2 Semantic Similarity Report on the product T5 The engine T5 will startedto be produced in late October2004. Till then the engine willbe extensively tested. Update on the product T5 The tests of the engine T5failed several times. Theengine will be redesigned tillOctober 2004 ≈ + +
How to use demand models for Content Supply? Content meta-data(expressed as taxonomic paths) Information demand model (expressed in enterprise modeling language) Comparing concept sets Matching between structures (e.g. ontologies)
Approaches for comparing concepts sets • Not only for taxonomies or ontologies • Requires the existence of an approach to calculate the similarity of each pair of individuals in the given sets • Methods propose ways to aggregate the individual similarities to a single value • Optimal matching • all objects in every set are coupled (have a distance) to at most one object in the other set • Matching is maximized if no more couplings can be added • Matching is optimal if it minimizes the total distance
Distance Calculation between Concepts in Ontologies • String Similarity: edit-distance like functions (e.g. Levenstein distance, Monger-Elkan distance, Jaro-Winkler distance) and token-based distance functions (e.g. Jaccard similarity, TFIDF or cosine similarity, Jense-Shannon distance). • Synonyms (with the help of dictionary or thesaurus), like WordNet, can support improving the similarity measure • Structure Similarity: usually is based on is-a or part-of hierarchy of the ontology in the graph. For example, if two classes’ super classes and sub classes are same, we may say these two classes are same. • Based on instances: FCA-Merge is a method for comparing ontologies that have a set of shared instances or a shared set of documents annotated with concepts from source ontologies. Uses Formal Concept Analysis to produce a lattice of concepts which relates concepts from the source ontologies.
Some approaches for comparing concept sets • Rada, Mili, Bicknell, Blettner (distance) • Length of the shortest path between two concepts • Only based on taxonomy • Resnick (similarity) • The more information two concepts share, the more similar they are • Similarity based on information content for a concept’s next neighbor in the taxonomy • Uses a text corpus in order to determine information content • Sussna (distance) • Includes all possible couplings (not only the taxonomy) • Every relation type has a certain weight interval • Distance is calculated by summing up the distance between neighboring concepts on the shortest path between the concepts
Jönköping Approach • Combination of Sussna’s distance formula and distance between sets based on an optimal mapping • Not too complex algorithm: • Calculate distance between all possible pairs of concepts in the two sets (Sussna’s distance formula) • Identify the optimal mapping, a limited coupling, by selecting concept pairs of the results from previous step • Identify distance between sets of concepts by using the result of the previous step as mapping in the below formula
Information Logistics Challenge: Not any information to any person any time, but only the right information for a demand Change the perspective on information supply: • Start from demand, actively support users • Important for decision and problem solving processes Demand modelling, taxonomic paths as content meta-data and ontology matching can help to achieve demand-oriented information supply
Any challenges left? Source: Thomas Norrby Creative consultant www.vioni.se
Ongoing Research in Jönköping Efficient development of ontologies • Methodology for ontology development • Automatic Ontology Development based on Patterns Integration of heterogeneous information sources • On-demand ontology integration based on P2P approaches Ontology Evolution Information Demand Modelling • Methodology for modeling information demand • Information Demand Patterns Ontology Matching based on Context Content Demand
Thank you for your attention! • Questions? • More information about Information Engineering: • www.hj.se/cenit • infoeng.hj.se • www.informationslogistik.se • www.informationslogistik.org • www.isst.fraunhofer.de