1 / 42

Automating the Committee Meeting: Intelligent Integration of Information From Diverse Sources

Automating the Committee Meeting: Intelligent Integration of Information From Diverse Sources. Pedrito Maynard-Zhang. Department of Computer Science & Systems Analysis. Information Integration. Information integration is ubiquitous: Committee meetings Research papers

yanka
Download Presentation

Automating the Committee Meeting: Intelligent Integration of Information From Diverse Sources

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automating the Committee Meeting:Intelligent Integration of Information From Diverse Sources Pedrito Maynard-Zhang Department of Computer Science & Systems Analysis

  2. Information Integration Information integration is ubiquitous: • Committee meetings • Research papers • Information retrieval on the web • Assessing intelligence on the battlefield • …

  3. Information Integration

  4. Outline • Introduction • Automating Information Integration • Database Integration • Model Integration • Conflict Resolution and Meta-Information • Integrating Learned Probabilistic Information • Conclusion and Current Work

  5. Multi-Disciplinary Research • Databases (e.g., Halevy’s group at U. of Washington) • Artificial Intelligence (e.g., Stanford’s Knowledge Systems Laboratory) • Business (e.g., MIT-Sloan’s Aggregators Group) • Decision Analysis (e.g., Clemen & Winkler’s work at Duke)

  6. Database Integration Mediation Layer Proteins Nucleotide Sequences Genes bioinformatics query Entrez Locus- Link OMIM Gene- Clinics Source Databases

  7. Database Integration • Application: Querying distributed databases • Examples • Bioinformatics • Corporate data management • Question-answer systems on the web • Detecting bioterrorism

  8. Model Integration super model cdi = CIRDE BML … if cancer then operate … mathematical model expert system probabilistic model

  9. Model Integration • Applications: Diagnosis and prediction • Examples: • Medical diagnosis • NASA spacecraft design and diagnosis • Expert system integration • Combining commonsense knowledge bases

  10. Challenges • Efficient query processing and optimization • Parsing XML • Defining expressive yet tractable mediator languages • Handling heterogeneous source languages • Wrapper technology development

  11. Challenges • Resolving ontological differences • e.g., realizing that the field “Name” for one source stores the same information as “First Name” and “Last Name” for another. • Detecting conflicts • Resolving conflicts • Resolution done manually in practice • We can automate more!

  12. Uninformed Integration What’s the weather like? raining raining sunny

  13. Intelligent Integration What’s the weather like? raining raining sunny practical joker meteorologist own eyes

  14. Types of Meta-Information • Credibility, experience, political clout • Areas of expertise • How source acquired information: • Source’s sources • Processes source used to accumulate information • Structure of the data representation

  15. Outline • Introduction • Automating Information Integration • Integrating Learned Probabilistic Information • Medical Scenario • Semantic Framework • LinOP-Based Aggregation • Aggregating Bayesian Networks • Experimental Validation • Conclusion and Current Work

  16. Medical Expert System Scenario Expert system 20 years experience 10 years experience 3 years experience

  17. Source Meta-Information • Doctors learned probabilistic models from patient data using some known standard learning algorithm. • We know the relative amount of experience doctors have had (i.e., years of practice).

  18. Popular Aggregation Approaches • Intuition approach: Take simple weighted averages, etc.  unexpected behavior • Axiomatic approach: Find aggregation algorithm satisfying certain “obvious” properties  impossibility results • Problem: Not semantically grounded

  19. Aggregation Semantics aggregation algorithm learning algorithm p1 p2 learning algorithm … … aggregatedistributionp pL ^ … learning algorithm  learningalgorithm optimaldistributionp* M samples generated from the true distribution p not available in practice

  20. Linear Opinion Pool (LinOP) • LinOP: Weighted sum of joint distributions. • Precisely, for joint distributions pi and joint variable instantiation w, LinOP(p1, p2, …, pL)(w) = i ipi(w). • i weights: relative experience. • Satisfies unanimity, non-dictatorship, and marginalization. • Doesn’t preserve shared independences.

  21. LinOP and Joint Learning If • sources learn joint distributions using maximum likelihood or MAP learning and • the same learning framework would be used on the combined data set to learn p* then p* LinOP(p1, p2, …, pL).

  22. Bayesian Network (BN) • Summary: Compact, graphical representation of a probability distribution. • Definition: Directed acyclic graph (DAG) over nodes (random variables); each node has a local conditional probability distribution (CPD) associated with it. • Exploits causal structure in the domain.

  23. Alarm BN P(B) .001 P(E) .002 Burglary Earthquake B E P(A) + + .95 + - .94 - + .29 - - .001 Alarm A P(J) + .90 - .05 A P(M) + .70 - .01 JohnCalls MaryCalls

  24. BN Advantages • Compact representation and graph encodes conditional independences. • Elicitation easy in practice. • Inference efficient in practice. • Can be learned from data. • Deployed successfully – medical diagnosis, Microsoft Office, NASA Mission Control, and more.

  25. BN Learning • Idea: Select BN most likely to have generated data. • Standard algorithm: • Search over structures by adding, deleting, and reversing edges. • Parameterize and score structures using statistics from the data. • Penalize complex structures.

  26. Aggregating BNs • Each source i learns BN pi. • p* is the BN we would learn from the combined data set. • We want to approximate p* as closely as possible by aggregating p1, …, pL. • Source information: estimates for the relative experience of the sources and the total amount of data seen (M).

  27. AGGR: BN Aggregation Algorithm • Idea: Use BN learning algorithm. • Problem: We don’t have data. • Key observation: We can use LinOP to approximate the statistics needed for the parameterization and scoring steps! • Also, we can use LinOP properties to make algorithm reasonably efficient.

  28. Asia BN Smoking Visit to Asia Tuberculosis Lung Cancer Abnormality in Chest Bronchitis Dyspnea X-Ray

  29. Experimental Setup • Generate data for sources from well-known ASIA BN which relates smoking, visiting Asia, and lung cancer. • Compare our algorithm AGGR against the optimal algorithm OPT that has access to the combined data set. • Accuracy measure: KL divergence from generating distribution.

  30. Sensitivity to M Experiments • Sensitivity to M • Size of the combined data set M varies. • AGGR’s estimate of M is accurate. • Sensitivity to Estimate of M • Size of the combined data set M is fixed. • AGGR’s estimate of M varies.

  31. Sensitivity to M

  32. Sensitivity to Estimate of M M=10k

  33. Subpopulations • Each source’s data may come from a different subpopulation P(D|Si), where D is the data. • We want to learn P(D). • P(D) = LinOP(P(D|S1), P(D|S2), …, P(D|SL)) with sources’ weights based on P(Si). • We can apply the same algorithm.

  34. Subpopulations Experiments • In the Asia network domain, one doctor practices in San Francisco, another in Cincinnati. • Subpopulations have different priors for smoking and having visited Asia, so doctors’ beliefs are biased. • The aggregate distribution comes much closer to the original distribution.

  35. Asia BN Doctor Smoking Visit to Asia Tuberculosis Lung Cancer Abnormality in Chest Bronchitis Dyspnea X-Ray

  36. Subpopulations

  37. Contributions • A semantic framework for aggregating learned probabilistic models. • A LinOP-based algorithm for aggregating learned BNs. • Experiments showing algorithm behaves well.

  38. Outline • Introduction • Automating Information Integration • Integrating Learned Probabilistic Information • Conclusion and Current Work

  39. Conclusion • Conflict resolution is key in automated information integration. • This is a difficult task in general. • However, information about sources is often readily available. • Principled use of this information can greatly enhance the ability to resolve conflicts intelligently.

  40. Current Work • Allow dependence between sources’ data sets in probabilistic aggregation work. • Apply semantic framework to aggregation in other learning paradigms. • Explore application of algorithms to database integration, RoboCup, stock market prediction, etc. • Making committee meetings obsolete!

  41. Multi-Agent Research Zone • Research interests: • Information integration • Multi-agent machine learning • RoboCup soccer simulation league testbed • Masters students • Jian Xu: Information integration in medical informatics • Linxin Gan: Ensemble learning in stock market prediction

  42. CSA Graduate Program • Masters in Computer Science • Research areas include: • machine learning, KRR, and MAS • information retrieval, databases, and NLP • networking and virtual environments • simulation and evolutionary computation • software engineering and formal methods http://unixgen.muohio.edu/~maynarp/ maynarp@muohio.edu

More Related