1 / 72

Qualitative Description of Complex Objects Enrique H. Ruspini Artificial Intelligence Center

Qualitative Description of Complex Objects Enrique H. Ruspini Artificial Intelligence Center. Linked Data Objects (1). O 5 : Person. O 1 : Person. R 5 : Owner_of. R 2 : Name_of. R 4 : Paid. O 7 : Institution. R 1 : Friend. O 3 : Name. R 6 : Drawn_at. R 3 : Received. O 4 : Payment.

fayre
Download Presentation

Qualitative Description of Complex Objects Enrique H. Ruspini Artificial Intelligence Center

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Qualitative Description of Complex Objects Enrique H. Ruspini Artificial Intelligence Center

  2. Linked Data Objects (1) O5: Person O1: Person R5: Owner_of R2: Name_of R4: Paid O7: Institution R1: Friend O3: Name R6: Drawn_at R3: Received O4: Payment O6: Fin_Instrument O2: Person

  3. Linked Data Objects (2)

  4. Patterns of Interest

  5. Relations of Interest

  6. Approximate Matching

  7. Similarity in Linked Structures • Similarity between Objects: • Mexico City is similar to Denver (Altitude) • France is similar to Germany (Economy) • Checks are similar to Money Orders (Financial Instruments) • Similarity between Relations: • Cash Payments are similar to Money Transfers (Financial Transactions) • Sequence (G1, Pos1234) is similar to Sequence (G2, Pos3421) (Genomics) • The 1929 Market Crash was similar to the 1987 Market Crash (Economics)

  8. Link Discovery • Find interesting linked structures: • Equal or similar to a predefined pattern • Must satisfy (extended) equivalence relations between template and instantiated pattern • Similar objects • Similar relationships between objects“Find transactions matching money-laundering patterns” • Find interesting relations between structures:“The actors/roles in Situation A12 are similar to those in Situation B34”

  9. Biomolecules as seen by a Computer .... HEADER GENE REGULATING PROTEIN 26-JUL-90 3CRO 3CRO 2 COMPND 434 CRO PROTEIN COMPLEX WITH 20 BASE PAIR PIECE OF /DNA$ 3CRO 3 COMPND 2 CONTAINING OPERATOR /OR1$ 3CRO 4 SOURCE PHAGE 434 3CRO 5 AUTHOR A.MONDRAGON,S.C.HARRISON 3CRO 6 REVDAT 1 15-OCT-91 3CRO 0 3CRO 7 ............................................................................. ATOM 5 O5* A A 1 -16.851 -5.543 74.981 1.00 55.62 3CRO 148 ATOM 6 C5* A A 1 -18.254 -5.683 75.238 1.00 51.97 3CRO 149 ATOM 7 C4* A A 1 -18.600 -7.125 75.571 1.00 37.32 3CRO 150 ATOM 8 O4* A A 1 -19.740 -7.166 76.456 1.00 26.97 3CRO 151 ATOM 9 C3* A A 1 -18.978 -8.004 74.382 1.00 34.63 3CRO 152 ATOM 10 O3* A A 1 -18.314 -9.224 74.465 1.00 30.96 3CRO 153 ATOM 11 C2* A A 1 -20.466 -8.236 74.564 1.00 54.40 3CRO 154 ATOM 12 C1* A A 1 -20.537 -8.253 76.076 1.00 31.85 3CRO 155 ATOM 13 N9 A A 1 -21.868 -7.978 76.551 1.00 18.79 3CRO 156 ATOM 14 C8 A A 1 -22.501 -6.770 76.700 1.00 20.51 3CRO 157 ATOM 15 N7 A A 1 -23.737 -6.871 77.141 1.00 6.86 3CRO 158 ATOM 16 C5 A A 1 -23.910 -8.231 77.267 1.00 2.00 3CRO 159 ATOM 17 C6 A A 1 -24.991 -8.982 77.706 1.00 4.41 3CRO 160 ....................... 4

  10. ... and by a Biologist 5

  11. Structure and Detail

  12. A graph of a computational object ... Source: http://www.marketguide.com/MGI/PRODUCTS/chart.htm?symb=NKE, Accessed 2/2/98

  13. ... and its interpretation in investing terms Trends: OverallSubstantial Decline Rapid Decline in 1Q followingSharp Reversal Relatively Stable Period afterwards Moderate Decline in 4Q ....... Temporal Patterns: Major Upward Spike in late Spring 4Q Decline followed byShort Panic Reversal and Partial Recovery Pronounced Double Top Reversal in Summer Descending Triangle Reversal in 3Q .....

  14. OBJECTIVES 1. DISCOVER:Interesting patterns within an object , at various levels of detail EXAMPLE: “Charged Arm,” “Concave Pocket” 2. RELATE:Discovered patterns according to relevant, interesting, relationships EXAMPLE: “Spike follows Recovery” 3. DESCRIBE:Discovered Structures and Patterns (Qualitative Descriptions) EXAMPLE: <<Spike, Midterm>> 4. ANNOTATE:Objects with textual descriptions based on discovered patterns EXAMPLE: “The arm protrudes midway from the ..” 5. MINE: Variable and Object Relationships in Collection on the basis of Qualitative Descriptions (Qualitative DM) EXAMPLE: “Panic Reversals follow Short Periods of High Level Buying” 8

  15. Technical Approach 1. DISCOVER:Based on solution of constrained optimization problems (best fitting) by soft-computing techniques (Fuzzy Clustering) 2. RELATE:Relations are found by optimal fitting of relationships from catalog of interesting relations to discovered structures and by summarization of results of the discovery step 3. DESCRIBE:Hierarchical structures organized by level of granularity/ inclusion 4. ANNOTATE:Text generation employing Natural Languagemethods developed at the AIC 5. MINE: Based on Generalized Association Rule Discovery(e.g., ANFIIS), Possibilistic Network Learning, and Fuzzy Clustering 15

  16. The Notion of Similarity • Basic primitive concept • Fundamental cognitive capability • Commonly expressed by (numerical) measures quantifying “resemblance” • - S : X  X [0,1] • - S(x, y) = 1 means that x and y are very similar • S(x, y) = 0 means that x and y are very different • Resemblance is always measured from some perspective • Generalized Reflexive, Symmetric, Transitive Relation (Fuzzy Equivalence Relation)

  17. The Clustering Problem

  18. Fuzzy Clustering (Ruspini, 1969) • Map each point into a vector representing degrees of membership to a fuzzy partition • C : X [0, 1] c : x ( C1(x), C2(x), ..., Cc(x) ) , • For all x, C1(x) + C2(x) + ... +Cc(x) = 1.

  19. Similarity-based Clustering • Basic idea: Map Sample Space into “Classification Space” • Mapping should be optimal in some sense • Mapping should define a partition of the sample space • - Similar sample points should receive similar classifications (Ruspini)

  20. Object Clustering • Data is generally expressed as a collection of vectors in Rn • Categorical Data may be considered but requires special handling • Relies on various (sometimes hidden or implicit) measures of distance/similarity between objects

  21. Prototype-based Clustering (Bezdek)

  22. Comments onc-means Clustering • In most cases, the distance D is simply the Euclidean Distance in Rn • Crisp c-means: assign X to the class of its closest prototype • Fuzzy, Possibilistic c-means: degree of fuzzification depends on m • Possibilistic c-means: The weights in the definition of the Objective Functional J penalize clusters that are too small • Solution is determined by Alternating Optimization : • Find clusters for fixed prototypes • Find prototypes for fixed clusters • Alternative solution algorithms are important from the viewpoint of Data Mining (e.g., Sequential Iteration) • Related to ISODATA

  23. Possibilistic Clustering • Ruspini, 1977; Krishnapuram and Keller, 1993 • Does not require “probabilistic” constraint • Based on idea of model fitness and utilization of additional constraints

  24. Generalized Prototypes • The notion of prototype may be changed to include more general structures • Linear varieties • Shells • Elliptotypes (adaptive cluster dimensionality)

  25. Subtractive Clustering Method Objective:Describe a time series in terms of significant events or epochs Approach: • Stepwise iterative determination of interesting structures • “Good clusters” rather than “good clusterings” • Clusters may overlap • Multiple, domain-specific, models of significant structures • Nonlinear constrained-optimization techniques • Optimal Fitness subject to Size and Extent Constraints • Clusters are local constrained optima of the fitness function • Minimization of functional describing quality of fitness of line over a fuzzy (trapezoidal) interval subject to size constraints • Penalty-function Approach • GA Tournament Domination Selection Algorithm

  26. FEATURE IDENTIFICATION ALGORITHM RELATIONS OF INTEREST SUMMARIZATION ALGORITHM QUALITATIVE DESCRIPTION APPROACH MODELS OBJECT

  27. Localized, Non-dominated, Solutions • All local maxima • Multiple objectives (Effective Frontier)

  28. Genetic Algorithm for Feature Identification • Multiobjective Optimization (Quality of Fit, Extent) • Multiple Models of Interesting Features • Supports Complex Model Definitions (MV Logic/Approximate Matching) • NLP Genetic Algorithm (extension of GA by Horn, Napfliotis, and Goldberg): • Localized: Solutions are not dominated by their neighbors (i.e., a generalization of the notion of local maxima) • Niched:The algorithm clusters candidate solutions to isolate each generalized “peak” in the multimodal distribution • Pareto:Multiple objectives define a notion of dominance based on separate consideration of all objectives • Tournament Selection:Comparison between randomly chosen population pairs to simulate selective pressure • Sharing:Procedure to promote diversity in solution space • Exhaustive:Seeks to find all solutions in the “localized” effective frontier • Special Genetic Operators: T-norm/conorm-based crossover operators, FP-based mutation operator

  29. NLP GENETIC ALGORITHM Initialization Old Population Genetic Selection Genetic Operations Random Selection: Candidate-1, Candidate-2, Comparison Set Crossover Dominant Comparison Mutation no Sharing Winner? yes New Population

  30. Example of Model Definition (Downtrends) For all peaks in epoch, peak(i) t peak(i+1) and For all valleys in epoch, valley(i) tvalley(i+1)

  31. Examples of Feature Extraction (Uptrends)

  32. Example of Feature Extraction (Downtrends)

  33. Example of Feature Extraction (Triangular and Rectangular Patterns)

  34. Example of Feature Extraction (Head & Shoulders)

  35. Visualizing the Effective Frontier (i,I) Diagram - All Intervals (i,I) Diagram - Final Intervals S-Q Diagram - All Intervals S-Q Diagram - Final Intervals

  36. Heuristic Summarization Algorithm • Merging of suboptimal intersecting results(unnecessary if enough NLP generations produced) • Eliminate “approximately” dominated solutions(i.e., deletion of neighbors of lower quality) • Hierarchical organization by approximate inclusion • Removal of conflicts in multimodel applications (where NLP has been run separately for each model)

  37. Example of Output of GA Algorithm

  38. Pruning and Summarization

  39. Hierarchical Organization by Inclusion

  40. Hierarchical Organization of Summarized GA Output(Uptrends, Downtrends, H&S)

  41. Biological Sequence Description • Description of Repetitive Elements • Genome of Trypanosoma Cruzii • Short Interspersed Repetitive Element (SIRE) • GOAL: Identify all interesting alignments Pattern: TTTTATT ---------TTTTTATTTTT----TTATT----- TTTAAAATATTTTTATTTTTAAAATTATTTTATT ----TT---TTT---A-TT- ---------TATT ACGTTTCGGTTTCCACCTTG TGCTTAAATTAT-

  42. DNA Feature Classification

  43. Gene Expression From I. Zwir (with permission)

  44. Linked Data Objects O5: Person O1: Person R5: Owner_of R2: Name_of R4: Paid O7: Institution R1: Friend O3: Name R6: Drawn_at R3: Received O4: Payment O6: Fin_Instrument O2: Person

  45. Patterns and Data are expressed through logical formulas that: Define Patterns Specify known information as partially-specified, database entries Patterns and Data Patterns: Data: p(UID345), q(UID345, UIF348), r(UID345, UID675, 500), ….

  46. Data and Pattern Structure PATTERN Similarity Functions Sp Sq Sr Sv DATA

  47. Relational Data as Triples • N-ary relations as set of triples • Triples:(Type, PMY001, Payment) (Date, PMY001, 1 Oct 1888) (Type, PER234, Person) (Name, PER234, “David Copperfield”) (Paid-by, PMY001, PER234) . . . (Transaction-type, PMY003, Stock-Transfer) . . .

  48. Representing Data as Triples(Precise DB)

  49. Conceptual Approach • Data is represented as a set of triples • Logical conjunction of corresponding logical clauses

  50. Conceptual Approach • Data is represented as set of Triples • Logical conjunction of corresponding logical clauses • Pattern is a logical expression that may or may not be satisfied by the Data

More Related