380 likes | 514 Views
MetaData, Objects, Relations: Similarities and Differences and Cognitive Aspects of Categorization. SIMS 202, Lecture 10 Fall, 1997 Prof. Marti Hearst. Why are we learning about metadata database design object oriented systems? How are these related to one another?
E N D
MetaData, Objects, Relations: Similarities and DifferencesandCognitive Aspects of Categorization SIMS 202, Lecture 10 Fall, 1997 Prof. Marti Hearst SIMS 202, Marti Hearst
Why are we learning about metadata database design object oriented systems? How are these related to one another? How are these different from one another? Why is it hard to define/design these things? What cognitive science is What cogsci tells us about categorization Today: Four Related Questions UCB SIMS 202
Information organization These are all ways to handle complexity, by imposing structure and order on messy data Each is useful in a different way Why are we learning about metadata, database design and OO systems? UCB SIMS 202
Let’s start with a re-description of objects. Objects are instantiated classes Classes are have attributes Attribute is the TYPE of information (kind of like a data type in a programming language) Attributes have VALUES that fit their TYPE attribute TYPE: integer, VALUE: 9 attribute TYPE: suit, VALUE: club, heart, spade, diamond attribute TYPE: name, VALUE: Juanita, Dekai, Laura How is the Relational Model Related to the Object Oriented Model? UCB SIMS 202
How do we make this distinction? Say we are clothing manufacturers. Fur is a class Animal is an attribute Say we are naturalists. Animal is a class Fur is an attribute Attributes vs. Classes UCB SIMS 202
Class Fur Animal: fox, rabbit, sable Color: red, black, white Texture: silky, thick, coarse Garment_type: coat, stole, hat Class Animal Outer_Covering: fur, skin, scales Number_of_limbs: 4, 6, 8 Circulatory_System: cold_blooded, hot_blooded Garment Makers vs. Naturalists UCB SIMS 202
This example showed that one user’s classes are another user’s attributes. Attributes vs. Classes UCB SIMS 202
Let’s Revisit Relations • Table contains rows of data • The data has attribute types • Can perform certain operations: • select (pick out rows) • project (pick out columns) • join (match up 2 or more tables’ data) • add (add a new row) • delete (delete a row) • update (change a value within a row) UCB SIMS 202
ER Diagram: Entity = Class Attribute = Attribute Relation ~ Method Relational Table Table ~ Class Row ~ Instantiated Object of Class Column = Attribute TYPE Value in (row,column) = Attribute VALUE Name ~ Primary Key Relations vs. Objects UCB SIMS 202
There are no Class-specific Methods in the Relational Model There are general-purpose methods on all data: update (change), select, delete, add, join, project The Relation in the ER diagram indicates how to set up the tables so they can be easily joined There is no unique Object Id (Address) in the Relational Model Can only access an “instantiated object” by combinations of its “attribute values” Normalization can cause the object representation to be spread out across several tables No encapsulated data in the Relational Model Relations vs. Objects UCB SIMS 202
Class Fur Animal: fox, rabbit, sable Color: red, black, white Texture: silky, thick, coarse Garment_type: coat, stole, hat Class Garment Material: fur, cotton, wool Color: red, black, brown, white, blue Garment_Type: coat, stole, hat Problem: match color to material Garment Maker vs. Garment Maker UCB SIMS 202
Class Garment Material: Class Fur Animal: fox, rabbit, sable Color: red, black, white Texture: silky, thick, coarse Class Cotton Color: red, blue, white, brown, black Thread_Count: 100, 200 Garment_type: stole, coat, hat, t-shirt Attributes often must be nested Alternative: two subclasses of Garment Nesting Attributes and Classes UCB SIMS 202
In the Relational Model, Normalization “flattens out” the Nesting Why? Normalization makes certain kinds of access more efficient, less likely to mess up updates Why isn’t this confusing in the OO model? Key: Relational and OO used in different situations Normalization and Nesting UCB SIMS 202
Objects Nomads, doing their own thing, rugged individualists, use one-at-a-time Example: program running on a printer Relations: Packed into apartments, lots and lots of items all being lined up in one place for easy comparison Queries: Find all X that have Y and are > Z Example: Everyone’s phone bill in the U.S. Relations vs. Objects UCB SIMS 202
Can you have a table of objects? Can you have an object that has a table? Relations vs. Objects UCB SIMS 202
MetaData like the Dublin Core is simple Much like the name, attribute parts of a Class No methods MetaData like AACRII is messier A bunch of rules about how to deal with the exceptions Law deals quite a bit with exceptions Computer Science tries as hard as possible to abstract away or ignore exceptions Metadata vs. Objects UCB SIMS 202
The computer science tradition is good at abstracting away details. The computer science tradition is not good at describing detail and convoluted exceptions. The library tradition can teach us something useful about how to describe complex data. Think about how these bibliographic examples can be applied to other domains (maybe a test question!!!) Why are we learning about this old library stuff? UCB SIMS 202
Relational model makes use of Metadata The description of the database is often called a Schema The Schema is a kind of Metadata description Main differences: Exceptions not handled well in the relational model either Relational model focus is on the system design Metadata focus is on the description of the data, independent of a computer system Metadata vs. Relational Model UCB SIMS 202
These are all variations on Categorization Categorization is an important topic in: Philosophy Language/Linguistics Psychology How does the human mind do categorization? Fresh Topic: Why is this Stuff Hard? UCB SIMS 202
“A sentence is not a verbal snapshot or movie of an event. In framing an utterance, you have to abstract away from everything you know, or can picture, about a situation, and present a schematic version which conveys the essentials. In terms of grammatical marking, there is not enough time in the speech situation for any language to allow for the marking of everything which could possibly be significant to the message.” Dan Slobin, in Language Acquisition: The state of the art, 1982 What’s In a Sentence? UCB SIMS 202
Defining attributes A weak approximation to meanings and concepts Defining methods A weak approximation to how these meanings interact and change Necessary and Sufficient Conditions Example: A prime number is an integer divisible only by itself and 1. Approximating Meaning UCB SIMS 202
Family Resemblance: Members of a category may be related to one another without all members having any properties in common that define the category. Centrality: Some members of a category may be “better examples” of that category than others. Properties of Categorization UCB SIMS 202
A category: Prime Numbers Definition: An integer divisible only by itself and 1 Examples: 1, 2, 3, 5, 7, 11, 13, 17, … A very clear-cut category. Or is it? Can one number be “more prime” than another? CENTRALITY: some members of a category may be “better examples” than others Centrality UCB SIMS 202
Famous example by Wittgenstein Classic categories: clear boundaries defined by common properties Counterexample: Game No common properties shared by all games card games, ball games, Olympic games, children’s games competition: ring-around-the-rosie skill: dice games luck: chess No fixed boundary; can be extended to new games video games Alternative: Concepts related by Family Resemblances Definition of Game UCB SIMS 202
Perceived degree of category membership has to do with which features define the category. Members usually do not have ALL the necessary features, but have some subset. Those members that have more of the central features are seen as more central members. People have conceptions of typical members. Characteristic Features UCB SIMS 202
Basic-level Categories: Categories are organized into a hierarchy from the most general to the most specific, but the level that is most cognitively basic is “in the middle” of the hierarchy Basic-level Primacy: Basic-level categories are functionally primary with respect to factors including ease of cognitive processing (learning, reasoning, recognition, etc). Properties of Categorization UCB SIMS 202
Brown 1958, 65, Berlin et al., 1972, 73 Folk biology: unique beginner: plant, animal life form: tree, bush, flower generic name: pine, oak, maple, elm specific name: Ponderosa pine, white pine varietal name: western Ponderosa pine No overlap between levels Level 3 is basic Level 3 corresponds to genus Levels of Abstraction UCB SIMS 202
Language People name things more readily at basic level Name learned earliest in childhood Languages have simpler names at basic level Sounds like the “real name” Name used more frequently Strange to call a dime a coin, a metal object Names used in neutral context There’s a dog on the porch. There’s a terrier on the porch. Characteristics of Basic-level Categories UCB SIMS 202
Concepts Things perceived more wholistically at basic level (rather than by parts) No difference in how people interact with the concept between basic and more specific levels Things are remembered more readily at basic level Folk biology categories correspond accurately to scientific biological categories only at the basic level Characteristics of Basic-level Categories UCB SIMS 202
SUPERORDINATE animal furniture BASIC LEVEL dog chair SUBORDINATE terrier rocker Children take longer to learn superordinate Superordinate not associated with mental images or motor actions Superordinate and Subordinate Levels UCB SIMS 202
Some categories have clear boundaries, but have graded membership What is a good example of a bird? Examples from language: A robin is a bird. A chicken is a bird. A bat is a bird. Takes longer for people to say the second is true and the third is false Features characterize the category How many typical features does the object possess? Typicality and Characteristic Features UCB SIMS 202
Is a cat on a mat at cat? Is a dead cat a cat? Is a photo of a cat a cat? Is a cat with three legs a cat? Is a cat that barks a cat? Is a cat with a dog’s brain a cat? Is a cat with every cell replaced by a dog’s cells a cat? Characteristic Features UCB SIMS 202
Most words have more than one sense that dog has floppy ears good ear for jazz three ears of corn Homonymy: same word, different meaning Polysemy: different senses of same word Polysemy UCB SIMS 202
Category membership is determined by shared subsets of features Different senses of a word reflect differences in which attributes are shared This is reflected in language by polysemy related meaning, but slightly different Example: bank the building, the institution, the notion of where money is stored Category Structure and Polysemy UCB SIMS 202
Use one aspect of something to stand for the whole The building stands for the institution of the bank. Newscast: “The White House relased new figures today.” Waitperson: “The ham sandwich spilled his drink.” Metonymy UCB SIMS 202
Different ways of expressing related concepts Examples cat, feline, Siamese cat Overlaps with basic, subordinate level Synonyms are almost never “true” used in different contexts have different implications This is a point of contention. Synonymy UCB SIMS 202
Polysemy: same word, different senses of meaning slightly different concepts expressed similarly Synonyms: different words, related senses of meanings different ways to express similar concepts Thesauri help draw all these together Thesauri UCB SIMS 202
Processes of categorization underlie many of the issues having to do with information organization Categorization is messier than our computer systems would like Human categories have graded membership, consisting of family resemblances. Family resemblance is expressed in part by which subset of features are shared It is also determined by underlying understandings of the world that do not get represented in most systems Summary UCB SIMS 202