800 likes | 987 Views
Thoughts on Social Tagging. Marti Hearst UC Berkeley. Taxonomy Bootcamp ’07 Keynote Talk. Outline. What are Tags? Organizing Tags for Navigation Facets and faceted navigation How to (semi)automatically create facet hierarchies What’s up with Tag Clouds?. Social Tagging.
E N D
Thoughts on Social Tagging Marti Hearst UC Berkeley Taxonomy Bootcamp ’07 Keynote Talk
Outline • What are Tags? • Organizing Tags for Navigation • Facets and faceted navigation • How to (semi)automatically create facet hierarchies • What’s up with Tag Clouds?
Social Tagging • Metadata assignment without all the bother • Spontaneous, easy, and tends towards single terms • Usually used in the context of social media
The Tagging Opportunity • At last! Content-oriented metadata in the large! • Attempts at metadata standardization always end up with something like the Dublin Core • author, date, publisher, … • I’ve always thought the action was in the subject metadata, and have focused on how to navigate collections given such data.
The Tagging Opportunity • Tags are inherently faceted ! • It is assumed that multiple labels will be assigned to each item • Rather than placing them into a folder • Rather than placing them into a hierarchy • Concepts are assigned from many different content categories • Helps alleviate the metadata wars: • Allows for both splitters and lumpers • Is this a bird or a robin • Doesn’t matter, you can do both! • Allows for differing organizational views • Does NASCAR go under sports or entertainment? • Doesn’t matter, you can do both!
Tagging Problems • Tags aren’t organized • Tags don’t attempt exhaustive coverage • Different tags for the same meanings • Morphological variants (airplane, airplanes) • Lexical variants (sf, sanfrancisco, san francisco) • Synonyms (boat, ship) • See how this author attempts to compensate:
Tagging Problems / Opportunities • Some tags are fleeting in meaning or too personal • toread todo • Tags are not “professional” • (I personally don’t think this matters) • Great example from Trant: • "Anecdotal evidence also shows that ‘professional’ cataloguers find the basic description of visual elements surprisingly difficult: a curator exhibited significant discomfort during this description task. When asked what was wrong, he blurted out "everything I know isn't in the picture". Investigating social tagging and folksonomy in the art museum with steve.museum", J. Trant, B. Wyman, WWW 2006 Collaborative Tagging Workshop
Investigating social tagging and folksonomy in the art museumwith steve.museum", J. Trant, B. Wyman, WWW 2006 Collaborative Tagging Workshop
What about Browsing? • I think tags need some organization • Currently most tags are used as a direct index into items • Click on tag, see items assigned to it, end of story • Co-occurring tags are not shown • Grouping into small hierarchies is not usually done • del.icio.us now has bundles, but navigation isn’t good • IBM’s dogear and RawSugar come the closest • I think the solution is to organize tags into faceted hierarchies and do browsing in the standard way
The Idea of Facets • Facets are a way of labeling data • A kind of Metadata (data about data) • Can be thought of as properties of items • Facets vs. Categories • Items are placed INTO a category system • Multiple facet labels are ASSIGNED TO items
The Idea of Facets • Create INDEPENDENT categories (facets) • Each facet has labels (sometimes arranged in a hierarchy) • Assign labels from the facets to every item • Example: recipe collection Ingredient Cooking Method Chicken Stir-fry Bell Pepper Curry Course Cuisine Main Course Thai
The Idea of Facets • Break out all the important concepts into their own facets • Sometimes the facets are hierarchical • Assign labels to items from any level of the hierarchy Preparation Method Fry Saute Boil Bake Broil Freeze Desserts Cakes Cookies Dairy Ice Cream Sorbet Flan Fruits Cherries Berries Blueberries Strawberries Bananas Pineapple
Using Facets • Now there are multiple ways to get to each item Preparation Method Fry Saute Boil Bake Broil Freeze Desserts Cakes Cookies Dairy Ice Cream Sherbet Flan Fruits Cherries Berries Blueberries Strawberries Bananas Pineapple Fruit > Pineapple Dessert > Cake Preparation > Bake Dessert > Dairy > Sherbet Fruit > Berries > Strawberries Preparation > Freeze
Advantages of Faceted Navigation • Systematically integrates search results: • reflect the structure of the info architecture • retain the context of previous interactions • Gives users control and flexibility • Over order of metadata use • Over when to navigate vs. when to search • Allows integration with advanced methods • Collaborative filtering, predicting users’ preferences
Advantages of Faceted Navigation • Can’t end up with empty results sets • (except with keyword search) • Helps avoid feelings of being lost. • Easier to explore the collection. • Helps users infer what kinds of things are in the collection. • Evokes a feeling of “browsing the shelves” • Is preferred over standard search for collection browsing in usability studies. • (Interface must be designed properly)
Incorporating Tags into Library Catalogs • I think this is where semi-automated techniques for tag conversion will be most helpful. • Some libraries are already going this route: • Michigan State University Library: • http://discover.lib.msu.edu/iii/encore/app • Scottsdale Public Library • http://libcat.scottsdaleaz.gov/ • (search within the Encore box)
One attempt: RawSugar • A company/website that organizes bookmark tags into facet hierarchies • Current demo is sparse
(Stoica & Hearst, HLT-NAACL ’07) CastaNet:Creating Facet Hierarchies from Text
Example: Biology Journal TitlesCastanet Output (shown in Flamenco)
Build tree Compress tree Select terms Get hypernym paths WordNet Divide into facets Castanet Algorithm • Leverage the structure of WordNet Documents
Select well distributed terms from collection red blue 1. Select Terms Build tree Comp. tree Documents Select terms Get hypernym paths WordNet
Build tree Comp. tree Documents Select terms Get hypernym paths abstraction abstraction property property WordNet visual property visual property color color chromatic color chromatic color red, redness blue, blueness 2. Get Hypernym Path red blue
abstraction abstraction abstraction property property property visual property visual property visual property color color color chromatic color chromatic color chromatic color red, redness blue, blueness red, redness blue, blueness red blue 3. Build Tree Build tree Comp. tree Documents Select terms Get hypernym paths WordNet red blue
color chromatic color red blue green 4. Compress Tree Build tree Comp. tree Documents Select terms Get hypernym paths WordNet color chromatic color red, redness blue, blueness green, greenness red blue green
4. Compress Tree (cont.) Build tree Comp. tree Documents Select terms Get hypernym paths WordNet color color chromatic color red blue green red blue green
5. Divide into Facets Divide into facets
2 paths for same word Sense 2 for word “tuna” organism, being => fish => food fish => tuna => bony fish => spiny-finned fish => percoid fish => tuna Sense 1 for word “tuna” organism, being => plant, flora => vascular plant => succulent => cactus => tuna 2 paths for same sense Disambiguation • Ambiguity in: • Word senses • Paths up the hypernym tree
How to Select the Right Senses and Paths? • First: build core tree • (1) Create paths for words with only one sense • (2) Use Domains • Wordnet has 212 Domains • medicine, mathematics, biology, chemistry, linguistics, soccer, etc. • Automatically scan the collection to see which domains apply • The user selects which of the suggested domains to use or may add own • Paths for terms that match the selected domains are added to the core tree • Then: add remaining terms to the core tree.
Castanet Evaluation Method • Information architects assessed the category systems • For each of 2 systems’ output: • Examined and commented on top-level • Examined and commented on two sub-levels • Also compared to a baseline system • Then comment on overall properties • Meaningful? • Systematic? • Likely to use in your work?
CastaNet Evaluation Results • Results on recipes collection for “Would you use this system in your work?” • # “Yes in some cases” or “yes, definitely”: • Castanet: 29/34 • LDA: 0/18 • Subsumption: 6/16 • Baseline: 25/34 • Average response to questions about quality(4 = “strongly agree”)
Will Castanet Work on Tags? • Class project by Simon King and Jeff Towle, 2004 • 1650 captions captured from mobile phones • “Blocks with Grandpa”, “Weezer” , “A veterans day tour of berkeley in front of south hall.”, “Bad photo”, “Kitchen”, “Jgj ” • Wanted to organize them. • Use the CastaNet wordnet-based facet-hierarchy creation algorithm • by Stoica & Hearst, to appear at HLT-NAACL ’07 • Had to first remove proper names
Example Photos & Captions (King & Towle) very scary x-mas tree Hp presentation chasing a cat in the dark My cat
instrumentality, (112) vehicle (26) car (9) bike (8) vessel, watercraft (4) mayflower (2) ferry (1) gig (1) truck (3) airplane (2) device (20) machine (7) computer (4) laptop (1) sander (1) game (8) auction (1) skittles (1) diversion, recreation (6) athletic game (4) baseball (1) basketball (1) football (1) soccer (1) playing (2) frolic (1) container (16) vessel (7) bottle (5) water_bottle (2) jug (1) pill_bottle (1) bath (2) bowl (1) can (2) backpack (1) bumper (1) empty (1) salt_shaker (1) furniture, piece of furniture, article of furniture (12) seat (8) bench (2) chair (2) couch (2) lounge (1) bed (4) desk (1)
Next Steps • We need more analysis of: • What characterizes tags? • What makes for useful tags? • This will support automatic tag assignment and organization • Other Research Questions: • How can the interface encourage consistency, coherence, and coverage? • How to get tag expertise? • Right now, in many cases it is least-common-denominator
What’s up with Tag Clouds? What does a typical tag cloud look like?
Definition Tag Cloud: A visual representation of social tags, organized into paragraph-style layout, usually in alphabetical order, where the relative size and weight of the font for each tag corresponds to the relative frequency of its use.
Definition Tag Cloud: A visual representation of social tags, organized into paragraph-style layout, usually in alphabeticalorder, where the relative size and weightofthefont for each tag correspondsto the relative frequency of its use.