470 likes | 601 Views
Boolean, bibliometrics, and beyond. Part 1. LIS 670 donna Bair-Mundy. Our roadmap. Boolean. Boolean exercises. Fuzzy sets. Bibliometrics. Boolean. Boolean algebra. Developed by George Boole, an English mathematician, circa 1850 Set theory Boolean logic is binary
E N D
Boolean, bibliometrics, and beyond Part 1 LIS 670 donna Bair-Mundy
Our roadmap • Boolean • Boolean exercises • Fuzzy sets • Bibliometrics
Boolean algebra • Developed by George Boole, an English mathematician, circa 1850 • Set theory • Boolean logic is binary • Widely used in electronic design • Widely used in information retrieval systems
Two ways of defining a set Enumeration (listing the elements) A = {1, 2, 3, 4, 5} • Specification of a distinguishing property all elements of the set have in common • B = {x | x is a prime number}
Set operators (1) Given sets A = {1, 2, 3, 4, 5} B = {1, 3, 5, 7} C = {6, 7, 8} Union - produces a set containing all members of both operand sets A C = {1, 2, 3, 4, 5, 6, 7, 8} Set Set Resultant Set
Set operators (2) Given sets A = {1, 2, 3, 4, 5} B = {1, 3, 5, 7} C = {6, 7, 8} Intersection - produces a set containing members in the first set that also occur in the second set A B = {1, 3, 5}
Set operators (3) Given set A = {1, 2, 3, 4, 5} Complement - produces a set containing all members of the universal set that are not a member of the operand set If D is the universal set of all positive integers, then: A = {6, 7, 8, …}
Boolean operators Words and symbols to denote set operators. Boolean OR AND NOT Set Theory Union Intersection Complement Symbol - Algebraic symbol + * -
Algebraic operations on sets Given sets A = {1, 2, 3, 4, 5} B = {1, 3, 5, 7} C = {6, 7, 8} A * (B + C) = A * B + A * C AAND(BORC) {1,2,3,4,5}AND{1,3,5,6,7,8} 1, 3, 5 (A AND B)OR (A AND C) {1,3,5} OR {null} 1, 3, 5 = = =
Venn diagrams John Venn Charles Dodgson Set 1 Set 2 Set 3
Venn diagram - OR Poodles Retrievers Poodles OR Retrievers yields all documents about either poodles or retrievers
Venn diagram - AND Poodles Retrievers Poodles AND Retrievers yields all documents that deal with both poodles and retrievers
Venn diagram - NOT Poodles Retrievers Poodles NOT Retrievers yields documents about poodles but not about retrievers
Venn diagram - Exclusive OR Poodles Retrievers Poodles XOR Retrievers yields all documents that deal with either poodles or retrievers but not both
Rules of precedence Complementation (NOT) Intersection (AND) Union (OR) dogsORcatsANDfleas will be read as dogsOR (catsAND fleas)
Specifying order of performance (dogs OR cats) AND fleas
Boolean arithmetic Set A Set B Find A AND B * Set A Set B Retrieve? Set A Set B Retrieve? Yes Yes Yes 1 1 1 Yes No No 1 0 0 No Yes No 0 1 0 No No No 0 0 0
Boolean searching: advantages Ideally suited for inverted file indexes - each index entry and set of pointers constitutes a set Cats 1,3,7,9,13 Dogs 2,5,6,15 Fleas 6,7,9,17 Gnus 19,27 Guppies 4,14,18 Hamsters 22,25,31 Allows user to broaden (using OR) or narrow (using AND, NOT) searches
The Scenario – part I You are the librarian at the Happy Broccoli School of Culinary Arts. Chief Kweezee is planning the menus for this week’s demonstrations. He comes to the reference desk and asks you to search the recipe database for him.
The Scenario – part II The search command for this database is FIND followed by key words. The system accommodates Boolean operators and allows parentheses.
The Scenario – part III To impress the chef, who stays to watch you search, you formulate a single search statement for each menu.
Sample record Sample Boolean exercise Menu Cuisine: Mexican Title: Enchilada Ingredients: Corn tortillas, tomato sauce, chili peppers, beans, onions, garlic, cilantro… Mexican Cuisine Enchilada Refried beans Search statement FIND mexican AND (enchilada OR (refried and beans))
FIND mexican AND (enchilada OR (refried and beans)) refried enchilada enchilada mexican beans
Exercise 1 Menu Mexican Cuisine Mexican casserole Tostada FIND
Exercise 2 Menu Italian Cuisine Pasta with grilled artichoke hearts Baked garlic FIND
Menu Exercise 3 Greek Cuisine Vegetarian moussaka Greek salad featuring kalamata olives FIND
Exercise 4 Menu Chinese Cuisine Hot and sour soup Fried eggplant Tofu and broccoli dish FIND
Exercise 5 Menu Indian Cuisine Eggplant curry Samosa Raita Tamarind sauce FIND
Boolean searching: disadvantages (1) • Counterintuitive • AND retrieves fewer items • Two-valued logic - items meet criteria or they do not • Good for computers • Does not reflect user relevancy determinations
Boolean searching: disadvantages (2) Research topic: Digital music libraries Documents Current research on digital music libraries Introduction to digital libraries Information architecture in the digital environment Libraries of ancient Babylonia
Binary versus fuzzy sets Test each record against query Ri = any record Q = user query Yes or no: retrieved or not Binary set S(Ri x Q) 0,1 Retrieval set for query Q is all records Ri such that S(Ri x Q) = 1 Brackets indicate range Fuzzy setS(Ri x Q)[0,1] S expresses not whether or not R is in the set but the degree of strength of the association of R with the set.
Fuzzy set highly relevant non-relevant 1 0
FIND Agni Vedic fire ritual highly relevant non-relevant 0 1 Analysis of the AgniVedicfireritual (1) Characteristics of Agni (0.25 Structural analysis of a Vedicfireritual (0.75) Analysis of a fireritual of India (0.5)
Implementing fuzzy sets (1) User enters list of words FIRE, RITUAL, SACRIFICE Retrieval system examines each record or document in the database Computes score by number of query words that appear in the document System presents ordered list of documents, along with their scores
Implementing fuzzy sets (2) • FIRE, RITUAL, SACRIFICE RankTitle 100% The firesacrificeritual of early Vedic period India 66% Fire and sacrifice in proto-Indo-European society 33% How to build a fire the Girl Scout way
Implementing fuzzy sets (3) User enters list of words FIRE, RITUAL, SACRIFICE Retrieval system examines each document or record in the database, computing score for that item by adding 1 for each time any of the words on the user's list appears in the document or record System presents ordered list of documents, along with their scores
Implementing fuzzy sets (4) • FIRE, RITUAL, SACRIFICE Rank Document The firesacrificeritual of early Vedic period India. The firesacrificeritual is one of many sacrificerituals observed as being performed… Fire and sacrifice in proto-Indo-European society. Discussion of the role of fire… How to build a fire the Girl Scout way. Demonstrates fire building…
Fuzzy sets in Voyager (1) • Agni, vedic
Fuzzy sets in Voyager (2) • Agni, vedic
Fuzzy sets in Voyager (3) • Agni, vedic
Fuzzy sets in Voyager (4) • Agni, vedic
Fuzzy sets in Voyager (5) • Agni, vedic
Field-weighting terms Terms weighted by fields in which they occur Title: Winning-induced euphoria in tiddlywinks players 5 5 Descriptors: Euphoria; Tiddlywinks Abstract: The authors studied the brain waves of 175 tiddlywinks winners and found euphoria induced by winning lasted an average of 3 hours. 2 Text: Researchers have long held that tiddlywinks, unlike other sports, do not induce a significant affective… 1
User-weighting terms Terms weighted at time of search by user * Weighted term