370 likes | 388 Views
information retrieval. mon feb 08 2016 data… & information organization. SPSS Workshop in Odum…. Monday, February 29 2:00 – 3:30 pm Davis Library, Room 219 (same lab room) introduction to SPSS and teach how to work with data saved in SPSS format no registration required.
E N D
information retrieval mon feb 08 2016 data… & information organization
SPSS Workshop in Odum… • Monday, February 29 • 2:00 – 3:30 pm • Davis Library, Room 219 (same lab room) • introduction to SPSS and teach how to work with data saved in SPSS format • no registration required Anyone need an “SPSS Cheat Sheet”?
info organization activity • in a small group, examine the cards that identify various “documents” in a collection • on the table organize the document surrogates into some sort of schema – grouping by category (like items with like) • choose your own organization scheme and hierarchy • if desired, write on the blank cards to create new or uber categories • be ready to share your organization method with the class
STRUCTUREDvs unstructured data easy to envision structured data in terms of “tables” Employee Manager Salary Smith Jones 68000 Chang Smith 65000 Ivy Smith 50000 Typically allows numerical range and exact match (for text) queries, e.g., Salary < 60000 AND Manager = Smith.
tables in a MS Access relational database – defines each entity in a social networking site
Data entry form in a MS Access relational database – create each record
structured vsUNSTRUCTURED data • typically refers to free text • email is a good example of unstructured data. it's indexed by date, time, sender, recipient, and subject, but the body of an email remains unstructured • other examples of unstructured data include books, documents, medical records, and social media posts
Document collection (corpus) Query Representation function Representation function Matching function Index CATEGORIES SUBJECT HEADINGS Results
KWIC Key word in context
What is Metadata? • Classic definition: data about data • Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource. (NISO) • 3 primary “types”: • Descriptive • Structural • Administrative (rights management, preservation)
How do we organize a collection of “documents” so that users can find what they need?
from Glushko reading… • what three types/forms of categorization does Glushko discuss in the Categorization in the Wild piece? • give a real-world example of a categorization system and briefly describe the purpose behind it (i.e. what problem is it trying to address?)
from Glushko reading… • Cultural categorization • Embodied in culture and language • Acquired implicitly through development via parent-child interactions, language, and experience • Formal education can build on this, but non-formal cultural system can often dominate • Traditional perspective for thinking and research about categorization
From Glushko reading… • Individual categorization • A system developed by an individual for organizing a personal domain to aid memory, retrieval, or usage • Can serve social goals to convey information, develop a community, manage reputation • Have exploded with the advent of social computing, especially in applications based on “tagging” • An individual’s system of tags in web applications is sometimes called a “folksonomy”
From Glushko reading… • Institutional categorization • Systems created to serve institutional goals and facilitate sharing of information and increase interoperability • Helps to streamline interactions and transactions so that consistency, fairness and higher yields can result.
Let’s look at a database of magazine & journal articles…to see how information is organized – with particular attention to value-added SUBJECT TERMS/HEADINGS (categorization) …Academic Search Premier >> UNC Libraries Homepage: http://www.lib.unc.edu/ >> E-Research by Discipline >> Frequently Used >> Academic Search Premier [off-campus log in with onyen/password] Handout Activity #2
info organization & search • We organize to enable retrieval • The more effort put into organizing information, the more effectively it can be retrieved • The more effort we put into retrieving information, the less it needs to be organized first • We need to think in terms of investment, allocation of costs and benefits between the organizer and retriever • The allocation differs according to the relationship between them; who does the work and who gets the benefit?
final notes… • Homework #2: Database report • sign up for a database – or talk with me about suggestion • next Wednesday – 5-min reports in class • Wednesday: “Information Retrieval” intro with Dr. Jaime Arguello (required reading prep) • Wednesday: Data to Story Project – speed date/pitch