540 likes | 688 Views
Who Uses the Online Tobacco Industry Documents? . Martha Michel 1,2 , M.S., Ph.D. Lisa Bero 1,2 , Ph.D. 1 Graduate Group in Biological and Medical Informatics, UCSF 2 Center for Tobacco Control Research and Education, UCSF. What are the Tobacco Industry Documents?.
E N D
Who Uses the Online Tobacco Industry Documents? Martha Michel1,2, M.S., Ph.D. Lisa Bero1,2, Ph.D. 1Graduate Group in Biological and Medical Informatics, UCSF 2Center for Tobacco Control Research and Education, UCSF
What are the Tobacco Industry Documents? • As a result of the Master Settlement, millions of internal tobacco industry documents were released onto the Internet (legacy.library.ucsf.edu) • The documents contain memos, scientific reports, faxes, emails, budgets, etc… • The documents include information about scientific research, manufacturing, marketing, advertising and sales of cigarettes, and more
Document Collections • Legacy document depository at UCSF • 5 million documents • About 32 million pages and growing • 1.5 terabytes • Guilford document depository • 8 million British American Tobacco documents • About 32-40 million pages • UCSF has 13,000 documents which have been manually indexed. • Industry websites – PM, Lorillard, B&W, RJR • Other collections – Tobacco Documents Online, CDC tobacco industry documents
Difficulties of searching the documents • No OCR available for searching the full text • Variations in spelling and problems when names suddenly change • Duplicates • Vast quantities of information • No or varied indexing • Unknown recall and low precision Malone RE, Balbach ED. Tobacco industry documents: treasure trove or quagmire? Tobacco Control 2000;9(3):334-8.
Prior Studies of Who Searches • Different types of groups used the paper documents depositories (i.e. lawyers, government officials, researchers, tobacco control advocates, health related fields). • We still don’t know who uses the electronic documents or why they search… • We are currently conducting an online survey of the UCSF Legacy website to examine the use and barriers to searching the existing websites.
Aim 1: Conduct Online Survey • Purpose of survey: • Who uses the documents – (demographics) • Purposes for which documents are used • Barriers to searching the documents and • Suggestions for improving the archives
Methods • Developed and designed survey using Web Surveyor • Conducted pilot test of survey - N=14 • Launched surveys in November 2002 • 2 surveys – one on TCA, one on Legacy Tobacco Control Archives (n=50) Legacy Tobacco Control Documents Collection (n=22)
Text under “other” • I would like more structure on how to work the music sight. (4) • some stuff • its okay • telling about schools • direct assistance • links to student lead orgainization programs against tobacco • more categorization • I found your site useful. I had to fill out a worksheet from Health class and move along wite the website, but there a few things I could not find. Maybe it was the worksheet not the site, but overall your site helped me out. Thank you • love it • Why do I have do this survey its slowing me down? • This site is cool I think • The site is wonderful and very usefull. I want to congratulate the authors for the wonderful job. • I´d like more if smuggling was in better view.
Text from Other response • full textmanipulation of saved sets: more bookbag features • ability to search within retrieved set • more 'popular documents' type oflinks. • OCR • It would be great to have a quick way to search only for ads like Philip Morris offers from its advertising archives search engine • don't know • first visit, I don't know yet • fix the bookbag problem • quick search box right at top of home page bypassing other pages.Display list of other bates in a set from which the one comes. • Nested searchingSlightly larger font • Full Boolean search capacity;more than six search term limit;feedback on when user errors are syntactical (as PM gives);not having to toggle back and forth between long and short displays;master ID numbers in the display;OCR capacity--not only would it be fantastic to be able to search the text of the document, it would be invaluable to be able to cut and paste text from the documents into a word processor. • A better search engine....you seem to have more documents than Tobacco Documents Online, but when I use the same search terms your search engine tells me it doesn't find any....when TDO finds over 100!
Text from Other response • print them out; download to pc. • If there are useful documents in a search, I print out the list, then download and print out the useful documents, numbering them with the number on the list. I file them chronologically by theme. • by dateby topic and correspondence • First by theme (subject) and then by organization/corporation, and/or date. • first visit, I don't know yet • I e-mail them to my eudora account and search it when I want a citation. • I look for a doc at your site, then go to pmdocs.com or the like, type the bates, and pull the description. Then I type the first bates from master file, and this way I get the set of documents with context. Sometime, the same document is in a few different sets! Then I get back to you to download it, or to cross reference with other collections (say, TI). • I'm not sure I understand the difference between this question and the previous one. I use Endnote, if that's the question. • I wish I had a consistent way. Can you conduct a seminar showing suggestions? • Prefer to collect paper documents and arrange them in files that mirror the files they originally came from and/or dates and or events within a date range
Aim 2: Add the British American Tobacco documents to the Flamenco interface
The Tobacco Flamenco • 13,000 British American Tobacco documents have been “Flamencoized” • More documents are to be indexed as they arrive from Guilford, England and the industry websites