290 likes | 441 Views
Ubiquity needs authors: Understanding and Supporting Knowledge Contributors in Document / Metadata Servers. Bettina Berendt Humboldt University Berlin, Institute of Information Systems www.wiwi.hu-berlin.de/~berendt. Motivation: ubiquity – access and contribution.
E N D
Ubiquity needs authors: Understanding and Supporting Knowledge Contributors inDocument / Metadata Servers Bettina Berendt Humboldt University Berlin, Institute of Information Systems www.wiwi.hu-berlin.de/~berendt
Motivation: ubiquity – access and contribution • „Ubiquity“ is our vision of a Web for everyone, everywhere • This involves not only that everybody, everywhere can access the Web, but also that they can (& hopefully do) contribute to it • „Learning“ is an important application domain: education and knowledge • should be available to everyone • require active participation (= contribution of one‘s knowledge) – cf. constructivism / constructionism *, communities of practice, ... • To create ubiquity, we need recommendations and tools for the design for ubiquitous access & contribution • at the macro level (cf. Ricardo Baeza‘s talk at this workshop) • at the micro level of the individual author this talk *) for a discussion, see http://learning.media.mit.edu/content/publications/EA.Piaget%20_%20Papert.pdf
This talk is ... ... an overview of a project concerning 2. & 3., with an outlook on addressing 1. Main sources: • Berendt, B. (2005). Understanding and Supporting Volunteer Contributors: The Case of Metadata and Document Servers. In Knowledge Collection from Volunteer Contributors. Papers from the AAAI 2005 Symposium (http://teach-computers.org/kcvc05.html) • presentation & discussion at the Web Mining Workshop in Barcelona, 02/2005 – see also last slide • discussion at the Challenges in Web Mining Workshop in Madrid, 06/2005
Knowledge contributions: data and metadata <BIBLIOGRAPHY><FLOAT><PAGENUMBER>136</PAGENUMBER></FLOAT> <HEAD>Literaturverzeichnis</HEAD> ... <CITATION WORKTYPE="journal" PUBLISHED="PUBLISHED"> <CUT ID="bib-45-">[2] </CUT><WORKAUTHOR>Albrecht, T. F.; Bott, K.; Meier, T.; Schulze, A.; Koch, M.; Cundiff, S. T.; Feldmann, J.; Stolz, W.; Thomas, P.; Koch, S. W.; Göbel; E. O.</WORKAUTHOR> <ARTICLETITLE>Disorder mediated biexcitonic beats in semiconductor quantum wells</ARTICLETITLE>, <WORKTITLE>Phys. Rev. B</WORKTITLE>, <PUBDATE>1996</PUBDATE>, <NUMBER>54</NUMBER>, <PAGES>4436</PAGES>, </CITATION> ... http://edoc.hu-berlin.de/diml/dtd/xdiml.dtd
Once these data and metadata are there, ... • ... powerful searches are supported in distributed archives of electronic theses and dissertations (ETDs): • generally using OAI metadata harvesting • Examples: • www.ndltd.org • Currently 154 members / repositories (including EDOC) • http://www.cybertesis.net • Currently 27 members / repositories
Background & Questions • authors who publish on EDOC are volunteer contributors • main document type: PhD theses, “Habilitation” theses • advantages: • cost-free publication + high-quality archiving • long-term readability guarantee (50 years) • authenticity & integrity • semantic searchability • but: share of PhD theses published on EDOC has remained at ~20% (13% if medical faculty included) Why do so many authors not contribute? How can authors / contributors be supported better?
What kinds of knowledge can (not) be gathered from volunteer contributors? • Large-scale questionnaire study of authors‘ attitudes, experiences, and wishes • Samples: • All non-medical faculties (study 1, 2003), medical faculty (study 2, 2004) • Questionnaire sent to the whole target group: 1180, 1290 people working on their PhD or postdoctoral (usu. Habilitation) qualification • Response rates: 13.7%, 12% • Respondent characteristics mirror the target population (gender, disciplines, qualification sought)
Knowns and unknowns • Publication on EDOC is a fast & easy way to satisfy the German university publication requirements. • I have learned about EDOC (too) late. • The formatting requirements are difficult. • Don’t publish online because they • don’t wish to? • do not feel capable / perceive barriers? • are unaware of the possibility? EDOC authors (contributors) EDOC non-/not-yet-authors
Method Date: Tue, 11 Mar 2003 From: Yunfan Li To: the edoc survey mailing list Subject: Digital Dissertation Questionnaire for HU Doctoral Students and Doctors Dear doctoral student, dear doctor, Would you please take about 5 minutes to complete the HU Digital Dissertation Questionnaire. The goal of this investigation is to find out how the Digital publishing opportunity is known and used by HU doctoral students and doctors. With your help, we aim to continue to improve the service of the Document and Publication Server (http://edoc.hu-berlin.de). ...
First study (similar results in second study) Do authors not wish to publish online? x NO, BUT...
Do authors perceive barriers to contribution? If so, which one(s)? x • Authors need to format their dissertation using the “dissertation template” to enable DiML markup. • Most authors feel adequately supported. • ~ 20% can’t use and/or don’t need training course. cf. ~ 20% LaTeX users. • But the use of the dissertation template was the most-often described barrier. • In the 2nd study, we asked for template use in education: • 76% : I have never used a template • 94% : Templates has never been discussed in courses I took. METADATA PROVISION
First study (similar results in second study) AND: Few concrete steps taken to-wards a digital diss. publication so far ACTIONS HAPPEN TOO LATE!
First study (similar results in second study) Are authors unaware of the digital diss. publishing opportunity? x • 28.3% wished they had learned earlier about EDOC. YES! ... INFORMATION ARRIVES TOO LATE!
Summary: A marketing and service challenge • Lack of information is a major problem that impedes contribution. • Metadata contribution is perceived as a problem. • Main recommendation: Inform doctoral students when they start their dissertation project • Professors, examination offices, ... as informers & promoters educational challenge: practice structured writing! • Doctoral seminars etc., within or across scientific fields • Stress the advantages and new opportunities of digital publication! information and motivation (experience of competence) • 2nd recommendation: offer more service for formatting
Can useful knowledge collection be deployed as a side effect to users‘ activities, e.g. searching or using the Web? = (1 option) Does the site turn readers into contributors? • Data from Web server log • sample: 10,992 sessions (210,655 requests) from one week in 2003 (near the end of the first survey) • Methods: semantic enrichment, association rule and sequence mining (tools: WEKA, WUM); clustering and classification (not shown in poster)
… … Data preparation: Semantic enrichment TOP HOME AUTHOR SEARCH DOC OTHER HINWEISE OAI FULLTEXT META RESULT DISS OTHER DOC OTHER PROJECT ADVICE LIST MASTER … … DNB ABSTRACT TEMPLATE … AUTHOR ACCESS README KEYWORD CONFERENCE ABSTRACT … FAQ ACCESS … … LATEX PUBLIC READ regexpr.txt: mapping from URLs to concepts HOME edoc\.hu-berlin\.de\/$ AUTHOR-START \/e_autoren_en\/$ DISS-ABSTRACT \/abstract\.php3\/habilitationen\/ AUTHOR-ADVICE \/e_autoren\/hinweise\.php\?nav=.* AUTHOR-ADVICE \/e_rzm\/hinweise\.php.* ... … DIML … STUDY … … CMS
Content co-occurrence • 6. DOC-DISS-ABSTRACT=1 4169 ==> DOC-DISS-ACCESS=1 3107 conf:(0.75) supp:( 0.28) Reading an abstract will make you read a dissertation. lift(DISS-ABSTRACT DISS-ACCESS) = 0.75/0.44 = 1.69 • 2. DOC-DISS-ABSTRACT=1 externalReferrer=search-engine 3227 ==> DOC-DISS-ACCESS=1 2613 conf:(0.81) Even stronger than rule 6 above. (usually: people arriving from Google) • 11. externalReferrer=search-engine 7166 ==> DOC-DISS-ACCESS=1 3851 conf:(0.54) Still some lift, but not as much. • 8. DOC-DISS-ACCESS=1 4866 ==> DOC-DISS-ABSTRACT=1 3107 conf:(0.64) Over 30% of people who access a dissertation have not read an abstract! • 16. externalReferrer=other-external-referrer 1343 ==> OTHER-PROJECT=1 571 conf:(0.43) Lift = 0.43 / (2198/10992) = 2.15 • 17. externalReferrer=other-external-referrer 1343 ==> OTHER-OTHER=1 554 conf:(0.41) Lift = 0.41 / (2311/10992) = 1.95
Popular entry points and first steps “Readers“ go straight to dissertations and stay there.
Paths taken to the dissertation template “Authors“ stay in author content.
Navigation around “other” content Pattern 1 Of the visitors starting at a Public reading paper, 50% will stay within this subject area. The same observation can be made regarding Other-Other and Other-Project. Pattern 2 Pattern 3
When people use the internal search engine, they then … The first two patterns indicate that user don’t find their target or a clue leading them towards it. They do not continue with the search function but go back to home and come back to a search page again. Only the last pattern indicates a successful search.
Summary: A broken cycle of knowledge? • Readers and contributors are distinct groups; readers are not led to accessing contributor content (at least not in a single session). • Only few people use the internal search engine, and they do not experience structured search as an effective or efficient search option. • Supporting evidence from a separate questionnaire study • The use of external search engines makes access to dissertation full-texts more likely.
What programs and interfaces can support authors and turn them into contributors? ... and how can further data be gained for understanding and supporting the authoring process? • An Intelligent Authoring Tool for creating semantics • Prototype: focus on bibliography markup • core & most error-prone part of template use in EDOC • bibliographic errors haunt science: Cardona & Marx, 2004 • MS Word macro + information extraction (TTT, Grover, Matheson, Mikheev, & Moens, 2000), distributed system
Motivation: An error resulting from the use of MS Word + template (example) <BIBLIOGRAPHY><FLOAT><PAGENUMBER>136</PAGENUMBER></FLOAT> <HEAD>Literaturverzeichnis</HEAD> <CITATION WORKTYPE="journal" PUBLISHED="PUBLISHED"> <CUT ID="bib-15-">[1] </CUT><WORKAUTHOR>Agarwal, R.; Krueger, B. P.; Scholes, G. D.; Yang, M.; Yom, J.; Mets, L.; Fleming, G. R.</WORKAUTHOR>U<ARTICLETITLE>ltrafast energy transfer in LHC-II revealed by three-pulse photon echo peak shift measurements</ARTICLETITLE>, <WORKTITLE>J. Phys. Chem. B</WORKTITLE>, <PUBDATE>2000</PUBDATE>, <NUMBER>104</NUMBER>, <PAGES>2908</PAGES>, </CITATION> ...
Server MS Word VBA Makro Informat. extraction 1. Text file (SFTP) 2. SSH command 3. XML file (SFTP) System architecture & interface corrected, XML annotated, and formatted
Summary, conclusions, & outlook:(1) concerning contribution • To encourage volunteer contributions: consider computational & motivational & institutional aspects! • Extend Intelligent Authoring Tool: • extend bibliography functionality (citation styles, ...) • as Web service • integration of further text and link mining • laboratory user studies for first evaluation of tool • usage mining for continuous evaluation of tool • We believe that our setting, incentive structures, and thus also our findings transfer to other KCVC efforts.
Summary, conclusions, & outlook:(2) concerning ubiquity • In a series of studies of worldwide search in an information site, we have found influences of users‘ • language • culture • domain expertise on access to Web knowledge and derived recommendations for site design. • These variables describe aspects of the ubiquity of persons who use knowledge sources. Plan: extend this to also investigate the impact of these variables on contribution. Main sources: Kralisch & Berendt (2004a,b); Kralisch, Eisend, & Berendt (2005) – see www.wiwi.hu-berlin.de/~berendt; overview in Barcelona presentation: www.wiwi.hu-berlin.de/~berendt/Talks/berendt_2005_02_28_webversion.pdf