160 likes | 305 Views
New Frontiers in Auto-translation: The HAH* Solution. An ISyracuseHigh Joint Initiative Helen Szigeti, ISI Abby Goodrum, Syracuse University Helen Atkins, Highwire Press. * HAH: Helen, Abby, and another Helen. Issue: Citedness. Why aren’t JASIST authors more highly cited than they are?.
E N D
New Frontiers in Auto-translation: The HAH* Solution An ISyracuseHigh Joint InitiativeHelen Szigeti, ISIAbby Goodrum, Syracuse UniversityHelen Atkins, Highwire Press * HAH: Helen, Abby, and another Helen
Issue: Citedness • Why aren’t JASIST authors more highly cited than they are?
Problem: Incomprehensibility • No one can understand articles in JASIST • Hence, no one cites JASIST • JASIST authors do not receive large amounts of grant money, lucrative speaking engagements, a smooth path to tenure, or invitations for guest appearances on Oprah
Evidence of the problem • 1999 JASIS article by HB Babs, HMS Trix, and A Bala: “The synthesis of specialty narratives from co-citation clusters. Part 1: Utilization of a real-time self organizing approach to term co-occurrence and word frequency analysis through collaborative filtering of multidimensional databases.”
Hypothesis: Comprehension is time-consuming • By the time a reader reaches the end of a JASIST article with a full understanding of the ideas and issues presented s/he has forgotten why s/he was reading the article in the first place Goal: Reduce the time needed to understand a JASIST article
Solution: HAH Trans-JASIST Devicesm • Automatically parses out pseudo-scholarly info-babble leaving only root concepts, stop words, and thinly veiled polysyllabic expletives.* • “Corporate” Version (2.0; in beta) can also reverse-translate from a simple executive memorandum to a quality scholarly paper suitable for publication in any information science journal. * Note: ISyracuseHigh is currently working on a related parser that will be capable of capitalizing on these expletives as a means of generating a new method of relevance ranking
Elements of the Solution: part 1 • HAH Redundancy Reducer (HAR-HAR) - Occupational tendency for information scientists to utilize the same data set to publish multiple papers - The HAR-HAR takes a work or a corpus of work by a single author and reduces it to a single paragraph (or in some cases, a single phrase)
Elements of the Solution: part 2 • HAH Suess-O-Mapper (HAH-SOMMore) - Our research uncovered a fundamental linguistic key* that underlies all scholarly communication/ publication patterns worldwide - The HAH-SOMMore uses concept mapping algorithms against the output from the HAR-HAR redundancy reducer to generate a comprehensible, natural language alternative to the original text. * From the seminal work by Dr. Suess entitled One Fish, Two Fish, Red Fish, Blue Fish.
Demonstrations of the System • Academic paper to natural language • Corporate memo to academic paper
Academic paper to natural language • “The synthesis of specialty narratives from co-citation clusters. Part 1: Utilization of a real-time self organizing approach to term co-occurrence and word frequency analysis through collaborative filtering of multi-dimensional databases.”(Babs, Trix, and Bala) • After reduction: synthesis self-ego to group visual word and free ISI science data through from grant of no-tenure wine damn damn damn • After mapping to natural language...
Academic paper to natural language • “A pretty picture we drew by putting ISI data (which we got for free) into visualization software to show that medicine can be considered a sub-category of life sciences (who’da thunk?): We would have done more but we blew our grant money on Merlot and DVDs.”
Corporate memo to academic paper • “Subject: Unauthorized use of telephone, fax, and email for personal reasons.” • After reverse translation: “Policy analysis for topical consensus on the roles, rights, and responsibilities of individuals toward digital materials and communication protocols within the corporate learning organization: Optimization of transactional analysis to benchmark performance measures in a networked environment.”
Results • Although our translation engine has a 93% success rate it does not solve the problem initially identified by the research team • Original hypothesis: If readers could understand JASIST articles within a shorter time period then citations to these articles would increase • Actual outcome: Once fully comprehended in a reasonable time frame, JASIST articles are even less frequently cited because no worthwhile data, methodologies, or conclusions are discernable
The HAH Axiom: Comprehension works against citedness. COMP. (%) CITEDNESS (#)
Conclusion • Do not try to be clear -- just keep doing what you’re doing.
Thank you! ISyracuseHigh contact information: Helen Szigeti helen.szigeti@isinet.comAbby Goodrum aagoodru@syracuse.eduHelen Atkins something@highwire.org?