1 / 16

New Frontiers in Auto-translation: The HAH* Solution

New Frontiers in Auto-translation: The HAH* Solution. An ISyracuseHigh Joint Initiative Helen Szigeti, ISI Abby Goodrum, Syracuse University Helen Atkins, Highwire Press. * HAH: Helen, Abby, and another Helen. Issue: Citedness. Why aren’t JASIST authors more highly cited than they are?.

andrew
Download Presentation

New Frontiers in Auto-translation: The HAH* Solution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. New Frontiers in Auto-translation: The HAH* Solution An ISyracuseHigh Joint InitiativeHelen Szigeti, ISIAbby Goodrum, Syracuse UniversityHelen Atkins, Highwire Press * HAH: Helen, Abby, and another Helen

  2. Issue: Citedness • Why aren’t JASIST authors more highly cited than they are?

  3. Problem: Incomprehensibility • No one can understand articles in JASIST • Hence, no one cites JASIST • JASIST authors do not receive large amounts of grant money, lucrative speaking engagements, a smooth path to tenure, or invitations for guest appearances on Oprah

  4. Evidence of the problem • 1999 JASIS article by HB Babs, HMS Trix, and A Bala: “The synthesis of specialty narratives from co-citation clusters. Part 1: Utilization of a real-time self organizing approach to term co-occurrence and word frequency analysis through collaborative filtering of multidimensional databases.”

  5. Hypothesis: Comprehension is time-consuming • By the time a reader reaches the end of a JASIST article with a full understanding of the ideas and issues presented s/he has forgotten why s/he was reading the article in the first place Goal: Reduce the time needed to understand a JASIST article

  6. Solution: HAH Trans-JASIST Devicesm • Automatically parses out pseudo-scholarly info-babble leaving only root concepts, stop words, and thinly veiled polysyllabic expletives.* • “Corporate” Version (2.0; in beta) can also reverse-translate from a simple executive memorandum to a quality scholarly paper suitable for publication in any information science journal. * Note: ISyracuseHigh is currently working on a related parser that will be capable of capitalizing on these expletives as a means of generating a new method of relevance ranking

  7. Elements of the Solution: part 1 • HAH Redundancy Reducer (HAR-HAR) - Occupational tendency for information scientists to utilize the same data set to publish multiple papers - The HAR-HAR takes a work or a corpus of work by a single author and reduces it to a single paragraph (or in some cases, a single phrase)

  8. Elements of the Solution: part 2 • HAH Suess-O-Mapper (HAH-SOMMore) - Our research uncovered a fundamental linguistic key* that underlies all scholarly communication/ publication patterns worldwide - The HAH-SOMMore uses concept mapping algorithms against the output from the HAR-HAR redundancy reducer to generate a comprehensible, natural language alternative to the original text. * From the seminal work by Dr. Suess entitled One Fish, Two Fish, Red Fish, Blue Fish.

  9. Demonstrations of the System • Academic paper to natural language • Corporate memo to academic paper

  10. Academic paper to natural language • “The synthesis of specialty narratives from co-citation clusters. Part 1: Utilization of a real-time self organizing approach to term co-occurrence and word frequency analysis through collaborative filtering of multi-dimensional databases.”(Babs, Trix, and Bala) • After reduction: synthesis self-ego to group visual word and free ISI science data through from grant of no-tenure wine damn damn damn • After mapping to natural language...

  11. Academic paper to natural language • “A pretty picture we drew by putting ISI data (which we got for free) into visualization software to show that medicine can be considered a sub-category of life sciences (who’da thunk?): We would have done more but we blew our grant money on Merlot and DVDs.”

  12. Corporate memo to academic paper • “Subject: Unauthorized use of telephone, fax, and email for personal reasons.” • After reverse translation: “Policy analysis for topical consensus on the roles, rights, and responsibilities of individuals toward digital materials and communication protocols within the corporate learning organization: Optimization of transactional analysis to benchmark performance measures in a networked environment.”

  13. Results • Although our translation engine has a 93% success rate it does not solve the problem initially identified by the research team • Original hypothesis: If readers could understand JASIST articles within a shorter time period then citations to these articles would increase • Actual outcome: Once fully comprehended in a reasonable time frame, JASIST articles are even less frequently cited because no worthwhile data, methodologies, or conclusions are discernable

  14. The HAH Axiom: Comprehension works against citedness. COMP. (%) CITEDNESS (#)

  15. Conclusion • Do not try to be clear -- just keep doing what you’re doing.

  16. Thank you! ISyracuseHigh contact information: Helen Szigeti helen.szigeti@isinet.comAbby Goodrum aagoodru@syracuse.eduHelen Atkins something@highwire.org?

More Related