1 / 23

sCooL : A System for Academic Institution Name Normalization

sCooL : A System for Academic Institution Name Normalization. Ferosh Jacob, Faizan Javed, Meng Zhao, and Matt McNair Classification R & D CareerBuilder. About sCooL What is entity normalization? Why is academic entity normalization important?

lena
Download Presentation

sCooL : A System for Academic Institution Name Normalization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. sCooL:A System for Academic Institution Name Normalization Ferosh Jacob, Faizan Javed, Meng Zhao, and Matt McNair Classification R & D CareerBuilder

  2. About sCooL • What is entity normalization? • Why is academic entity normalization important? • What are the academic entity normalization challenges? • Inside sCooL • A high-level overview of the core components • Atlas- the mapping manager • Evaluating sCooL • Comparing sCooL with existing implementation • Independent evaluation of sCooL • Concluding remarks • Demo • Questions? Presentation overview

  3. Facts 7,021post-secondary title IV institutions in 2010-111* 200 Million unique visitors @ CB U.S 12 Million unique academic institutions entries in CB resume database About sCooL:Academic entity normalization facts *http://nces.ed.gov/fastfacts/display.asp?id=84

  4. } Entity: Surface forms About sCooL:Academic entity normalization definition

  5. Improved Searching Labor market dynamics insights About sCooL:Why academic entity normalizations

  6. } Entity: Salford City College MerchantsQuay, Salford Quays United Kingdom Entity: University of Salford Salford, Lancashire United Kingdom Entity: Salford College 68 Grenfell Street, Adelaide Australia How will you identify the most accurate normalization from a given surface form? About sCooL:Academic entity normalization challenges

  7. String similarity algorithms • Edit distance • Salford university -> SalfordUnevarsity (Edit distance 2) (spelling error) • St. Loye’sCollege ->St. Luke’s College (Edit distance 2) (Two different academic institutions) • How will you distinguish spelling or typing errors from • two different institution mapping scenario? About sCooL:Academic entity normalization challenges..

  8. Legacy names (Mergers) • University of Central England in Birminghamis an old name of Birmingham City University • In January 2009,SalfordCollegemerged with Eccles College and Pendleton College to form Salford City College • In October 2004, Victoria University of Manchester with the University of Manchester Institute of Science and Technology to form The University of Manchester • Popular names and Acronyms • Ole Miss is a popular name for The University of Mississippi • MIT is an acronym for Massachusetts Institute of Technology. However, GIT is not an acronym for Georgia Institute of Technology butGeorgia Tech or Ga Tech are popular names for the institution. How will you create and maintain the surface form-entity mappings? About sCooL:Academic entity normalization challenges

  9. How can we remove K-12 schools and noise? About sCooL:Academic entity normalization challenges

  10. How will you identify the most accurate normalization from a given surface form? • How will you distinguish spelling or typing errors from two different institution mapping scenario? • How will you create and maintain the surface form-entity mappings? • How can we remove K-12 schools and noise? About sCooL:Challenges summary

  11. Inside sCooL:A high-level overview of the system

  12. sCooL Atlas Inside sCooL:Atlas- sCooL’s mapping manager

  13. Inside sCooL:Refining Lucene results

  14. Evaluation:Comparing sCooL with existing implementation

  15. Evaluation:Comparing sCooLwithexisting implementation

  16. Evaluation:Comparing sCooLwith existing implementation

  17. Evaluation:Independent evaluation of sCooL

  18. Atlas • http://ec2-54-193-1-73.us-west-1.compute.amazonaws.com/Atlas/ sCooL:Demo

  19. sCooL:Questions

  20. sCooL: AppendixLucene search results for “University of Milan”

  21. sCooL: AppendixString similarity algorithms

  22. Balancing between Accuracy and Coverage Evaluation:Comparing sCool with existing implementation

  23. Cucerzan, S from Microsoft Research did great work on large-scale disambiguation by Wikipedia data in 2007 • Jijkoun, V et. al. from Univ. of Amsterdam proposed NEN in user generated content in 2008 • Liu, X et. al. from Microsoft Research, China conducted a joint inference on NER and NEN for tweets in 2012 • Magdy, W et. al. from IBM, Egypt invented NEN for Arabic names in 2007 • Jonnalagadda, S et. al. from Lnx Research, CA developed NEMO, a NER and NEN system for PubMed author affiliations 2011 • Cohen, A from OHSU studied gene/protein NEN by automatically generated libraries in 2005 About sCooL:Related work

More Related