1 / 37

Data Science for Higher Ed

Data Science for Higher Ed. Gloria Lau Manager, Data Science @ LinkedIn. LinkedIn data. For students*. *prospective students, current students and recent graduates. WHY? We have career outcome data to derive better insights about higher education. Common questions from user studies.

kasie
Download Presentation

Data Science for Higher Ed

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Science for Higher Ed • Gloria Lau • Manager, Data Science @ LinkedIn

  2. LinkedIn data. For students*. *prospective students, current students and recent graduates

  3. WHY?We have career outcome data to derive better insights about higher education

  4. Common questions from user studies Prospective students: I want to be a pediatrician. Where should I go to school? I don’t know what I want but I am an A student. So? Current students: Show me the internship / job opportunities. Should I double / change major? Recent graduates: Show me the job opportunities. Should I consider further education?

  5. The Answer for the type A’s Show me the career outcome data per school / field of study / degree

  6. The Answer for the exploratory kind Show me the career outcome data in a form that allows for serendipitous discoveries  build me some data products to help me draw insights from aggregate data  build me some data products that are delightful

  7. OK! Let’s start building some data products for students! type A’s and non type A’s, we have answers for you

  8. Invest in Plumbing

  9. Before your faucets

  10. Data Science for Higher EdA case study • From plumbing to fixture. • From standardization to delightful data products.

  11. Standardization • Standardization is about understanding our data, and building the foundational layer that maps <school_name> to <school_id> so that we can build data products on top • Entity resolution • Recognizable entities • Typeahead

  12. Entity Resolution • User types in University of California, Berkeley  easy • User types in UCB  hard / ambiguous

  13. Entity Resolution • Name feature: fuzzy match, edit distance, prefix match, etc • Profile feature: email, groups, etc • Network feature: connections, invitations, etc

  14. Recognizable entities • User types in University of California, Berkeley  easy • User types in UCB  hard / ambiguous / alias not understood • User types in 東京大学 harder / canonical name not understood

  15. Recognizable entities • You don’t know what you don’t know • Your standardization is only as good as your recognized dataset • LinkedIn data is very global

  16. Recognizable entities • IPEDS for US school data • Crowdsourcing for non-US school + government data • internal and external with schema spec’ed out • Alias – bootstrap from member data

  17. Typeahead • Plug the hole from the front(-end) as soon as you can • Invest in a good typeahead early on so that you don’t even need to standardize • Helps standardization rate tremendously • Make sure you have aliases and localized strings in your typeahead

  18. Plumbing? checked • Onto building delightful* data products *The level of delightfulness is directly correlated to how good your standardization layer is.

  19. Similar Schools • Serendipitous discoveries. Sideways browse. • Based on career outcome data + some more.

  20. Similar Schools

  21. Similar schools • Aggregate profile per school based on alumni data • Industry, job title, job function, company, skills, etc • Feature engineering and balancing • Dot-product of 2 aggregate profiles = school similarity

  22. Similar schools – issues • Observation #1: similarity identified between tiny specialized schools and big research institutions • Observation #2: similarity identified between non-US specialized schools and big US research institutions

  23. What’s wrong? • Degree bucketization

  24. Similar schools - issues • Observation: no data • New community colleges and non-US schools have very sparse data • Solution: attribute-based similarity • From IPEDS and crowdsourced data Kyoritsu Women's University

  25. Notable Alumni • Aspirations. Connecting the dots.

  26. Notable Alumni • Who’s notable? • Wikipedia match • School standardization • Name mapping • Success stories

  27. Who’s notable – Wikipedia stories

  28. Wikipedia stories • Lightweight school standardization • ✓ Name feature ✕ profile feature ✕ network feature • Name mapping • Even when you are notable, your name isn’t unique • Crowdsourcing for evaluation • Profile from LinkedIn vs profile from Wikipedia

  29. Crowdsourcing for evaluation

  30. Are we done? Do we have notable alumni for all schools? • Similar issue like similar schools – data sparseness

  31. Who’s notable - Success stories • Many schools don’t have notable alumni section in Wikipedia • Success stories based on LinkedIn data • Features of success • CXO’s at Fortune companies • Generalizes to high seniority at top companies • But what does it mean to be • A top company • Senior • An alum • They all depend on…

  32. Standardization • Degree standardization - alumni • Company standardization • IBM vs international brotherhood of magicians • Title & seniority standardization • founder of the glorialau franchise vs founder of LinkedIn • VP in financial sector vs VP in software engineering industry

  33. Evaluation – I know it when I see it

  34. INSIGHTS: unique & standardized data to describe schools. similar schools. notable alumni. to drive STUDENT DECISIONS

More Related