160 likes | 323 Views
Human Migration of Open-Source Contributors. Kick-off Presentation Erik Kouters Graduation supervisor: A. Serebrenik Graduation tutor: B. Vasilescu. What have I done so far?. Geographical Movement of Mailing List Participants Seminar SET Capita Selecta SET
E N D
Human Migration of Open-Source Contributors Kick-off Presentation Erik Kouters Graduation supervisor: A. Serebrenik Graduation tutor: B. Vasilescu
What have I done so far? • Geographical Movement of Mailing List Participants • Seminar SET • Capita Selecta SET • Who’s who in GNOME: using LSA to merge software repository identities • ICSM 2012 ERA Track / Software Engineering & Technology
What are the main topics? Human migration of open-source contributors Identity matching Case study: GNOME / Software Engineering & Technology
Why is human migration of open-source contributors interesting? • A passionate contributor would visit a conference. • Don't program on Fridays! • Contributors that appear as weekend commuters are less likely to introduce bugs on Fridays. • Translators that reside in a different country than the country of the target language are expected to deliver translations of lower quality. / Software Engineering & Technology
What’s so interesting about this human migration of open-source contributors? • What (geographical) patterns does the migration of open-source contributors follow? • Which patterns (source destination) are most popular? • Commute • Conferences • What are the factors that influence this migration? • Which factors are most influential? / Software Engineering & Technology
How am I planning to trace these migrations? Extract emails from mailing list archive Resolve emails to location Email A is sent from locationA at timestampA Email B is sent from locationB at timestampB <locationA, timestampA> + <locationB, timestampB> = migration! But what if the contributor uses multiple email addresses? / Software Engineering & Technology
What exactly is Identity Matching? Identifying which aliases belong to the same individual Common in the form <name, emailAddress> <“George Stefanakis”, george.stefanakis@domainA> <“Stephanakis, George”, g.stephanakis@domainB> Needs some similarity measure (e.g. edit distance) / Software Engineering & Technology
How am I going to match these identities? / Software Engineering & Technology
What will I be doing to improve the identity matching? • Increase confidence when merging email addresses • Look at fellow recipients (mailing list) • Look at coauthors (source code repository) • Use multiple similarity measures • Currently Levenshtein and Cosine Similarity • Compare performance with others (e.g. Jaccard, Jaro-Winkler, Dice’s coefficient, etc.) • Improve implementation • Currently slow • Data set limited to system’s memory • Release the tool as open-source (e.g. Github) • Compare to current implementations / Software Engineering & Technology
So, what will I be doing? • Improve the identity matching algorithm’s performance • Run the algorithm on the data from the mailing list archive • Send out a questionnaire to verify the results • While waiting for the questionnaire, improve the algorithm with more advanced techniques • When we have received sufficient responses on questionnaire, analyse the data and look for patterns / Software Engineering & Technology
A questionnaire? What about privacy? • Only the individual can access the data • Participation by entering their email address • Unique URL (hash) mailed to the email address • Data will not be made public • Research published based on the data will be anonymised / Software Engineering & Technology
How do I confirm the identity matching? / Software Engineering & Technology
How do I confirm the migrations? / Software Engineering & Technology
Looks promising… / Software Engineering & Technology
And what am I hoping to achieve? • A more advanced and better performing identity matching algorithm than currently exists • Versatile and open-source tool • According to which patterns and why skilled workers (open-source contributors) migrate • Work during holiday Hobbyist • Visits conferences High activity in project • More publications! / Software Engineering & Technology
Thank you! Questions? / Software Engineering & Technology