Example research presentations for CS 562

Example research presentations for CS 562

10-15 minutes to cover…

Multi-Objective Software Effort Estimation 1 Topic: • Managers must estimate effort for a project • Existing techniques focus on giving point estimates (e.g., 230 hours) • Paper presents technique for generating estimates with uncertainty (e.g., 230 ± 30 hours) • Internally, technique relies on machine learning for multi-objective optimization (estimated effort error, and uncertainty)

Claim 1: The algorithm outperforms state-of-the-art options. Evidence: • Tested on 5 datasets, each of which gave several project attributes for each of many projects • Example: "China" dataset had 499 rows (projects) and 6 columns (target:Effort, and 5 other project attributes for use in predicting effort) • Performed 3-fold validation • Used Wilcoxon Signed Rank Test to check their algorithm's performance against 5 competing techniques • Most statistical tests were significant at P<0.001 • Authors properly account for repeated tests with Bonferroni correction

Claim 2: The algorithm outperforms competitors with a large effect size. Evidence: • During evaluation, authors computed Vargha/Delaney effect size • Was large in almost every test (> 0.9 in 23 out of 25 tests).

Claim 3: The algorithm's estimation error lies within bounds acceptable for industrial practice. Evidence: • Errors of most estimates were within 30-40% of true effort • A prior survey of managers suggested this level of accuracy is a good enough range for practice.

Ideas for software project managers • Start collecting a project database. • Someday, such a database will be useful for predicting effort. • It doesn't take a fancy database to yield actionable results.

Developer Onboarding in GitHub: The Role of Prior Social Links and Language Experience 2 Topic: • Developers come and go from projects • Where can they be most productive? • Is it possible for a project leader to anticipate how productive a developer will be? • Especially hard in open source, where prior relationships might not exist

Claims • Developers tend to join projects where they already have social connections. • Developers are most productive immediately after joining if the project involves languages they've already used. • If a developer has prior social contacts and experience, long-term productivity tends to be higher.

Study • Used Github dataset to identify developers active ≥5 years and ≥500 commits to ≥10 projects • From those developers, identified related projects, filtered down to 58k whose commits could be retrieved and analyzed without significant error • Definitions: • Divided time period into a series of windows, to define what "prior" meant • Defined level of prior relationship as Σ1/N p, summed over projects p where they worked in common, and N p=team size of project p) • Defined language experience in terms of number of prior file changes, grouped by filename extensions • Defined productivity in terms of number of commits. Defined baseline in terms of control variables, looked for "above baseline" level of productivity.

Evidence for claims • Claim 1: • Over 90% of developers joined projects with prior connections • Significant at P<0.05 under statistical test using hypergeometric distribution • Claim 2: • 10% more likely to have "above baseline" productivity in 1st period after joining, if prior experience with language • Significant at P<0.0000001 with negative binomial distribution • Claim 3: • 43% more likely to have "above baseline" productivity summed over all periods, if prior experience and prior contacts • Significant at P<0.05 with negative binomial distribution

Ideas for software project managers • If you’re running a project, you can increase productivity by: • Finding team members with relevant experience • Finding team members with prior social contacts to your team • Best of all, both of the above • And if you have to take somebody lacking the above, watch out!

Disseminating Architectural Knowledge on Open-Source Projects 3 Topic: • Often necessary to communicate software architecture • E.g., to maintainers, to external partners • How do experts communicate about software architecture? • This paper discusses a study of how software architectures were described in a book • One chapter per software system

Study • 18 chapter authors participated • Multiple methods of data collection • Questionnaires, interviews, and opportunity for interviewee to review paper • Analyzed respondents statements (open coding) • Developed codes related to motivating research questions, performed axial coding, identified themes • Initial questions focused on how authors approach the topic, and what factors influence what content authors choose to include

Claim 1: Participants targeted a general programming audience Evidence: • 13 of 18 participants explicitly said so • Also backed up by specific quotes

Claim 2: Authors communicated architectures via code fragments and diagrams Evidence: • 16 out of 18 authors included code fragments in their chapters • The main reason for not doing so was reservations about whether including code would prevent comprehension by subsets of readers. • Note: Consistent with Claim 1, a desire to be readable by large audience • All 18 authors included diagrams in their chapters • But only 1 actually used proper UML • Note: Again consistent with Claim 1, no need for readers to know UML

Claim 3: Architecture discussions included a lot about the evolution and community. Evidence: • Most chapters included statements about how the architecture had changed (evolved) over time • Most also emphasized aspects of the architecture that had been important to the community creating it. • Backed up by quotes from interviewees about what they felt important to include.

Ideas for software project managers • Paper makes it clear that projects can often benefit from documentation of the software architecture • Manager should ensure staff include: • Discussion of architecture • Including significant evolution • Including community influences • Diagrams as necessary • OK to deviate from “proper” UML • Code samples as necessary

Why Do Developers Use Trivial Packages?An Empirical Case Study on npm 4 • A “trivial package” is a component that a developer could have coded for him or her self. • So why include it as a dependency? • Creates risk of breaking • For that matter, why even bother to publish it?

Claim 1: Trivial packages are common Evidence: • Authors analyzed >230k packages from the npm repository • Identified a sample of packages with 4-250 lines of code • Surveyed 12 programmers (including 10 students) • Developed criterion (covers ~80% of what people considered trivial) • “Trivial package”: ≤ 35 lines of code • And cyclomatic complexity ≤ 10 • Went back to analyze all packages • Found 17% of npm packages met the trivial-package criterion • Including 11% of the 1000 “most depended” packages!

Claim 2: Developers do not consider using trivial packages to be harmful. Evidence: • Survey of 88 developers • Identified 600 who used many trivial packages • And another 600 who used relatively few • Of these 1200, 88 completed the survey • 58% of respondents did not consider using trivial packages harmful • 55% said trivial packages are usually well-implemented and tested • 48% said reusing trivial packages saves time

Claim 3: Trivial packages often lack unit tests and can carry many additional dependencies. Evidence: • Analyzed the dataset of npm packages • Actually analyzed both trivial and non-trivial packages • Relied on the “npms” tool to analyze unit testedness of each package • Found that 45% of trivial packages have no unit tests • Further, used npm package annotations to compute dependencies • Found 44% of trivial packages have ≥ 1 dependency • Found 12% of trivial packages have ≥ 20 dependencies

Ideas for software managers • Keep an eye on whether your team is depending on trivial packages • Unless you’re really saving time that you need to save, such dependencies might be more trouble than they’re worth • In particular, don’t fall prey to the assumption that just because they’re used a lot means that they’re well-tested!

Classifying Developers into Core and Peripheral: An Empirical Study on Count and Network Metrics 5 Topic: • Researchers have investigated different roles of open source contributors • Helpful for understanding structure of communities • And for understanding how people progress thru communities • And someday for understanding how to help people progress • Historically, researchers classified people based on their activity • Commit count, lines of code committed, number of emails sent

New approach • Authors propose classifying based on social network • Undirected graph, one node per human being • Edges connect those who communicated in same email thread • And/or edges connect those who edit functionally connected code • For each person, compute social network metrics • Degree centrality, eigenvector centrality, hierarchy, role stability, core/periphery status

Claim 1: Network-based metrics are as mutually consistent as count-based metrics Evidence: • Analyzed logs for 10 open-source projects • Computed count-based metrics for each person • Computed Cohen’s kappa (measure of agreement) • Found Cohen’s kappa generally was… • Around 0.75 for commit count vs lines of code committed (substantial agreement) • Around 0.35 for each of those metrics, vs number of emails sent (fair agreement) • Repeated with network-based metrics for each person • Found Cohen’s kappa generally was in the same ranges

Claim 2: Network-based metrics more closely match “true” role than count-based metrics Evidence: • Did a survey of project participants (166 respondents in 9 projects) • Identified the person most frequently identified as highest role • Computed Cohen’s kappa between this “ground truth” and the indications of who’s most important according to network-based and count-based metrics • Count-based metrics had an agreement of 0.355 to 0.421 with “truth” • Network-based metrics had agreement of 0.404 to 0.497 with “truth” • All of these were significant at P<0.001

Claim 3: The network-based model reveals interesting dynamics Evidence: • Focused on the QEMU project • Computed network-based metrics for each of several periods • Classified people according to network metrics • Computed transition probabilities (Markov models) between categories • Note that count-based metrics don’t reveal these particular dynamics

Ideas for software project managers • Busyness isn’t as informative as relationships are • Relationships among developers may be a truer reflection of roles than counts of contribution types • Don’t evaluate importance of team solely based on commits • Also evaluate based on how they relate to others • Make time to understand how your team is relating to one another • … and how to foster meaningful interactions that promote success

Final comments for your CS 562 presentations

Example research presentations for CS 562

Example research presentations for CS 562

Presentation Transcript

Giving Research Presentations

Giving Research Presentations

FPA 111 Research for presentations

Giving Research Presentations

Research Presentations Workshop

Hospitalist Research Presentations

Research Presentations

CS 562 Advanced SW Engineering

CS 397 Review of Presentations

Giving Research Presentations

Research Presentations

Giving Research Presentations

CS 562 Advanced SW Engineering

CS 562 Advanced SW Engineering

Strategies for Effective Research Presentations

Research Presentations

Research Presentations

Research Proposal Presentations

Oral Research Presentations

Giving Research Presentations

Research Presentations

Giving Research Presentations