300 likes | 767 Views
Example research presentations for CS 562. 10-15 minutes to cover…. Multi-Objective Software Effort Estimation. 1. Topic: Managers must estimate effort for a project Existing techniques focus on giving point estimates (e.g., 230 hours)
E N D
Multi-Objective Software Effort Estimation 1 Topic: • Managers must estimate effort for a project • Existing techniques focus on giving point estimates (e.g., 230 hours) • Paper presents technique for generating estimates with uncertainty (e.g., 230 ± 30 hours) • Internally, technique relies on machine learning for multi-objective optimization (estimated effort error, and uncertainty)
Claim 1: The algorithm outperforms state-of-the-art options. Evidence: • Tested on 5 datasets, each of which gave several project attributes for each of many projects • Example: "China" dataset had 499 rows (projects) and 6 columns (target:Effort, and 5 other project attributes for use in predicting effort) • Performed 3-fold validation • Used Wilcoxon Signed Rank Test to check their algorithm's performance against 5 competing techniques • Most statistical tests were significant at P<0.001 • Authors properly account for repeated tests with Bonferroni correction
Claim 2: The algorithm outperforms competitors with a large effect size. Evidence: • During evaluation, authors computed Vargha/Delaney effect size • Was large in almost every test (> 0.9 in 23 out of 25 tests).
Claim 3: The algorithm's estimation error lies within bounds acceptable for industrial practice. Evidence: • Errors of most estimates were within 30-40% of true effort • A prior survey of managers suggested this level of accuracy is a good enough range for practice.
Ideas for software project managers • Start collecting a project database. • Someday, such a database will be useful for predicting effort. • It doesn't take a fancy database to yield actionable results.
Developer Onboarding in GitHub: The Role of Prior Social Links and Language Experience 2 Topic: • Developers come and go from projects • Where can they be most productive? • Is it possible for a project leader to anticipate how productive a developer will be? • Especially hard in open source, where prior relationships might not exist
Claims • Developers tend to join projects where they already have social connections. • Developers are most productive immediately after joining if the project involves languages they've already used. • If a developer has prior social contacts and experience, long-term productivity tends to be higher.
Study • Used Github dataset to identify developers active ≥5 years and ≥500 commits to ≥10 projects • From those developers, identified related projects, filtered down to 58k whose commits could be retrieved and analyzed without significant error • Definitions: • Divided time period into a series of windows, to define what "prior" meant • Defined level of prior relationship as Σ1/N p, summed over projects p where they worked in common, and N p=team size of project p) • Defined language experience in terms of number of prior file changes, grouped by filename extensions • Defined productivity in terms of number of commits. Defined baseline in terms of control variables, looked for "above baseline" level of productivity.
Evidence for claims • Claim 1: • Over 90% of developers joined projects with prior connections • Significant at P<0.05 under statistical test using hypergeometric distribution • Claim 2: • 10% more likely to have "above baseline" productivity in 1st period after joining, if prior experience with language • Significant at P<0.0000001 with negative binomial distribution • Claim 3: • 43% more likely to have "above baseline" productivity summed over all periods, if prior experience and prior contacts • Significant at P<0.05 with negative binomial distribution
Ideas for software project managers • If you’re running a project, you can increase productivity by: • Finding team members with relevant experience • Finding team members with prior social contacts to your team • Best of all, both of the above • And if you have to take somebody lacking the above, watch out!
Disseminating Architectural Knowledge on Open-Source Projects 3 Topic: • Often necessary to communicate software architecture • E.g., to maintainers, to external partners • How do experts communicate about software architecture? • This paper discusses a study of how software architectures were described in a book • One chapter per software system
Study • 18 chapter authors participated • Multiple methods of data collection • Questionnaires, interviews, and opportunity for interviewee to review paper • Analyzed respondents statements (open coding) • Developed codes related to motivating research questions, performed axial coding, identified themes • Initial questions focused on how authors approach the topic, and what factors influence what content authors choose to include
Claim 1: Participants targeted a general programming audience Evidence: • 13 of 18 participants explicitly said so • Also backed up by specific quotes
Claim 2: Authors communicated architectures via code fragments and diagrams Evidence: • 16 out of 18 authors included code fragments in their chapters • The main reason for not doing so was reservations about whether including code would prevent comprehension by subsets of readers. • Note: Consistent with Claim 1, a desire to be readable by large audience • All 18 authors included diagrams in their chapters • But only 1 actually used proper UML • Note: Again consistent with Claim 1, no need for readers to know UML
Claim 3: Architecture discussions included a lot about the evolution and community. Evidence: • Most chapters included statements about how the architecture had changed (evolved) over time • Most also emphasized aspects of the architecture that had been important to the community creating it. • Backed up by quotes from interviewees about what they felt important to include.
Ideas for software project managers • Paper makes it clear that projects can often benefit from documentation of the software architecture • Manager should ensure staff include: • Discussion of architecture • Including significant evolution • Including community influences • Diagrams as necessary • OK to deviate from “proper” UML • Code samples as necessary
Why Do Developers Use Trivial Packages?An Empirical Case Study on npm 4 • A “trivial package” is a component that a developer could have coded for him or her self. • So why include it as a dependency? • Creates risk of breaking • For that matter, why even bother to publish it?
Claim 1: Trivial packages are common Evidence: • Authors analyzed >230k packages from the npm repository • Identified a sample of packages with 4-250 lines of code • Surveyed 12 programmers (including 10 students) • Developed criterion (covers ~80% of what people considered trivial) • “Trivial package”: ≤ 35 lines of code • And cyclomatic complexity ≤ 10 • Went back to analyze all packages • Found 17% of npm packages met the trivial-package criterion • Including 11% of the 1000 “most depended” packages!
Claim 2: Developers do not consider using trivial packages to be harmful. Evidence: • Survey of 88 developers • Identified 600 who used many trivial packages • And another 600 who used relatively few • Of these 1200, 88 completed the survey • 58% of respondents did not consider using trivial packages harmful • 55% said trivial packages are usually well-implemented and tested • 48% said reusing trivial packages saves time
Claim 3: Trivial packages often lack unit tests and can carry many additional dependencies. Evidence: • Analyzed the dataset of npm packages • Actually analyzed both trivial and non-trivial packages • Relied on the “npms” tool to analyze unit testedness of each package • Found that 45% of trivial packages have no unit tests • Further, used npm package annotations to compute dependencies • Found 44% of trivial packages have ≥ 1 dependency • Found 12% of trivial packages have ≥ 20 dependencies
Ideas for software managers • Keep an eye on whether your team is depending on trivial packages • Unless you’re really saving time that you need to save, such dependencies might be more trouble than they’re worth • In particular, don’t fall prey to the assumption that just because they’re used a lot means that they’re well-tested!
Classifying Developers into Core and Peripheral: An Empirical Study on Count and Network Metrics 5 Topic: • Researchers have investigated different roles of open source contributors • Helpful for understanding structure of communities • And for understanding how people progress thru communities • And someday for understanding how to help people progress • Historically, researchers classified people based on their activity • Commit count, lines of code committed, number of emails sent
New approach • Authors propose classifying based on social network • Undirected graph, one node per human being • Edges connect those who communicated in same email thread • And/or edges connect those who edit functionally connected code • For each person, compute social network metrics • Degree centrality, eigenvector centrality, hierarchy, role stability, core/periphery status
Claim 1: Network-based metrics are as mutually consistent as count-based metrics Evidence: • Analyzed logs for 10 open-source projects • Computed count-based metrics for each person • Computed Cohen’s kappa (measure of agreement) • Found Cohen’s kappa generally was… • Around 0.75 for commit count vs lines of code committed (substantial agreement) • Around 0.35 for each of those metrics, vs number of emails sent (fair agreement) • Repeated with network-based metrics for each person • Found Cohen’s kappa generally was in the same ranges
Claim 2: Network-based metrics more closely match “true” role than count-based metrics Evidence: • Did a survey of project participants (166 respondents in 9 projects) • Identified the person most frequently identified as highest role • Computed Cohen’s kappa between this “ground truth” and the indications of who’s most important according to network-based and count-based metrics • Count-based metrics had an agreement of 0.355 to 0.421 with “truth” • Network-based metrics had agreement of 0.404 to 0.497 with “truth” • All of these were significant at P<0.001
Claim 3: The network-based model reveals interesting dynamics Evidence: • Focused on the QEMU project • Computed network-based metrics for each of several periods • Classified people according to network metrics • Computed transition probabilities (Markov models) between categories • Note that count-based metrics don’t reveal these particular dynamics
Ideas for software project managers • Busyness isn’t as informative as relationships are • Relationships among developers may be a truer reflection of roles than counts of contribution types • Don’t evaluate importance of team solely based on commits • Also evaluate based on how they relate to others • Make time to understand how your team is relating to one another • … and how to foster meaningful interactions that promote success