210 likes | 237 Views
Improving the OFED Development Process. Panelists. Betsy Zeller, QLogic (Moderator) Tziporet Koren, Mellanox Cristoph Raisch, IBM Dave Sommers, NetEffect John Jolly, Novell Many people, via email, and … You, out there in the audience!. Improving the OFED Development Process.
E N D
Panelists • Betsy Zeller, QLogic (Moderator) • Tziporet Koren, Mellanox • Cristoph Raisch, IBM • Dave Sommers, NetEffect • John Jolly, Novell • Many people, via email, and … • You, out there in the audience!
Improving the OFED Development Process • Focus for this session is on OFED on Linux, rather than on Unix(s) or Windows • Focus is on process, rather than ways to improve features of release • Presentation based on input from panelists, OFED phone sessions, emails to EWG list, and private emails • Goals of session: • Make sure major issues are identified • Identify solutions where possible • Plan for followup
Agenda • Original OFED goals from 2006 • How are we doing on meeting all these goals? • Process improvements since 2006 • What issues have been raised? • Includes question of kernel.org/OFED! • Handling support – bug fixes and point releases • Open the floor to issues • Smoothing the way for distro integration • Next steps
Original OFED Goals (March, 2006) • Move away from each vendor providing their own unique snapshot of software. • Enable vendors to create a common and supportable enterprise-grade distribution of OpenFabrics SW - something they can stand behind and support 24x7. Note: Expectation was that eventually distros (RH and SUSE) would ramp up to provide support for enterprise customers.
How are we doing? • 4 major OFED releases over last two years, installed on many sites. • Vendors are no longer pulling from top-of-trunk of SVN trees, delivering random disparate SW releases. • RH integrated OFED 1.3 into RHEL 5 U2 • Novell SLES 10 SP2 to ship with OFED 1.3
Process improvements since ‘06 • All developers now use git • Bi-weekly or weekly teleconferences, with detailed minutes • Feature list and schedule are discussed in advance, posted on OF website, and updated regularly • Processes are documented on website • Nightly builds allow vendors to test their changes before more public releases • Process is evolving for doing point releases • But …
Some issues have been raised • Release process is flexible, but: • Some major features have been integrated long after feature freeze, and even after the RC process has started. • There’s a delicate balance between “holding to the rules”, while still meeting vendors needs for critical features and bug fixes. • OFED release dates tend to slide a little, not unlike many SW release dates! • Should OFED patches have to go through additional review process, especially after RC?
Issues (continued) Issue: How do you get a fast bug fix turnaround (eg 2-3 days) ? Proposed Solution: • For a kernel issue, add a patch to existing release using OFED patch script • For kernel or userland issue, send email to list requesting sub-minor release (eg 1.3.1.2). For vendor specific problem, this can be turned around fast. This gets rolled into the next release cycle, where it will be tested by all vendors.
Issue – Balance with kernel.org Issue: Currently, there is kernel SW in OFED which didn’t come in through kernel.org • SDP – no plan to get it in kernel.org – is this a problem? • In other components, there are fixes/changes which “missed the kernel.org train”. Proposed Solution: Topic for discussion
Issues (continued) Issue: What’s the process if a vendor delivers new HW off-cycle, and needs SW? • Has been a major issue for other vendors when a newer kernel version was required Proposed Solution: Topic for discussion
Issues (continued) Issue: Interoperability events use GA OFED SW. If a vendor fails to pass testing, they are off the Integrators List for a minimum of six months. Proposed Solution: Run initial interoperability testing with RC candidate. Do final “real” testing with GA version, which should have no surprises. Vendors can choose to be present at one or both of these events. Issue: “Final” versions of tests/test plans are not available until the event, which makes it a bit difficult to prepare in advance. Proposed Solution: Make “final” test plans available at least two weeks before event.
Issues (continued) Issue: There’s occasionally a difference between what’s in OFED, and what’s in latest package from maintainer. (eg, changes in verbs API to support XRC didn’t go into Roland’s released verbs package). Proposed Solution: Topic for discussion
Other Issues • Are there other issues about the Linux OFED development process you’d like to raise?
Proposed Solutions • Clearly publish (and reiterate) Feature Freeze dates from early on in the release. As part of this, clearly differentiate between “feature” and “bug fix”. (No, it’s not a bug that your feature is missing!) • Review new feature proposals, to understand implications on others • Clear discussion/negotiation of implied API changes related to new features. • Showcase process documents on OpenFabrics website, so they can all be easily found and accessed • As a community, either accept that some vendors will miss the opportunity to submit a change they care about, or accept that release dates will slip. • Run initial interoperability testing with RC candidate. Do final “real” testing with GA version, which should have no surprises. Vendors can choose to be present at one or both of these events.
Novell – Build Constraints • Packages That work well with 'quilt' • open source patch management utility • http://www.suse.de/~agruen/quilt.pdf • Use OpenSUSE Build Service • Consistent backport implementation • ofa-kernel uses kernel versions • ib-bonding uses specific distributions • #include_next can break distro use • Packaging
Issues – Packaging • Original plan for OFED was that it would go away when distros were ramped up to deliver and support Open Fabrics SW. • What would have to be true before the distros can handle everything? • Is there a better solution than backport patches? • Backport patches required so users don’t have to compile kernels • SUSE and RH can’t directly use OFED backport patches because “include_next” is not transparent.
Packaging Proposal • Kernel distribution • Aggregate kernel patches/modules in one package • Userspace distribution • Tar-balls + sample RPM spec files for releases/RCs • Use git (for daily builds) + pull script • Solution needs to meet the needs of distributors, vendors, and those who want to roll their own • Comments?
Point Releases • Release frequency: Between two to three months • Can be more frequent if a critical bug is found • Change guidelines: • Use the same kernel base of the major OFED release • No API changes (both in kernel and in user libs) • Core and ULPs (including MPI): Critical and high priority bug fixes only • Low level driver changes: responsibility of the HW vendor • Add backports to support a new OS (e.g. SLES10 SP2, FC8, etc.)
Point Releases (con’t) • Release verification: • All vendors should run at least basic QA/verification cycle • Full QA by any vendor who changes their low level driver • Release process: • Release manager will publish the release target date 4 weeks prior to the release • Patches will be sent against the major release git repositories • A release will be built and tested by all companies in the usual method
Next Steps • Collect a clear statement of the issues and proposed solutions. • Send these out to the EWG mailing list, as pending decisions from Sonoma Workshop. Deal with any issues which are raised. • Summarize feedback. • Execute!