210 likes | 325 Views
Programming Languages and Systems at Berkeley and Beyond Past, Present, and Future. Kathy Yelick. The Questions. Programming Languages and Systems (PL&S): aka Languages: this is too narrow (some of us don’t do much “language” research) aka Software:
E N D
Programming Languages and Systemsat Berkeley and BeyondPast, Present, and Future Kathy Yelick
The Questions • Programming Languages and Systems (PL&S): • aka Languages: • this is too narrow (some of us don’t do much “language” research) • aka Software: • this is too broad (what doesn’t involve software?) • Who are we? • What do we do?
The Culture of PL&S • The middle management of EECS • Blamed for • slow execution time • buggy software • low programmer productivity • languages that are too big, restrictive, ugly, etc. • Need to have control over • hardware complexity • programmer quality • consumers (features over robustness)
The Big Motivators • Ease of Programming • Hardware costs -> 0 • Software costs -> infinity • Correctness • Increasing reliance on software increases cost of software errors (medical, financial, etc.) • Performance • Increasing machine complexity • New languages and applications • Enabling Java; network packet filters
History of Programming Language Research General Purpose Language Design Domain-Specific Language Design Parsing Theory Type Systems Theory Flop optimization Memory Optimizations Data and Control Analysis Type-Based Analysis Garbage Collection Threads Program Verification Program Checking Tools 70s 80s 90s 2K
Topics • Programming Language and Systems Research • Language Design • Compilers & Tools • Libraries & Runtime Systems • Software Engineering • Berkeley Projects: Current and Future • BANE • Titanium • Proof Carrying Code • Future Emphasis: Reliability
Language Design • Economics of programming languages • Programming training is the dominant cost • implies languages are rarely replaced • Languages are adopted to fill a void • not because of language quality • Is there anything left for PL designers? • Niche languages: • Everyone does language design, but doing it well is hard • Understanding languages: • E.g., Titanium’s type system is sound, Split-C’s is not • Language design at Berkeley: • Lisp (Fateman), Ada (Hilfinger), Tioga (*), Titanium (*)
Compilers and Tools • Economics of compilers • Large industrial teams built commercial compilers • How can academia compete? • Focus on new algorithms and future problems • Need software infrastructure for experiments • from others (SUIF, gcc) or our own (Titanium, BANE) • Compilers and Runtime Systems at Berkeley • Historical and continuing strength • Code gen, profiling (Graham), sw pipelining (Aiken) • Analysis and optimization of parallel code (Yelick) • Automatic (compile-time) memory management (Aiken) • Environments (Graham, Fateman)
Libraries • Open problems in complex platforms/applications • Scientific libraries (overlaps with SciComp group) • Parallel and distributed machines • Economics of Libraries • Market and competition are less intense • Can’t afford to hand-code for each machine • Berkeley strength: • Load balancing (Graham, Yelick, and many others) • Data structures (Yelick), matrices (Demmel, Kahan, Yelick), Meshes (Shewchuk) • High precision (Demmel, Fateman, Kahan, Shewchuk) • Symbolic (Fateman, Kahan) • New: tools to automate library construction
Software Engineering • Economics of Software Engineering • Robust software is expensive • Old approaches: • Formal: Verification, specification • Informal: Software process, patterns • What Berkeley is doing: • Automatic analysis of large programs (Aiken) • Software fault isolation (Graham) • Proof Carrying Code (Necula) • Model checking (Henzinger, Brayton, S-V) • Experience (lots of large software construction projects) • What’s missing? • “Core” Software Engineering
Projects:Titanium • Problem: portable scientific computing • The Approach • Domain-specific language and compiler: • Old applications: astrophysics, combustion • New applications in Bioengineering • modeling the cell to cure cancer (Arkin) • modeling bio-MEMs devices for treatment (Liepmann) • Language design • Dialect of Java with in-house compiler (to C) • Support for fast, safe multidimensional arrays • Types for distributed data, regions • Optimizations • Communication, memory, arrays, synchronization
Projects: BANE • Problem: removing bugs from large programs • The Approach • automatic analysis • discover small facts about big programs • Target: 1,000,000 line systems • Examples: • Find relay races in RLL programs • RLL used in >50% of factories, at Disneyland, etc. • Prove C programs are Y2K ready • CVS 1.10 is OK, CVS 1.9 is not • Detect buffer overruns in security-critical code
Projects: Proof Carrying Code • The Problem: • How can I trust code from another language, person, machine? • The Approach: • programs carry a proof of what they promise • Semantic analog of digital signatures • Properties often from program analysis (e.g., types) • Passed through compilation by validating translations • client’s cheap trusted verifier checks the proof • Applications • Very fast network packet filters • “Native code” in ML that is safe • Mobile code security
Reliable Computing (Future) • Problem: build more reliable systems • Approaches: • Build from reliable components • Better languages for system design (H*) • Better environments for particular domains (F,G) • Build semantic models of system behavior (A,H,N) • Build reliable systems from unreliable components by spend cheap hardware resources (H,K,P,Y) • Introspection of network, disks, processor, software • Use statistical models to determine normal/abnormal • Fault tolerant, self-scrubbing data structures • Redundant computation: catch transient errors
Summary of PL&S at Cal • Good coverage in core language and compiler work • People move with opportunities • Traditional boundaries becoming blurred • Strength in analysis • Semantics with practical applications • Strength in collaborative work • Systems: Culler, Kubiatowicz, Patterson • Scientific computing: inside and outside department • Areas that are not well represented • Core Software Engineering • Logic
Faculty • Alex Aiken • Richard Fateman • Susan Graham • Mike Harrison • Tom Henzinger • Paul Hilfinger • George Necula • Kathy Yelick
Mobile Ambients PartialEvaluation Monads Continuations Pi Calculus Regions Software Fault Isolation Type Inference Set-Based Analysis Proof Carrying Code Long Term • Language research can be loooong term • e.g., garbage collection
Executive Summary • Anything related to programming • How do we know it does what we think it does? • A mix of • theory • systems • human factors
Language Design: History • 70s & 80s: • Design better general purpose languages • pure functional, object-oriented, logic… • Lisp (Fateman), Ada (Hilfinger) • 90s & 2Ks: • Domain-specific languages • Tioga (Stonebraker, Hellerstein, Aiken) • Titanium (Graham, Yelick, Hilfinger, Aiken) • Understanding semantics: type soundness, etc. • Titanium pointers types are sound (Split-C’s are not) • Good language design is hard • Almost everyone does it
Language Technology without Languages • Increasing connections to other areas of CS • transfer of PL ideas to non-language tools • avoids language adoption problems • foundational ideas are portable • High-performance thread systems • based on CPS conversion • Low overhead virtual machines • uses software fault isolation • More to come . . .
Compilers Software Engineering Semantics Systems Programming Language Design Logic Interests and Collaborations