CSEP505: Programming Languages Lecture 10: OOP; Memory Mgmt; Wrap-Up

CSEP505: Programming LanguagesLecture 10: OOP; Memory Mgmt; Wrap-Up Dan Grossman Winter 2009

Last time • Key novelty / semantic difference of OOP is dynamic dispatch • Defined by self mapping to “whole current object” • The method’s “receiver” • Investigating the “extensibility problem” with canonical example: • Abstract class Exp with subclasses IntExp, AddExp, … • Exp has methods for interp, typecheck, toInt, … CSE P505 Winter 2009 Dan Grossman

The Grid 1 new class 1 new function CSE P505 Winter 2009 Dan Grossman

Back to MultExp • Even in OOP, MultExp is easy to add, but you’ll copy the typecheck method of AddExp • Or maybe AddExp extends MultExp, but it’s a kludge • Or maybe refactor into BinaryExp with subclasses AddExp and MultExp • So much for not changing existing code • Fairly heavyweight approach to a helper function CSE P505 Winter 2009 Dan Grossman

Remaining OO plan • Meaning of type-safety for OO • Why are subtyping and subclassing separate concepts worth keeping separate? • Multiple inheritance; multiple interfaces • Static overloading • Multimethods • Revenge of bounded polymorphism CSE P505 Winter 2009 Dan Grossman

Typechecking We were sloppy: talked about types without “what are we preventing” • In pure OO, stuck if we need to interpret v.m(v1,…,vn) and v has no m method (taking n args) • “No such method” error • Also if ambiguous: multiple methods with same name and there is no “best choice” • “No best match” error • Will arise with static overloading and multimethods CSE P505 Winter 2009 Dan Grossman

Subtyping vs. subclassing • Often convenient confusion: C a subtype of D if and only if C a subclass of D • But self is covariant; the key type system difference • But more subtypes are sound • If A has every field and method that B has (at appropriate types), then subsume B to A • Interfaces help, but require explicit annotation • And fewer subtypes could allow more code reuse… CSE P505 Winter 2009 Dan Grossman

Non-subtyping example Pt2 ≤ Pt1 is unsound here: classPt1 extends Object { int x; int get_x() { x } bool compare(Pt1 p){ p.get_x() == self.get_x() } } classPt2 extends Pt1 { int y; int get_y() { y } bool compare(Pt2 p) { // override p.get_x() == self.get_x() && p.get_y() == self.get_y() } } CSE P505 Winter 2009 Dan Grossman

What happened • Could inherit code without being a subtype • Cannot always do this • what if get_x called self.compare with a Pt1 Possible solutions: • Re-typecheck get_x in subclass • Use a really fancy type system • Don’t override compare • Moral: Not suggesting “subclassing not subtyping” is useful, but the concepts of inheritance and subtyping are orthogonal CSE P505 Winter 2009 Dan Grossman

Multiple inheritance Why not allow C extends C1,…,Cn {…} • and C≤C1, …, C≤Cn What everyone agrees on: C++ has it, Java doesn’t We’ll just consider some problems it introduces and how (multiple) interfaces avoids some of them Problem sources: • Class hierarchy is a dag, not a tree • Type hierarchy is a dag, not a tree CSE P505 Winter 2009 Dan Grossman

Diamonds • If C extends C1, C2 and C1, C2 have a common (transitive) superclass D, we have a diamond • Always have one with multiple inheritance and a topmost class (Object) • If D has a field f, does C have one field or two? • C++ answer: yes  • If D has a method m, C1 and C2 will have a clash • Also possible without a diamond • If subsumption is coercive (changing method-lookup), then how we subsume from C to D affects run-time behavior (incoherent) Diamonds are common, largely due to types like Object with methods like equals CSE P505 Winter 2009 Dan Grossman

Method-name clash What if C extends C1, C2 which both define m? Possibilities: • Reject declaration of C • Too restrictive with diamonds • Require C overrides m • Possibly with directed resends • “Left-side” (C1) wins • Question: does cast to C2 change what m means? • C gets both methods (implies incoherent subtyping) • Other? CSE P505 Winter 2009 Dan Grossman

Implementation issues • Multiple-inheritance semantics often muddied by wanting “efficient member lookup” • If “efficient” is compile-time offset from self pointer, then multiple inheritance means subsumption must “bump the pointer” • Roughly why C++ has different sorts of casts • Preaching: Better to think • semantically first: how should subsumption affect the behavior of method-lookup • implementationally second: what can I optimize based on the class/type hierarchy CSE P505 Winter 2009 Dan Grossman

Digression: casts A “cast” can mean too many different things (cf. C++): Language-level: • Upcast: no run-time effect • Downcast: failure or no run-time effect • Conversion: key question is round-tripping • “Reinterpret bits”: not well-defined Implementation level • Upcast: usually no run-time effect, but see multiple inheritance • Downcast: check the tag, maybe fail, but see multiple inheritance • Conversion: same as at language level • “Reinterpret bits”: no effect (by definition) CSE P505 Winter 2009 Dan Grossman

Least supertypes [Related to homework 4 challenge problem] For e1? e2 : e3 • e2 and e3 need the same type • But that just means a common supertype • But which one? (The least one) • But multiple inheritance means may not exist! Common solution: • Reject without explicit cast on e2 and/or e3 CSE P505 Winter 2009 Dan Grossman

Multiple inheritance summary • Diamond issues (coherence issues, shared (?) fields) • Method clashes (what does inheriting m mean) • Implementation issues (slower method lookup) • Least supertypes (may not exist) Multiple interfaces have issues (3) and (4) • Again, an interface is just a named type • Provides no implementation (method or field definition) CSE P505 Winter 2009 Dan Grossman

Static overloading • So far: Assume every method name unique • Same name in subclass meant override • Many OO languages allow same name, different argument types: A f(B b) {…} C f(D d, E e) {…} F f(G g, H h) {…} • Changes method-lookup definition for e.m(e1,…en) • Old: method-lookup a (meta)function of the class of the object e evaluates to (at run-time) • New: method-lookup a (meta)function of the class of the object e evaluates to (at run-time) and the types of e1,…,en (at compile-time) CSE P505 Winter 2009 Dan Grossman

Ambiguity Because of subtyping, multiple methods can match! “Best match” rules are complicated. One rough idea: • Fewer subsumptions is better match • If tied, subsume to immediate supertypes & recur Ambiguities remain (no best match) • A f(B) or C f(B) (usually disallowed) • A f(B) or A f(C) and f(e) where e has a subtype of B and C but B and C are incomparable • A f(B,C) or A f(C,B) and f(e1,e2) where e1 and e2 have type B and B≤C CSE P505 Winter 2009 Dan Grossman

Multimethods Static overloading mostly saves keystrokes • Shorter method names • Name-mangling on par with syntactic sugar • But sometimes can comment out a method and program still type-checks with different run-time behavior due to different compile-time method resolution Multiple (dynamic) dispatch (a.k.a. multimethods) much more interesting: Method lookup for e.m(e1,…,en)a (meta)function of the classes of the objects e and e1,…,en evaluate to (at run-time) A natural generalization: “receiver” no longer special So may as well write m(e1,…,en) instead of e1.m(e2,…,en) CSE P505 Winter 2009 Dan Grossman

Multimethods example classA { int f;} classB extends A { int g;} bool compare(A x, A y) { x.f==y.f } bool compare(B x, B y) { x.f==y.f && x.g==y.g } bool f(A x, A y, A z) { compare(x,y) && compare(y,z) } • compare(x,y) calls first version unless both arguments are Bs • Could add “one of each” methods if you want different behavior • f has fairly surprising behavior • But still more useful than with static overloading? CSE P505 Winter 2009 Dan Grossman

Pragmatics; UW Not clear where multimethods should be defined • No longer “belong to a class” because receiver not special Multimethods are “more OO” because dynamic-dispatch is the essence of OO Multimethods are “less OO” because without distinguished receiver the “analogy to physical objects” is reduced A couple papers: • Millstein got a UW PhD around multimethods for Java • UW a long-time multimethods leader • Nice summary and “where really used” Noble OOPSLA08 CSE P505 Winter 2009 Dan Grossman

Revenge of ambiguity • Like static overloading, multimethods have “no best match” problems • Unlike static overloading, the problem does not arise until run-time! Possible solutions: • Run-time exception • Always define a best-match (e.g., Dylan) • A conservative type system CSE P505 Winter 2009 Dan Grossman

Still want generics OO subtyping no replacement for parametric polymorphism So have both Example: /* 3 type constructors (e.g., Int Set a type) */ interface’a Comparable { Int f(’a,’a);} interface’a Predicate { Bool f(’a);} class’aSet { … constructor(’a Comparable x){…} unit add (’a x) {…} ’a Set functional_add(’a x) {…} ’a find (’a Predicate x) {…} } CSE P505 Winter 2009 Dan Grossman

Worse ambiguity “Interesting” interaction with overloading or multimethods classB { Int f(Int C x){1} Int f(String C x){2} Int g(’a x) { self.f(x) } } Whether match is found depends on instantiation of ’a Cannot resolve static overloading at compile-time without code duplication At run-time, need run-time type information • Including instantiation of type constructors • Or restrict overloading enough to avoid it CSE P505 Winter 2009 Dan Grossman

Wanting bounds As expected, with subtyping and generics, want bounded polymorphism Example: interfacePrintable { unit print();} class (’a ≤ Printable) Logger { ’a item; ’a get() { item.print(); item } } w/o polymorphism, get would return an Printable (not useful) w/o the bound, get could not send print to item CSE P505 Winter 2009 Dan Grossman

Fancy example With forethought, can use bounds to avoid some subtyping limitations (Example lifted from Abadi/Cardelli text; I would have never thought of this) /* Herbivore1 ≤ Omnivore1 unsound */ interfaceOmnivore1 { unit eat(Food);} interfaceHerbivore1 { unit eat(Veg);} /* T Herbivore2 ≤ T Omnivore2 sound for any T */ interface (’a≤Food) Omnivore2 { unit eat(’a);} interface (’a≤Veg) Herbivore2 { unit eat(’a);} /* subtyping lets us pass herbivores to feed but only if food is a Veg */ unit feed(’a food, ’a Omnivore animal) { animal.eat(food); } CSE P505 Winter 2009 Dan Grossman

You have grading to do… I am going to distribute course evaluation forms so you may rate the quality of this course. Your participation is voluntary, and you may omit specific items if you wish. To ensure confidentiality, do not write your name on the forms. There is a possibility your handwriting on the yellow written comment sheet will be recognizable; however, I will not see the results of this evaluation until after the quarter is over and you have received your grades. Please be sure to use a No. 2 PENCIL ONLY on the scannable form. I have chosen _______ to distribute and collect the forms. When you are finished, he/she will collect the forms, put them into an envelope and mail them to the Office of Educational Assessment. If there are no questions, I will leave the room and not return until all the questionnaires have been finished and collected. Thank you for your participation. CSE P505 Winter 2009 Dan Grossman

From the beginning Problem: • Why do we need memory management? • Same reason for any finite reusable resource • What does safety mean? (What is guaranteed?) • What is drag? Solutions: • How does tracing garbage collection (GC) work? • What other ways for safe memory management? • Unique pointers • (Automatic) reference-counting • Regions CSE P505 Winter 2009 Dan Grossman

Why reuse? • Values/objects/code take up space • Using too much space slows down programs • Eventually they stop (memory exhaustion) • Optimal space: reclaim immediately after last use • Earlier is incorrect (dangling-pointer dereference) • Drag is time between last use and reclamation • But: • Last-use undecidable • Batched reclamation can gain time for space CSE P505 Winter 2009 Dan Grossman

The view from C/C++ • Stack objects reclaimed at end of block/function • Heap objects reclaimed with call to free/delete • Drag can still exist • Dangling-pointers fine; dereferencing them unsafe • “Double-free” also unsafe • Unreclaimed objects that become unreachable will: • Never be used • Never be reclaimed • So drag until termination (“space leak”) CSE P505 Winter 2009 Dan Grossman

Reachability Reachability soundly approximates “may be used again” Inductive definition (transitive “points to”): • Global variables reachable • Unreclaimed stack objects reachable • Liveness analysis can do a bit better • Objects pointed to by reachable objects are reachable C: Avoid leaks by freeing before unreachable Garbage-collected language: Make things unreachable Reachability is an approximation that works well in practice CSE P505 Winter 2009 Dan Grossman

Reachability and leaks • GC’d languages reclaim unreachable objects • So by some definitions “leaks are impossible” • Like by some definitions deadlock with atomic is impossible • But “infinite drag times” are possible • Example: large unused data structure in a global • Programming for space in GC’d languages • Usually ignore the issue • Set pointers to null when done with them • Error-prone! • Use weak pointers where appropriate • Provided as a language feature, dereference can fail • Compiler-writer should also consider if optimizations are “safe for space” CSE P505 Winter 2009 Dan Grossman

Where are we Problem: • Why do we need memory management? • Same reason for any finite reusable resource • What does safety mean? • What is drag? Solutions: • How does garbage collection (GC) work? • What other ways for safe memory management? • Unique pointers • (Automatic) reference-counting • Regions CSE P505 Winter 2009 Dan Grossman

Reachability, cont’d Algorithm sketch to find all reachable objects: • Start at roots (globals and stack objects) • Follow all pointers, but do not go around cycles Problems: • Find all pointers in pointed-to object • How big is the object? • What fields are integers? • Avoid cycles (solution depends on GC technique) CSE P505 Winter 2009 Dan Grossman

Finding sizes Garbage collector must know an object’s size • free/delete need to know too! Solutions: • A header word (e.g., before object) with the size • Class pointer can “serve double-duty” • Size segregation and a global table of “page to size” Bottom line: • Allocator and/or compiler must collaborate with GC CSE P505 Winter 2009 Dan Grossman

Finding pointers Does the GC know which fields/roots are pointers? • Yes: accurate GC • No: conservative GC Theory: With conservative GC, “one unlucky int” could keep huge amount of data Practice: Conservative GC tends to work Accurate GC techniques: • Class-pointer can “serve triple-duty” • Low-order bit tricks (e.g., Caml ints are 31-bits) CSE P505 Winter 2009 Dan Grossman

Conservative GC for C Yes, you can (conservatively) GC a C program • The Boehm-Demers-Weiser conservative collector 2 of many interesting details: • Use collector’s malloc (so GC knows the size) • Possible b/c C bans code most people think is legal: void f() { int * p = malloc(100*sizeof(int)); int * q = p + 1000; // not allowed q[-950] = 17; int * r = p + 100; /* allowed */ r[-50] = 17; } Compile-time flag to “add a byte or keep 2 objects” CSE P505 Winter 2009 Dan Grossman

Semispace copying collection • Divide memory into 2 equal-size contiguous pieces • Allocate objects into one-space until full • Easy and fast: “bump an allocation-pointer” • Now have a full from-space & an empty to-space • Copy reachable objects into end of to-space • Set allocation-pointer just past them in to-space • Restart the program (semispaces reversed roles) CSE P505 Winter 2009 Dan Grossman

Wait a minute Skimmed over key details • We moved objects; must update all pointers to them • Must avoid cycles • The GC can run without much extra space (good) How: • “Cheney queue” just two pointers in to-space • Objects to scan (update pointers and maybe add pointed-to objects to queue) • Cycle avoidance: forwarding-pointers in from-space • Easy to tell what space is pointed-to CSE P505 Winter 2009 Dan Grossman

Mark-sweep collection • Allocate objects until you have almost no room left • Mark all reachable objects (bit in header word) • Avoid cycle by checking bit • Sweep through memory • If object unmarked, reclaim it • If object marked, unmark it No 2x space and no moving objects, but… CSE P505 Winter 2009 Dan Grossman

Wait another minute • In practice, if more than 2/3 of objects or so are reachable, you spend lots of time in GC • Allocation is complicated • Must find enough space for the new object • Fragmentation can hurt performance • Or exhaust memory before copying GC does • No “Cheney” queue, so GC needs an explicit stack or low-level cleverness to run in little space CSE P505 Winter 2009 Dan Grossman

Generational Copying and mark-sweep from about 1960 Generational GC a key mid-80s optimization because • Most objects die young • Most old objects never get mutated to point to young How: • Allocate in a nursery • Empty nursery has no pointers into it! • Fill nursery like in copying collection • Also track mutations to record pointers into nursery • Yet another reason to avoid mutation (slower) • To collect nursery, ignore rest of heap except recorded pointers CSE P505 Winter 2009 Dan Grossman

Some more terms Just sketched the basics of copying and mark-sweep And the orthogonal issue of generations Some other terms worth knowing: • Incremental GC: do a little bit on each allocation • Avoid large pause times • Concurrent GC (collector thread in parallel with the program) • Parallel GC (multiple collector threads) • Lots of other important tricks: • lazy-sweeping, large-object spaces, … CSE P505 Winter 2009 Dan Grossman

GC Summary Great survey paper: Paul R. Wilson. Uniprocessor Garbage Collection Techniques. International Workshop on Memory Management 1992 • Programmer must know about reachability, that objects may move, that mutation may cost, etc. • GC implementor must try to do well without knowing the application’s memory behavior • But done by memory-system experts! • One-size-fits-most CSE P505 Winter 2009 Dan Grossman

Where are we Problem: • Why do we need memory management? • Same reason for any finite reusable resource • What does safety mean? • What is drag? Solutions: • How does garbage collection (GC) work? • What other ways for safe memory management? • Unique pointers • (Automatic) reference-counting • Regions CSE P505 Winter 2009 Dan Grossman

Now forget GC Idioms that avoid dangling-pointer dereferences • And languages and/or types to enforce them! • A language can have more than one • More work than GC, but safer than unchecked malloc/free Worth knowing just for the idioms CSE P505 Winter 2009 Dan Grossman

Unique pointers • If p is the only pointer to o, then free(p) can’t lead to dangling-pointer dereferences provided *p is not used afterwards • Unique-pointers allow only trees (no dags or cycles) • Maintaining uniqueness invariant • Dynamic: destructive-reads • p=q and free(q) set q to null • Static: linear type systems and/or flow analysis CSE P505 Winter 2009 Dan Grossman

CSEP505: Programming Languages Lecture 10: OOP; Memory Mgmt; Wrap-Up

CSEP505: Programming Languages Lecture 10: OOP; Memory Mgmt; Wrap-Up

Presentation Transcript

C SC 520: Principles of Programming Languages

Beginning C Programming for Engineers CSCI-1190

Lecture 2 Introduction to C Programming

CSEP505: Programming Languages Lecture 7: Coercions, Type Variables, Type Inference

CSEP505: Programming Languages Lecture 6: Types, Types, and Subtypes

CSE 341 : Programming Languages Lecture 1 Hello World! Welcome to ML

F28PL1 Programming Languages

CSE 341 : Programming Languages Lecture 8 First Class Functions

Lecture on Programming Languages

Types and Programming Languages

Samuel Labi and Fred Moavenzadeh Massachusetts Institute of Technology

Review of the Previous Lecture

CSEP505: Programming Languages Lecture 5: Continuations, Types …

Parallel Processing (CS 667) Lecture 4: Shared Memory Programming with Pthreads *

Lecture 2 Introduction to C Programming

Software II: Principles of Programming Languages

F28PL1 Programming Languages

CSE 341 Lecture 29 b

Programming Languages and Design Lecture 5 Object-Oriented Programming

CSEP505: Programming Languages Lecture 5: continuations, types

CSE-321 Programming Languages Overview

Languages