Abstract Data Type

Abstract Data Type C and Data Structures Baojian Hua bjhua@ustc.edu.cn

Data Types • A data type consists of: • A collection of data elements (a type) • A set of operations on these data elements • Data types in languages: • predefined: • any language defines a group of predefined data types • C e.g.: int, char, float, double, … • user-defined: • allow programmers to define their own (new) data types • C e.g.: struct, union, …

Data Type Examples • Predefined: • type: int • elements: …, -2, -1, 0, 1, 2, … • operations: +, -, *, /, %, … • User-defined: • type: complex • elements: 1+3i, -5+8i, … • operations: new, add, sub, distance, …

Concrete Data Types (CDT) • An concrete data type: • both concrete representations and their operations are available • Almost all C predefined types are CDT • For instance, “int” is a 32-bit double-word, and +, -, … • Knowing this can do dirty hacks • See demo…

Abstract Data Types (ADT) • An abstract data type: • separates data type declaration from representation • separates function declaration (prototypes) from implementation (definitions) • A language must some form of mechanism to support ADT • interfaces in Java • signatures in ML • (roughly) header files & typedef in C

Case Study • Suppose we’d design a new data type to represent complex number c: • a data type “complex” • elements: 3+4i, -5-8i, … • operations: • new, add, sub, distance, … • How to represent this data type in C (CDT, ADT or …)?

Complex Number // Recall the definition of a complex number c: c = x + yi, where x,y \in R, and i=sqrt(-1); // Some typical operations: complex Complex_new (double x, double y); complex Complex_add (complex c1, complex c2); complex Complex_sub (complex c1, complex c2); complex Complex_mult (complex c1, complex c2); complex Complex_divide (complex c1, complex c2); // Next, we’d discuss several variants of rep’s: // CDT, ADT.

CDT of Complex:Interface—Types // In file “complex.h”: #ifndef COMPLEX_H #define COMPLEX_H struct Complex_t { double x; double y; }; typedef struct Complex_t Complex_t; Complex_t Complex_new (double x, double y); // other function prototypes are similar … #endif

Client Code // With this interface, we can write client codes // that manipulate complex numbers. File “main.c”: #include “complex.h” int main () { Complex_t c1, c2, c3; c1 = Complex_new (3.0, 4.0); c2 = Complex_new (7.0, 6.0); c3 = Complex_add (c1, c2); Complex_output (c3); return 0; } Do we know c1, c2, c3’s concrete representation? How?

CDT Complex: Implementation // In a file “complex.c”: #include “complex.h” Complex_t Complex_new (double x, double y) { Complex_t c = {.x = x, .y = y}; return c; } // other functions are similar. See Lab1

Problem #1 int main () { Complex_t c; c = Complex_new (3.0, 4.0); // Want to do this: c = c + (5+i6); // Ooooops, this is legal: c.x += 5; c.y += 6; return 0; }

Problem #2 #ifndef COMPLEX_H #define COMPLEX_H struct Complex_t { // change to a more fancy one? Anger “main”… double a[2]; }; typedef struct Complex_t Complex_t; Complex_t Complex_new (double x, double y); // other function prototypes are similar … #endif

Problems with CDT? • Operations are transparent. • user code have no idea of the algorithm • Good! • Data representations dependence • Problem #1: Client code can access data directly • kick away the interface • safe? • Problem #2: make code rigid • easy to change or evolve?

ADT of Complex:Interface—Types // In file “complex.h”: #ifndef COMPLEX_H #define COMPLEX_H // note that “struct complexStruct” not given typedef struct Complex_t *Complex_t; Complex_t Complex_new (double x, double y); // other function prototypes are similar … #endif

Client Code // With this interface, we can write client codes // that manipulate complex numbers. File “main.c”: #include “complex.h” int main () { Complex_t c1, c2, c3; c1 = Complex_new (3.0, 4.0); c2 = Complex_new (7.0, 6.0); c3 = Complex_add (c1, c2); Complex_output (c3); return 0; } Can we still know c1, c2, c3’s concrete representation? Why?

ADT Complex: Implementation#1—Types // In a file “complex.c”: #include “complex.h” // We may choose to define complex type as: struct Complex_t { double x; double y; }; // which is hidden in implementation.

ADT Complex: Implementation Continued // In a file “complex.c”: #include “complex.h” Complex_t Complex_new (double x, double y) { Complex_t c; c = malloc (sizeof (*c)); c->x = x; c->y = y; return c; } // other functions are similar. See Lab1

ADT Summary • Yes, that’s ADT! • Algorithm is hidden • Data representation is hidden • client code can NOT access it • thus, client code independent of the impl’ • Interface and implementation • Do Lab1

Polymorphism • To explain polymorphism, we start with a new data type “tuple” • A tuple is of the form: (x, y) • xA, yB (aka: A*B) • A, B may be unknown in advance and may be different • E.g: • A=int, B=int: • (2, 3), (4, 6), (9, 7), … • A=char *, B=double: • (“Bob”, 145.8), (“Alice”, 90.5), …

Polymorphism • From the data type point of view, two types: • A, B • operations: • new (x, y); // create a new tuple with x and y • equals (t1, t2); // equality testing • first (t); // get the first element of t • second (t); // get the second element of t • … • How to represent this type in computers (using C)?

Monomorphic Version • We start by studying a monomorphic tuple type called “intTuple”: • both the first and second components are of “int” type • (2, 3), (8, 9), … • The intTuple ADT: • type: intTuple • elements: (2, 3), (8, 9), … • Operations: • tuple new (int x, int y); • int first (int t); • int second (tuple t); • int equals (tuple t1, tuple t2); • …

“IntTuple” CDT // in a file “int-tuple.h” #ifndef INT_TUPLE_H #define INT_TUPLE_H struct IntTuple_t { int x; int y; }; typedef struct IntTuple_t IntTuple_t; IntTuple_t IntTuple_new (int n1, int n2); int IntTuple_first (IntTuple_t t); … #endif

Or the “IntTuple” ADT // in a file “int-tuple.h” #ifndef INT_TUPLE_H #define INT_TUPLE_H typedef struct IntTuple_t *IntTuple_t; IntTuple_t IntTuple_new (int n1, int n2); int IntTuple_first (IntTuple_t t); int IntTuple_equals (IntTuple_t t1, IntTuple_t t2); … #endif // We only discuss “tupleEquals ()”. All others // functions left to you.

t1 t2 x x y y Equality Testing // in a file “int-tuple.c” int Tuple_equals (IntTuple_t t1, IntTuple_t t2) { return ((t1->x == t2->x) && (t1->y==t2->y)); }

Problems? • It’s ok if we only design “IntTuple” • But we if we’ll design these tuples: • (int, double), (int, char *), (double, double), … • Same code exists everywhere, no means to maintain and evolve • Nightmares for programmers • Remember: never duplicate code!

Polymorphism • Now, we consider a polymorphic tuple type called “tuple”: • “poly”: may take various forms • Every element of the type “tuple” may be of different types • (2, 3.14), (“8”, ‘a’), (‘\0’, 99), … • The “tuple” ADT: • type: tuple • elements: (2, 3.14), (“8”, ‘a’), (‘\0’, 99), …

The Tuple ADT • What about operations? • tuple new (??? x, ??? y); • ??? first (tuple t); • ??? second (tuple t); • int equals (tuple t1, tuple t2); • …

Polymorphic Type • To resove this, C dedicates a special polymorphic type “void *” • “void *” is a pointer which can point to “any” concrete types (i.e., it’s compatible with any pointer type), • very poly… • long history of practice, initially “char *” • can not be used directly, use ugly cast • similar to constructs in others language, such as “Object”

The Tuple ADT • What about operations? • tuple newTuple (void *x, void *y); • void *first (tuple t); • void *second (tuple t); • int equals (tuple t1, tuple t2); • …

“tuple” Interface // in a file “tuple.h” #ifndef TUPLE_H #define TUPLE_H typedef void *poly; typedef struct Tuple_t * Tuple_t; Tuple_t Tuple_new (poly x, poly y); poly first (Tuple_t t); poly second (Tuple_t t); int equals (Tuple_t t1, Tuple_t t2); #endif TUPLE_H

Client Code // file “main.c” #include “tuple.h” int main () { int i = 8; Tuple_t t1 = Tuple_new (&i, “hello”); return 0; }

t x y “tuple” ADT Implementation // in a file “tuple.c” #include <stdlib.h> #include “tuple.h” struct Tuple_t { poly x; poly y; }; Tuple_t Tuple_new (poly x, poly y) { tuple t = malloc (sizeof (*t)); t->x = x; t->y = y; return t; }

t x y “tuple” ADT Implementation // in a file “tuple.c” #include <stdlib.h> #include “tuple.h” struct Tuple_t { poly x; poly y; }; poly Tuple_first (Tuple_t t) { return t->x; }

Client Code #include “complex.h” // ADT version #include “tuple.h” int main () { int i = 8; Tuple_t t1 = Tuple_new (&i, “hello”); // type cast int *p = (int *)Tuple_first (t1); return 0; }

t x y Equality Testing struct Tuple_t { poly x; poly y; }; // The #1 try: int Tuple_equals (Tuple_t t1, Tuple_t t2) { return ((t1->x == t2->x) && (t1->y == t2->y)); // Wrong!! }

t x y Equality Testing struct Tuple_t { poly x; poly y; }; // The #2 try: int Tuple_equals (Tuple_t t1, Tuple_t t2) { return (*(t1->x) == *(t2->x) && *(t1->y) == *(t2->y)); // Problem? }

t x y Equality Testing struct Tuple_t { poly x; poly y; }; // The #3 try: int Tuple_equals (Tuple_t t1, Tuple_t t2) { return (equalsXXX(t1->x, t2->x) &&equalsYYY(t1->y, t2->y)); // but what are “equalsXXX” and “equalsYYY”? }

Function as Arguments // So in the body of “equals” function, instead // of guessing the types of t->x and t->y, we // require the callers of “equals” supply the // necessary equality testing functions. // The #4 try: typedef int (*tf)(poly, poly); int Tuple_equals (tuple t1, tuple t2, tf eqx, tf eqy) { return(eqx (t1->x, t2->x) &&eqy (t1->y, t2->y)); }

Change to “tuple” Interface // in file “tuple.h” #ifndef TUPLE_H #define TUPLE_H typedef void *poly; typedef int (*tf)(poly, poly); typedef struct Tuple_t *Tuple_t; Tuple_t Tuple_new (poly x, poly y); poly Tuple_first (Tuple_t t); poly Tuple_second (Tuple_t t); int Tuple_equals (Tuple_t t1, Tuple_t t2, tf eqx, tf eqy); #endif TUPLE_H

Client Code // in file “main.c” #include “tuple.h” int main () { int i=8, j=8, k=7, m=7; Tuple_t t1 = Tuple_new (&i, &k); Tuple_t t2 = Tuple_new (&j, &k); Tuple_equals (t1, t2, Int_equals, Int_equals); return 0; }

Moral • void* serves as polymorphic type in C • mask all pointer types (think Object type in Java) • Pros: • code reuse: write once, used in arbitrary context • we’d see more examples later in this course • Cons: • Polymorphism doesn’t come for free • boxed data: data heap-allocated (to cope with void *) • no static or runtime checking (at least in C) • clumsy code • extra function pointer arguments

Function-Carrying Data • Why we can NOT make use of data, such as passed as function arguments, when it’s of type “void *”? • Better idea: • Let data carry functions themselves, instead passing function pointers • such kind of data called objects

Function Pointer in Data int Tuple_equals (Tuple_t t1, Tuple_t t2) { // note that if t1->x or t1->y has carried the //equality testing functions, thenthe code // could just be written as: return (t1->x->equals (t1->x, t2->x) && t1->y->equals (t1->y, t2->y)); } equals equals_x …… t1 x equals_y y equals ……

equals n x y Function Pointer in Data // To cope with this, we should modify other // modules. For instance, the “complex” ADT: struct Complex_t { int (*equals) (poly, poly); double a[2]; }; Complex_t Complex_new (double x, double y) { Complex_t c = malloc (sizeof (*c)); c->equals = Complex_equals; …; return n; }

Function Call int Tuple_equals (Tuple_t t1, Tuple_t t2) { return(t1->x->equals (t1->x, t2->x) && t1->y->equals (t1->y,t2->y)); } equals t2 t1 x a[0] a[0] x y a[1] a[1] y

Client Code // in file “main.c” #include “complex.h” #include “tuple.h” int main () { Complex_t c1 = Complex_new (1.0, 2.0); Complex_t c2 = Complex_new (1.0, 2.0); Tuple_t t1 = Tuple_new (c1, c2); Tuple_t t2 = Tuple_new (c1, c2); Tuple_equals (t1, t2); // dirty simple! :-P return 0; }

Object • Data elements with function pointers is the simplest form of objects • object = virtual functions + private data • With such facilities, we can in principal model object oriented programming • In fact, early C++ compilers compiles to C • That’s partly why I don’t love object-oriented languages

Summary • Abstract data types enable modular programming • clear separation between interface and implementation • interface and implementation should design and evolve together • Polymorphism enables code reuse • Object = data + function pointers

Abstract Data Type