Managing Program Complexity Through Modularization in UNIX Systems

UNIX! Landon Cox September 3, 2012

Dealing with complexity • How do youreduce the complexity of large programs? • Break functionality into modules • Goal is to “decouple” unrelated functions • Narrow the set of interactions between modules • Hope to make whole system easier to reason about • How do we specify interactions between code modules? • Procedure calls (or objects = data + procedure calls) • intfoo(char *buf) • Procedure calls reduce complexity by • Limiting how modules can interact with one another • Hiding implementation details

Dealing with complexity intmain () { getInput (); computeResult (); printOutput (); } intmain () { cout << “input: ”; cin >> input; output = sqrt (input); output = pow (output,3); cout << output << endl; } void getInput() { cout << “input: ”; cin >> input; } void printOutput() { cout << output << endl; } void computeResult() { output = sqrt (input); output = pow (output,3); }

intP(int a){…}void C(intx){ inty=P(x);} How do C and P share information? • Via a shared, in-memory stack

intP(int a){…}void C(intx){ inty=P(x);} What info is stored on the stack? • C’s registers, call arguments, RA, • P's local vars

Review of the stack • Each stack frame contains a function’s • Local variables • Parameters • Return address • Saved values of calling function’s registers • The stack enables recursion

Code Memory Stack SP void C () { A (0); } void B () { C (); } void A (inttmp){ if (tmp) B (); } int main () { A (1); return 0; } 0xfffffff tmp=0 RA=0x8048347 A SP 0x8048347 const=0 RA=0x8048354 C SP 0x8048354 RA=0x8048361 B SP … tmp=1 RA=0x804838c A 0x8048361 SP main const1=1 const2=0 0x804838c 0x0

Code Memory Stack SP 0xfffffff bnd=0 RA=0x8048361 A SP bnd=1 RA=0x8048361 A void A (intbnd){ if (bnd) A (bnd-1); } int main () { A (3); return 0; } SP 0x8048361 bnd=2 RA=0x8048361 A SP … … A bnd=3 RA=0x804838c 0x804838c SP main const1=3 const2=0 How can recursion go wrong? Can overflow the stack … Keep adding frame after frame 0x0

Code Memory Stack void cap (char* b){ for (inti=0; b[i]!=‘\0’; i++) b[i]+=32; } intmain(char*arg) { char wrd[4]; strcpy(arg, wrd); cap (wrd); return 0; } 0xfffffff 0x8048361 SP … … SP cap b=0x00234 RA=0x804838c 0x804838c main wrd[3] wrd[2] wrd[1] wrd[0] const2=0 What can go wrong? Can overflow wrd variable … Overwrite cap’s RA 0x00234 0x0

intP(int a){…}void C(intx){ inty=P(x);} Can think of this as a contract • P agrees to return • P agrees to resume where C left off • P agrees to restore the stack pointer • P agrees to leave rest of stack alone

intP(int a){…}void C(intx){ inty=P(x);} Is the call contract enforced? • At a low level, NO! • P can violate all terms of the contract • Sources of violations: attacks + bugs

intP(int a){…}void C(intx){ inty=P(x);} Enforcing the contract is feasible • Interaction is purely mechanical • Programmers intention is clear • No semantic gap to cross

intP(int a){…}void C(intx){ inty=P(x);} How does Java enforce the call contract? • Language restricts expressiveness • Programmers can’t access the stack • Special “invoke” instruction expresses intent • JVM trusted to transfer control between C, P

intP(int a){…}void C(intx){ inty=P(x);} Awesome, so why not run only Java programs? • Lower-level languages are faster • (trusted JVM interposes on every instr) • Restricts programmer’s choice • (maybe, I hate programming in Java)

intP(int a){…}void C(intx){ inty=P(x);} Another approach to enforced modularity • Put C and P in separate processes • Code is fast when processes not interacting • Trust kernel to handle control transfers • Kernel ensures transitions are correct

intP(int a){…}void C(intx){ inty=P(x);} Key question: What should the interface be? • Put C and P in separate processes • Want a general interface for inter-process communication (IPC) • Should be simple and powerful (i.e., elegant)

UNIX philosophy • OS by programmers for programmers • Support high-level languages (C and scripting) • Make interactivity a first-order concern (via shell) • Allow rapid prototyping • How should you program for a UNIX system? • Write programs with limited features • Do one thing and do it well • Support easy composition of programs • Make data easy to understand • Store data in plaintext (not binary formats) • Communicate via text streams Thompson and Ritchie Turing Award ‘83

UNIX philosophy Kernel ProcessC ? ProcessP What is the core abstraction? • Communication via files

UNIX philosophy Kernel File ProcessC ProcessP What is the interface? • Open: get a file reference (descriptor) • Read/Write: get/put data • Close: stop communicating

UNIX philosophy Kernel File ProcessC ProcessP Why is this safer than procedure calls? • Interface is narrower • Access file in a few well-defined ways • Kernel ensures things run smoothly

UNIX philosophy Kernel File ProcessC ProcessP How do we transfer control to kernel? • Special system call instruction! • CPU pauses process, runs kernel • Kind of like Java’s invoke instruction

UNIX philosophy Kernel File ProcessC ProcessP Key insight: • Interface can be used for lots of things • Persistent storage (i.e., “real” files) • Devices, temporary channels (i.e., pipes)

UNIX philosophy Kernel File ProcessC ProcessP Two questions • How do processes start running? • How do we control access to files?

Course administration • Heap manager project • Due a week from Friday • Sorry, but I can’t help you … • Questions for Vamsi? • Piazza • Should have received account info • Email Jeff if not • Other questions?

UNIX philosophy Kernel File ProcessC ProcessP Two questions • How do processes start running?

UNIX philosophy Kernel File ProcessC ProcessP Maybe P is already running? • Could just rely on kernel to start processes

UNIX philosophy Kernel File ProcessC ProcessP What might we call such a process? • Basically what a server is • A process C wants to talk to that someone else launched

UNIX philosophy Kernel File ProcessC ProcessP All processes shouldn’t be servers • Want to launch processes on demand • C needs primitives to create P

UNIX Shell Kernel Shell Program that runs other programs • Interactive (accepts user commands) • Essentially just a line interpreter • Allows easy composition of programs

UNIX shell • How does a UNIX process interact with a user? • Via standard in (fd 0) and standard out (fd 1) • These are the default input and output for a program • Establishes well-known data entry and exit points for a program • How do UNIX processes communicate with each other? • Mostly communicate with each other via pipes • Pipes allow programs to be chained together • Shell and OS can connect one process’s stdout to another’s stdin • Why do we need pipes when we have files? • Pipes create unnamed temporary buffers between processes • Communication between programs is often ephemeral • OS knows to garbage collect resources associated with pipe on exit • Consistent with UNIX philosophy of simplifying programmers’ lives

UNIX shell • Pipes simplify naming • Program always receives input on fd 0 • Program always emits output on fd 1 • Program doesn’t care what is on the other end of fd • Shell/OS handle input/output connections • How do pipes simplify synchronization? • Pipe accessed via read system call • Read can block in kernel until data is ready • Or can poll, checking to see if read returns enough data

How kernel starts a process • Allocates process control block (bookkeeping data structure) • Reads program code from disk • Stores program code in memory (could be demand-loaded too) • Initializes machine registers for new process • Initializes translator data for new address space • E.g., page table and PTBR • Virtual addresses of code segment point to correct physical locations • Sets processor mode bit to “user” • Jumps to start of program Need hardware support

Creating processes • Through what commands does UNIX create processes? • Fork: create copy child process • Exec: initialize address space with new program • What’s the problem of creating an exact copy process? • Child needs to do something different than parent • i.e., child needs to know that it is the child • How does child know it is child? • Pass in return point • Parent returns from fork call, child jumps into other region of code • Fork works slightly differently now

Fork • Child can’t be an exact copy • Is distinguished by one variable (the return value of fork) if (fork () == 0) { /* child */ execute new program } else { /* parent */ carry on }

Creating processes • Why make a complete copy of parent? • Sometimes you want a copy of the parent • Separating fork/exec provides flexibility • Allows child to inherit some kernel state • E.g., open files, stdin, stdout • Very useful for shell • How do we efficiently copy an address space? • Use “copy on write” • Make copy of page table, set pages to read-only • Only make physical copies of pages on write fault

Copy on write Physical memory Parent memory Child memory What happens if parent writes to a page?

Copy on write Physical memory Parent memory Child memory Have to create a copy of pre-write page for the child.

Alternative approach • Windows CreateProcess • Combines the work of fork and exec • UNIX’s approach • Supports arbitrary sharing between parent and child • Window’s approach • Supports sharing of most common data via params

Shells (bash, explorer, finder) • Shells are normal programs • Though they look like part of the OS • How would you write one? while (1) { print prompt (“crocus% “) ask for input (cin) // e.g., “ls /tmp” first word of input is command // e.g., ls fork a copy of the current process (shell) if (child) { redirect output to a file if requested (or a pipe) exec new program (e.g., with argument “/tmp”) } else { wait for child to finish or can run child in background and ask for another command } }

Shell demo

Access control • Where is most trusted code located? • In the operating system kernel • What are the primary responsibilities of a UNIX kernel? • Managing the file system • Launching/scheduling processes • Managing memory • How do processes invoke the kernel? • Via system calls • Hardware shepherds transition from user process to kernel • Processor knows when it is running kernel code • Represents this through protection rings or mode bit

Access control • How does kernel know if system call is allowed? • Looks at user id (uid) of process making the call • Looks at resources accessed by call (e.g., file or pipe) • Checks access-control policy associated with resource • Decides if policy allows uid to access resources • How is a uid normally assigned to a process? • On fork, child inherits parent’s uid

MOO accounting problem • Multi-player game called Moo • Want to maintain high score in a file • Should players be able to update score? • Yes • Do we trust users to write file directly? • No, they could lie about their score Game client (uidx) “x’s score = 10” High score “y’s score = 11” Game client (uidy)

MOO accounting problem • Multi-player game called Moo • Want to maintain high score in a file • Could have a trusted process update scores • Is this good enough? Game client (uidx) “x’s score = 10” Game server High score “x:10 y:11” “y’s score = 11” Game client (uidy)

MOO accounting problem • Multi-player game called Moo • Want to maintain high score in a file • Could have a trusted process update scores • Is this good enough? • Can’t be sure that reported score is genuine • Need toensure score was computed correctly Game client (uidx) “x’s score = 100” Game server High score “x:100 y:11” “y’s score = 11” Game client (uidy)

Access control • Insight: sometimes simple inheritance of uids is insufficient • Tasks involving management of “user id” state • Logging in (login) • Changing passwords (passwd) • Why isn’t this code just inside the kernel? • This functionality doesn’t really require interaction w/ hardware • Would like to keep kernel as small as possible • How are “trusted” user-space processes identified? • Run as super user or root (uid 0) • Like a software kernel mode • If a process runs under uid 0, then it has more privileges

Access control • Why does login need to run as root? • Needs to check username/password correctness • Needs to fork/exec process under another uid • Why does passwd need to run as root? • Needs to modify password database (file) • Database is shared by all users • What makes passwd particularly tricky? • Easy to allow process to shed privileges (e.g., login) • passwd requires an escalation of privileges • How does UNIX handle this? • Executable files can have their setuid bit set • If setuid bit is set, process inherits uid of image file’s owner on exec

Managing Program Complexity Through Modularization in UNIX Systems