520 likes | 611 Views
CSE 7348 - class 12. Steven’s uses three general forms or patterns of interaction to demonstrate IPC; File server: a client-server application in which the client sends the server a pathname and the server returns the contents of that file to the client.
E N D
CSE 7348 - class 12 • Steven’s uses three general forms or patterns of interaction to demonstrate IPC; • File server: a client-server application in which the client sends the server a pathname and the server returns the contents of that file to the client. • Producer-consumer: one or more threads/processes place data into a shared buffer while one or more threads/processes operate on the data in that buffer. • Sequence number increment: the various threads/processes increment a shared sequence number (number sometimes in a shared file, sometimes in shared memory). • One of the key issues of IPC is its ‘name space’. How are the IPC objects identified by their various consumers? • Some have no names (pipes, mutexes), some have names in the file system, some have Posix IPC names, some have System V keys / identifiers).
CSE 7348 - class 12 • The three types of Posix IPC: • Message queues • Semaphores • Shared memory • These types use Posix IPC names for identification. The name used is very similar to a pathname and must conform to the rules of pathnames ( < PATH_MAX bytes, terminate with a null byte). • If the name begins with a / then different calls to these functions all reference the same queue. If not a / then the effect is implementation dependent. • If there are additional slashes then the interpretation is implementation dependent.
CSE 7348 - class 12 • Names are a bit of a problem as if you use a name beginning with a slash you must have write permission in the target directory. • The name /tmp.123 will cause a write under Digital Unix; since only root can write then a failure will occur. Using /tmp/test.123 will work on Digital Unix but fail on Solaris. • Steven’s suggests using a #define to get the name scoped as a non-pathname reference. • “a standard so general that it is no standard at all”. • “Don’t waste time with this standard and its murky implementation; skip Ch 2 and ch 3.
CSE 7348 - CLASS 12 • Message Passing: In the context of a form of IPC. • Pipes the original form of Unix IPC (1973) • Gives some insight into the fact that the Unix of Kernighan was not the Unix that we are all accustomed to. • Pipes are not named ; therefore can only be used by related process. • But aren’t all Unix processes related (to init, pid 1). • Must be used between processes which have a common ancestor. • A pipe is created by the pipe function. • A pipe provides a one way flow of data.
CSE 7348 - CLASS 12 • int pipe ( int fd[2] ); // value-result argument • Two file descriptors are returned; fd[0] available for reading, and fd[1] used for writing. • Half-duplex in the sense that only ONE end can read and write. • Typical use scenario for a pipe is NOT a single process creating a pipe, but rather a parent creating a pipe and then forking a child. • After the fork the parent closes the read end of one pipe and the child closes the write end of the same pipe. • Thus creating a one way flow of data.
CSE 7348 - CLASS 12 • When entering a shell command such as ps -aef | grep drusso • The shell creates two processes with a pipe between them. • One process runs the ps, the other runs the grep. • ps writes, grep reads. process ps process grep process stdout stdin kernel PIPE
CSE 7348 - CLASS 12 • To create a full-duplex channel, two pipes must be created. • Create pipeA, pipeB. (fdA[0], fdA[1], fdB[0], fdB[1] ). • Fork • parent closes read of A (fdA[0]), write of B (fdB[1]) • child closes write of A (fdA[1]), read of B (fdB[0]) Pipe (pipeA); Pipe (pipeB); if ( (childpid = Fork() ) = = 0) { Close(pipeA[1]); // child Close(pipeB[0]); } Close(pipeA[0]); // parent Close(pipeB[1]);
CSE 7348 - CLASS 12 • NB: each pipe, while logically connected to the process, actually runs through the kernel. • Therefore every byte xmitted crosses the kernel / process barrier twice (written TO pipe, read FROM pipe). • In the sample code (page 46 - 48), the main creates two pipes and forks a child. • The server runs in the child • The client is a function of the parent. • A pathname is sent from the client to the server (first pipe) • The server uses that pathname to send the contents of that file to the client using the second pipe.
CSE 7348 - CLASS 12 • Termination: the server (child) terminates first. Goes to zombie (parent does NOT catch SIGCHLD). • Parent’s client function returns; parent then calls waitpid to fetch the termination status of the zombie. • Interesting passage on server (child); if ( ( fd = open (buff, O_RDONLY) ) < 0) { error stuff; } else { while ( (n = Read (fd, buff, MAXLINE) ) > 0 ) Write (writefd, buff, n); Close (fd) }
CSE 7348 - CLASS 12 • Full-Duplex pipes • Can be realized via the construction of two half-duplex pipes. • Which implies FOUR pipes, two each way. • If a single pipe is used then its possible that a read would only get what had just been written (assuming single buffer on each descriptor). • Slowlaris 2.6 supports a full-duplex pipe. Digital Unix ( a bad idea ) does not.
CSE 7348 - CLASS 12 • popen and pclose functions: supported by standard I/O library. FILE *popen (const char *command, const char *type); returns a file pointer on success, NULL on error. • command is a shell command line (processed by sh so the PATH is used to locate the command). • The effect is to create a pipe between the calling process and the specified command. • The return is a standard FILE ptr used for input or output depending on the type.
CSE 7348 - CLASS 12 • Example of using popen: fp = Popen(command , “r”); while (Fgets(buff, MAXLINE, fp) != NULL) Fputs(buff, stdout); Pclose(fp); • The problem with this approach to message passing is that the error messages are not very informative (merely appended to cat). • Again, pipes don’t have names (scoping and visibility problems) and they can only be shared between related processes.
CSE 7348 - CLASS 12 • FIFO: In Unix also called a ‘named pipe’. • To create a FIFO; int mkfifo ( const char *pathname, mode_t mode ); • returns 0 if OK, -1 on error. • pathname is normal Unix pathname is the name of the FIFO (don’t forget null termination). Mode specs the file permission bits. • S_IRUSR //user read S_IWUSR // user write • S_IRGRP // group read S_IWGRP // group write • S_IROTH // other read S_IWOTH // other write
CSE 7348 - CLASS 12 • If the named FIFO already exists use open to access. • Upon the creation of a FIFO it must be opened for reading or writing ( open, or fopen). • FIFO can only be opened either read-only or write-only. • Therefore a FIFO is ? • A write to a FIFO appends data; a read returns what is at the top (?) of the FIFO. If ( ( mkfifo (FIFO1, FILE_MODE) < 0) && (errno != EEXIST) ) error stuff; /// child readfd = fopen(FIFO1, O_RDONLY, 0); // at close Unlink(FIFO1);
CSE 7348 - CLASS 12 • A FIFO’s name is removed from the system only by calling unlink. A pipe is removed on its last close. • NB: the open of a FIFO for a read blocks if no process has the FIFO open for writing. • A deadlock, not to be confused with a dreadlock, can be caused if two processes open a FIFO for reading and it hasn’t been opened for writing.
CSE 7348 - CLASS 12 • An unrelated client and server; a brief story. • Uses a .h to provide names for the two FIFO’s. All users must know this name. • Server: if ( (mkfifo(FIFO1, FILE_MODE) < 0) && (errno != EEXIST) ) error_stuff; readfd = Open(FIFO1, O_RDONLY, 0); Client: writefd = Open(FIFO1, O_WRONLY, 0);
CSE 7348 - CLASS 12 • Important note in regards FIFO’s and the kernel. • The kernel keeps a reference count in regards both pipes and FIFO’s. • Either the client or server can call unlink without difficulty. • The unlink removes the pathname from the system, but this does NOT affect open descriptors that already had opened the pathname. • Some other items in regards pipes and FIFO’s. • How to set the pipe / FIFO descriptor nonblocking: • Use the O_NONBLOCK flag when specifying an open per writefd = Open (FIFO1, O_WRONLY | O_NONBLOCK, 0); nb: understand the use of the | in the above
CSE 7348 - CLASS 12 • More ways to set a pipe / FIFO nonblocking. • If the fd is already ‘open’, fnctl can be used to enable the O_NONBLOCK flag. • This technique must be used with a pipe since a open is not called for a pipe (no way to use O_NONBLOCK in the call to pipe). • Which leads to one of my favorite subjects, the setting of flags in a control word (assembly language programmer that I am). • When using fcntl first get the current file status flags using F_GETFL command if ( ( flags = fcntl (fd , F_GETFL, 0 ) ) < 0 ) error_out;
CSE 7348 - CLASS 12 • After getting the flags with F_GETFL, we need to bitwise OR the O_NONBLOCK flag. Flags |= O_NONBLOCK; • Then store the file status flags with the F_SETFL command if (fcntl (fd, F_SETFL, flags) < 0) error_out; • Beware of code which merely sets the desired flag (amateur mistake - rookie). if (fcntl, (fd, F_SETFL, O_NONBLOCK) < 0) WHICH SHOWS TO GO THAT REAL CODERS WRITE ASSEMBLY LANGUAGE - not effete high order languages.
CSE 7348 - CLASS 12 • A few additional ‘gotcha’s’ in regard reading/writing pipes and FIFO’s. • Request more data than available and only the data ready is returned. (set up reads to catch < expected). • If the # of byte to write is =< PIPE_BUF, the write is guaranteed to be atomic (data not intermixed). • However if you exceed PIPE_BUF then no guarantee’s (race condition). • The setting of the O_NONBLOCK flag has no effect on the atomicity (?????) of the writes to a pipe or FIFO. • When a pipe or FIFO is set nonblocking all the requested number of bytes are transferred (if the buffer can accommodate). • If there is not enough room in the (pipe/FIFO) than a return (EAGAIN) (no data accepted since the write wants to be atomic).
CSE 7348 - CLASS 12 • If a write to pipe/FIFO that is not open is attempted the SIGPIPE signal is generated. • If the signal is not caught the process is terminated. • In general even if the signal is caught the write returns an error of EPIPE. • SIGPIPE is a synchronous signal; which means it can be attributed to the thread which attempted the write. • Easiest way to handle is to accept the EPIPE error. • This is generally tricky stuff and should be handled with extreme care. Not for the beginner.
CSE 7348 - CLASS 12 • The purpose of a FIFO (in regards a pipe) is its use in a long-running process (daemon) that is unrelated to a client. • The daemon creates a FIFO, opens the FIFO for reading. • Some time later the client opens the FIFO for writing. • This does not address the situation where the client wants to send something to the server. • Each client creates its own FIFO when it starts (the pathname contains its process ID). • Each client writes its request to the server’s well-known FIFO (the request contains the client process ID along with the pathname of the file that the client wants the server to open).
CSE 7348 - CLASS 12 • Figure 4.22 on page 61 of Volume 2 depicts a one server, multiple client model. • Interesting trick in regards the server’s FIFO: • Open this FIFO twice (one read only, once write only). • The readfifo fd is used to read each client request that arrives at the FIFO (dummy is never used). Dummyfd = Open(SERV_FIFO, O_WRONLY, 0); • The FIFO is opened for writing so that each time a client terminates the a read will never return a 0 (EOF) when no client exists. • Instead the server will block in the call to a read waiting for the next client request. • This effectively reduces the number of calls to open for the server’s well known FIFO.
CSE 7348 - CLASS 12 • The sample client/server on page 65 reveals why the atomicity of writes to pipes/FIFOs is important. • Assume two clients sending requests to the server. • Each client issues one write function call. • Each line is <= to PIPE_BUF (1024 to 5120). • Therefore the data in the FIFO will be the requests: • 1234 /etc/inet/ntp.conf • 9876 /etc/passwd (or the inverse). • The data in the FIFO will not appear as appended or ‘cat’ed’.
CSE 7348 - CLASS 12 • The server in the example is an iterative server. • Iterates through the client requests, completing each clients request before proceeding to the next client. • The concurrent server, typically one-child per server (fork creates a new child each time a request arrives). • Denial of service attacks: A client could tie up the server by sending a request line but never opening the FIFO for reading. • Handled by placing a timeout on certain operations.
CSE 7348 - CLASS 13 • Other forms of IPC. Process Tracing. • Unix provides a primitive form of IPC for tracing processes and debugging. • Process tracing consists of synchronization of the debugger process and the process to be traced and controlling the execution of the traced process. if ( ( pid = fork(0) == 0) { ptrace (0,0,0,0) // the child is the traced proc exec (“process_to_be_traced_name_here”); } for (;;) wait ( (int *) 0); read (input for tracing instructions); ptrace;
CSE 7348 - CLASS 13 • In the example code: • the debugger spawns a child process, which invokes the ptrace system call. • As a result the kernel sets a trace bit in the child process entry table. • The child now execs the program being traced. • Example: a user debugging the program a.out, the child would exec a.out. • The kernel executes the exec call as usual, but notes that the trace bit is set and sends the child a ‘trap’ signal. • The kernel checks for signals when returning from the exec system call. • The kernel finds the trap signal it just sent itself. • The kernel now executes code for process tracing as a special case for handling signals. • Noting that the trace bit is set in its process table entry, the child awakens the parent from its sleep in the wait, enters a special trace state similar to the sleep state and does a context switch.
CSE 7348 - CLASS 13 • In the previous example the parent (debugger) process would have entered a user-level loop ‘waiting’ to be awakened by the traced process. • When the traced process awakens the debugger, the debugger returns from wait, reads user input commands and converts them to a series of ptrace calls to control the child (traced) process. • The ptrace syntax is: ptrace (cmd, pid, addr, data); where cmd specifies reading, writing, resuming execution, etc. pid is the process id of the traced process. addr is the virtual address to be read or written in the child process data is an integer value to be written.
CSE 7348 - CLASS 13 • The kernel uses a global trace data structure to transfer data between the two processes. (this data structure is locked to prevent other tracing processes from overwriting it). • cmd, addr, & data are written to this structure. • The kernel then awakens the child process and places it in the ready-to-run state. • The kernel then sleeps until the child responds. • The child awakens in kernel mode, performs the appropriate trace command, writes its reply to the trace data structure and then awakens the debugger. • Child may then either reenter trace state, or return from handling signals and resume execution. • When the debugger resumes execution, the kernel saves the return value, unlocks the trace data struct and returns to the user.
CSE 7348 - CLASS 13 • This discussion leads to requiring information in regards the Unix memory model and how processes exist in same. • A Unix process consists of three logical components: the text, the data and the stack. Any ‘shared memory’ is considered part of the data section. • The ‘text’ section contains the machine code. • Addresses in the text section include text addresses (for branch or subroutine calls) • data addresses for access to global data variables. • Stack addresses for access to variables local to a subroutine. • These addresses are NOT physical addresses. • Instead within the Unix model the compiler will generate addresses for a virtual address space.
CSE 7348 - CLASS 13 • The Unix platform memory management unit translates the virtual addresses to physical memory addresses. • In SVR the virtual address space is divided into regions which are logical constructs. • A region is a contiguous area of virtual address space. • This facilitates sharing and protection. • Typically text, data and stack form separate regions of a process. • This is heavily exploited by reentrant code; several processes may execute the same program. Therefore they share one copy of the text region. • Similarly several processes can access a shared-memory region.
CSE 7348 - CLASS 13 • In SVR the kernel maintains a region table. • The region table contains the mapping information to determine where each process is located in physical memory. • Each process contains a private per process region table (pregion). • The pregions are part of the process table entry. Each pregion entry points to a region table entry (starting virtual address of the region). • The pregion entry for each process also contains a permission field that indicates the type of access allowed. • The region and pregion are very similar to the file table and inode structure in the file system. • Several processes can share parts of their address space via a private pregion entry.
CSE 7348 - CLASS 13 REGIONS PREREGION TABLE virtual addresses • A graphic for regions and pregions: A Text 8k Data 16k Stack 32k Process X C B Text 8k Data 16k Stack 32k Process Y C The processes share region B at virtual address 8k and 4k respectively. If process X reads memory location 8k and process Y reads memory location 4k they read the identical memory location in region B.
CSE 7348 - CLASS 13 • Regions are independent of the memory management policy of the OS. (mm policy being those practices taken to ensure the processes share memory fairly). • Regions are also independent of whether segmented or paged memory is used. • Memory models, segmented, paged, demand paged are not specific to a Unix system implementation. • The kernel, regardless of the memory model employed, does manage the assignment of memory space to running processes.
CSE 7348 - CLASS 13 • Common properties of the SVR4 IPC package. • Messages are used by processes to send formatted data streams to arbitrary processes. • Shared memory allows processes to share parts of the virtual address space. • Semaphores allow processes to synchronize execution. • Implemented as a unit these services: • contains a table whose entries describe all instances of the mechanism. • Each entry in this table contains a numeric key which is a user supplied name • Each mechanism provides a ‘get’ system call to create a new entry or retrieve an extant one. The parameters to these calls include a key and various flags.
CSE 7348 - CLASS 13 • Common properties of SVR4 IPC (continued). • For each IPC mechanism, the kernel uses the following formula to find the index into the table of data structures from the descriptor (key): index = descriptor % (number of entries in the table). • Each IPC entry has a permissions structure that includes the user ID and group ID of the process that created the entry as well as a set of rwx permissions of ugo. • Each entry contains status information such as the process ID of the last process to update the entry (send / receive message, update shared memory, et cetera). • Each service provides a control call to query the status of an entry, to set status information, or to remove the entry from the table. • The above can only be accessed if the querying process has permission.
CSE 7348 - CLASS 13 • Messages: • Four system calls for messages: • msgget returns (creates if necessary) a message descriptor that designates a message queue for use in other system calls. msgqi = msgget(key, flag); • where msgqid is the returned descriptor, and key is a user supplied integer while flag is used to set options. • When msgget is called the kernel searches to ensure that such a message queue doesn’t already exist. • msgctl provides option services to set and return paramters associated with a message descriptor (also remove descriptors). • msgsnd sends a message • msgrcv receives a message.
CSE 7348 - CLASS 13 • SVR4 messages (continued): • syntax msgsnd (msgqid, msg, count, flag); • msgqid is the message descriptor, msg is a pointer to a struct consisting of a user-chosen integer type and a character array. • count gives the size of the data array of msg. • flag specifies the kernel response if the buffer space is exceeded. • The kernel checks that the sending process has write permission for the designated message descriptor. • The kernel also checks that the message length does not exceed the system limit. • Checks to see that the message queue does not contain too many bytes and that the message type is a positive integer.
CSE 7348 - CLASS 13 • Messages (continued) • if all tests are successful, the kernel allocates space the message and copies the data from user space. • The kernel attaches a message header and places it on the end of the linked-list for the message queues. • The message header has the type, size and a pointer to the message data as well time stamps, and process ID of sender. • The kernel then awakens any processes that were asleep waiting for messages to arrive on the queue. Queue headers Message headers Data Area
CSE 7348 - CLASS 13 • Messages (continued) • receive syntax: count = msgrcv (id, msg, maxcount, type, flag); • id is the message descriptor • msg is the address of the user structure to contain the message. • type specifies the message type the user wants to read • flag specifies what to do if no messages are available. • count is the number of bytes returned to the user (value/result pair). • A message can receive a message of a certain type by setting the type parameter appropriately. • If a positive integer then the kernel returns the first such instance. • If it is a negative integer, then the kernel finds the lowest type of all messages on the queue which is <= type and returns the first message of that type. • Example if a queue contains types 5, 3, 2 and the request is for a -1 then the 2 is returned.
CSE 7348 - CLASS 13 • Messages (continued) • Messages are formatted as type-data pairs whereas file data is a byte stream. • The type prefix allows processes to select messages of a particular type. This is not a feature of a file data stream. • Processes can therefore extract messages of a certain type from the message queue in the order that they arrive (kernel will maintain the order). • Messages provide a more efficient way to transfer data between processes than a hacked together system which employs the file system. • A process can query the status of a message descriptor, set its status and remove a descriptor with; msgctl ( id, cmd, mstatbuf) // id is the message descriptor, cmd is the command and mstatbuf is the address of a user data struct containing control parameters and/or the results of the query.
CSE 7348 - CLASS 13 • SVR4 Shared Memory: • Basic concept is that processes can communicate directly by sharing their virtual address space and then reading and writing to this area. • The system calls are very similar to that of the messaging system. • shmget creates a new region of shared memory, or returns an existing one. Shmid = shmget (key, size, flag); where size is the number of bytes in the region. key is similar to message type flag indicates the response on failure.
CSE 7348 - CLASS 13 • SVR4 shared memory (continued) • If the kernel, on an shmget, finds no extant entry, and the IPC_CREAT flag is set, the kernel will allocate a region data structure. • The kernel will save the permission modes, size and a pointer to the region table entry in the shared memory table. • A flag will be set in the shared memory table to indicate that no memory is associated with this region. • The kernel allocates memory (page tables etc) only when a process attaches the region to its address space. • There is also a flag set that indicates the region should NOT be freed when the last attached process exits. • This means that data is a shared memory area remains intact even when NO process includes it as part of their virtual memory space.
CSE 7348 - CLASS 13 • SVR4 shared memory IPC (continued) • A process attaches a shared memory region to its address space with the shmat system call. • Syntax of shmat: virtaddr = shmat (id, addr, flags); • id is the descriptor returned by shmget. • addr is the virtual address where the user wants to attach the shared memory. • flags specify whether the region is read-only, and whether the kernel should round off the user-specified address. • The return value, virtaddr, is the virtual address where the kernel actually attached the region (which may be different than that requested).
CSE 7348 - CLASS 13 • SVR4 shared memory (continued). • If in a shmat a zero is specified as the addr argument then the kernel specifies a convenient area (recommended for portability). • The shared memory must NOT overlap other regions in the process virtual address space (overlays???). • Location of shared memory is very sensitive; typically stacks grow ‘upward’ therefore a good spot is immediately before the start of the stack region. • A process detaches a shared memory region from its virtual address space with the: shmdt(addr); • addr is the virtual address returned by the shmat call. • This is used so that a process can distinguish between several instances of a shared memory region that might be attached to its address space.
CSE 7348 - CLASS 13 • SVR4 shared memory. • A given shared memory region can be attached twice to the same process using different virtual addresses. • Write data to one and read from another. • Another technique is for a process to wait until the first location in shared memory contains a nonzero value and then read the shared memory. • The scmctl system call is used to query status and set parameters for the shared memory region. • Syntax: shmctl (id, cmd, shmstatbuf); • id identifies the shared memory table entry. • Cmd specifies the type of operation. • Shmstatbuf is the address of a user data struct that contains the status info of the shared memory table entry.
CSE 7348 - CLASS 13 • SVR4 Semaphores. • The purpose of the semaphore system is to allow the processes to synchronize execution by performing a series of atomic operations on a set of semaphores. • Historically, before Dykstra and the infamous ‘68 ACM conference, a process would create a lock file with the creat system call. The creat fails if the file already exists, therefore the process would assume that another process had already locked the resource. • Unfortunately the process doesn’t know when to try again; similarly lock files can get left behind when the system crashes or is rebooted.
CSE 7348 - CLASS 13 • SVR4 Semaphores (continued) • Dykstra (or Dijkstra) published the Dekker algorithm that describes an implementation of semaphores (‘68). • This paper specified two atomic operations: P and V. • The P operation decrements the value of a semaphore if its value is > 0. • The V operation increments its value. • Because the operation is atomic at most one P or V operation can succeed on a semaphore at a given time. • The semaphore calls in SVR4 are a generalization of Dijkstra’s P and V operations; • In SVR4 the increment and decrement values can be by values greater than one.
CSE 7348 - CLASS 13 • SVR4 semaphores: • The kernel does all operations atomically; no other processes adjust the semaphore values until all operations are done. • If the kernel cannot do ALL the operations it does NOT do ANY. The requesting process will simply sleep until all can be done. • In SVR4 a semaphore consists of: • The value of the semaphore • The process ID of the last process to manipulate the semaphore. • The number of processes waiting for the semaphore value to increase. • The number of processes waiting for the semaphore value to equal 0. • The semget system call is used to create and gain access to a set of semaphores. • Semctl provides various control operations. • Semop manipulates the value of the semaphres.
CSE 7348 - CLASS 13 • SVR4 semaphores (continued) • semget syntax: id = semget ( key, count, flag); • key, flag and id are similar to that used for messages and shared memory. • The kernel allocates an entry that points to an array of semaphore structures with ‘count’ elements. • Two structures are created, the semaphore table with its pointer to the semaphore array, the time of the last semop call, and the time of the last semctl invocation are within the semaphore table. Also the semaphore table. • semop syntax: oldval = semop(id, oplist, count); • Id is the descriptor returned by semget; oplist is a pointer to an array of semaphore operations, and count in the size of the array. • oldval is the value of the last semaphore operated on in the set before the operation was done.