Other filesystem system calls

Other filesystem system calls • pipe • dup • mount • umount • link • unlink • system • popen Operating System Design

Write pointer Read pointer 0 1 2 3 4 5 6 7 8 9 Unnamed and named pipes • Pipes and FIFOs (also known as named pipes) provide a unidirectional interprocess communication channel. • The difference between them is the manner in which they are created and opened. • I/O on pipes and FIFOs has exactly the same well known bounded buffer producer-consumer semantics. Operating System Design

Cannot share pipe process A calls pipe() process B process C process D process E Share pipe Unnamed pipe Operating System Design

Unnamed pipe pipe (fdptr); • fdptr is a pointer to an int arraythat will be filled with two file descriptors that can be used to reade write the unnamed pipe. • An inode is assigned to a new pipe • Two entries in the user file descriptor table and in the file table, are allocated • The inode reference count indicates how many times the pipe has been opened (both for reading and writing) (2). • The kernel stores in each entry of the file table the read and write reference counts • The inode also includes the offset values for the next read and write (cannot be modified by means of lseek) • Storing the offsets in the inode rather than in the file table allows more than one process to share the pipe both in reading and writing (each process modifies the same offset) Operating System Design

read and write char string[]="hello"; main(){ char buf[1024]; char *cp1, *cp2; int fds[2]; cp1=string; cp2=buf; while (*cp1) *cp2++=*cp1++; pipe(fds); for(;;) { write(fds[1], buf, 6); read (fds[0], buf, 6); } } Operating System Design

Pipe open • A process opening a pipe for reading will be suspended until another process open the pipe for writing (and viceversa) • It is possible to open a FIFO using the flags • O-NONBLOCK or O-NDELAY • O_ASYNC • Setting the O_ASYNC flag for the read end of a pipe causes a signal (SIGIO by default) to be generated when new input becomes available on the pipe • Non-blocking I/O is also possible by using the fcntlF_SETFL operation to enable the O_NONBLOCK open file status flag. Operating System Design

Named pipe Example of open that blocks the issuing process until the other process open the other end of a named pipe npipe-r.c npipe-w.c Operating System Design

Pipe write • Writes of less than PIPE_BUF bytes (4KB on Linux) are atomic n <= PIPE_BUF • O_NONBLOCK disabled • All n bytes are written atomically; • write may block if there is not room for n bytes to be written immediately • O_NONBLOCK enabled • If there is room to write n bytes to the pipe, then write succeeds immediately, writing all n bytes; • otherwise it fails, with errno set to EAGAIN. Operating System Design

Pipe write • Writes of more than PIPE_BUF bytes may be non-atomic n > PIPE_BUF • O_NONBLOCK disabled • the write is non-atomic: the data given to write may be interleaved with writes by other process; • the write blocks until n bytes have been written. • O_NONBLOCK enabled • If the pipe is full, then write fails, with errno set to EAGAIN. • Otherwise, a "partial write" of up to n bytes may occur, and these bytes may be interleaved with writes by other processes. Operating System Design

Pipe close • If all file descriptors referring to the write end of a pipe have been closed, then an attempt to read from the pipe return 0. • If all file descriptors referring to the read end of a pipe have been closed, then a write will cause a SIGPIPE signal to be generated for the calling process. • If the calling process is ignoring this signal, then write fails with the error EPIPE. • An application that uses pipe and fork should close unnecessary duplicate file descriptors to ensures that end-of-file and SIGPIPE/EPIPE are delivered when appropriate. Operating System Design

Other examples of pipe use • Client-server with named pipe • pipe1.c • client_fifo.c • server-fifo.c • Use of pipe, dup, exec, getenv • pipe2.c Operating System Design

Other examples of pipe use Client- server using an unnamed pipe (pipe1.c server1.c client1.c) PARENT 1 1 PIPE1 PIPE2 0 0 CHILD Operating System Design

dup – dup2 newfd = dup (fd); • Duplicates the fd pointer in the first free entry of the user file description table newfd = dup2 (fd1, fd2); • Duplicates the fd1 pointer in the fd2 entry of the user file description table Operating System Design

File table Inode table 0 1 C=2 C=3 (/etc/passwd) C=1 C=1 C=1 (local) Comparison between open and dup User file descriptor table 2 3 4 5 6 Operating System Design

Other examples of pipe use • Use of pipe, dup, exec, getenv • pipe2.c Operating System Design

dup example #include <fcntl.h> main () { int i,j; char buf1[512], buf2[512]; i = open("/etc/passwd", O_RDONLY); j = dup(i); read(i,buf1, sizeof(buf1)); read(j,buf2, sizeof(buf2)); close(i); read(j, buf2, sizeof(buf2)); } Operating System Design

Output redirection fd = open(”file_output”, O_CREAT|O_WRONLY); close(1); dup(fd); close(fd); write(1,buf, sizeof(buf)); Operating System Design

mount mount (pathname, dir pathname, options); • dev pathname is • the name of the device special file corresponding to the disk partition formatted with a file system • a directory name • dir pathname is the directory (mount point), in the current directory tree, where the filesystem will be mounted. • options indicates the mode of mounting (ex. Read-Only) Operating System Design

/ file system root bin etc usr file system / /dev/dsk1 cc date sh getty passwd bin include src stdio.h awk banner yacc uts mount Operating System Design

Mount table Inode Table Mount Table Buffer Operating System Design

mount procedura mount input: nome file di un file speciale a blocchi nome directory punto di mount opzioni (sola lettura) output: nessuno { if (non superuser) return(errore); prende inode file speciale a blocchi (namei); effettua controlli legalità; prende inode per nome directory "mounted on" (namei); if (non directory o contatore di riferimenti > 1){ rilascia inode (procedura iput); return(errore); } Operating System Design

mount find a free entry in mount table; open the block device; getblk; read superblock; initialize superblock; iget root inode of the new filesystem store it in mount table; mark the inode of the directory as a mount point; relese the inode of the special file (iput); free the inode in memory of the mount point; } Operating System Design

umount umount ( special file name); • Before unmounting a filesystem, the kernel controls that no file is still in use (open) searching in the inode table the files having a device field equal to the device of the filesystem we try to umount. Operating System Design

Virtual File System Operating System Design

/ usr src include uts sys realfile.h sys inode.h testfile.h link (source name, target name); link("/usr/src/uts/sys", "/usr/include/sys"); link("/usr/include/realfile.h", /usr/src/uts/sys/testfile.h"); Operating System Design

unlink (pathname) • Deletes a name from the filesystem. • If that name was the last link to a file and no processes have the file open the file is deleted (reference cont and link count = 0) • If the name was the last link to a file, but a process still has the file open (reference count > 0) the file will remain in existence until the last file descriptor referring to it is closed. • If the name referred to a symbolic link the link is removed. • If the name referred to a socket, FIFO or device the name for it is • removed but processes which have the object open may continue to use it. Operating System Design

unlink unlink (pathname); • The kernel releases in this order the file blocks: • Direct blocks • Direct blocks pointed by indirect blocks • Indirect blocks • Set to 0 the entries in the inode • Set to 0 the file size • Update the disk copy of the inode Operating System Design

unlink iget inode of the file that must be removed; update the parent directory; set to 0 the status field of the inode of the erased file; release the inode of the parent directory (iput); decrements the file link count; release the file inode (iput); // iput tests link count if it is zero free and ifree Operating System Design

unlink - close • A process can perform unlink of a file while itself or another process still has the file open • Any process will be able to access the file, but since open increments the file’s inode reference count, the kernel will not remove the data blocks and the inode, it just decrements the link count. • when a system call close is executed, the reference count becomes 0 and close call free and ifree. Operating System Design

Example with unlink – stat – fstat #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> main (int argc, char **argv) { int fd; char buf[1024]; struct stat statbuf; if (argc != 2)exit (-1); if ((fd = open(argv[1], O_RDONLY)== -1) exit (-1); unlink(argv[1]) /*unlink of the file just opened*/ if (stat(argv[1], &statbuf) == -1) /*stat name */ printf("stat %s fails \n",argv[1]); else printf("stat %s succeed !!!\n",argv[1]); if (fstat(fd, &statbuf) == -1) /* stat through fd */ printf ("fstat %s fails \n",argv[1]); else printf("fstat %s succseed \n",argv[1]); while (read(fd,buf,sizeof(buf))>0) printf ("%1024s",buf); /*print 1KB */ } Operating System Design

Advisory / Mandatory locking • Advisorylocking (flockBDS ) (fcntl POSIX) • read and write are protected by an access protocol • Mandatorylocking (non POSIX) • kernel managed locking • BehavioursimilartotheReaders & Writers problem • shared or read lock • excludes other writes • exclusive or write lock • excludes other reads and writes Operating System Design

File locking • A set of processes open a file that stores a sequence number • The processes • read the sequence number • print their process identifier followed by the sequence number • Increment and store the sequence number Operating System Design

File locking example /* Concurrent processes updating the same file */ #include <stdio.h> #include <sys/file.h> #define SEQFILE “seqno” #define MAXBUFF 100 main() { int fd, i, n, pid, seqno; char buff[MAXBUFF+1]; pid =getpid(); if ((fd = open(SEQFILE, O_RDWR))<0) err_sys(“can’t open %s”, SEQFILE); Operating System Design

File locking example for (i=0; i<10; i++) { my_lock(fd); lseek(fd, 0L, 0); if ((n = read(fd, buff, MAXBUFF))<=0)err_sys(“read error”); buff[n]=‘\0’; if ((n = sscanf(buff, “%d\n”, &seqno)) !=1) err_sys(“sscanf error”); printf(“pid = %d, seq# = %d\n”, pid, seqno); seqno++; sprintf(buff, “%03d\n”, seqno); n = strlen(buff); lseek(fd, 0L, 0); if (write(fd, buff, n) !=n) err_sys(“write error”); my_unlock(fd); } } Operating System Design

No locking errors my_lock(fd) int fd; { return; } my_unlock(fd) int fd; { return; } pid = 692, seq# = 0 pid = 692, seq# = 1 pid = 693, seq# = 0 pid = 692, seq# = 2 pid = 692, seq# = 3 pid = 693, seq# = 1 pid = 692, seq# = 4 pid = 692, seq# = 5 pid = 693, seq# = 2 pid = 692, seq# = 6 pid = 692, seq# = 7 pid = 693, seq# = 2 pid = 692, seq# = 6 pid = 692, seq# = 7 pid = 693, seq# = 3 pid = 693, seq# = 4 pid = 693, seq# = 5 pid = 693, seq# = 6 pid = 693, seq# = 7 pid = 693, seq# = 8 Operating System Design

BSD filelocking operations LOCK_SH read LOCK_EX write LOCK_UN unlock LOCK_NB no_blocking Operating System Design

BSD 4.3 solution (flock) pid = 1165, seq# = 0 pid = 1165, seq# = 1 pid = 1165, seq# = 2 pid = 1165, seq# = 3 pid = 1165, seq# = 4 pid = 1164, seq# = 5 pid = 1164, seq# = 6 pid = 1165, seq# = 7 pid = 1164, seq# = 8 pid = 1165, seq# = 9 pid = 1164, seq# = 10 pid = 1165, seq# = 11 pid = 1164, seq# = 12 pid = 1165, seq# = 13 pid = 1164, seq# = 14 pid = 1165, seq# = 15 pid = 1164, seq# = 16 pid = 1164, seq# = 17 pid = 1164, seq# = 18 pid = 1164, seq# = 19 /* BDS 4.3 */ #include <sys/file.h> my_lock(fd) int fd; { lseek(fd, 0L, 0); if (flock(fd, LOCK_EX) == -1) err_sys(“can’t LOCK_EX”); } my_unlock(fd) int fd; { if (flock(fd, LOCK_UN, 0L) == -1) err_sys(“can’t LOCK_UN”); } Operating System Design

Advisory locking int fcntl(int fd, int cmd, struct flock *lock) • F_GETLK, F_SETLK and F_SETLKW are used to acquire, release, and test for the existence of record locks • struct flock { short l_type; //Type of lock: F_RDLCK,F_WRLCK, F_UNLCK short l_whence; // SEEK_SET, SEEK_CUR, SEEK_END off_t l_start; // Starting offset for lock off_t l_len; // Number of bytes to lock pid_t l_pid; // PID of process blocking the lock (F_GETLK only) }; • Bytes past the end of the file may be locked, but not bytes before the start of the file. • Specifying 0 for l_len has the special meaning: lock all bytes starting at the location specified by l_whence and l_start through to the end of file, no matter how large the file grows. Operating System Design

Record locking: F_SETLK, F_SETLKW • F_SETLK, F_SETLKW • Acquire a lock (when l_type is F_RDLCK or F_WRLCK) or release a lock (when l_type is F_UNLCK) . • If a conflicting lock is held by another process, this call returns -1 and sets errno to EACCES or EAGAIN • If a conflicting lock is held by another process, waits Operating System Design

Record locking: F_GETLK • F_GETLK • On input to this call, lock describes a lock we would like to place on the file. • If the lock could be placed, fcntl does not actually place it, but returns F_UNLCK in the l_type field of lock and leaves the other fields of the structure unchanged. • If one or more incompatible locks would prevent this lock being placed, then fcntl returns details about one of these locks in the l_type, l_whence, l_start, and l_len fields of lock and sets l_pid to be the PID of the process holding that lock. Operating System Design

Record locking • Record locks are automatically released when the process terminates or if it closes any file descriptor referring to a file on which locks are held. • A process can lose the locks when some other process decides to open, read and close it. • Record locks are not inherited by a child created via fork, but are preserved across an execve. • Because of the buffering performed by the stdio library, avoid the use of record locking with stdio functions ; use read and write. Operating System Design

POSIX file & recordlocking #include <sys/file.h> int lockf(int fd, int cmd, off_t len) • lockf in Linux is just an interface to fcntl Operating System Design

Mandatory locking (Non-POSIX) • Mandatory locks are enforced for all processes. • If a process tries to perform an incompatible access on a file region that has an incompatible mandatory lock, then the result depends upon whether the O_NONBLOCK flag is enabled for its open file description. • If the O_NONBLOCK flag is not enabled, then system call is blocked until the lock is removed or converted to a mode that is compatible with the access. • If the O_NONBLOCK flag is enabled, then the system call fails with the error EAGAIN or EWOULDBLOCK. • To make use of mandatory locks, mandatory locking must be enabled both on the file system that contains the file to be locked, and on the file itself. • Mandatory locking is enabled on a file system using the "-omand" option to mount, or the MS_MANDLOCK flag for system call mount. • Mandatory locking is enabled on a file by disabling group execute permission on the file and enabling the set-group-ID permission bit (octal 02000) Operating System Design

Recordlocking examples flock[-h] [-s start] [-l len] [-w|-r] filename -h print this help -s start region starting byte -l len region length (0 means all file) -w write lock -r read lock -b block when locking impossible -f enable BSD semantic Operating System Design

Recordlocking examples • flock -r flock.c • flock -r flock.c • flock -w flock.c • flock -w flock.c • flock -w -s0 -l10 flock.c • flock -r -s0 -l10 flock.c • flock -w -s5 -l15 flock.c • flock -w -s11 -l15 flock.c • flock -r -s10 -l20 flock.c Operating System Design

Blocking recordlocking • flock -r -b -s0 -l10 flock.c • flock -w -s0 -l10 flock.c • Warning!! BDS and POSIX file locking structures are independent • flock -r -b -s0 -l10 flock.c • flock -f -w flock.c (BDS) Operating System Design

Use of linkfor locking #define LOCKFILE “seqno.lock” #include <sys/errno.h> extern int errno; my_lock(int fd) { int tempfd; char tempfile[30]; sprintf(tempfile, “LCK%d”, getpid()); if ((tempfd = creat(tempfile, 0444))<0) err_sys(“can’t creat temp file”); close(tempfd); while (link(tempfile, LOCKFILE)<0){ if (errno != EEXIST) err_sys(“Link error”); sleep(1); } if (unlink(tempfile)<0)err_sys(“Unlink error for temp file”); } my_unlock( int fd) { if (unlink(LOCKFILE)<0) err_sys(“Unlink error for LOCKFILE”); } Operating System Design

tmpfile and mktemp FILE *tmpfile(void); • Opens a unique temporary file in binary read/write (w+b) mode. • The file will be automatically deleted when it is closed or the program terminates. #include <stdlib.h> int mkstemp(char *template); • Generates a unique temporary filename from template. The last six characters of template must be XXXXXX and these are replaced with a string that makes the filename unique. • The file is then created with mode read/write and permissions 0600. • Template must be declared as a character array. • The file is opened with the open O_EXCL flag, this guarantees that the process is the only user Operating System Design

mktemp #include <stdlib.h> char template[] = "/tmp/fileXXXXXX"; int fd; fd = mkstemp(template); Operating System Design

system /* creates a directory */ #include <stdio.h> #define MAXLINE 1024 main () { char line[MAXLINE], command[MAXLINE+10]; int n; FILE *fp; if (fgets(line, MAXLINE, stdin) == NULL) err_sys(“filename read error”); sprintf(command, “mkdir %s”, line); if (system(command) != 0) err_sys(“system error”); exit(0); } Operating System Design

Other filesystem system calls