380 likes | 598 Views
Unix Programming Environment Part 3-3 File Systems in Unix Prepared by Xu Zhenya( xzy@buaa.edu.cn ). Agenda. 1. Hierarchical Architecture & Tree-structured Name Space File, Directory, Hard Link, Mountable File System 2. Access Control ( Permission ) 3. Implementation ( layout )
E N D
Unix Programming Environment Part 3-3 File Systems in Unix Prepared by Xu Zhenya( xzy@buaa.edu.cn )
Agenda • 1. Hierarchical Architecture & Tree-structured Name Space • File, Directory, Hard Link, Mountable File System • 2. Access Control ( Permission ) • 3. Implementation ( layout ) • 4. File, Directory and inode • 5. Special Files • Symbol Link • device files, PIPE & socket, etc • 6. System Data Files and Information • 7. Commands Summary • Textbook: Chapter2, Chapter1( 1.2&1.3 )
Tree-structured Name Space • using “Slash”: • /usr/bin • Physical Disk, Logical Disk, File System
File ( 1 ) - Types • Everything visible to the user in UNIX systems( System Resource ) can be represented as a file in the file system – even including the processes and the network connections. • UNIX stores information in byte-oriented files. • You can view the different types of files with ls -l • ls -ld /home /dev/null /etc/passwd • ls -l /dev/hda1 • ls -l /etc/X11/X
Types of normal files • All files under a UNIX just contain bytes of data. • However, the magic number specifies a particular byte (or two/three) to be tested. Magic numbers, tests and file types are described in the magic file, /usr/share/magic ( Linux ) • You can determine the file type using the file command. • Example$ file Image43.gif disks.jpg todo two.html $ file /bin/ls $ file /arm/system/rootfs/gdbserver
File Attributes • Information UNIX stores about files includes / a file’s metadata • location of the data: Includes the device and also pointers to data blocks • Filename:Up to 255 characters, should avoid using * ? $ ' " \ -. Can't use / • user and group who own the file:Stored as UID and GID • who can do what with the file (permissions):Explained on the coming slides • how big the file is:Size in bytes. Linux filesystem (ext2) supports file sizes up to 4Tb (terra bytes). • when it was last modified, accessed or changed( inode ) • how many links to it: Also discussed below. • Stored on an i-node. Attributes can be viewed using: • ls -l • stat: a system call to fetch a file’s attributes • Example: $ ls -ld / /dev • What type of files are / and /dev? • How come /dev is bigger than /?
File Systems in Modern Unix Systems open read mount sync VFS dcache file inode super block XXX-FS XXX_open() XXX_read_inode() XXX_lookup() XXX_read() XXX_fsync() XXX_delete_inode() XXX_mkdir() XXX_unlink() XXX_create() XXX_write_super() BUFFER CACHE getblk() bread() wait_on_buffer()
File Systems in Modern Unix Systems filesystem data access filesystem meta-data access write read mmap super block bitmap block group directory entry (name) cache inode cache buffer cache page cache I/O I/O
File Protection • UNIX must protect files, restrict access to certain files. Examples: • /etc/passwd: written only by the root user. • high score file for a game • UNIX achieve this by • specifying three valid file operations Read, write and execute • User, user group • /etc/passwd, /etc/group, /etc/shadow • the super user: root • Process, user, & user group: login • dividing users into three groups user - person who owns the file group - group who owns the file other - everybody else • allow the owner to specify valid operations for each file
File Operations and Permission (1) Notes: (read, write and execute) operate slightly differently on files and directories
setuid/setgid and sticky bit • Looking at “/etc/passwd” or “/etc/shadow” file • -r-------- 1 root root 718 Jul 24 19:32 /etc/shadow • setuid / setgid bit • Whether a process can write to a file is based on their effective uid/gid (not the real uid/gid). • setuid and setgid bits change the effective uid and gid of a process • Used when you want to provide restricted access: => May lead to Security? • modified by any user with the setuid “passwd” command • -r-sr-xr-x 1 root bin 15613 Apr 28 1998 /usr/bin/passwd • “Sticky” Bit • A very long time ago, Unix ran on machines with 64K memory: swap area • Some UNIX systems use these bits to change the behavior of directories. • For example to set the mode of the /tmp on a system so any user can create or delete her own files, but cannot delete other's file. • drwxrwxrwt 11 root root 24576 Oct 13 17:34 /tmp • You must set the sticky bit as root. • On older systems setting the SGID bit resulted the same behavior.
Numeric and Symbolic Permissions • UNIX actually stores permissions as numbers. But humans generally don't do numbers well. The nice commands (like ls stat) change them to symbolic.
Permissions (2) • Conversion from symbolic to numeric • split symbols into three user groups user - rwx group - r-- other - r-x • replace symbols with numeric equivalent and add user - rwx = 4 + 2 + 1 = 7 group - r-- = 4 other - r-x = 4 + 1 = 5 • bring them together to form the numeric permissions rwxr--r-x = 745
Permissions (3) • Changing file permissions • chmod: Change the file permissions for a file. Only the owner of a file and the root can use it. • chgrp: Change the group owner of a file. You can only change it to a group you belong to. • chown: Change the user owner of a file. Only root can use this.
chmod command • chmod [-Rfv] operation files • -R : Optional, tells the command to recursively descend directories and change permissions as it goes. • Operation : Specifies how to change the permissions of the file. Can use symbolic or numeric permissions • Files : The list of files to change file permissions of. • Types of operation can be • Numeric: Permissions specifed using numeric form • chmod 770 my.file • Symbolic: chmod [-Rfv] [augo] [+-=] [rwxst] file [file ...] • Who : Specify which category of user to change • u - user, g - group, o - others, a - all • op : How to change permissions • +: add, -: subtract, = : set • Permission : Symbolic permissions. • Examples • chmod u+rwx temp.dat • chmod go-rwx temp.dat • chmod -R a-rwx /etc • chmod 770 temp.dat
umask command • umask works doing the bitwise AND with the bitwise complement of the umask. Bits that are set in the umask correspond to permissions that are not assigned to newly created files. • The explanation of various values: • 0 All permissions granted • 1 no execute • 2 no write • 3 no write or execute • 4 no read • 5 no read or execute • 6 no read or write • 7 no permissions at all • The most common umask values are: • 002 others: no write • 022 group + others: no write • 027 group: no write, others: none • 077 group + others: none • 0666 & ~022 = 0644 = rw-r--r--: open( 0666 ) and umask( 022 ), the default setting • extern int open (__const char *__file, int __oflag, ...); • extern int creat (__const char *__file, __mode_t __mode);
Hard Link • UNIX allows you to provide one file with more than one name. Hard links are a way to create a new filename which points to an existing i-node. • A directory is simple a file which contains a list of <filename, inode> pairs. • Create using the “ln” command • Notes: • Hard links only work : • when the files are on the same device/partition • on a file system which supports links • Any operation performed on the data in link is performed on the original file ( Data Block ). • Don’t make a loop on the file system. => Difficult to fix this question.
Soft/Symbolic Links • Differences from hard links include: • created using the -s option for the ln command • $ ln -s the_file symbolic_link • can cross partitions and devices • doesn't link by inode • can't create a symbolic link to a file that doesn't exist? • file permissions are not used • Notes: • chmod operations performed on a hard link are reflected on both the hard link file and the file it is linked to. • chmod operations on soft links are reflected on the original file but not on the soft link - the soft link will always have full file permissions (lrwxrwxrwx).
find command (1) • Format: • find [ path-list ] [ expression ] • path-list :Optional. Is a list of files/directories in which find should search. find is recursive. • Expression : Optional. Specifies what is being searched for and what to do with it. • A find expression can contain the following components • Options : Modify find's operation • Tests : Specify what we are looking for • -name “pattern”, -user “name”, -size “1234”, etc • Actions : Specify what we're going to do with what we find • Operators : Used to combine expressions
find command (3) • Examples: • $ find . -user upe • $ find / -name \*.htmlthe * must be quoted so that the shell doesn't interpret it. • $ find /home -size +2500k -mtime -7files are greater than 2500 kilobytes in size and their data have been modified in the last seven days. • $ find . -exec grep hello \{\} \; Search all the files under the local directory for the word hello. • $ find / -name \*.bak -ok rm \{\} \; ask the user if it is ok to remove them • Convert all .c & .h from the DOS format into the UNIX : 012, 015 $ find . -name "*.[ch]" -print -exec sh -c "tr -d '\015' < {} > {}.$$; mv -f {}.$$ {}" \; $ dos2unix
find command (4) • Notes: • [for Win32 Cygwin] The syntax of the directories a little different, because Windows has drive letters and Unix does not. In Cygwin Bash the drive letter is expressed as /cygdrive/c. To find all "tmp" files from directory C:/temp down: $ mount $ find /cygdrive/c/temp/ -name "*tmp*" -print
Device Files (1) • Device - generic name of system component that OS has to "talk" to. • Physical devices - eg. hard disks, serial devices, CDROMs, sound card. • Logical devices - eg. virtual terminals, memory, kernel, network ports, loopback disk devices(use a regular file as a block device ). • Device files - allow programs to interact with devices via OS kernel. - /dev: the location of device files - not real files - do not contain data - entry point into kernel or device drivers. • Device drivers - are coded routines used for interacting with devices - "go between" for low level hardware and kernel/user interface. - compiled into kernel or dynamically loaded in memory.
Device Files (2) • Major and Minor device number • used by kernel to communicate with devices • kernel maintains list of its available device drivers: device switch table • major number tells kernel which device driver • minor device number determine which physical device • $ ls -al /dev/hda /dev/hdbbrw-rw---- 1 root disk 3, 0 Apr 28 1995 /dev/hdabrw-rw---- 1 root disk 3, 64 Apr 28 1995 /dev/hdb • major number 3 controls both hda & hdb • hda - minor device number of 0 • hdb - minor device number of 64 • Creating device files: • mknod /dev/ttyS9 c 4 240 • See how to implement a simple device driver.
Device Files (3) • Demo using “tty” devices on Linux / Solaris • # ls –l `tty` • # chmod a+w `tty` • $ echo “hi, root!” > `tty` • Notes: • Pseudo Terminals - Non-Physical terminals • master pseudo-terminals & slave pseudo-terminals
File Systems Operations • Disks, Partitions and file systems fdisk /dev/sda cat /proc/filesystem # on Linux • mkfs - make file system creates I-Nodes and data blocks • mount - attach a file system to part of the directory hierarchy • $ mount –t dos /dev/hda1 /mnt/dos • $ umount /dev/fd0 • * /etc/fstab, mount • fsck – check the integrity of a file system • Booting time • Manual checking • Misc: • Mtools: utilities provide a convenient way of accessing DOS-formatted floppies without having to mount and unmount filesystems.
Example ( 1 ) # create a file whose size is 8Mb dd if=/dev/zero of=/tmp/file bs=1M count=8 # dump a harddisk’s MBR: dd if=/dev/hda … losetup /dev/loop0 /tmp/file # losetup -e des /dev/loop0 /file mkfs -t ext2 /dev/loop0 mount -t ext2 /dev/loop0 /mnt # a quick method: mount –o loop /tmp/file /mnt umount /dev/loop0 losetup -d /dev/loop0
Example ( 2 ) • See how to implement a simple file system.
Important Files and Directory • /etc/passwd, /etc/shadow, /etc/group • /var/log/* • /etc/rc[1-6].*: /etc/inittab
Summary • Files under UNIX • are just a sequence of bytes • there are number of types of files • magic numbers can be used to determine the type of a normal file • UNIX stores information about files including • filenames • file size (up to 4Tb) • file ownership etc. • Files are protected by a combination of • users and groups: “su” • file operations • file permissions • Links (both hard and soft) allow a file to have more than one name • find command is used to • search for files • which match a certain condition • and perform operations on them
Commands & Utils • File commands • cat print content of file(s) • cd change directory • chmod Change modes (files and directories) • cp copy files • diff Report difference between files. patch • ls List files ( -r & -R ) • mv move files (you can think also as rename) • od Dump the content of file ("od -h -c" for hex dump) • pwd print working firectory • rm Remove files • tee Genrate output to files inside pipeline • touch Modify File's time stamp • tar Group files into TAR packet. Usually with gzip or bzip2 • bzip2 Compress single file with bzip2 • gzip Compress single file with gzip • compress Old Unix .Z compression • Disk commands • du Disk usage • df Disk free, sometimes as bdf(1) • Reference Material: <External Filters, Programms and Commands in Unix >, Mendel Cooper
Appendix • 1. How does the kernel implement file systems? • Architecture, directory cache( namei ), vnode, inode, • I/O device management: /dev, major & minor • Layout of the file system: boot block, super block, inode, dir, data block • Hard link, soft link • Link: reference count • The link of one dir >= 2. • 2. modeling concepts in file systems using UML • User( name, uid, … )(e-uid, r-uid ), group( supplementary group), file(regular files, device files, pipe, socket, etc), vnode/inode, directory, file system, access control: relationship & pattern • Mapping the model into the implementation including the layout of the file systems • 3. Other issues: • Tree-structured name space management: • Hierarchical: domain name system, the name service in CORBA, RMI in Java, Directory Service( LDAP, … ), etc • Objects’ name & their reference: the name registry in many tech. • File: polymorphical & interface => Abstraction • Link/reference count: inode, smart pointers in C++ • Layout of the file system: the object model of Java( GC=>reference count )