960 likes | 1.12k Views
Advanced Char Driver Operations. Sarah Diesburg CIS 4930. Resources. LDD Chapter 3 Red font in slides where up-to-date code diverges from book LDD module source code for 3.2.x http://ww2.cs.fsu.edu/~diesburg/courses/dd/code.html. Resources. LXR – Cross-referenced Linux
E N D
Advanced Char Driver Operations Sarah Diesburg CIS 4930
Resources • LDD Chapter 3 • Red font in slides where up-to-date code diverges from book • LDD module source code for 3.2.x • http://ww2.cs.fsu.edu/~diesburg/courses/dd/code.html
Resources • LXR – Cross-referenced Linux • Go to http://lxr.linux.no/ • Click on Linux 2.6.11 and later • Select your kernel version from drop-down menu
Topics • Managing ioctl command numbers • Block/unblocking a process • Seeking on a device • Access control
ioctl • For operations beyond simple data transfers • Eject the media • Report error information • Change hardware settings • Self destruct • Alternatives • Embedded commands in the data stream • Driver-specific file systems
ioctl • User-level interface int ioctl(int fd, int request, ...); • ... • Variable number of arguments • Problematic for the system call interface • In this context, it is meant to pass a single optional argument • Traditionally a char *argp • Just a way to bypass the type checking • For more information, look at man page
ioctl • Driver-level interface int (*unlocked_ioctl) (struct file *filp, unsigned int cmd, unsigned long arg); • cmd is passed from the user unchanged • arg can be an integer or a pointer • Compiler does not type check • Ioctl has changed from the LDD3 era • Modified to remove the big kernel lock (BKL) • http://lwn.net/Articles/119652/
Choosing the ioctl Commands • Need a numbering scheme to avoid mistakes • E.g., issuing a command to the wrong device (changing the baud rate of an audio device) • Check include/linux/ioctl.h and directory Documentation/ioctl/
Choosing the ioctl Commands • A command number uses four bitfields • Defined in <linux/ioctl.h> • < direction, type, number, size> • direction: direction of data transfer • _IOC_NONE • _IOC_READ • _IOC_WRITE • _IOC_READ | WRITE
Choosing the ioctl Commands • type (ioctl device type) • 8-bit (_IOC_TYPEBITS) magic number • Associated with the device • number • 8-bit (_IOC_NRBITS) sequential number • Unique within device • size: size of user data involved • The width is either 13 or 14 bits (_IOC_SIZEBITS)
Choosing the ioctl Commands • Useful macros to create ioctl command numbers • _IO(type, nr) • _IOR(type, nr, datatype) • _IOW(type, nr, datatype) • _IOWR(type, nr, datatype) size = sizeof(datatype)
Choosing the ioctl Commands • Useful macros to decode ioctl command numbers • _IOC_DIR(nr) • _IOC_TYPE(nr) • _IOC_NR(nr) • _IOC_SIZE(nr)
Choosing the ioctl Commands • The scull example /* Use 'k' as magic number */ #define SCULL_IOC_MAGIC 'k‘ /* Please use a different 8-bit number in your code */ #define SCULL_IOCRESET _IO(SCULL_IOC_MAGIC, 0)
Choosing the ioctl Commands • The scull example /* * S means "Set" through a ptr, * T means "Tell" directly with the argument value * G means "Get": reply by setting through a pointer * Q means "Query": response is on the return value * X means "eXchange": switch G and S atomically * H means "sHift": switch T and Q atomically */ #define SCULL_IOCSQUANTUM _IOW(SCULL_IOC_MAGIC, 1, int) #define SCULL_IOCSQSET _IOW(SCULL_IOC_MAGIC, 2, int) #define SCULL_IOCTQUANTUM _IO(SCULL_IOC_MAGIC, 3) #define SCULL_IOCTQSET _IO(SCULL_IOC_MAGIC, 4) #define SCULL_IOCGQUANTUM _IOR(SCULL_IOC_MAGIC, 5, int) Set new value and return the old value
Choosing the ioctl Commands • The scull example #define SCULL_IOCGQSET _IOR(SCULL_IOC_MAGIC, 6, int) #define SCULL_IOCQQUANTUM _IO(SCULL_IOC_MAGIC, 7) #define SCULL_IOCQQSET _IO(SCULL_IOC_MAGIC, 8) #define SCULL_IOCXQUANTUM _IOWR(SCULL_IOC_MAGIC, 9, int) #define SCULL_IOCXQSET _IOWR(SCULL_IOC_MAGIC,10, int) #define SCULL_IOCHQUANTUM _IO(SCULL_IOC_MAGIC, 11) #define SCULL_IOCHQSET _IO(SCULL_IOC_MAGIC, 12) #define SCULL_IOC_MAXNR 14
The Return Value • When the command number is not supported • Return –EINVAL • Or –ENOTTY (according to the POSIX standard)
The Predefined Commands • Handled by the kernel first • Will not be passed down to device drivers • Three groups • For any file (regular, device, FIFO, socket) • Magic number: “T.” • For regular files only • Specific to the file system type
Using the ioctl Argument • If it is an integer, just use it directly • If it is a pointer • Need to check for valid user address int access_ok(int type, const void *addr, unsigned long size); • type: either VERIFY_READ or VERIFY_WRITE • Returns 1 for success, 0 for failure • Driver then results –EFAULT to the caller • Defined in <linux/uaccess.h> • Mostly called by memory-access routines
Using the ioctl Argument • The scull example int scull_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) { int err = 0, tmp; int retval = 0; /* check the magic number and whether the command is defined */ if (_IOC_TYPE(cmd) != SCULL_IOC_MAGIC) { return -ENOTTY; } if (_IOC_NR(cmd) > SCULL_IOC_MAXNR) { return -ENOTTY; } …
Using the ioctl Argument • The scull example … /* the concept of "read" and "write" is reversed here */ if (_IOC_DIR(cmd) & _IOC_READ) { err = !access_ok(VERIFY_WRITE, (void __user *) arg, _IOC_SIZE(cmd)); } else if (_IOC_DIR(cmd) & _IOC_WRITE) { err = !access_ok(VERIFY_READ, (void __user *) arg, _IOC_SIZE(cmd)); } if (err) return -EFAULT; …
Using the ioctl Argument • Data transfer functions optimized for most used data sizes (1, 2, 4, and 8 bytes) • If the size mismatches • Cryptic compiler error message: • Conversion to non-scalar type requested • Use copy_to_user and copy_from_user • #include <linux/uaccess.h> • put_user(datum, ptr) • Writes to a user-space address • Calls access_ok() • Returns 0 on success, -EFAULT on error
Using the ioctl Argument • __put_user(datum, ptr) • Does not check access_ok() • Can still fail if the user-space memory is not writable • get_user(local, ptr) • Reads from a user-space address • Calls access_ok() • Stores the retrieved value in local • Returns 0 on success, -EFAULT on error • __get_user(local, ptr) • Does not check access_ok() • Can still fail if the user-space memory is not readable
Capabilities and Restricted Operations • Limit certain ioctl operations to privileged users • See <linux/capability.h> for the full set of capabilities • To check a certain capability call int capable(int capability); • In the scull example if (!capable(CAP_SYS_ADMIN)) { return –EPERM; } A catch-all capability for many system administration operations
The Implementation of the ioctl Commands • A giant switch statement … switch(cmd) { case SCULL_IOCRESET: scull_quantum = SCULL_QUANTUM; scull_qset = SCULL_QSET; break; case SCULL_IOCSQUANTUM: /* Set: arg points to the value */ if (!capable(CAP_SYS_ADMIN)) { return -EPERM; } retval = __get_user(scull_quantum, (int __user *)arg); break; …
The Implementation of the ioctl Commands … case SCULL_IOCTQUANTUM: /* Tell: arg is the value */ if (!capable(CAP_SYS_ADMIN)) { return -EPERM; } scull_quantum = arg; break; case SCULL_IOCGQUANTUM: /* Get: arg is pointer to result */ retval = __put_user(scull_quantum, (int __user *) arg); break; case SCULL_IOCQQUANTUM: /* Query: return it (> 0) */ return scull_quantum; …
The Implementation of the ioctl Commands … case SCULL_IOCXQUANTUM: /* eXchange: use arg as pointer */ if (!capable(CAP_SYS_ADMIN)) { return -EPERM; } tmp = scull_quantum; retval = __get_user(scull_quantum, (int __user *) arg); if (retval == 0) { retval = __put_user(tmp, (int __user *) arg); } break; …
The Implementation of the ioctl Commands … case SCULL_IOCHQUANTUM: /* sHift: like Tell + Query */ if (!capable(CAP_SYS_ADMIN)) { return -EPERM; } tmp = scull_quantum; scull_quantum = arg; return tmp; default: /* redundant, as cmd was checked against MAXNR */ return -ENOTTY; } /* switch */ return retval; } /* scull_ioctl */
The Implementation of the ioctl Commands • Six ways to pass and receive arguments from the user space • Need to know command number int quantum; ioctl(fd,SCULL_IOCSQUANTUM, &quantum); /* Set by pointer */ ioctl(fd,SCULL_IOCTQUANTUM, quantum); /* Set by value */ ioctl(fd,SCULL_IOCGQUANTUM, &quantum); /* Get by pointer */ quantum = ioctl(fd,SCULL_IOCQQUANTUM); /* Get by return value */ ioctl(fd,SCULL_IOCXQUANTUM, &quantum); /* Exchange by pointer */ /* Exchange by value */ quantum = ioctl(fd,SCULL_IOCHQUANTUM, quantum);
Device Control Without ioctl • Writing control sequences into the data stream itself • Example: console escape sequences • Advantages: • No need to implement ioctl methods • Disadvantages: • Need to make sure that escape sequences do not appear in the normal data stream (e.g., cat a binary file) • Need to parse the data stream
Blocking I/O • Needed when no data is available for reads • When the device is not ready to accept data • Output buffer is full
Introduction to Sleeping • A process is removed from the scheduler’s run queue • Certain rules • Never sleep when running in an atomic context • Multiple steps must be performed without concurrent accesses • Not while holding a spinlock, seqlock, or RCU lock • Not while disabling interrupts
Introduction to Sleeping • Okay to sleep while holding a semaphore • Other threads waiting for the semaphore will also sleep • Need to keep it short • Make sure that it is not blocking the process that will wake it up • After waking up • Make no assumptions about the state of the system • The resource one is waiting for might be gone again • Must check the wait condition again
Introduction to Sleeping • Wait queue: contains a list of processes waiting for a specific event • #include <linux/wait.h> • To initialize statically, call DECLARE_WAIT_QUEUE_HEAD(my_queue); • To initialize dynamically, call wait_queue_head_t my_queue; init_waitqueue_head(&my_queue);
Simple Sleeping • Call variants of wait_event macros • wait_event(queue, condition) • queue = wait queue head • Passed by value • Waits until the boolean condition becomes true • Puts into an uninterruptible sleep • Usually is not what you want • wait_event_interruptible(queue, condition) • Can be interrupted by signals • Returns nonzero if sleep was interrupted • Your driver should return -ERESTARTSYS
Simple Sleeping • wait_event_timeout(queue, condition, timeout) • Wait for a limited time (in jiffies) • Returns 0 regardless of condition evaluations • wait_event_interruptible_timeout(queue, condition, timeout)
Simple Sleeping • To wake up, call variants of wake_up functions void wake_up(wait_queue_head_t *queue); • Wakes up all processes waiting on the queue void wake_up_interruptible(wait_queue_head_t *queue); • Wakes up processes that perform an interruptible sleep
Simple Sleeping • Example module: sleepy static DECLARE_WAIT_QUEUE_HEAD(wq); static int flag = 0; ssize_t sleepy_read(struct file *filp, char __user *buf, size_t count, loff_t *pos) { printk(KERN_DEBUG "process %i (%s) going to sleep\n", current->pid, current->comm); wait_event_interruptible(wq, flag != 0); flag = 0; printk(KERN_DEBUG "awoken %i (%s)\n", current->pid, current->comm); return 0; /* EOF */ } Multiple threads can wake up at this point
Simple Sleeping • Example module: sleepy ssize_t sleepy_write(struct file *filp, const char __user *buf, size_t count, loff_t *pos) { printk(KERN_DEBUG "process %i (%s) awakening the readers...\n", current->pid, current->comm); flag = 1; wake_up_interruptible(&wq); return count; /* succeed, to avoid retrial */ }
Blocking and Nonblocking Operations • By default, operations block • If no data is available for reads • If no space is available for writes • Non-blocking I/O is indicated by the O_NONBLOCK flag in filp->f_flags • Defined in <linux/fcntl.h> • Only open, read, and write calls are affected • Returns –EAGAIN immediately instead of block • Applications need to distinguish non-blocking returns vs. EOFs
A Blocking I/O Example • scullpipe • A read process • Blocks when no data is available • Wakes a blocking write when buffer space becomes available • A write process • Blocks when no buffer space is available • Wakes a blocking read process when data arrives
A Blocking I/O Example • scullpipe data structure struct scull_pipe { wait_queue_head_t inq, outq; /* read and write queues */ char *buffer, *end; /* begin of buf, end of buf */ int buffersize; /* used in pointer arithmetic */ char *rp, *wp; /* where to read, where to write */ int nreaders, nwriters; /* number of openings for r/w */ struct fasync_struct *async_queue; /* asynchronous readers */ struct semaphore sem; /* mutual exclusion semaphore */ struct cdev cdev; /* Char device structure */ };
A Blocking I/O Example static ssize_t scull_p_read(struct file *filp, char __user *buf, size_t count, loff_t *f_pos) { struct scull_pipe *dev = filp->private_data; if (down_interruptible(&dev->sem)) return -ERESTARTSYS; while (dev->rp == dev->wp) { /* nothing to read */ up(&dev->sem); /* release the lock */ if (filp->f_flags & O_NONBLOCK) return -EAGAIN; if (wait_event_interruptible(dev->inq, (dev->rp != dev->wp))) return -ERESTARTSYS; if (down_interruptible(&dev->sem)) return -ERESTARTSYS; }
A Blocking I/O Example if (dev->wp > dev->rp) count = min(count, (size_t)(dev->wp - dev->rp)); else /* the write pointer has wrapped */ count = min(count, (size_t)(dev->end - dev->rp)); if (copy_to_user(buf, dev->rp, count)) { up (&dev->sem); return -EFAULT; } dev->rp += count; if (dev->rp == dev->end) dev->rp = dev->buffer; /* wrapped */ up (&dev->sem); /* finally, awake any writers and return */ wake_up_interruptible(&dev->outq); return count; }
Advanced Sleeping • Uses low-level functions to affect a sleep • How a process sleeps 1. Allocate and initialize a wait_queue_t structure DEFINE_WAIT(my_wait); • Or wait_queue_t my_wait; init_wait(&my_wait); Queue element
Advanced Sleeping 2. Add to the proper wait queue and mark a process as being asleep • TASK_RUNNINGTASK_INTERRUPTIBLE or TASK_UNINTERRUPTIBLE • Call void prepare_to_wait(wait_queue_head_t *queue, wait_queue_t *wait, int state);
Advanced Sleeping 3. Give up the processor • Double check the sleeping condition before going to sleep • The wakeup thread might have changed the condition between steps 1 and 2 if (/* sleeping condition */) { schedule(); /* yield the CPU */ }
Advanced Sleeping 4. Return from sleep Remove the process from the wait queue if schedule() was not called void finish_wait(wait_queue_head_t *queue, wait_queue_t *wait);
Advanced Sleeping • scullpipewrite method /* How much space is free? */ static int spacefree(struct scull_pipe *dev) { if (dev->rp == dev->wp) return dev->buffersize - 1; return ((dev->rp + dev->buffersize - dev->wp) % dev->buffersize) - 1; }