450 likes | 663 Views
Advanced Char Driver Operations. Linux Kernel Programming CIS 4930/COP 5641. Topics. Managing ioctl command numbers Putting a thread to sleep Seeking on a device Access control. ioctl. input/output control system call For operations beyond simple data transfers Eject the media
E N D
Advanced Char Driver Operations Linux Kernel Programming CIS 4930/COP 5641
Topics • Managing ioctl command numbers • Putting a thread to sleep • Seeking on a device • Access control
ioctl • input/output control • system call • For operations beyond simple data transfers • Eject the media • Report error information • Change hardware settings • Self destruct • Alternatives • Embedded commands in the data stream • Driver-specific file systems
ioctl • User-level interface (application view) int ioctl(int fd, int request, ...); • ... • Does not indicate variable number of arguments • Would be problematic for the system call interface • In this context, it is meant to pass a single optional argument • Traditionally a char *argp • Just a way to bypass the type checking • For more information, look at man page
ioctl • Driver-level interface int (*unlocked_ioctl) (struct file *filp, unsigned intcmd, unsigned long arg); • cmdis passed from the user unchanged • argcan be an integer or a pointer • Compiler does not type check • ioctl() has changed from the LDD3 era • Modified to remove the big kernel lock (BKL) • http://lwn.net/Articles/119652/
Choosing the ioctl Commands • Desire a numbering scheme to avoid mistakes • E.g., issuing a command to the wrong device (changing the baud rate of an audio device) • Unique ioctl command numbers across system • Check ioctl.h files in the source and directory Documentation/ioctl/
Choosing the ioctl Commands • A command number uses four bitfields • Defined in include/uapi/asm-generic/ioctl.h (for most architectures) • < direction, type, number, size> • direction: direction of data transfer • _IOC_NONE • _IOC_READ • _IOC_WRITE • _IOC_READ | WRITE
Choosing the ioctl Commands • < direction, type, number, size> • type (ioctl device type) • 8-bit (_IOC_TYPEBITS) magic number • Associated with the device • number • 8-bit (_IOC_NRBITS) sequential number • Unique within device
Choosing the ioctl Commands • < direction, type, number, size> • size: size of user data involved • _IOC_SIZEBITS • Usually 14 bits but could be overridden by architecture • #define SCULL_IOCSQUANTUM _IOW(SCULL_IOC_MAGIC, 1, int) /* provoke compile error for invalid uses of size argument */ extern unsigned int __invalid_size_argument_for_IOC; #define _IOC_TYPECHECK(t) \ ((sizeof(t) == sizeof(t[1]) && \ sizeof(t) < (1 << _IOC_SIZEBITS)) ? \ sizeof(t) : __invalid_size_argument_for_IOC) /* See http://lwn.net/Articles/48354/ */
Choosing the ioctl Commands • Useful macros to create ioctl command numbers • _IO(type, nr) • _IOR(type, nr, datatype) • _IOW(type, nr, datatype) • _IOWR(type, nr, datatype) • _IO*_BAD used for backward compatibility • Uses number (of bytes) rather than datatype • http://lkml.iu.edu//hypermail/linux/kernel/0310.1/0019.html arg is unsigned long (integer) arg is a pointer
Choosing the ioctl Commands • Useful macros to decode ioctl command numbers • _IOC_DIR(nr) • _IOC_TYPE(nr) • _IOC_NR(nr) • _IOC_SIZE(nr)
Choosing the ioctl Commands • The scull example /* Use 'k' as magic number (type) field */ #define SCULL_IOC_MAGIC 'k‘ /* Please use a different 8-bit number in your code */ #define SCULL_IOCRESET _IO(SCULL_IOC_MAGIC, 0)
Choosing the ioctl Commands • The scull example /* * S means "Set" through a ptr, * T means "Tell" directly with the argument value * G means "Get": reply by setting through a pointer * Q means "Query": response is on the return value * X means "eXchange": switch G and S atomically * H means "sHift": switch T and Q atomically */ #define SCULL_IOCSQUANTUM _IOW(SCULL_IOC_MAGIC, 1, int) #define SCULL_IOCSQSET _IOW(SCULL_IOC_MAGIC, 2, int) #define SCULL_IOCTQUANTUM _IO(SCULL_IOC_MAGIC, 3) #define SCULL_IOCTQSET _IO(SCULL_IOC_MAGIC, 4) #define SCULL_IOCGQUANTUM _IOR(SCULL_IOC_MAGIC, 5, int) Set new value and return the old value
Choosing the ioctl Commands • The scull example #define SCULL_IOCGQSET _IOR(SCULL_IOC_MAGIC, 6, int) #define SCULL_IOCQQUANTUM _IO(SCULL_IOC_MAGIC, 7) #define SCULL_IOCQQSET _IO(SCULL_IOC_MAGIC, 8) #define SCULL_IOCXQUANTUM _IOWR(SCULL_IOC_MAGIC, 9, int) #define SCULL_IOCXQSET _IOWR(SCULL_IOC_MAGIC,10, int) #define SCULL_IOCHQUANTUM _IO(SCULL_IOC_MAGIC, 11) #define SCULL_IOCHQSET _IO(SCULL_IOC_MAGIC, 12) ... #define SCULL_IOC_MAXNR 14
The Return Value • When the command number is not supported • –ENOTTY (according to the POSIX standard) • Some drivers may (in conflict with the POSIX standard) return –EINVAL
The Predefined Commands • Handled by the kernel first • Will not be passed down to device drivers • Three groups • For any file (regular, device, FIFO, socket) • Magic number: “T.” • For regular files only • Specific to the file system type • E.g., see ext2_ioctl()
Using the ioctl Argument • If it is an integer, just use it directly • If it is a pointer • Need to check for valid user address int access_ok(int type, const void *addr, unsigned long size); • type: either VERIFY_READ or VERIFY_WRITE • Returns 1 for success, 0 for failure • Driver then results –EFAULT to the caller • Defined in <linux/uaccess.h> • Mostly called by memory-access routines
Using the ioctl Argument • The scull example int scull_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) { int err = 0, tmp; int retval = 0; /* check the magic number and whether the command is defined */ if (_IOC_TYPE(cmd) != SCULL_IOC_MAGIC) { return -ENOTTY; } if (_IOC_NR(cmd) > SCULL_IOC_MAXNR) { return -ENOTTY; } …
Using the ioctl Argument • The scull example … /* the concept of "read" and "write" is reversed here */ if (_IOC_DIR(cmd) & _IOC_READ) { err = !access_ok(VERIFY_WRITE, (void __user *) arg, _IOC_SIZE(cmd)); } else if (_IOC_DIR(cmd) & _IOC_WRITE) { err = !access_ok(VERIFY_READ, (void __user *) arg, _IOC_SIZE(cmd)); } if (err) return -EFAULT; …
Capabilities and Restricted Operations • Limit certain ioctl operations to privileged users • See <linux/capability.h> for the full set of capabilities • To check a certain capability call int capable(int capability); • In the scull example if (!capable(CAP_SYS_ADMIN)) { return –EPERM; } • http://lwn.net/Articles/486306/ A catch-all capability for many system administration operations
The Implementation of the ioctl Commands • A giant switch statement … switch(cmd) { case SCULL_IOCRESET: scull_quantum = SCULL_QUANTUM; scull_qset = SCULL_QSET; break; case SCULL_IOCSQUANTUM: /* Set: arg points to the value */ if (!capable(CAP_SYS_ADMIN)) { return -EPERM; } retval = __get_user(scull_quantum, (int __user *)arg); break; …
The Implementation of the ioctl Commands • Six ways to pass and receive arguments from the user space • Need to know command number int quantum; ioctl(fd,SCULL_IOCSQUANTUM, &quantum); /* Set by pointer */ ioctl(fd,SCULL_IOCTQUANTUM, quantum); /* Set by value */ ioctl(fd,SCULL_IOCGQUANTUM, &quantum); /* Get by pointer */ quantum = ioctl(fd,SCULL_IOCQQUANTUM); /* Get by return value */ ioctl(fd,SCULL_IOCXQUANTUM, &quantum); /* Exchange by pointer */ quantum = ioctl(fd,SCULL_IOCHQUANTUM, quantum); /* Exchange by value */
Pros/Cons of ioctl • Cons • Unregulated means to add new system call • API • Not reviewed • Different for each device • 32/64-bit compatibility • No way to enumerate • Pros • read and write with one call • Ref • http://lwn.net/Articles/191653/
Device Control Without ioctl • Writing control sequences into the data stream itself • Example: console escape sequences • Advantages: • No need to implement ioctl methods • Disadvantages: • Need to make sure that escape sequences do not appear in the normal data stream (e.g., cat a binary file) • Need to parse the data stream
Device Control Without ioctl • sysfs • Can be used to enumerate all exported components • Use standard unix shell commands • Netlink • Getting/setting socket options • debugfs • Probably not a good choice since its purpose is for debugging • relay interface • https://www.kernel.org/doc/Documentation/filesystems/relay.txt
Sleeping • Suspend thread waiting for some condition • Example usage: Blocking I/O • Data is not immediately available for reads • When the device is not ready to accept data • Output buffer is full
Introduction to Sleeping • A process is removed from the scheduler’s run queue • Certain rules • Generally never sleep when running in an atomic context • Multiple steps must be performed without concurrent accesses • Not while holding a spinlock, seqlock, or RCU lock • Not while disabling interrupts
Introduction to Sleeping • After waking up • Make no assumptions about the state of the system • The resource one is waiting for might be gone again • Must check the wait condition again
Introduction to Sleeping • Wait queue: contains a list of processes waiting for a specific event • #include <linux/wait.h> • To initialize statically, call DECLARE_WAIT_QUEUE_HEAD(my_queue); • To initialize dynamically, call wait_queue_head_t my_queue; init_waitqueue_head(&my_queue);
Simple Sleeping • Call variants of wait_event macros • wait_event(queue, condition) • queue = wait queue head • Passed by value • Waits until the boolean condition becomes true • Puts into an uninterruptible sleep • Usually is not what you want • wait_event_interruptible(queue, condition) • Can be interrupted by signals • Returns nonzero if sleep was interrupted • Your driver should return -ERESTARTSYS
Simple Sleeping • wait_event_timeout(queue, condition, timeout) • Wait for a limited time (in jiffies) • Returns 0 regardless of condition evaluations • wait_event_interruptible_timeout(queue, condition, timeout)
Simple Sleeping • To wake up, call variants of wake_up functions void wake_up(wait_queue_head_t *queue); • Wakes up all processes waiting on the queue void wake_up_interruptible(wait_queue_head_t *queue); • Wakes up processes that perform an interruptible sleep
Simple Sleeping • Example module: sleepy static DECLARE_WAIT_QUEUE_HEAD(wq); static int flag = 0; ssize_t sleepy_read(struct file *filp, char __user *buf, size_t count, loff_t *pos) { printk(KERN_DEBUG "process %i (%s) going to sleep\n", current->pid, current->comm); wait_event_interruptible(wq, flag != 0); flag = 0; printk(KERN_DEBUG "awoken %i (%s)\n", current->pid, current->comm); return 0; /* EOF */ } Multiple threads can wake up at this point
Simple Sleeping • Example module: sleepy ssize_t sleepy_write(struct file *filp, const char __user *buf, size_t count, loff_t *pos) { printk(KERN_DEBUG "process %i (%s) awakening the readers...\n", current->pid, current->comm); flag = 1; wake_up_interruptible(&wq); return count; /* succeed, to avoid retrial */ }
Blocking and Nonblocking Operations • By default, operations block • If no data is available for reads • If no space is available for writes • Non-blocking I/O is indicated by the O_NONBLOCK flag in filp->f_flags • Defined in <linux/fcntl.h> • Only open, read, and write calls are affected • Returns –EAGAIN immediately instead of block • Applications need to distinguish non-blocking returns vs. EOFs
A Blocking I/O Example • scullpipe • A read process • Blocks when no data is available • Wakes a blocking write when buffer space becomes available • A write process • Blocks when no buffer space is available • Wakes a blocking read process when data arrives
A Blocking I/O Example • scullpipe data structure struct scull_pipe { wait_queue_head_t inq, outq; /* read and write queues */ char *buffer, *end; /* begin of buf, end of buf */ int buffersize; /* used in pointer arithmetic */ char *rp, *wp; /* where to read, where to write */ int nreaders, nwriters; /* number of openings for r/w */ struct fasync_struct *async_queue; /* asynchronous readers */ struct mutex mutex; /* mutual exclusion */ struct cdev cdev; /* Char device structure */ };
A Blocking I/O Example static ssize_t scull_p_read(struct file *filp, char __user *buf, size_t count, loff_t *f_pos) { struct scull_pipe *dev = filp->private_data; if (mutex_lock_interruptible(&dev->mutex)) return -ERESTARTSYS; while (dev->rp == dev->wp) { /* nothing to read */ mutex_unlock(&dev->mutex); /* release the lock */ if (filp->f_flags & O_NONBLOCK) return -EAGAIN; if (wait_event_interruptible(dev->inq, (dev->rp != dev->wp))) return -ERESTARTSYS; if (mutex_lock_interruptible(&dev->mutex)) return -ERESTARTSYS; }
A Blocking I/O Example if (dev->wp > dev->rp) count = min(count, (size_t)(dev->wp - dev->rp)); else /* the write pointer has wrapped */ count = min(count, (size_t)(dev->end - dev->rp)); if (copy_to_user(buf, dev->rp, count)) { mutex_lock(&dev->mutex); return -EFAULT; } dev->rp += count; if (dev->rp == dev->end) dev->rp = dev->buffer; /* wrapped */ mutex_unlock(&dev->mutex); /* finally, awake any writers and return */ wake_up_interruptible(&dev->outq); return count; }
The llseek Implementation • Implements lseek and llseek system calls • Modifies filp->f_pos loff_t scull_llseek(struct file *filp, loff_t off, int whence) { struct scull_dev *dev = filp->private_data; loff_t newpos; switch(whence) { case 0: /* SEEK_SET */ newpos = off; break; case 1: /* SEEK_CUR, relative to the current position */ newpos = filp->f_pos + off; break;
The llseek Implementation case 2: /* SEEK_END, relative to the end of the file */ newpos = dev->size + off; break; default: /* can't happen */ return -EINVAL; } if (newpos < 0) return -EINVAL; filp->f_pos = newpos; return newpos; }
The llseek Implementation • May not make sense for serial ports and keyboard inputs • Need to inform the kernel via calling nonseekable_open in the open method int nonseekable_open(struct inode *inode, struct file *filp); • Replace llseek method with no_llseek (defined in <linux/fs.h> in your file_operations structure