Kernel Development

Kernel Development CSC585 Class Project Dawn Nelson December 2009

Compare timing and jitter between a realtime module and non-realtime module • Are the results of using a realtime module worth the effort of installing RTAI? • What is the timing difference between realtime and non-realtime kernel modules for computation? • What is the jitter difference between realtime and non-realtime kernel modules for computation? • What is the jitter difference between realtime and non-realtime kernel modules for overall process time, with and without MPI? • What types of tasks are improved by using RTAI?

What is the timing difference between realtime and non-realtime kernel modules?

Overall process Time comparison for 8 nodes

What is the jitter difference between realtime and non-realtime kernel modules for overall process time?

Source Code Written • Kernel Module implementing a char device read/write as a signal to perform the kernel task. • Kernel Module implementing RTAI with a fifo and a semaphore as a signal to perform the kernel task. • Programs to use the kernel modules. • MPI Programs to use the kernel modules. • Scripts to build and load both modules. • Scripts to run programs and save results. • Scripts to initiate MPI on all nodes (because mpdboot is retarded and doesn’t work for 8 nodes)

Character Device Driver – Read function ///read ssize_t mmmodule_mmmdo(struct file *filp, char *buf,size_t count, loff_t *f_pos) { int a[20][20],b[20][20],c[20][20]; int i,j,k,extraloop,t2; RTIME t0, t1; t0 = rt_get_cpu_time_ns(); //50000 iterations for a good measurement for (extraloop=0; extraloop< 50000; extraloop++) { // Matrix calculation block for (k=0; k< 20; k++) for (i=0; i< 20; i++) { c[i][k] = 0; for (j=0; j< 20; j++) c[i][k] = c[i][k] + a[i][j] * b[j][k]; } } t1 = rt_get_cpu_time_ns(); t2 = (int) (t1-t0); // Changing reading position as best suits //copy_to_user(buf,mmmodule_buffer,1); return t2; }

Character Device Driver - setup // Declaration of the init and exit functions module_init(mmmodule_init); module_exit(mmmodule_exit); // Global variables of the driver intmmmodule_major = 60; // Major number char *mmmodule_buffer; // Buffer to store data intmmmodule_init(void) { int result; // Registering device result = register_chrdev(mmmodule_major, "mmmodule", &mmmodule_fops); if (result < 0) { printk("mmmodule: cannot get major number %d\n", mmmodule_major); return result; } // Allocating mmmodule for the buffer mmmodule_buffer = kmalloc(1, GFP_KERNEL); if (!mmmodule_buffer) { result = -ENOMEM; goto fail; } memset(mmmodule_buffer, 0, 1); printk("Inserting mmmodule module\n"); return 0; fail: mmmodule_exit(); return result; } // memory character device driver to do matrix // multiply upon a call to it #include <linux/init.h> #include <linux/module.h> #include <linux/kernel.h> // printk() #include <linux/slab.h> // kmalloc() #include <linux/fs.h> // everything #include <linux/errno.h> // error codes #include <linux/types.h> // size_t #include <linux/proc_fs.h> #include <linux/fcntl.h> // O_ACCMODE #include <asm/system.h> // cli(), *_flags #include <rtai_sched.h> MODULE_LICENSE("GPL"); // Declaration of mmmodule.c functions intmmmodule_open(structinode *inode, struct file *filp); intmmmodule_release(structinode *inode, struct file *filp); ssize_tmmmodule_mmmdo(struct file *filp, char *buf, size_t count, loff_t *f_pos); void mmmodule_exit(void); intmmmodule_init(void); /* Structure that declares the usual file */ /* access functions */ structfile_operationsmmmodule_fops = { read: mmmodule_mmmdo, //write: mmmodule_write, open: mmmodule_open, release: mmmodule_release };

RealTime Module - Read static int myfifo_handler(unsigned int fifo) { rt_sem_signal(&myfifo_sem); return 0; } static void Myfifo_Read(long t) { int i=0,j=0,k=0,xj=0; int a[20][20],b[20][20],c[20][20]; char ch ='d'; RTIME t0, t1; while (1) { //rt_printk("new_shm: sem_waiting\n"); rt_sem_wait(&myfifo_sem); rtf_get(Myfifo, &ch, 1); //rt_printk("got a char off the fifo... time to do matrix mult\n"); t0 = rt_get_cpu_time_ns(); //rt_printk("t0= %ld \n",t0); for (xj=0; xj < 50000; xj++) { for (k=0; k < 20; k++) for (i=0; i < 20; i++) { c[i][k] = 0; for (j=0; j< 20; j++) c[i][k] = c[i][k] + a[i][j] * b[j][k]; } } t1 = rt_get_cpu_time_ns(); shm->t2 = t1-t0; // = (int *)t2; }}

RealTime Module - setup static RT_TASK read; #define TICK_PERIOD 100000LL /* 0.1 msec ( 1 tick) */ int init_module (void) { // shared memory section rt_printk("shm_rt.ko initialized: tick period = %ld\n", TICK_PERIOD); shm = (mtime *)rtai_kmalloc(nam2num(SHMNAM), SHMSIZ); if (shm == NULL) return -ENOMEM; memset(shm, 0, SHMSIZ); rtf_create(Myfifo, 1000); rtf_create_handler(Myfifo, myfifo_handler); rt_sem_init(&sync, 0); rt_typed_sem_init(&myfifo_sem, 0, SEM_TYPE); rt_task_init(&read, Myfifo_Read, 0, 2000, 0, 0, 0); start_rt_timer((int)nano2count(TICK_PERIOD)); rt_task_resume(&read); return 0; }

Conclusions • There are cases when RTAI improves timing and jitter. Mostly, longer running tasks, widely distributed tasks and deterministic tasks. • Accessing shared memory created using RTAI sadly slows the module to a ‘crawl’. My previous rt-module was giving results of 140 milliseconds per 5000 matrix multiplies. New version gives results of 100 Nanoseconds for 50,000 matrix multiplies. I can try physical memory mapping to see if performance is improved. • I don’t think modules were meant to be used for mass amounts of data, because of the slow transfer between user & kernel via copy-to-user, shared memory and copy-from-user • For MPI, the main advantage of using RTAI is that the nodes all finish at nearly the same rate.

Lessons learned • A kernel crash writes core dumps on all open windows. • A small tick-period locks up the whole machine and is unrecoverable. • Fifos and semaphores work nicely and do not create race conditions. • Character device drivers work nicely but are a little more maintenance to set up and program. • These are my first modules ever written, including the rt one for the conference. • A profiler would be very useful for comparing performance instead of graphs and text. • I will soon be writing an RT module to read a synchro device every 12 milliseconds to try out the deterministic-ness of RTAI. • 1000 Nanoseconds = 1 Micosecond • 1 Microsecond = 1000 Millisecond • 1 Millisecond = 1000000 Nanoseconds

Future work • There is very little work or code examples (findable by Google, anyway) done with RTAI • The Matrix Multiply, even at 50 thousands iterations, is not cpu-intensive enough to prove or disprove the advantages of RTAI. • Need to ask the Physicists for some of their algorithms to crunch through the system. At the conference, it was the physicists who showed interest in RTAI. • Plan to use RTAI for its intended purpose of being deterministic. • Write stuff about things for a paper.

C107 8 Node cluster setup with centos 5.3, rtai and mpich2

Kernel Development

Kernel Development

Presentation Transcript

Kernel Regression

Linux Kernel Development

Kernel Methods

Kernel III

Kernel Methods

kernel development and the embedded world

Kernel Methods

Kernel Methods

Micro-kernel

Kernel Modules

Linux Kernel

Kernel III

Kernel Development using Virtualization

Kernel

Kernel Regression

Kernel Synchronization

UNIX Kernel

Linux Kernel Development

kernel development and the embedded world