140 likes | 277 Views
Kernel Development. CSC585 Class Project Dawn Nelson December 2009. Compare timing and jitter between a realtime module and non- realtime module. Are the results of using a realtime module worth the effort of installing RTAI?
E N D
Kernel Development CSC585 Class Project Dawn Nelson December 2009
Compare timing and jitter between a realtime module and non-realtime module • Are the results of using a realtime module worth the effort of installing RTAI? • What is the timing difference between realtime and non-realtime kernel modules for computation? • What is the jitter difference between realtime and non-realtime kernel modules for computation? • What is the jitter difference between realtime and non-realtime kernel modules for overall process time, with and without MPI? • What types of tasks are improved by using RTAI?
What is the timing difference between realtime and non-realtime kernel modules?
What is the jitter difference between realtime and non-realtime kernel modules for overall process time?
Source Code Written • Kernel Module implementing a char device read/write as a signal to perform the kernel task. • Kernel Module implementing RTAI with a fifo and a semaphore as a signal to perform the kernel task. • Programs to use the kernel modules. • MPI Programs to use the kernel modules. • Scripts to build and load both modules. • Scripts to run programs and save results. • Scripts to initiate MPI on all nodes (because mpdboot is retarded and doesn’t work for 8 nodes)
Character Device Driver – Read function ///read ssize_t mmmodule_mmmdo(struct file *filp, char *buf,size_t count, loff_t *f_pos) { int a[20][20],b[20][20],c[20][20]; int i,j,k,extraloop,t2; RTIME t0, t1; t0 = rt_get_cpu_time_ns(); //50000 iterations for a good measurement for (extraloop=0; extraloop< 50000; extraloop++) { // Matrix calculation block for (k=0; k< 20; k++) for (i=0; i< 20; i++) { c[i][k] = 0; for (j=0; j< 20; j++) c[i][k] = c[i][k] + a[i][j] * b[j][k]; } } t1 = rt_get_cpu_time_ns(); t2 = (int) (t1-t0); // Changing reading position as best suits //copy_to_user(buf,mmmodule_buffer,1); return t2; }
Character Device Driver - setup // Declaration of the init and exit functions module_init(mmmodule_init); module_exit(mmmodule_exit); // Global variables of the driver intmmmodule_major = 60; // Major number char *mmmodule_buffer; // Buffer to store data intmmmodule_init(void) { int result; // Registering device result = register_chrdev(mmmodule_major, "mmmodule", &mmmodule_fops); if (result < 0) { printk("mmmodule: cannot get major number %d\n", mmmodule_major); return result; } // Allocating mmmodule for the buffer mmmodule_buffer = kmalloc(1, GFP_KERNEL); if (!mmmodule_buffer) { result = -ENOMEM; goto fail; } memset(mmmodule_buffer, 0, 1); printk("Inserting mmmodule module\n"); return 0; fail: mmmodule_exit(); return result; } // memory character device driver to do matrix // multiply upon a call to it #include <linux/init.h> #include <linux/module.h> #include <linux/kernel.h> // printk() #include <linux/slab.h> // kmalloc() #include <linux/fs.h> // everything #include <linux/errno.h> // error codes #include <linux/types.h> // size_t #include <linux/proc_fs.h> #include <linux/fcntl.h> // O_ACCMODE #include <asm/system.h> // cli(), *_flags #include <rtai_sched.h> MODULE_LICENSE("GPL"); // Declaration of mmmodule.c functions intmmmodule_open(structinode *inode, struct file *filp); intmmmodule_release(structinode *inode, struct file *filp); ssize_tmmmodule_mmmdo(struct file *filp, char *buf, size_t count, loff_t *f_pos); void mmmodule_exit(void); intmmmodule_init(void); /* Structure that declares the usual file */ /* access functions */ structfile_operationsmmmodule_fops = { read: mmmodule_mmmdo, //write: mmmodule_write, open: mmmodule_open, release: mmmodule_release };
RealTime Module - Read static int myfifo_handler(unsigned int fifo) { rt_sem_signal(&myfifo_sem); return 0; } static void Myfifo_Read(long t) { int i=0,j=0,k=0,xj=0; int a[20][20],b[20][20],c[20][20]; char ch ='d'; RTIME t0, t1; while (1) { //rt_printk("new_shm: sem_waiting\n"); rt_sem_wait(&myfifo_sem); rtf_get(Myfifo, &ch, 1); //rt_printk("got a char off the fifo... time to do matrix mult\n"); t0 = rt_get_cpu_time_ns(); //rt_printk("t0= %ld \n",t0); for (xj=0; xj < 50000; xj++) { for (k=0; k < 20; k++) for (i=0; i < 20; i++) { c[i][k] = 0; for (j=0; j< 20; j++) c[i][k] = c[i][k] + a[i][j] * b[j][k]; } } t1 = rt_get_cpu_time_ns(); shm->t2 = t1-t0; // = (int *)t2; }}
RealTime Module - setup static RT_TASK read; #define TICK_PERIOD 100000LL /* 0.1 msec ( 1 tick) */ int init_module (void) { // shared memory section rt_printk("shm_rt.ko initialized: tick period = %ld\n", TICK_PERIOD); shm = (mtime *)rtai_kmalloc(nam2num(SHMNAM), SHMSIZ); if (shm == NULL) return -ENOMEM; memset(shm, 0, SHMSIZ); rtf_create(Myfifo, 1000); rtf_create_handler(Myfifo, myfifo_handler); rt_sem_init(&sync, 0); rt_typed_sem_init(&myfifo_sem, 0, SEM_TYPE); rt_task_init(&read, Myfifo_Read, 0, 2000, 0, 0, 0); start_rt_timer((int)nano2count(TICK_PERIOD)); rt_task_resume(&read); return 0; }
Conclusions • There are cases when RTAI improves timing and jitter. Mostly, longer running tasks, widely distributed tasks and deterministic tasks. • Accessing shared memory created using RTAI sadly slows the module to a ‘crawl’. My previous rt-module was giving results of 140 milliseconds per 5000 matrix multiplies. New version gives results of 100 Nanoseconds for 50,000 matrix multiplies. I can try physical memory mapping to see if performance is improved. • I don’t think modules were meant to be used for mass amounts of data, because of the slow transfer between user & kernel via copy-to-user, shared memory and copy-from-user • For MPI, the main advantage of using RTAI is that the nodes all finish at nearly the same rate.
Lessons learned • A kernel crash writes core dumps on all open windows. • A small tick-period locks up the whole machine and is unrecoverable. • Fifos and semaphores work nicely and do not create race conditions. • Character device drivers work nicely but are a little more maintenance to set up and program. • These are my first modules ever written, including the rt one for the conference. • A profiler would be very useful for comparing performance instead of graphs and text. • I will soon be writing an RT module to read a synchro device every 12 milliseconds to try out the deterministic-ness of RTAI. • 1000 Nanoseconds = 1 Micosecond • 1 Microsecond = 1000 Millisecond • 1 Millisecond = 1000000 Nanoseconds
Future work • There is very little work or code examples (findable by Google, anyway) done with RTAI • The Matrix Multiply, even at 50 thousands iterations, is not cpu-intensive enough to prove or disprove the advantages of RTAI. • Need to ask the Physicists for some of their algorithms to crunch through the system. At the conference, it was the physicists who showed interest in RTAI. • Plan to use RTAI for its intended purpose of being deterministic. • Write stuff about things for a paper.