480 likes | 796 Views
Solaris Scheduling. Bongio Jeremy Wenjin Hu. Overview. Table Driven Loadable class module Thread-level scheduling The Solaris kernel may be seen as a bundle of kernel threads A kernel thread is the entity that is scheduled by the kernel
E N D
Solaris Scheduling Bongio Jeremy Wenjin Hu
Overview • Table Driven • Loadable class module • Thread-level scheduling • The Solaris kernel may be seen as a bundle of kernel threads • A kernel thread is the entity that is scheduled by the kernel • If no lightweight process is attached, it is also known as a system thread • Kernel preemptable
Class and priority • Interrupt • global prio -> 100~109/160~169 • user prio -> 0-9 • Not a really scheduling class • Real Time • global prio -> 100~159 • user prio -> 0-59 • SYS • global prio -> 60~99 • user prio -> 0-39 • TimeShare • global prio -> 0~59 • user prio -> 0-59 • Interactive • sharesTS dispatch table
Priority Classes • Global Priority Scheme and Scheduling Classes (1)
Class and priority (Con’t) • Two level priority Systemwide-relative priority(Global Priority) NOT tunable(Global Range) Class-relative priority(Class Priority) Tuned by the kernel/dispatcher (Adjustment Rang)
/uts/common/sys/class.h 105 typedefstructsclass { 106 char *cl_name; /* class name */ 107 /* class specific initialization function */ 108 pri_t (*cl_init)(id_t, int, classfuncs_t **); /*scheduling-class-dependent(class_ops & thread_ops)*/ /*thread can enter the class*/ 109 classfuncs_t *cl_funcs; /* pointer to classfuncs structure */ /*kernel lock for synchronized access to the class structure*/ 110 krwlock_t *cl_lock; /* class structure read/write lock */ 111 intcl_count; /* # of threads trying to load class */ 112 } sclass_t;
SYS • Critical resource • Preemptable • NOT time sliced • Priority defined in a simple array • NOT loadable • In the TS framework • TS/IA may be temporarily adjust to SYS
/uts/common/disp/ts_dptbl.c 77 #defineTSGPUP0 0 /* Global priority for TS user priority 0 */ 78 #defineTSGPKP0 60 /* Global priority for TS kernel priority 0 */ 79 80 /* 81 * array of global priorities used by ts procs sleeping or 82 * running in kernel mode after sleep 83 */ 84 85 pri_tconfig_ts_kmdpris[] = { 86 TSGPKP0, TSGPKP0+1, TSGPKP0+2, TSGPKP0+3, 87 TSGPKP0+4, TSGPKP0+5, TSGPKP0+6, TSGPKP0+7, 88 TSGPKP0+8, TSGPKP0+9, TSGPKP0+10, TSGPKP0+11, 89 TSGPKP0+12, TSGPKP0+13, TSGPKP0+14, TSGPKP0+15, 90 TSGPKP0+16, TSGPKP0+17, TSGPKP0+18, TSGPKP0+19, 91 TSGPKP0+20, TSGPKP0+21, TSGPKP0+22, TSGPKP0+23, 92 TSGPKP0+24, TSGPKP0+25, TSGPKP0+26, TSGPKP0+27, 93 TSGPKP0+28, TSGPKP0+29, TSGPKP0+30, TSGPKP0+31, 94 TSGPKP0+32, TSGPKP0+33, TSGPKP0+34, TSGPKP0+35, 95 TSGPKP0+36, TSGPKP0+37, TSGPKP0+38, TSGPKP0+39 96 };
Realtime • Capable of preempting SYS • Memory locking • Run at fixed priority but can be configured (kernel cannot change its priority)
/src/uts/common/sys/rt.h 46 typedefstructrtdpent { 47 pri_trt_globpri; /* global (class independent) priority */ 48 intrt_quantum; /* default quantum associated with this level */ 49 } rtdpent_t;
/uts/common/disp/rt_dptbl.c 73 #defineRTGPPRIO0 100 /* Global priority for RT priority 0 */ 74 75 rtdpent_tconfig_rt_dptbl[] = { 76 77 /* prilevel Time quantum */ 78 79 RTGPPRIO0, 100, 80 RTGPPRIO0+1, 100, 81 RTGPPRIO0+2, 100, 97 RTGPPRIO0+18, 80, 102 RTGPPRIO0+23, 60, 107 RTGPPRIO0+28, 60, 112 RTGPPRIO0+33, 40, 117 RTGPPRIO0+38, 40, 122 RTGPPRIO0+43, 20, 127 RTGPPRIO0+48, 20, 137 RTGPPRIO0+58, 10, 138 RTGPPRIO0+59, 10 139 };
Dispatcher Table • Contains default values for priority and priortity readjustment • Get its global priority by user priority as index • Get quantum by its global priority • Indicate how to adjust the priority TS example
/src/uts/common/sys/ts.h 47 typedefstructtsdpent { 48 pri_tts_globpri; /* global (class independent) priority */ 49 intts_quantum; /* time quantum given to procs at this level */ /*favors IA or CPU-bound*/ /*parameters to calculate the class-relative priority*/ /*deduct 10 from current globpri value(decreasing the priority)*/ 50 pri_tts_tqexp; /* ts_umdpri assigned when proc at this level */ 51 /* exceeds its time quantum */ 52 pri_tts_slpret; /* ts_umdpri assigned when proc at this level */ 53 /* returns to user mode after sleeping */ /*Control I/O bound decay*/ /*threshhold*/ 54 shortts_maxwait; /* bumped to ts_lwait if more than ts_maxwait */ 55 /* secs elapse before receiving full quantum */ 56 shortts_lwait; /* ts_umdpri assigned if ts_dispwait exceeds */ 57 /* ts_maxwait */ /*Controls thread starvation*/ 58 } tsdpent_t;
/uts/common/disp/ts_dptbl.c 77 #defineTSGPUP0 0 /* Global priority for TS user priority 0 */ 78 #defineTSGPKP0 60 /* Global priority for TS kernel priority 0 */ 98 tsdpent_tconfig_ts_dptbl[] = { 99 100 /* glbpri qntm tqexp slprt mxwt lwt */ 101 102 TSGPUP0+0, 20, 0, 50, 0, 50, 124 TSGPUP0+22, 12, 12, 52, 0, 52, 136 TSGPUP0+34, 8, 24, 53, 0, 53, 149 TSGPUP0+47, 4, 37, 58, 0, 58, 159 TSGPUP0+57, 4, 47, 58, 0, 59, 160 TSGPUP0+58, 4, 48, 58, 0, 59, 161 TSGPUP0+59, 2, 49, 59, 32000, 59 162 };
Thread priority calculation& Dispatcher Algorithm • Quantum corresponding to the class-relative priority • Lower priority but longer quantum to favor IA • ts_cpupri is used to index into TS disptbl and updated itself by the corresponding ts_tqexp • user mode priority calculated • ts_globpri=TSGPUP+ts_umdpri (used as index) • t_pri=ts_globpri or t_pri=lowest SYS priority if ts_flags=TSKPRI(indicates working in SYS class)
/uts/common/sys/ts.h 64 typedefstructtsproc { 65 intts_timeleft; /* time remaining in procs quantum */ /*updated per sec by ts_update() and compared with tsdpend.ts_maxwait*/ 66 uint_tts_dispwait; /* wall clock seconds since start */ 67 /* of quantum (not reset upon preemption */ 71 pri_tts_umdpri; /* user mode priority within ts class */ /*adjustment: ts_umdpri=ts_cpupri+ts_upri+ts_boost*/ 68 pri_tts_cpupri; /* system controlled component of ts_umdpri */ 69 pri_tts_uprilim; /* user priority limit */ 70 pri_tts_upri; /* user priority */ 74 charts_boost; /* interactive priority offset */ /*distinguish between IA and TS*/ 75 uchar_tts_flags; /* flags defined below */ 72 pri_tts_scpri; /* remembered priority, for schedctl */ 73 charts_nice; /* nice value for compatibility */ 76 kthread_t *ts_tp; /* pointer to thread */ 77 structtsproc *ts_next; /* link to next tsproc on list */ 78 structtsproc *ts_prev; /* link to previous tsproc on list */ 79 } tsproc_t;
/uts/common/sys/thread.h 106 typedefstruct_kthread { /*set by tsdpent_t.ts_globpri*/ 120 pri_tt_pri; /* assigned thread priority */ /*scheduling-class-specific structure linked to every kthread */ 129 structthread_ops *t_clfuncs; /* scheduling class ops vector */ 130 void *t_cldata; /* per scheduling class specific data */ /*inherited from parent thread, initial thread is LWP/kthread*/ 121 pri_tt_epri; /* inherited thread priority */ /*the SYS thread priority for this thread*/ /*Increase the TS/IA class to SYS class if critical resource obtained*/ 195 uint_tt_kpri_req; /* kernel priority required */ 336 } kthread_t;
/uts/common/disp/ts.c 128 #defineTS_NEWUMDPRI(tspp) \ 129 { \ 130 pri_tpri; \ 131 pri = (tspp)->ts_cpupri + (tspp)->ts_upri + (tspp)->ts_boost; \ 132 if (pri > ts_maxumdpri) \ 133 (tspp)->ts_umdpri = ts_maxumdpri; \ 134 elseif (pri < 0) \ 135 (tspp)->ts_umdpri = 0; \ 136 else \ 137 (tspp)->ts_umdpri = pri; \ 138 ASSERT((tspp)->ts_umdpri >= 0 && (tspp)->ts_umdpri <= ts_maxumdpri); \ 139 }
1659 ts_tick(kthread_t *t) 1693 tspp->ts_cpupri = ts_dptbl[tspp->ts_cpupri].ts_tqexp; 1694 TS_NEWUMDPRI(tspp); 1696 new_pri = ts_dptbl[tspp->ts_umdpri].ts_globpri; 1706 if ((t->t_schedflag & TS_LOAD) 1710 tspp->ts_timeleft = 1711 ts_dptbl[tspp->ts_cpupri].ts_quantum;
TS Class vs IA Class • The same dispatcher table • IA for windows (This can be observed by last semester’s project, the active windows has more chances to run) • Share TS’s thread tsproc_t data structure by flag ts_flags
TS Class vs IA Class (Con’t) • ts_boost for IA (+10, cancel the effect of ts_tqexp) • ts_boost for TS (0) • IA use setfrontdq() for getting scheduled ASAP • TS use: • setbackdq() for maintaining a banlance in queue depth across processors • setfrontdq() for waiting for a while
/uts/common/sys/ts.h 61 /* 62 * time-sharing class specific thread structure 63 */ 64 typedefstructtsproc { 65 intts_timeleft; /* time remaining in procs quantum */ 66 uint_tts_dispwait; /* wall clock seconds since start */ 67 /* of quantum (not reset upon preemption */ 68 pri_tts_cpupri; /* system controlled component of ts_umdpri */ 69 pri_tts_uprilim; /* user priority limit */ 70 pri_tts_upri; /* user priority */ 71 pri_tts_umdpri; /* user mode priority within ts class */ 72 pri_tts_scpri; /* remembered priority, for schedctl */ 73 charts_nice; /* nice value for compatibility */ 74 charts_boost; /* interactive priority offset */ 75 uchar_tts_flags; /* flags defined below */ 76 kthread_t *ts_tp; /* pointer to thread */ 77 structtsproc *ts_next; /* link to next tsproc on list */ 78 structtsproc *ts_prev; /* link to previous tsproc on list */ 79 } tsproc_t; 80 81 82 /* flags */ 83 #defineTSKPRI 0x01 /* thread at kernel mode priority */ 84 #defineTSBACKQ 0x02 /* thread goes to back of disp q when preempted */ 85 #defineTSIA 0x04 /* thread is interactive */ 86 #defineTSIASET 0x08 /* interactive thread is "on" */ 87 #defineTSIANICED 0x10 /* interactive thread has been niced */ 88 #defineTSRESTORE 0x20 /* thread was not preempted, due to schedctl */ 89 /* restore priority from ts_scpri */
/uts/common/disp/ts.c 1653 * Check for time slice expiration. If time slice has expired 1654 * move thread to priority specified in tsdptbl for time slice expiration 1655 * and set runrun to cause preemption. 1656 */ 1657 1658 staticvoid 1659 ts_tick(kthread_t *t) 1668 if ((tspp->ts_flags & TSKPRI) == 0) { 1669 if (--tspp->ts_timeleft <= 0) { 1670 pri_tnew_pri; 1671 1672 /* 1673 * If we're doing preemption control and trying to 1674 * avoid preempting this thread, just note that 1675 * the thread should yield soon and let it keep 1676 * running (unless it's been a while). 1677 */ 1678 if (t->t_schedctl && schedctl_get_nopreempt(t)) { 1679 if (tspp->ts_timeleft > -SC_MAX_TICKS) { 1680 DTRACE_SCHED1(schedctl__nopreempt, 1681 kthread_t *, t); 1682 schedctl_set_yield(t, 1); 1683 thread_unlock_nopreempt(t); 1684 return; 1685 } 1686
/uts/common/disp/ts.c 1653 * Check for time slice expiration. If time slice has expired 1654 * move thread to priority specified in tsdptbl for time slice expiration 1655 * and set runrun to cause preemption. 1656 */ 1657 1658 staticvoid 1659 ts_tick(kthread_t *t) 1668 if ((tspp->ts_flags & TSKPRI) == 0) { 1669 if (--tspp->ts_timeleft <= 0) { 1670 pri_tnew_pri; 1671 1672 /* 1673 * If we're doing preemption control and trying to 1674 * avoid preempting this thread, just note that 1675 * the thread should yield soon and let it keep 1676 * running (unless it's been a while). 1677 */ 1678 if (t->t_schedctl && schedctl_get_nopreempt(t)) { 1679 if (tspp->ts_timeleft > -SC_MAX_TICKS) { 1680 DTRACE_SCHED1(schedctl__nopreempt, 1681 kthread_t *, t); 1682 schedctl_set_yield(t, 1); 1683 thread_unlock_nopreempt(t); 1684 return; 1685 } 1686
/uts/common/disp/ts.c 1686 1687 TNF_PROBE_2(schedctl_failsafe, 1688 "schedctl TS ts_tick", /* CSTYLED */, 1689 tnf_pid, pid, ttoproc(t)->p_pid, 1690 tnf_lwpid, lwpid, t->t_tid); 1691 } 1692 tspp->ts_flags &= ~TSRESTORE; 1693 tspp->ts_cpupri = ts_dptbl[tspp->ts_cpupri].ts_tqexp; 1694 TS_NEWUMDPRI(tspp); 1695 tspp->ts_dispwait = 0; 1696 new_pri = ts_dptbl[tspp->ts_umdpri].ts_globpri; 1697 ASSERT(new_pri >= 0 && new_pri <= ts_maxglobpri); 1698 /* 1699 * When the priority of a thread is changed, 1700 * it may be necessary to adjust its position 1701 * on a sleep queue or dispatch queue. 1702 * The function thread_change_pri accomplishes 1703 * this. 1704 */ 1705 if (thread_change_pri(t, new_pri, 0)) { 1706 if ((t->t_schedflag & TS_LOAD) && 1707 (lwp = t->t_lwp) && 1708 lwp->lwp_state == LWP_USER) 1709 t->t_schedflag &= ~TS_DONT_SWAP; 1710 tspp->ts_timeleft = 1711 ts_dptbl[tspp->ts_cpupri].ts_quantum; 1712 } else { 1713 tspp->ts_flags |= TSBACKQ; 1714 cpu_surrender(t); 1715 } 1716 TRACE_2(TR_FAC_DISP, TR_TICK, 1717 "tick:tid %p old pri %d", t, oldpri); 1718 } elseif (t->t_state == TS_ONPROC && 1719 t->t_pri < t->t_disp_queue->disp_maxrunpri) { 1720 tspp->ts_flags |= TSBACKQ; 1721 cpu_surrender(t); 1722 }
/uts/common/disp/ts.c 1698 /* 1699 * When the priority of a thread is changed, 1700 * it may be necessary to adjust its position 1701 * on a sleep queue or dispatch queue. 1702 * The function thread_change_pri accomplishes 1703 * this. 1704 */ 1705 if (thread_change_pri(t, new_pri, 0)) { 1706 if ((t->t_schedflag & TS_LOAD) && 1707 (lwp = t->t_lwp) && 1708 lwp->lwp_state == LWP_USER) 1709 t->t_schedflag &= ~TS_DONT_SWAP; 1710 tspp->ts_timeleft = 1711 ts_dptbl[tspp->ts_cpupri].ts_quantum; 1712 } else { 1713 tspp->ts_flags |= TSBACKQ; 1714 cpu_surrender(t); 1715 } 1716 TRACE_2(TR_FAC_DISP, TR_TICK, 1717 "tick:tid %p old pri %d", t, oldpri); 1718 } elseif (t->t_state == TS_ONPROC && 1719 t->t_pri < t->t_disp_queue->disp_maxrunpri) { 1720 tspp->ts_flags |= TSBACKQ; 1721 cpu_surrender(t); 1722 }
Priority Inheritance • Prevent priority inversion • Each thread has two priorities: global priority and inherited priority. The inherited priority is normally zero unless the thread is sitting on a resource that is required by a higher priority thread. • When a thread blocks on a resource, it attempts to "will" or pass on its priority to all threads that are directly or indirectly blocking it. The pi_willto() function checks each thread that is blocking the resource or that is blocking a thread in the syncronization chain. When it sees threads that are a lower priority, those threads inherit the priority of the blocked thread. It stops traversing the syncronization chain when it hits an object that is not blocked or is higher priority than the willing thread. • If someone maliciously grab some resource?
Mechanism / Frame • Dispatcher: manage queues of runable threads ---run the highest priority thread ---recalculate the thread priority • Multi dispatch queues one for each processor one kernel preempt queue for systemwide for unbound RT threads one kernel preempt queue for each processor set for RT threads • Double linked list • Clock-driven
/src/uts/common/sys/disp.h 47 typedefstructdispq { 48 kthread_t *dq_first; /* first thread on queue or NULL */ 49 kthread_t *dq_last; /* last thread on queue or NULL */ 50 intdq_sruncnt; /* number of loaded, runnable */ 51 /* threads on queue */ 52 } dispq_t;
/uts/common/sys/disp.h 54 /* 55 * Dispatch queue structure. 56 */ 57 typedefstruct_disp { 58 disp_lock_tdisp_lock; /* protects dispatching fields */ 59 pri_tdisp_npri; /* # of priority levels in queue */ 60 dispq_t *disp_q; /* the dispatch queue */ 61 dispq_t *disp_q_limit; /* ptr past end of dispatch queue */ 62 ulong_t *disp_qactmap; /* bitmap of active dispatch queues */ 63 64 /* 65 * Priorities: 66 * disp_maxrunpri is the maximum run priority of runnable threads 67 * on this queue. It is -1 if nothing is runnable. 68 * 69 * disp_max_unbound_pri is the maximum run priority of threads on 70 * this dispatch queue but runnable by any CPU. This may be left 71 * artificially high, then corrected when some CPU tries to take 72 * an unbound thread. It is -1 if nothing is runnable. 73 */ 74 pri_tdisp_maxrunpri; /* maximum run priority */ 75 pri_tdisp_max_unbound_pri; /* max pri of unbound threads */ 76 77 volatileintdisp_nrunnable; /* runnable threads in cpu dispq */ 78 79 structcpu *disp_cpu; /* cpu owning this queue or NULL */ 80 } disp_t;
Dispatcher • ts_tick() recalculate the priority of the running process • ts_update() recalculate the priority of the process in a dispatch queue or sleep queue • setfrontdq()&setbackdq() will cause preempt() • ts_tick() will cause cpu_surrend() • ts_tick() or ts_yield will cause swtch() • swtch() will call disp() • disp() looks for the highest-priority to run • First search kernel preempt queue • Then search the queue of the current CPU • Search the dispatch queue of other CPUs • Idle thread
/uts/common/disp/disp.c 685 while ((pri = kpq->disp_maxrunpri) >= 0 && 686 pri >= dp->disp_maxrunpri && 687 (cpup->cpu_flags & CPU_OFFLINE) == 0 && /*fectch the best-priority thread from the kernel preempt queue*/ 688 (tp = disp_getbest(kpq)) != NULL) { 695 } 698 pri = dp->disp_maxrunpri; 707 if (pri == -1) { 708 if (!(cpup->cpu_flags & CPU_OFFLINE)) { /*find a processor with the highest-priority thread*/ 710 if ((tp = disp_getwork(cpup)) == NULL) { 711 tp = cpup->cpu_idle_thread; 718 } 719 } else { 721 tp = cpup->cpu_idle_thread; 727 } 734 dq = &dp->disp_q[pri]; 735 tp = dq->dq_first;
Scheduler activation • Preemption control • The management of the LWP-to-User-thread problem • The management fo keeping the correct number of LWPs available for a threaded process
LWP • Lightweight process • Execute a function call to a function that is part of another process’s address space, pass the function arguments, and get a return value as if the function was part of the calling process • A lightweight process can be considered as the swappable portion of a kernel thread • a lightweight process is to think of them as "virtual CPUs" which perform the processing for applications. Application threads are attached to available lightweight processes, which are attached to a a kernel thread, which is scheduled on the system's CPU dispatch queue. • Communication between the kernel and user-level threads library • Based on shared memory pages • System call lwp_schedctl() • primordial thread t0 • sc_init() establish the shared memory pages and the upcall door.
User thread vs Kernel thread • A kernel thread is the entity that is scheduled by the kernel. If no lightweight process is attached, it is also known as a system thread. It uses kernel text and global data, but has its own kernel stack, as well as a data structure to hold scheduling and syncronization information. • Kernel threads can be independently scheduled on CPUs. Context switching between kernel threads is very fast because memory mappings do not have to be flushed. • User threads are scheduled via a scheduler in libthread. This scheduler does implement priorities, but does not implement time slicing. If time slicing is desired, it must be programmed in.
User thread vs Kernel thread(Con’t) • User thread use thread library’s own scheduler • Solaris currently ships with two threads libraries: • libthread.so, for support of the Solaris threads interfaces, user threads are created by a call to thr_create(3THR) (Solaris threads) • libpthread.so, the POSIX (Portable Operating System Interface for Unix) threads APIs, user threads are created by a call to pthread_create(3THR) (POSIX threads).
User thread vs Kernel thread • The Multithreaded Process Model (1)
Scheduler activation • Schedctl_init() initialize, turning on premption control. • sc_init() estalbish a door for kernel-to-user upcalls • schedctl_block() determin if this LWP is the last one in the process when LWP is to sleep.
Preemption control • In ts-tick, if ts_timeleft reaches 0, give the kthread a few extra ticks beyond its time quantum to free the critical resources • Get one more time slice to run, not allow the scheduler activation to keep thread running indefinitely.
Citation • 1. Solaris Internals • 2. http://www.princeton.edu/~psg/unix/Solaris/troubleshoot/process.html
Scheduler activation • Give the kthread a few extra clock ticks beyond its time quantum to complete its task and free the lock • Activated by mutex lock • Only given one extra ts_tick()
Processor Set • RT in the same processor set • Interrupt disabled in the RT set
Kernel Service • System Calls: The kernel executes requests submitted by processes via system calls. The system call interface invokes a special trap instruction. • Hardware Exceptions: The kernel notifies a process that attempts several illegal activities such as dividing by zero or overflowing the user stack. • Hardware Interrupts: Devices use interrupts to notify the kernel of status changes (such as I/O completions). • Resource Management: The kernel manages resources via special processes such as the pagedaemon.
Hash table 150 #defineTS_LISTS 16 /* number of lists, must be power of 2 */ 153 #defineTS_LIST_HASH(tp) (((uintptr_t)(tp) >> 9) & (TS_LISTS - 1)) 235 statickmutex_tts_dptblock; /* protects time sharing dispatch table */ 236 statickmutex_tts_list_lock[TS_LISTS]; /* protects tsproc lists */ 237 statictsproc_tts_plisthead[TS_LISTS]; /* dummy tsproc at head of lists */
Key feature • Table driven • Priority inversion—priority inheritance • User mode thread—Kernel mode thread • Kernel preemptable • Scheduler activation • Processor set