| 1 |
schedtool |
| 2 |
Copyright (C) 2002-2006 Freek |
| 3 |
Release under GPL, version 2 (see LICENSE) |
| 4 |
Use at your own risk. |
| 5 |
Inspired by setbatch (C) 2002 Ingo Molnar |
| 6 |
Suggestions are welcome. |
| 7 |
|
| 8 |
CONTENT: |
| 9 |
-------- |
| 10 |
|
| 11 |
-About |
| 12 |
-Usage / description of schedtool |
| 13 |
-A complex example |
| 14 |
-Static Priority |
| 15 |
-Policies reviewed |
| 16 |
-Thanks |
| 17 |
-Appendix A: A course into Multi-Level-Feedback-Queue-scheduling |
| 18 |
|
| 19 |
|
| 20 |
|
| 21 |
ABOUT: |
| 22 |
------ |
| 23 |
|
| 24 |
schedtool was born, because there was no tool to change or query |
| 25 |
all CPU-scheduling policies under Linux, in one handy command. |
| 26 |
Support for CPU-affinity has also been added and most recently |
| 27 |
(re-)nicing of processes. |
| 28 |
Thus, schedtool is the definitive interface to Linux's scheduler. |
| 29 |
|
| 30 |
It can be used to avoid skipping for A/V-applications, to lock |
| 31 |
processes onto certain CPUs on SMP/NUMA systems, which may be |
| 32 |
beneficial for networking or benchmarks, or to adjust nice-levels |
| 33 |
of lesser important jobs to maintain a high amount of interactive |
| 34 |
responsiveness under high load. |
| 35 |
|
| 36 |
All output, even errors, go to STDOUT to ease piping. |
| 37 |
|
| 38 |
If you don't know about scheduling policies, you probably don't want to |
| 39 |
use this program - or learn and read "man sched_setscheduler". |
| 40 |
|
| 41 |
Certain modes (as of this writing: SCHED_IDLEPRIO and SCHED_ISO) need a |
| 42 |
patched kernel. See INSTALL for details. |
| 43 |
|
| 44 |
|
| 45 |
|
| 46 |
USAGE: |
| 47 |
------ |
| 48 |
|
| 49 |
There are 3 operation modes: query, set and execute new process. |
| 50 |
|
| 51 |
QUERY PROCESS(ES): |
| 52 |
|
| 53 |
#> schedtool <LIST_OF_PIDs> |
| 54 |
|
| 55 |
This will print all information it can obtain for the processes with |
| 56 |
<PIDs>. |
| 57 |
|
| 58 |
|
| 59 |
SET PROCESS(ES): |
| 60 |
|
| 61 |
I) scheduling policy (detailed discussion in "POLICY OVERVIEW") |
| 62 |
|
| 63 |
#> schedtool -<MODE> <LIST_OF_PIDs> |
| 64 |
|
| 65 |
where <MODE> is one of: |
| 66 |
|
| 67 |
N or 0: for SCHED_NORMAL |
| 68 |
F or 1: for SCHED_FIFO |
| 69 |
R or 2: for SCHED_RR |
| 70 |
B or 3: for SCHED_BATCH |
| 71 |
I or 4: for SCHED_ISO |
| 72 |
|
| 73 |
example: |
| 74 |
#> schedtool -B <PIDs> |
| 75 |
|
| 76 |
|
| 77 |
II) static priority |
| 78 |
|
| 79 |
.. is mandatory for SCHED_FIFO and SCHED_RR |
| 80 |
STATIC_PRIO is a number from 1-99; higher values mean higher priority in |
| 81 |
that scheduling class (relative to other processes in the same class). |
| 82 |
|
| 83 |
example: |
| 84 |
#> schedtool -R -p 20 <PIDs> |
| 85 |
|
| 86 |
III) CPU-affinity |
| 87 |
|
| 88 |
example: |
| 89 |
#> schedtool -a 0x3 <PIDs> |
| 90 |
|
| 91 |
IV) nice-level |
| 92 |
|
| 93 |
example |
| 94 |
#> schedtool -n 10 <PIDs> |
| 95 |
|
| 96 |
|
| 97 |
Of course you can combine policy with affinity and nice in one call. |
| 98 |
|
| 99 |
|
| 100 |
EXECUTE A NEW PROCESS: |
| 101 |
|
| 102 |
example: |
| 103 |
#> schedtool [SCHED_PARAMETERS_LIKE_ABOVE] -e command -arg1 -arg2 file |
| 104 |
|
| 105 |
This will execute "command -arg1 -arg2 file" like typing exactly this |
| 106 |
on the prompt would. |
| 107 |
|
| 108 |
|
| 109 |
|
| 110 |
CPU-affinity: |
| 111 |
|
| 112 |
To give PIDs/a command a certain CPU-affinity, use the -a switch. |
| 113 |
The value is used as a simple bitmask, the bit set to 1 denoting the |
| 114 |
PID may run on that CPU, the bit unset (0) denoting it MUST NOT. |
| 115 |
|
| 116 |
The following picture uses only 16 bits for example purpose. |
| 117 |
The resulting value is the bitwise OR of the single values for each CPU. |
| 118 |
CPU0 (the first CPU in your system) is denoted by the least significant bit |
| 119 |
(here, the one on the right side). |
| 120 |
|
| 121 |
CPU 0-----------, |
| 122 |
CPU 1-----------,| |
| 123 |
... || |
| 124 |
mask VV means value == dec |
| 125 |
-------------------------------------------------------------------------- |
| 126 |
0000 0000 0000 0001 -> run only on CPU0 -> 0x1 == 1 |
| 127 |
0000 0000 0000 1001 -> run on CPU0 AND CPU3 -> 0x9 == 9 |
| 128 |
0000 0000 0000 1111 -> run on CPU0-CPU4 -> 0xF == 15 |
| 129 |
1111 1111 1111 1111 -> run on CPU0-CPU15 -> 0xFFFF ==2^16-1 |
| 130 |
|
| 131 |
To set back to the default (PID may run on all CPUs), use the mask |
| 132 |
0xFFFFFFFF (the kernel will automatically reduce it to the max # of cpus) |
| 133 |
|
| 134 |
As a short mnemonic rule, each 'F' denotes a set of 4 CPUs |
| 135 |
(0xF: all 4 CPUs, 0xFF: all 8 CPUs, and so on ...) |
| 136 |
|
| 137 |
|
| 138 |
Since version 1.1.1 a new list mode is supported, allowing you to |
| 139 |
specify the target-CPUs without doing bitjuggling. To separate the |
| 140 |
different CPUs, use a ',': |
| 141 |
|
| 142 |
Run on CPU0 and CPU1: |
| 143 |
#> schedtool -a 0,1 <PIDs> |
| 144 |
|
| 145 |
|
| 146 |
|
| 147 |
A COMPLEX EXAMPLE: |
| 148 |
------------------ |
| 149 |
#> schedtool -R -p 50 -a 0x2 -e mplayer file.avi |
| 150 |
|
| 151 |
Execute mplayer file.avi with |
| 152 |
-SCHED_RR, |
| 153 |
-static priority 50, |
| 154 |
-affinity 0x2 (run only on CPU1). |
| 155 |
|
| 156 |
|
| 157 |
|
| 158 |
ABOUT STATIC PRIORITY: |
| 159 |
---------------------- |
| 160 |
Static priority is something completely different than the nice-level; the |
| 161 |
nice-level is added to the dynamic priority, and the higher it gets, the more |
| 162 |
the process is "punished"([2]), whereas the static priority is used to find |
| 163 |
the next process to run in the current scheduling class and the higher it is |
| 164 |
the more preferred >in general< the process is over others, e.g. when |
| 165 |
it's becoming ready after a blocking action. It will/may also preempt |
| 166 |
another, lower-prioritized process. |
| 167 |
|
| 168 |
STATIC_PRIO can't be assigned to SCHED_NORMAL or SCHED_BATCH. The |
| 169 |
code won't prevent this (a warning is printed - think UNIX), you maybe get an |
| 170 |
error later at the setting-call. |
| 171 |
|
| 172 |
v1.2.4+ support a probe mode like sched-utils; it will display each policy's |
| 173 |
min and max priority, when given the -r parameter. |
| 174 |
|
| 175 |
#> schedtool -r |
| 176 |
N: SCHED_NORMAL : prio_min 0, prio_max 0 |
| 177 |
F: SCHED_FIFO : prio_min 1, prio_max 99 |
| 178 |
R: SCHED_RR : prio_min 1, prio_max 99 |
| 179 |
B: SCHED_BATCH : prio_min 0, prio_max 0 |
| 180 |
I: SCHED_ISO : policy not implemented |
| 181 |
D: SCHED_IDLEPRIO: policy not implemented |
| 182 |
|
| 183 |
|
| 184 |
|
| 185 |
POLICY OVERVIEW + WHERE TO USE: |
| 186 |
------------------------------- |
| 187 |
SCHED_NORMAL |
| 188 |
is the standard scheduling policy and good for the average |
| 189 |
job with reasonable interaction. |
| 190 |
|
| 191 |
|
| 192 |
SCHED_FF and SCHED_RR |
| 193 |
are for real-time constraints. |
| 194 |
Don't use them for normal stuff, because they've got extremely short |
| 195 |
time-slices increasing the context-switching overhead and they won't |
| 196 |
let other processes run until they get blocked by a system-call like |
| 197 |
read() or actively free themselves from the CPU via the system-call |
| 198 |
sched_yield(2). |
| 199 |
|
| 200 |
|
| 201 |
SCHED_BATCH |
| 202 |
is encuraged for long-running and non-interactive |
| 203 |
processes; the timeslice is considerably longer (1.5s I think) - |
| 204 |
these processes, though, are interrupted almost anytime by other ones to |
| 205 |
guarantee interactiveness. |
| 206 |
Processes won't get any interactive boosts. |
| 207 |
|
| 208 |
Users are encouraged to set their computing jobs to SCHED_BATCH. Or, as |
| 209 |
admin of a compute-server, you could set their shells to SCHED_BATCH |
| 210 |
via the login-script. |
| 211 |
SCHED_BATCH has been included in 2.6.16+ kernels. |
| 212 |
|
| 213 |
|
| 214 |
SCHED_ISO [patch needed, see INSTALL] |
| 215 |
is a new mode, currently only in Con's patches, to mimick the |
| 216 |
real-time class for non-root users. To quote Con: |
| 217 |
|
| 218 |
"Any task trying to start as real time that doesn't have authority to do so |
| 219 |
will be set to SCHED_ISO. This is a non-expiring scheduler policy designed to |
| 220 |
guarantee a timeslice within a reasonable latency while preventing starvation. |
| 221 |
Good for gaming, video at the limits of hardware, video capture etc. |
| 222 |
It is best set using the schedtool by a normal user trying to start something |
| 223 |
as SCHED_RR." [ http://kerneltrap.org/node/view/2159 ] |
| 224 |
|
| 225 |
SCHED_ISO is now somewhat deprecated; SCHED_RR is now possible for normal users, |
| 226 |
albeit to a limited amount only. See newer kernels. |
| 227 |
|
| 228 |
|
| 229 |
SCHED_IDLEPRIO [patch needed, see INSTALL] |
| 230 |
SCHED_IDLEPRIO was formerly called SCHED_BATCH in the -ck patchset; the |
| 231 |
-ck SCHED_BATCH has nothing to do with the mainline SCHED_BATCH! |
| 232 |
It is a policy where the process does not get any interactive boost |
| 233 |
(through sleeping etc) and also only the idle CPU time. |
| 234 |
|
| 235 |
For more information you can read the file SCHED_DESIGN as a good overview, but |
| 236 |
be warned, that *some* things may be outdated by the new O(1)-patches. |
| 237 |
Then proceed to the man-page for sched_setscheduler(2) - it gives a very good |
| 238 |
overview and is _highly_ recommended. |
| 239 |
|
| 240 |
|
| 241 |
|
| 242 |
FINAL WORDS / CONTACT: |
| 243 |
---------------------- |
| 244 |
If you feel you are able to make this software better or you can report |
| 245 |
some numbers with the different scheduling policies, please contact me. |
| 246 |
Feedback is appreciated. |
| 247 |
Please use freshmeat.net's "contact author"-feature to do so. |
| 248 |
|
| 249 |
|
| 250 |
|
| 251 |
THANKS: |
| 252 |
------- |
| 253 |
Thanks fly out to (in no particular order) |
| 254 |
|
| 255 |
o Ingo Molnar |
| 256 |
o Con Kolivas for suggesting the -e switch, submitting patch for SCHED_ISO |
| 257 |
o Samuli Kärkkäinen, the quality-verification-engineer |
| 258 |
o my girlfriend and supporting friends |
| 259 |
|
| 260 |
|
| 261 |
|
| 262 |
- -- - -- - |
| 263 |
|
| 264 |
[2]: |
| 265 |
A bit simplified - it's not all that easy :-) Go on to Appendix A for |
| 266 |
an example on how scheduling is performed in Solaris. |
| 267 |
|
| 268 |
[3]: (see also [4]) |
| 269 |
Nice level and dynamic priority are somewhat "strange": sometimes, higher |
| 270 |
values mean higher priority (to be put on CPU when process ready); sometimes, |
| 271 |
lower values mean higher priority. |
| 272 |
|
| 273 |
At the moment, I confirm my system being the following: |
| 274 |
- The nice-level is SUBSTRACTED from the dynamic priority, thus giving a |
| 275 |
Process nice -10 means INCREASING it's priority (valuewise) by 10 points. |
| 276 |
- In the end, higher values mean higher priority. |
| 277 |
Use "ps -eO pri,nice" and look for yourself. |
| 278 |
|
| 279 |
|
| 280 |
|
| 281 |
APPENDIX A: INTRODUCTION TO MULTI-LEVEL-QUEUE-FEEDBACK-SCHEDULING |
| 282 |
----------------------------------------------------------------- |
| 283 |
This appendix uses information originating from the "System |
| 284 |
Programming I"-course at my university; the examples are using Solaris |
| 285 |
2.X, but I think, Linux is doing it in a >similar< (albeit not that |
| 286 |
overcomplicated) way. |
| 287 |
|
| 288 |
Solaris has 60 wait queues for the class TS (TimeSharing); there are |
| 289 |
other classes as system and RealTime as well. |
| 290 |
A TS-queue looks like this (which is basically a set of rules): |
| 291 |
|
| 292 |
Level ts_quantum ts_tqexp ts_maxwait ts_lwait ts_slpret |
| 293 |
0 200 0 0 50 50 |
| 294 |
. . . . . . |
| 295 |
. . . . . . |
| 296 |
. . . . . . |
| 297 |
44 40 34 0 55 55 |
| 298 |
45 40 35 0 56 56 |
| 299 |
. . . . . . |
| 300 |
. . . . . . |
| 301 |
. . . . . . |
| 302 |
59 20 49 32000 59 59 |
| 303 |
|
| 304 |
|
| 305 |
You can display all these numbers on a Solaris-box using |
| 306 |
# dispadmin -c TS -g |
| 307 |
|
| 308 |
Level: |
| 309 |
just a queue ID. |
| 310 |
|
| 311 |
ts_quantum: |
| 312 |
the maximum timeslice - the maximum time, the process |
| 313 |
is allowed to run continuously until it's interrupted and another |
| 314 |
process is physically put on the CPU. |
| 315 |
|
| 316 |
ts_tqexp: |
| 317 |
if the process uses it's timeslice entirely, it's put into that |
| 318 |
queue [cf. Level]. |
| 319 |
|
| 320 |
ts_maxwait: |
| 321 |
maximum time to wait for the process in that queue without being |
| 322 |
run, in seconds. |
| 323 |
|
| 324 |
ts_lwait: |
| 325 |
if one process stays too long in the current queue, it's put |
| 326 |
into that queue. |
| 327 |
ts_slpret: |
| 328 |
queue to put the process in after it was blocked, e.g. in a |
| 329 |
syscall. |
| 330 |
|
| 331 |
|
| 332 |
Let's start an imaginary process and look what's happening: |
| 333 |
Start -> |
| 334 |
-> queue 59, ts_quantum 20ms -> queue 49, ts_quantum 40ms |
| 335 |
-> queue 39, ts_quantum 80ms -> queue 29, ts_quantum 120ms |
| 336 |
-> queue 19, ts_quantum 160ms -> queue 9, ts_quantum 200ms |
| 337 |
|
| 338 |
You see how this only number-crunching process is put into queues that |
| 339 |
allow him to use the CPU for more and more time. |
| 340 |
|
| 341 |
|
| 342 |
Now let's do the process a blocking action: |
| 343 |
|
| 344 |
queue 0, ts_quantum 200ms, after e.g. 100ms blocking call! --> |
| 345 |
queue 50, ts_quantum 40ms, after e.g. 20ms blocking call! --> |
| 346 |
queue 58, ts_quantum 40ms |
| 347 |
|
| 348 |
Now you see how this process is "punished", or from another point of |
| 349 |
view, the scheduler thinks, this is an interactive process computing a |
| 350 |
bit and then outputting data, so it's put into a queue that has |
| 351 |
averagely the same computing time until a this output occurs. |
| 352 |
So the schedulers knows pretty much about the current state of the |
| 353 |
machine and can plan accordingly. |
| 354 |
|
| 355 |
The dynamic priority is something like an age - the higher[4] it is the more |
| 356 |
likely you get a seat :) (the CPU). |
| 357 |
There are 4 rules: |
| 358 |
-if a process is not run, it ages - the dynamic priority rises. |
| 359 |
-if a process is running, the dynamic priority is lowered. |
| 360 |
-the process with the (at the moment) highest priority is put onto the CPU. |
| 361 |
-processes with lower priority are/can be interrupted by processes with |
| 362 |
higher priority. |
| 363 |
|
| 364 |
This guarantees that no process is running for too long and others are |
| 365 |
waiting for too long. |
| 366 |
|
| 367 |
-End Of Documentation |
| 368 |
|
| 369 |
- -- - |
| 370 |
|
| 371 |
[4]: |
| 372 |
higher (priority) in means of more important to run in the near future; |
| 373 |
higher does not automatically mean a higher value in it's PCB[5] |
| 374 |
|
| 375 |
[5]: |
| 376 |
Process Control Block, some structure where important accounting and |
| 377 |
other useful information are stored, usually only used by the kernel. |