Commit c65c8b1d2daaeda00983fa9394f018b8fd62fcda.patch

  • avatar
  • darwish
  • Sun Apr 25 18:52:25 CEST 2010
On debugging periodic timers.
  
23032303
23042304 - Let code binary single-stepping be our last debugging resort to make sure we
23052305 understand the algorithms and code we've just written well enough.
2306
2307Debugging periodic timers using statistical data:
2308-------------------------------------------------
2309
2310* A problem appeared while running the scheduler with the APIC timer instead of
2311 the PIT. Using observation, it was clear that the running intervals given
2312 to processes by the APIC timer are twice as long as the ones given by the PIT.
2313 Both timers were supposedly programmed to produce 40 millisecond intervals,
2314 so where's the fault?
2315
2316 Note: we are talking about monotonic PIT and APIC interrupts since those are
2317 the only ones used by the scheduler; we don't use the one-shot variants here.
2318
2319* One of the best ways I found to debug this problem, instead of depending on
2320 shaky heuristics, is to store the occurrence of the timer ticks interrupts in
2321 a statistical manner.
2322
2323 An example of this statistical method would be finding the number of timer
2324 ticks occurrence after a sequence of independently-programmed delay intervals.
2325
2326* How to do so without sacrificing simplicity?
2327
2328 Setup the scheduler periodic timer through the PIT or the APIC:
2329 `pit_monotonic(40)' OR `apic_monotonic(40)'.
2330
2331 Setup a counter representing the number of scheduler timer ticks so far:
2332
2333 .globl ticks_counter
2334 ticks_counter:
2335 .quad 0
2336
2337 .globl timer_handler
2338 timer_handler:
2339 ...
2340 incq ticks_counter
2341 ...
2342
2343 Then, do _independent_ time delays and store the number of timer ticks so far
2344 after each delay. This way, we can have a statistical comparison between the
2345 two timers, the one used for the delay and the one used for the ticks. An
2346 example of this would be:
2347
2348 /* ticks[i]: number of scheduler timer ticks after the `i'th delay
2349 * interval, having in mind that we've setup both the ticks and delay
2350 * timers with the same time priod.
2351 *
2352 * If `1 2 3 4 5 6 ..' is printed, then the two timers are in harmony.
2353 * Having the sequence '3 5 7 9 11 ..' means that the ticks timer runs
2354 * twice as fast as the delay one. */
2355 for (int i = 0; i < 100; i++)
2356 pit_mdelay(52), ticks[i] = ticks_counter;
2357 for (int j = 0; j < 100; j++)
2358 printk("%d ", ticks[j]);
2359
2360 Comparing the statistical results, we found that the PIT in monotonic mode
2361 ticks twice as fast as the PIT programmed in delay mode. It also ticked twice
2362 as fast as the APIC timer in delay mode. This heavily suggests something wrong
2363 in our PIT monotonic-mode setup!
2364
2365 Meanwhile, setting up the scheduler ticks using the APIC timer ran in perfect
2366 harmony with the PIT-programmed delays. This heavily re-enforces above paragr-
2367 aph reasoned speculation: something is wrong in our PIT periodic mode code.
2368
2369* Now that the problem was isolated, the error was a misunderstanding of the
2370 PIT's mode-3 behavior; mode-2 fitted our goals better.
2371
2372 Interestingly, I found this method really helpful for testing periodic timers
2373 code in general. We already test timer delays code by calibrating its produced
2374 delays against wall clock time; this method complement the tests by also
2375 testing the monotonic/periodic interrupts code.