Commit 34429b5b6638e1a787b65bc8909928175fb30c49.patch

  • avatar
  • darwish
  • Wed Jan 13 18:55:11 CET 2010
On light Debugging and COMMON symbols.
CuteNotes.txt
(296 / 0)
  
12131213 segment. They all function using the same registers and simple circuity. Intel
12141214 elaborates: "Note that if the segment registers are not reloaded, execution
12151215 continues using the descriptor attributes loaded during protected mode."
1216
1217
1218Using light debugging techniques:
1219---------------------------------
1220
1221* References:
1222 - Ulrich Drepper, "How to Write Shared Libraries", 2.1 Data Definitions,
1223 http://www.webcitation.org/5mkmxB9Fx
1224 - Ian Lance Taylor, "Linkers part 5", http://www.airs.com/blog/archives/42,
1225 http://www.webcitation.org/5mkmiZ8FN
1226 - Binutils bug report, "ELF linker loads member of archive for common symbol",
1227 http://sources.redhat.com/bugzilla/show_bug.cgi?id=1811
1228 http://www.webcitation.org/5mknnyUe2
1229 - Minor bug-report at http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42726
1230
1231* Today while writing page allocator test code, I faced a very strange bug: once
1232 any of the kernel's text, data, or bss sections exceeds something close to 20
1233 KBs, the MPtables code panic()s at different and unpredictable locations. The
1234 failure is something like:
1235
1236 PANIC: MPTables - 1217939456 CPUs found, only 32 supported
1237
1238* To give some context, there's a system global called 'nr_cpus' representing
1239 the number of CPUs in the system. So far, we know the number of CPUs from the
1240 MP tables Conf entries, where each CPU entry represents a CPU in the system.
1241 Thus, finding 8 CPU entries means that the machine has 8 usable cores.
1242
1243 This global var 'nr_cpus' is only increased at one and _only_ one place in
1244 the system, at the MP tables parsing code:
1245
1246 /* We call this for each MP CPU entry in the system */
1247 static void parse_cpu(void *addr)
1248 {
1249 struct mpc_cpu *cpu = addr;
1250
1251 if (!cpu->enabled)
1252 return;
1253
1254 if (nr_cpus >= CPUS_MAX) {
1255 panic("MP - %d CPUs found, only %d supported",
1256 nr_cpus, CPUS_MAX);
1257 }
1258 ...
1259
1260 ++nr_cpus;
1261 }
1262
1263* Inspecting the BIOS MP tables more closely, the MP pointer table is at 0xfbb60
1264 and its checksum and signature are valid. The MP configuration table header was
1265 also valid and checksummed at 0xfba00.
1266
1267 The problem, it seems, is that the conf table entries got somehow corrupted
1268 (thus the wrong number of cpus value), or there's a bug in our parser code.
1269
1270* Unfortunately, and as usual for such problems, inserting printk()s at any
1271 location skews the results randomly, making any printk() useless.
1272
1273 So we try less invasive debugging techniques like writing to a memory area
1274 using a _single_ inlined asm instruction, and inspecting this memory area
1275 using an emulator afterwars. This worked and made the problem repeatable!
1276
1277* First, and to test the parser validity, we choose a memory location that we
1278 know before hand it's going to be zero; I chose physical 0x500. In parse_cpu
1279 we add a very small asm instruction as follows:
1280
1281 /* 8-byte instruction */
1282 asm volatile("incq (0xffffffff80000500);");
1283
1284 if (nr_cpus >= CPUS_MAX) {
1285 panic("MP - %d CPUs found, only %d supported",
1286 nr_cpus, CPUS_MAX);
1287 }
1288
1289 And thankfully this instruction was light enough, so the system paniced
1290 at the same point. Our panic() just calls printk() and hlt the system. So
1291 after the system halts, we move to the qemu console handler and check the
1292 value at address 0x500. Good for us, we found the value there to be 1, so
1293 it seems our parser is behaving correctly.
1294
1295* Now the problem is getting more mysterious. If the parser is behaving
1296 correctly, and the value at address 0x500 is completely valid, how come
1297 'nr_cpus' have those very strange values? Remember that we put the 'incl
1298 0x500' instruction at the same place of '++nr_cpus', so the two addresses
1299 should have the exact same values or something is _completely_ messed up.
1300
1301 The 'nr_cpus' variable still yielded the value of "1217939456".
1302
1303* So what did really corrupt the BSS-stored variable nr_cpus at this _very_
1304 early stage?
1305
1306 The very first thing we do in the C part of the kernel is to clear the entire
1307 BSS section to zero. So assuming the kernel was copied/setup correctly (which
1308 we're not very sure of yet), the problem probably occurs _between_ clearing
1309 the BSS section and parsing the MP tables.
1310
1311 OK, so we'll try to check the value of the 'nr_cpus' variable at different
1312 places in the kernel between clearing BSS and MP parsing using this 14-byte
1313 code segment:
1314
1315 asm volatile ("movl %0, %%eax;"
1316 "movl %%eax, (0xffffffff80000500); hlt;"
1317 : : "m"(nr_cpus) : "memory");
1318
1319 And then use the qemu console to read the address 0x500 'xp 0x500'.
1320
1321* Very important discovery: Adding the above asm fragment _directly_ after
1322 clearning the BSS section yielded the same corrupt value of "1217939456"
1323 for the 'nr_cpus' variable! Something is seriously wrong cause 'nr_cpus'
1324 is clearly in the BSS section:
1325
1326 $ readelf -a kernel.elf
1327
1328 Section Headers:
1329 [Nr] Name Type Address Offset
1330 Size EntSize Flags Link Info Align
1331 ...
1332 [ 8] .bss NOBITS ffffffff8010f5e0 0010f5e0
1333 00000000000028c8 0000000000000000 WA 0 0 32
1334
1335 Symbol table '.symtab' contains 183 entries:
1336 Num: Value Size Type Bind Vis Ndx Name
1337 ...
1338 151: ffffffff80111d80 4 OBJECT GLOBAL DEFAULT 8 nr_cpus
1339
1340* I think we're getting closer. Notice from above ELF analysis, the BSS
1341 section starts at 0xffffffff8010f600, and it has a length of 0x28c8:
1342
1343 This means that from reading the ELF:
1344
1345 BSS start = 0xffffffff8010f5e0
1346 BSS end = 0xffffffff8010f5e0 + 0x28c8 = 0xffffffff800111ea8
1347 'nr_cpus' is at 0xffffffff80111d80
1348
1349 But very strangely, here are the results of printing __bss_start and
1350 __bss_end symbols exported by our kernel linker script we wrote since the
1351 dawn of time:
1352
1353 BSS start = 0xffffffff8010f5e0
1354 !! BSS end = 0xffffffff801115e0
1355 'nr_cpus' is outside of this region!
1356
1357 As you see, somehow the kernel sees a _trimmed_ BSS section from the
1358 linker script, and to add insult to injury, 'nr_cpus' is not included
1359 in that trimmed region.
1360
1361* Things began to make sense: the kernel obviously has a misguided view about
1362 its bss section. This is disastrous cause at this level, we're the ones
1363 responsible of clearing this BSS area before use. Ofcourse, we directly move
1364 to our linker script BSS section to see what's wrong:
1365
1366 SECTIONS
1367 {
1368 ...
1369 .bss : {
1370 __bss_start = .;
1371 *(EXCLUDE_FILE (*head.o *e820.o) .bss)
1372 __bss_end = .;
1373 }
1374 __kernel_end = .;
1375 ...
1376 }
1377
1378 Things should be fine and everything is at its place, but the corruption still
1379 exists. So I tried a small modification of:
1380
1381 .bss : {
1382 __bss_start = .;
1383 *(EXCLUDE_FILE (*head.o *e820.o) .bss)
1384 }
1385 __bss_end = .;
1386 __kernel_end = .;
1387
1388 And things worked magically! Ofcourse leaving the issue as this will be immature
1389 as we didn't understand yet why this trivial change produced this huge effect.
1390
1391* Trying to debug the issue further, we restore the linker script to its original
1392 state and let the linker be more verbose by using the -M flag. Inspecting the
1393 output, a very interesting thing came up when the linker displayed the final
1394 memory map:
1395
1396 ('...' denotes trimmed output lines)
1397
1398 Allocating common symbols // mm, ..
1399 Common symbol size file
1400
1401 test_symbol 0x8 mmap.o
1402 ioapic_descs 0x60 ioapic.o
1403 nr_ioapics 0x4 ioapic.o
1404 nr_cpus 0x4 smpboot.o
1405 mp_irqs 0x700 mptables.o
1406 nr_mpcirqs 0x4 mptables.o
1407 cpu_descs 0x100 smpboot.o
1408
1409 Linker script and memory map
1410
1411 0x0000000000100000 . = 0x100000
1412
1413 .text.head 0x0000000000100000 0x310
1414 ...
1415 .bss 0xffffffff8010f5e0 0x28c8 load address 0x000000000010692c
1416 0xffffffff8010f5e0 __bss_start = .
1417 *(EXCLUDE_FILE(*e820.o *head.o) .bss)
1418 .bss 0xffffffff8010f5e0 0x0 common.o
1419 .bss 0xffffffff8010f5e0 0x0 main.o
1420 .bss 0xffffffff801105e0 0x0 i8259.o
1421 .bss 0xffffffff801105e0 0x0 apic.o
1422 ...
1423 .bss 0xffffffff80110610 0x0 lib/string.o
1424 .bss 0xffffffff80110620 0xfc0 lib/printf.o
1425 0xffffffff801115e0 __bss_end = .
1426 COMMON 0xffffffff801115e0 0x64 ioapic.o
1427 0xffffffff801115e0 ioapic_descs // Outside of BSS!
1428 0xffffffff80111640 nr_ioapics // Outside of BSS!
1429 *fill* 0xffffffff80111644 0x1c 00
1430 COMMON 0xffffffff80111660 0x704 mptables.o
1431 0xffffffff80111660 mp_irqs // Outside of BSS!
1432 0xffffffff80111d60 nr_mpcirqs // Outside of BSS!
1433 *fill* 0xffffffff80111d64 0x1c 00
1434 COMMON 0xffffffff80111d80 0x120 smpboot.o
1435 0xffffffff80111d80 nr_cpus // Outside of BSS!
1436 0xffffffff80111da0 cpu_descs // Outside of BSS!
1437 COMMON 0xffffffff80111ea0 0x8 mmap.o
1438 0xffffffff80111ea0 test_symbol // Outside of BSS!
1439 0xffffffff80111ea8 __kernel_end = .
1440
1441 So at last we got it, the linker script didn't put variables like nr_cpus,
1442 nr_ioapics etc in the BSS section, thus they were not cleared in our clear_bss()
1443 function. As you can see, those symbols are allocated past the __bss_end symbol.
1444
1445* OK, so we inspect the LD manual for this 'COMMON' section, to find:
1446
1447 "A special notation is needed for common symbols, because in many object file
1448 formats common symbols do not have a particular input section. The linker treats
1449 common symbols as though they are in an input section named ‘COMMON’ ..."
1450
1451* Unfortunately the documentation was a bit vague, and there was no explanation
1452 of what "common symbols" were really used for. So we dig deeper to find this
1453 great explanation from Ulrich's paper on shared libraries:
1454
1455 "Common variables are widely used in fortran, but they got used in C and C++
1456 as well to work around mistakes of programmers. Since in the early days people
1457 used to drop the 'extern' keyword from variable definitions, in the same way
1458 it is possible to drop it from function declaration, the compiler often had
1459 multiple definitions of the same variable in different files.
1460
1461 To help the poor and clueless programmer, the C/C++ compiler normally generates
1462 common variables for uninitialized definitions such as `int foo;'
1463
1464 For common variables, there can be more than one definition, and they all get
1465 unified in one location in the output file .. their values does not need to be
1466 stored in the ELF file."
1467
1468* And also from Ulrich, a description of -fno-common which is more accurate than
1469 gcc's manual:
1470
1471 "If the programmer uses the compiler command-line option -fno-common, the
1472 generated code will contain uninitialized variables instead of common variables
1473 if a vriable definition has no initializer. ..
1474
1475 The result at run-time is the same as for common variable, no value stored in
1476 the file. But the representation in the object file is different [BSS section],
1477 and it allows the linker to find multiple definitions and flag them as errors."
1478
1479* And finally from Ian Lance Taylor's series on linkers:
1480
1481 "A section index of SHN_COMMON (0xfff2) indicates a common symbol. Common
1482 symbols were invented to handle Fortran common blocks, and they are also often
1483 used for uninitialized global variables in C.
1484
1485 A common symbol has unusual semantics. Common symbols have a value of zero, but
1486 set the size field to the desired size. If one object file has a common symbol
1487 and another has a definition, the common symbol is treated as an undefined
1488 reference.
1489
1490 If there is no definition for a common symbol, the program linker acts as though
1491 it saw a definition initialized to zero of the appropriate size. Two object
1492 files may have common symbols of different sizes, in which case the program
1493 linker will use the largest size.
1494
1495 Implementing common symbol semantics across shared libraries is a touchy
1496 subject, somewhat helped by the recent introduction of a type for common symbols
1497 as well as a special section index (see the discussion of symbol types below)."
1498
1499 (Also check nm(1) on "C"/COMMON symbols, and --warn-common on ld(1))
1500
1501* And by now everything is super-clear, and the fix is only adding the COMMON
1502 symbols within the output object BSS boundaries:
1503
1504 .bss : {
1505 __bss_start = .;
1506 *(EXCLUDE_FILE (*head.o *e820.o) .bss)
1507 *(EXCLUDE_FILE (*head.o *e820.o) COMMON)
1508 __bss_end = .;
1509 }
1510
1511 What a bug this was.