2008-11-25 77 views
18

給定一個Linux內核oops,你如何去診斷問題?在輸出中,我可以看到似乎給出一些線索的堆棧跟蹤。有什麼工具可以幫助找到問題嗎?你追蹤哪些基本程序?你如何診斷內核哎呀?


Unable to handle kernel paging request for data at address 0x33343a31 
Faulting instruction address: 0xc50659ec 
Oops: Kernel access of bad area, sig: 11 [#1] 
tpsslr3 
Modules linked in: datalog(P) manet(P) vnet wlan_wep wlan_scan_sta ath_rate_sample ath_pci wlan ath_hal(P) 
NIP: c50659ec LR: c5065f04 CTR: c00192e8 
REGS: c2aff920 TRAP: 0300 Tainted: P   (2.6.25.16-dirty) 
MSR: 00009032 CR: 22082444 XER: 20000000 
DAR: 33343a31, DSISR: 20000000 
TASK = c2e6e3f0[1486] 'datalogd' THREAD: c2afe000 
GPR00: c5065f04 c2aff9d0 c2e6e3f0 00000000 00000001 00000001 00000000 0000b3f9 
GPR08: 3a33340a c5069624 c5068d14 33343a31 82082482 1001f2b4 c1228000 c1230000 
GPR16: c60f0000 000004a8 c59abbe6 0000002f c1228360 c340d6b0 c5070000 00000001 
GPR24: c2aff9e0 c5070000 00000000 00000000 00000003 c2cc2780 c2affae8 0000000f 
NIP [c50659ec] mesh_packet_in+0x3d8/0xdac [manet] 
LR [c5065f04] mesh_packet_in+0x8f0/0xdac [manet] 
Call Trace: 
[c2aff9d0] [c5065f04] mesh_packet_in+0x8f0/0xdac [manet] (unreliable) 
[c2affad0] [c5061ff8] IF_netif_rx+0xa0/0xb0 [manet] 
[c2affae0] [c01925e4] netif_receive_skb+0x34/0x3c4 
[c2affb10] [c60b5f74] netif_receive_skb_debug+0x2c/0x3c [wlan] 
[c2affb20] [c60bc7a4] ieee80211_deliver_data+0x1b4/0x380 [wlan] 
[c2affb60] [c60bd420] ieee80211_input+0xab0/0x1bec [wlan] 
[c2affbf0] [c6105b04] ath_rx_poll+0x884/0xab8 [ath_pci] 
[c2affc90] [c018ec20] net_rx_action+0xd8/0x1ac 
[c2affcb0] [c00260b4] __do_softirq+0x7c/0xf4 
[c2affce0] [c0005754] do_softirq+0x58/0x5c 
[c2affcf0] [c0025eb4] irq_exit+0x48/0x58 
[c2affd00] [c000627c] do_IRQ+0xa4/0xc4 
[c2affd10] [c00106f8] ret_from_except+0x0/0x14 
--- Exception: 501 at __delay+0x78/0x98 
    LR = cfi_amdstd_write_buffers+0x618/0x7ac 
[c2affdd0] [c0163670] cfi_amdstd_write_buffers+0x504/0x7ac (unreliable) 
[c2affe50] [c015a2d0] concat_write+0xe4/0x140 
[c2affe80] [c0158ff4] part_write+0xd0/0xf0 
[c2affe90] [c015bdf0] mtd_write+0x170/0x2a8 
[c2affef0] [c0073898] vfs_write+0xcc/0x16c 
[c2afff10] [c0073f2c] sys_write+0x4c/0x90 
[c2afff40] [c0010060] ret_from_syscall+0x0/0x38 
--- Exception: c01 at 0xfd98a50 
    LR = 0x10003840 
Instruction dump: 
419d02a0 98010009 800100a4 2f800003 419e0508 2f170000 419a0098 3d20c507 
a0e1002e 81699624 39299624 7f8b4800 419e007c a0610016 7d264b78 
Kernel panic - not syncing: Fatal exception in interrupt 
Rebooting in 1 seconds.. 

回答

19

Oops提供了一些有用的信息來診斷崩潰。它從崩潰的地址開始,原因(「訪問不良區域」)和寄存器的內容。呼叫跟蹤回答了「我們如何到達這裏」的問題。列表中的第一項發生在最近。反向工作時,發生中斷(do_IRQ),因爲Atheros WiFi適配器收到一個數據包(ath_rx_poll)。例程將它傳遞給通用WiFi代碼(ieee80211_input),後者又將其傳遞給網絡堆棧(netif_receive_skb)。

找出確切的代碼引起的問題,您可以運行

gdb /usr/src/linux/vmlinux 

,然後拆開有問題的功能,這可能是mesh_packet_in()。可能,因爲錯誤指令(0xc50659ec)看起來不在mesh_packet_in()(0xc5065f04)之外。你也可以嘗試gdb命令

(gdb) info line 0xc50659ec 

找出哪個函數包含這個地址。

1

http://oss.sgi.com/projects/kdb/

安裝此進內核,那麼當它糟糕的,你會被扔進了GDB一樣的界面,你可以撥開周圍。但是,它看起來像manet模塊是一個糟糕的指針。

5

您應該首先嚐試查找崩潰的代碼的來源。在特定情況下,分析聲稱崩潰發生在manet驅動程序的mesh_packet_in中,偏移量爲0x8f0。它還報告此時的指示是419d02a0 98010009 ...因此,使用「objdump -d」檢查模塊,以確認報告的功能/偏移是否正確。然後檢查源代碼是否在做什麼;您可以使用寄存器列表再次確認您正在查看正確的指令。

當您知道C語句錯誤時,您需要閱讀源代碼以找出僞造數據來自哪裏。