[LU-3596] deadlock in kiblnd Created: 16/Jul/13 Updated: 01/Sep/17 Resolved: 01/Sep/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 1.8.9 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Oz Rentas | Assignee: | Bruno Faccini (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 9116 |
| Description |
|
We had a few OSSes crash at NOAA recently due to an NMI deadlock detected error. I was able to get a vmcore from one and analyze it, and it looks like we are hitting It appears that Thanks. |
| Comments |
| Comment by Bruno Faccini (Inactive) [ 16/Jul/13 ] |
|
Hello Kit, |
| Comment by Kit Westneat (Inactive) [ 16/Jul/13 ] |
|
Hi Bruno, All the last_rans from the runnables seem similar: Is that the correct field in task to be looking at? This is the bt for the oldest last_ran: I looked through the runnable kiblnd_sd_* tasks (there were only 2), and saw this, which looks like Here is the only other runnable IB related thread: Thanks. |
| Comment by Bruno Faccini (Inactive) [ 17/Jul/13 ] |
|
Hello Cory, |
| Comment by Kit Westneat (Inactive) [ 17/Jul/13 ] |
|
Hi Bruno, There are actually 24 CPUs. I may have misinterpreted running vs runnable. I looked for all the tasks with RU in ps, but is that running? I've attached the requested output. The vmcore is 2GB, is there a server I could put it on? Thanks, |
| Comment by Bruno Faccini (Inactive) [ 17/Jul/13 ] |
|
RU in ps stands for RUnnable and not only running. That was my wish to get all runnable vs only the running ones. |
| Comment by Bruno Faccini (Inactive) [ 18/Jul/13 ] |
|
Having a look to crash-dump extracts you posted in lu-3596.out file, I am less and less convinced it is a dup of Also, is the OSS running wit /proc/sys/kernel/nmi_watchdog set ? |
| Comment by Bruno Faccini (Inactive) [ 23/Jul/13 ] |
|
Ok, from the crash-dump, scenario is different than for PID: 9331 TASK: ffff81063f2707a0 CPU: 12 COMMAND: "ipoib"
#0 [ffff810c3fc43f20] crash_nmi_callback at ffffffff8007c396
#1 [ffff810c3fc43f40] do_nmi at ffffffff800658e5
#2 [ffff810c3fc43f50] nmi at ffffffff80064ecf
[exception RIP: __smp_call_function_many+150]
RIP: ffffffff80077925 RSP: ffff810c3dde7c30 RFLAGS: 00000297
RAX: 0000000000000016 RBX: ffff810c3dde7cb0 RCX: 0000000000000000
RDX: 00000000000000ff RSI: 00000000000000ff RDI: 00000000000000c0
RBP: 0000000000000017 R8: 0000000000000018 R9: 0000000000000028
R10: ffff810c3dde7bd0 R11: ffff8106293e7600 R12: 0000000000000001
R13: 0000000000000000 R14: ffffffff800777f9 R15: ffff810c3dde7cb0
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
#3 [ffff810c3dde7c30] __smp_call_function_many at ffffffff80077925
#4 [ffff810c3dde7c78] smp_call_function_many at ffffffff80077a27
#5 [ffff810c3dde7ca8] smp_call_function at ffffffff80077b18
#6 [ffff810c3dde7cf8] on_each_cpu at ffffffff80096bcd
#7 [ffff810c3dde7d18] __remove_vm_area at ffffffff800d55f8
#8 [ffff810c3dde7d38] remove_vm_area at ffffffff800d5627
#9 [ffff810c3dde7d48] __vunmap at ffffffff800d5673
#10 [ffff810c3dde7d68] mlx4_buf_free at ffffffff884f8352 [mlx4_core]
#11 [ffff810c3dde7d88] mlx4_ib_destroy_qp at ffffffff88542219 [mlx4_ib]
#12 [ffff810c3dde7dc8] ib_destroy_qp at ffffffff884a1bb0 [ib_core]
#13 [ffff810c3dde7df8] ipoib_cm_tx_reap at ffffffff88741999 [ib_ipoib]
#14 [ffff810c3dde7e38] run_workqueue at ffffffff8004d8ac
#15 [ffff810c3dde7e78] worker_thread at ffffffff8004a1e5
#16 [ffff810c3dde7ee8] kthread at ffffffff80032c45
#17 [ffff810c3dde7f48] kernel_thread at ffffffff8005dfc1
PID: 15504 TASK: ffff810c390da0c0 CPU: 17 COMMAND: "kiblnd_sd_13"
#0 [ffff810c3fd3af20] crash_nmi_callback at ffffffff8007c396
#1 [ffff810c3fd3af40] do_nmi at ffffffff800658e5
#2 [ffff810c3fd3af50] nmi at ffffffff80064ecf
[exception RIP: __write_lock_failed+9]
RIP: ffffffff80062191 RSP: ffff810619ad7db0 RFLAGS: 00000087
RAX: 0000000000000286 RBX: ffff8105db4da0c0 RCX: 0000000000000100
RDX: 000000000000029a RSI: 00000000fffffffb RDI: ffffffff8b14d39c
RBP: ffff8105db4da0c0 R8: 0000000000000001 R9: 00000000ffffffff
R10: 0000000000000000 R11: 000000000000000a R12: 00000000fffffffb
R13: 0000000000000001 R14: ffffc20010d5fd10 R15: 000000000000000c
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
#3 [ffff810619ad7db0] __write_lock_failed at ffffffff80062191
#4 [ffff810619ad7db0] _write_lock_irqsave at ffffffff80064b83
#5 [ffff810619ad7db8] kiblnd_close_conn at ffffffff8b12c844 [ko2iblnd]
#6 [ffff810619ad7dd8] kiblnd_tx_complete at ffffffff8b12f20d [ko2iblnd]
#7 [ffff810619ad7e58] kiblnd_scheduler at ffffffff8b136769 [ko2iblnd]
#8 [ffff810619ad7f48] kernel_thread at ffffffff8005dfc1
PID: 15515 TASK: ffff810c39c237e0 CPU: 0 COMMAND: "kiblnd_connd"
#0 [ffffffff804b8dc0] crash_kexec at ffffffff800b1192
#1 [ffffffff804b8e80] die_nmi at ffffffff80065285
#2 [ffffffff804b8ea0] nmi_watchdog_tick at ffffffff80065a66
#3 [ffffffff804b8ef0] default_do_nmi at ffffffff80065609
#4 [ffffffff804b8f40] do_nmi at ffffffff800658f1
#5 [ffffffff804b8f50] nmi at ffffffff80064ecf
[exception RIP: __write_lock_failed+15]
RIP: ffffffff80062197 RSP: ffff81061978dc90 RFLAGS: 00000087
RAX: ffffc20000000000 RBX: ffff8107eb940140 RCX: 0000000000000001
RDX: 0000000000006000 RSI: 0000000000000003 RDI: ffffffff8032d42c
RBP: ffffc20000000000 R8: 0000000000000000 R9: ffff810c39c237e0
R10: ffff8105645d2000 R11: 0000000000002000 R12: ffffffffffffffff
R13: 0000000000007000 R14: 0000000000000001 R15: 0000000000000002
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
#6 [ffff81061978dc90] __write_lock_failed at ffffffff80062197
#7 [ffff81061978dc90] _write_lock at ffffffff80064a7d
#8 [ffff81061978dc98] __get_vm_area_node at ffffffff800d51c1
#9 [ffff81061978dcd8] __vmalloc_node at ffffffff800d5952
#10 [ffff81061978dcf8] kiblnd_create_tx_pool at ffffffff8b127e0e [ko2iblnd]
#11 [ffff81061978dd68] kiblnd_pool_alloc_node at ffffffff8b1247a9 [ko2iblnd]
#12 [ffff81061978ddc8] kiblnd_get_idle_tx at ffffffff8b12d8d0 [ko2iblnd]
#13 [ffff81061978ddd8] kiblnd_check_sends at ffffffff8b12e9b7 [ko2iblnd]
#14 [ffff81061978dde8] kiblnd_check_txs at ffffffff8b12c22c [ko2iblnd]
#15 [ffff81061978de48] kiblnd_check_conns at ffffffff8b12ea68 [ko2iblnd]
#16 [ffff81061978dea8] kiblnd_connd at ffffffff8b136063 [ko2iblnd]
#17 [ffff81061978df48] kernel_thread at ffffffff8005dfc1
where : _ 9331/"ipoib" thread owns vmlist_lock, during mlx4 driver buffer unmap, but then wants all others CPUs to flush their TLB and thus sent an IPI to each of them and it is now busy waiting for one reluctant peer ... _ 15504/"kiblnd_sd_13" thread is spinning/waiting on kiblnd_data.kib_global_lock in kiblnd_close_conn() but with interrupts disabled, and thus is likely to be the one still missing the IPI and to flush its TLB !! _ 15515/"kiblnd_connd" thread owns kiblnd_data.kib_global_lock since kiblnd_check_conns() but is now spinning/waiting for vmlist_lock due to kmem alloc in kiblnd_create_tx_pool(). So 9331 blocks 15515 who blocks 15504 who blocks 9331 !! But even if it is a new scenario than the 2 already described in |
| Comment by Kit Westneat (Inactive) [ 23/Jul/13 ] |
|
Thanks for the analysis! I have ported the patch from Does that look correct? |
| Comment by Bruno Faccini (Inactive) [ 24/Jul/13 ] |
|
Yes, patch/back-port looks ok. Need to wait Liang's feed-back to confirm. Also I had to re-trigger auto-tests due to un-related failure. |
| Comment by Oz Rentas [ 01/Sep/17 ] |
|
This is resolved. Please close. |
| Comment by Peter Jones [ 01/Sep/17 ] |
|
Thanks Oz |