[LU-1311] Debugging deadlock in lnet Created: 12/Apr/12  Updated: 31/May/12  Resolved: 31/May/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.2.0, Lustre 2.3.0
Fix Version/s: Lustre 2.3.0

Type: Bug Priority: Blocker
Reporter: Oleg Drokin Assignee: Oleg Drokin
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 4631

 Description   

ORNL reported this lockup

[  446.217881] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 2
[  446.217881] Pid: 9709, comm: kiblnd_sd_05 Not tainted 2.6.32-220.el6.wc.x86_64 #1
[  446.217881] Call Trace:
[  446.217881]  <NMI>  [<ffffffff814ec681>] ? panic+0x78/0x143
[  446.217881]  [<ffffffff810d8fad>] ? watchdog_overflow_callback+0xcd/0xd0
[  446.217881]  [<ffffffff8110a89d>] ? __perf_event_overflow+0x9d/0x230
[  446.217881]  [<ffffffff8110ae54>] ? perf_event_overflow+0x14/0x20
[  446.217881]  [<ffffffff8101e096>] ? intel_pmu_handle_irq+0x336/0x550
[  446.217881]  [<ffffffff814f2256>] ? kprobe_exceptions_notify+0x16/0x430
[  446.217881]  [<ffffffff814f0d39>] ? perf_event_nmi_handler+0x39/0xb0
[  446.217881]  [<ffffffff814f2885>] ? notifier_call_chain+0x55/0x80
[  446.217881]  [<ffffffff814f28ea>] ? atomic_notifier_call_chain+0x1a/0x20
[  446.217881]  [<ffffffff81096bce>] ? notify_die+0x2e/0x30
[  446.217881]  [<ffffffff814f0503>] ? do_nmi+0x173/0x2b0
[  446.217881]  [<ffffffff814efe10>] ? nmi+0x20/0x30
[  446.217881]  [<ffffffff814ef67c>] ? _spin_lock+0x1c/0x30
[  446.217881]  <<EOE>>  [<ffffffffa07473bf>] ? cfs_trace_lock_tcd+0x2f/0x90 [libcfs]
[  446.217881]  [<ffffffffa0750b0a>] ? libcfs_debug_vmsg2+0xda/0xb60 [libcfs]
[  446.217881]  [<ffffffffa07515d1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
[  446.217881]  [<ffffffffa07515d1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
[  446.217881]  [<ffffffffa0b5768d>] ? kiblnd_scheduler+0x16d/0x620 [ko2iblnd]
[  446.217881]  [<ffffffff8105fa50>] ? default_wake_function+0x0/0x20
[  446.217881]  [<ffffffffa0b57520>] ? kiblnd_scheduler+0x0/0x620 [ko2iblnd]
[  446.217881]  [<ffffffff8100c14a>] ? child_rip+0xa/0x20
[  446.217881]  [<ffffffffa0b57520>] ? kiblnd_scheduler+0x0/0x620 [ko2iblnd]
[  446.217881]  [<ffffffffa0b57520>] ? kiblnd_scheduler+0x0/0x620 [ko2iblnd]
[  446.217881]  [<ffffffff8100c140>] ? child_rip+0x0/0x20

the other thread is in irq context:

PID: 9855   TASK: ffff880431898b40  CPU: 0   COMMAND: "ktracefiled"
 #0 [ffff880028207e90] crash_nmi_callback at ffffffff81029716
 #1 [ffff880028207ea0] notifier_call_chain at ffffffff814f2885
 #2 [ffff880028207ee0] atomic_notifier_call_chain at ffffffff814f28ea
 #3 [ffff880028207ef0] notify_die at ffffffff81096bce
 #4 [ffff880028207f20] do_nmi at ffffffff814f0503
 #5 [ffff880028207f50] nmi at ffffffff814efe10
    [exception RIP: _spin_lock_irqsave+47]
    RIP: ffffffff814ef56f  RSP: ffff880028203de8  RFLAGS: 00000097
    RAX: 00000000000055c2  RBX: ffff88040df2b800  RCX: 00000000000055c1
    RDX: 0000000000000046  RSI: ffff88040df2b800  RDI: ffffffffa0b71c30
    RBP: ffff880028203de8   R8: ffff8803fbb04000   R9: ffff8803fbb05de0
    R10: ffff880028403a40  R11: 0000000000000000  R12: 0000000000000087
    R13: 0000000000000000  R14: ffff8804318da000  R15: ffff8803fbb05f00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #6 [ffff880028203de8] _spin_lock_irqsave at ffffffff814ef56f
 #7 [ffff880028203df0] kiblnd_cq_completion at ffffffffa0b51548 [ko2iblnd]   <== this is line 3277 (cfs_spin_lock_irqsave(&kiblnd_data.kib_sched_lock, flags);) in o2iblnd_cb.c
 #8 [ffff880028203e10] mthca_cq_completion at ffffffffa019ca4a [ib_mthca]
 #9 [ffff880028203e30] mthca_eq_int at ffffffffa019c422 [ib_mthca]
#10 [ffff880028203eb0] mthca_arbel_msi_x_interrupt at ffffffffa019c694 [ib_mthca]
#11 [ffff880028203ed0] handle_IRQ_event at ffffffff810d94f0
#12 [ffff880028203f20] handle_edge_irq at ffffffff810dbc2e
#13 [ffff880028203f60] handle_irq at ffffffff8100df09
#14 [ffff880028203f80] do_IRQ at ffffffff814f504c
--- <IRQ stack> ---
#15 [ffff8803fbb05d38] ret_from_intr at ffffffff8100ba53
    [exception RIP: put_pages_on_tcd_daemon_list+255]
    RIP: ffffffffa075072f  RSP: ffff8803fbb05de0  RFLAGS: 00000246
    RAX: ffffea000c9a82a8  RBX: ffff8803fbb05e20  RCX: ffff880390407b88
    RDX: ffff88040d080128  RSI: ffff880390407b48  RDI: ffff880390407b68
    RBP: ffffffff8100ba4e   R8: ffff88040d080128   R9: 0000000000000000
    R10: ffff880028403a40  R11: 0000000000000000  R12: ffff8803fbb05d70
    R13: ffff8803fbb05f00  R14: ffff8803fbb05ef0  R15: 0000000000000286
    ORIG_RAX: ffffffffffffff46  CS: 0010  SS: 0018
#16 [ffff8803fbb05e28] put_pages_on_daemon_list at ffffffffa075094b [libcfs]  <== this is line 575 (cfs_tcd_for_each_type_lock) in tracefile.c
#17 [ffff8803fbb05e78] tracefiled at ffffffffa0751d4f [libcfs]
#18 [ffff8803fbb05f48] kernel_thread at ffffffff8100c14a


 Comments   
Comment by Oleg Drokin [ 12/Apr/12 ]

tracefiled thread: lock some random tcd (nonirq type)
ibscheduler thread: get kiblnd_data.kib_sched_lock
ibscheduler thread: try to print debug message, need the lock in the tcd that tracefiled holds
tracefiled: interrupt hits, needs the kiblnd_data.kib_sched_lock == deadlock

Comment by Liang Zhen (Inactive) [ 13/Apr/12 ]

Oleg, due to your analysis, any call of CDEBUG between spin_lock/unlock_irqsave, or spin_lock/unlock_bh could trigger this deadlock. I think a fix could be like this:

#define cfs_trace_lock_tcd_safe(tcd, flags)
({
    int __ret;
    spin_lock_irqsave(&cfs_trace_glock, flags);
    __ret = cfs_trace_lock_tcd(tcd);
    __ret;
})

#define cfs_trace_unlock_tcd_safe(tcd, flags)
({
    cfs_trace_unlock_tcd(tcd);
    spin_unlock_irqrestore(&cfs_trace_glock, flags);
})

#define cfs_tcd_for_each_type_lock(..., flags)
......
cfs_trace_lock_tcd_safe(tcd, fags); cfs_trace_unlock_tcd(tcd, flags), i++)

how do you think?

Liang

Comment by Peter Jones [ 16/Apr/12 ]

Oleg will construct a patch

Comment by Oleg Drokin [ 27/Apr/12 ]

Patch at http://review.whamcloud.com/2605

Comment by James A Simmons [ 30/Apr/12 ]

I now get this error when testing with the patch.

Apr 30 08:19:54 barry-oss4 kernel: [247732.876333] Lustre: debug daemon will attempt to start writing to /chexport/lustre/logs/lustre-log.barry-oss4 (2097152kB m
ax)
Apr 30 08:21:41 barry-oss4 kernel: [247839.213070] wanted to write 1008 but wrote -5
Apr 30 08:21:41 barry-oss4 kernel: [247839.225887] LNetError: 20124:0/data/buildsystem/jsimmons-wc/rpmbuild/BUILD/lustre-2.2.0/libcfs/libcfs/tracefile.c:1038:t
racefiled()) ASSERTION(tage->page != NULL) failed
Apr 30 08:21:41 barry-oss4 kernel: [247839.249433] Kernel panic - not syncing: Lustre debug assertion failure
Apr 30 08:21:41 barry-oss4 kernel: [247839.249434]
Apr 30 08:21:41 barry-oss4 kernel: [247839.282160] Pid: 20124, comm: ktracefiled Not tainted 2.6.32-220.el6.wc.x86_64 #1
Apr 30 08:21:41 barry-oss4 kernel: [247839.297951] Call Trace:
Apr 30 08:21:41 barry-oss4 kernel: [247839.308593] [<ffffffff814ec681>] ? panic+0x78/0x143
Apr 30 08:21:41 barry-oss4 kernel: [247839.321844] [<ffffffff8109b56a>] ? do_gettimeofday+0x1a/0x50
Apr 30 08:21:41 barry-oss4 kernel: [247839.335818] [<ffffffffa07ff004>] ? cfs_trace_assertion_failed+0x74/0x80 [libcfs]
Apr 30 08:21:41 barry-oss4 kernel: [247839.351479] [<ffffffffa0800edb>] ? tracefiled+0x43b/0x530 [libcfs]
Apr 30 08:21:41 barry-oss4 kernel: [247839.365829] [<ffffffff811245b9>] ? free_pages+0x49/0x50
Apr 30 08:21:41 barry-oss4 kernel: [247839.379090] [<ffffffff8105fa50>] ? default_wake_function+0x0/0x20
Apr 30 08:21:41 barry-oss4 kernel: [247839.393366] [<ffffffffa0800aa0>] ? tracefiled+0x0/0x530 [libcfs]
Apr 30 08:21:41 barry-oss4 kernel: [247839.407430] [<ffffffff8100c14a>] ? child_rip+0xa/0x20
Apr 30 08:21:41 barry-oss4 kernel: [247839.420375] [<ffffffffa0800aa0>] ? tracefiled+0x0/0x530 [libcfs]
Apr 30 08:21:41 barry-oss4 kernel: [247839.434357] [<ffffffffa0800aa0>] ? tracefiled+0x0/0x530 [libcfs]
Apr 30 08:21:41 barry-oss4 kernel: [247839.448139] [<ffffffff8100c140>] ? child_rip+0x0/0x20

The load on all OSS goes to 300+.

Comment by Peter Jones [ 31/May/12 ]

Landed for 2.3

Generated at Sat Feb 10 01:15:31 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.