Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
Lustre 2.0.0
-
None
-
RHEL 6.0, kernel-2.6.32-30.el6, lustre 2.0.0.1
-
3
-
8534
Description
When this occurs, the impacted clients become very slow, ping on ethernet is ok but SSH connections are very slow and ping on IB is KO.
Multiple forced crash-dumps always show the same situations where all CPUs are running N-1 threads stuck spinning on the LNET main spinlock in different places and one owns the lock and is always running in lnet_match_md(), for instance:
crash> bt -a
PID: 10869 TASK: ffff88044c582d10 CPU: 0 COMMAND: "kiblnd_sd_18"
#0 [ffff880036607d10] machine_kexec at ffffffff8102e66b
#1 [ffff880036607d70] crash_kexec at ffffffff810a9b08
#2 [ffff880036607e40] oops_end at ffffffff81456108
#3 [ffff880036607e70] die_nmi at ffffffff814562a9
#4 [ffff880036607ea0] do_nmi_callback at ffffffff81028f0b
#5 [ffff880036607f10] do_nmi at ffffffff81455e46
#6 [ffff880036607f50] nmi at ffffffff81455710
[exception RIP: _spin_lock+33]
RIP: ffffffff81455011 RSP: ffff880466a9fa10 RFLAGS: 00000283
RAX: 0000000000003f08 RBX: ffff880296799e00 RCX: 00000000000000c0
RDX: 0000000000003eeb RSI: ffff880296799e00 RDI: ffffffffa040d340
RBP: ffff880466a9fa10 R8: 0000000000001000 R9: 0000000000000000
R10: ffff880296799e00 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000001 R14: ffff88047cd29280 R15: ffff880b468e0bb0
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
— <NMI exception stack> —
#7 [ffff880466a9fa10] _spin_lock at ffffffff81455011
#8 [ffff880466a9fa18] lnet_finalize at ffffffffa03f0218
#9 [ffff880466a9fa48] kiblnd_recv at ffffffffa0661b5a
#10 [ffff880466a9fb08] lnet_ni_recv at ffffffffa03f37a8
#11 [ffff880466a9fb98] lnet_recv_put at ffffffffa03f3b00
#12 [ffff880466a9fbe8] lnet_parse at ffffffffa03fa23a
#13 [ffff880466a9fce8] kiblnd_handle_rx at ffffffffa0662253
#14 [ffff880466a9fd78] kiblnd_rx_complete at ffffffffa0662e02
#15 [ffff880466a9fdf8] kiblnd_complete at ffffffffa0662fa2
#16 [ffff880466a9fe38] kiblnd_scheduler at ffffffffa066331c
#17 [ffff880466a9ff48] kernel_thread at ffffffff8100d1aa
PID: 10866 TASK: ffff8801006fab90 CPU: 1 COMMAND: "kiblnd_sd_15"
#0 [ffff88088e407e80] crash_nmi_callback at ffffffff810266d6
#1 [ffff88088e407e90] notifier_call_chain at ffffffff81457d55
#2 [ffff88088e407ed0] atomic_notifier_call_chain at ffffffff81457dba
#3 [ffff88088e407ee0] notify_die at ffffffff810875de
#4 [ffff88088e407f10] do_nmi at ffffffff81455e1c
#5 [ffff88088e407f50] nmi at ffffffff81455710
[exception RIP: _spin_lock+30]
RIP: ffffffff8145500e RSP: ffff8802276f7cb0 RFLAGS: 00000297
RAX: 0000000000003efb RBX: ffff88088f80c400 RCX: ffff88043f99ecc0
RDX: 0000000000003ef5 RSI: ffff88088f80c400 RDI: ffffffffa040d340
RBP: ffff8802276f7cb0 R8: ffffc900177cb5e0 R9: 00049b3c447e0790
R10: 000000000000003f R11: 0000000000000064 R12: 0000000000000000
R13: ffff88088f80c400 R14: 0000000000000000 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
— <NMI exception stack> —
#6 [ffff8802276f7cb0] _spin_lock at ffffffff8145500e
#7 [ffff8802276f7cb8] lnet_finalize at ffffffffa03f0218
#8 [ffff8802276f7ce8] kiblnd_tx_done at ffffffffa065d482
#9 [ffff8802276f7d68] kiblnd_tx_complete at ffffffffa06615cf
#10 [ffff8802276f7df8] kiblnd_complete at ffffffffa0662f72
#11 [ffff8802276f7e38] kiblnd_scheduler at ffffffffa066331c
#12 [ffff8802276f7f48] kernel_thread at ffffffff8100d1aa
PID: 10882 TASK: ffff8804664deb50 CPU: 2 COMMAND: "kiblnd_sd_31"
#0 [ffff88048e407e80] crash_nmi_callback at ffffffff810266d6
#1 [ffff88048e407e90] notifier_call_chain at ffffffff81457d55
#2 [ffff88048e407ed0] atomic_notifier_call_chain at ffffffff81457dba
#3 [ffff88048e407ee0] notify_die at ffffffff810875de
#4 [ffff88048e407f10] do_nmi at ffffffff81455e1c
#5 [ffff88048e407f50] nmi at ffffffff81455710
[exception RIP: _spin_lock+30]
RIP: ffffffff8145500e RSP: ffff880164f1bb90 RFLAGS: 00000297
RAX: 0000000000003f0a RBX: ffff880562f81800 RCX: 00000000000000c0
RDX: 0000000000003ef5 RSI: ffff880562f81800 RDI: ffffffffa040d340
RBP: ffff880164f1bb90 R8: 00000000000000c0 R9: 00000000000000c0
R10: 0000000000000004 R11: ffff880562f81800 R12: ffff880954e2ccc0
R13: 0000000000000000 R14: 00000000000000c0 R15: 00000000000000c0
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
— <NMI exception stack> —
#6 [ffff880164f1bb90] _spin_lock at ffffffff8145500e
#7 [ffff880164f1bb98] lnet_recv_put at ffffffffa03f3a5c
#8 [ffff880164f1bbe8] lnet_parse at ffffffffa03fa23a
#9 [ffff880164f1bce8] kiblnd_handle_rx at ffffffffa0662253
#10 [ffff880164f1bd78] kiblnd_rx_complete at ffffffffa0662e02
#11 [ffff880164f1bdf8] kiblnd_complete at ffffffffa0662fa2
#12 [ffff880164f1be38] kiblnd_scheduler at ffffffffa066331c
#13 [ffff880164f1bf48] kernel_thread at ffffffff8100d1aa
PID: 10853 TASK: ffff880349ef6f10 CPU: 3 COMMAND: "kiblnd_sd_02"
#0 [ffff880c8e407e80] crash_nmi_callback at ffffffff810266d6
#1 [ffff880c8e407e90] notifier_call_chain at ffffffff81457d55
#2 [ffff880c8e407ed0] atomic_notifier_call_chain at ffffffff81457dba
#3 [ffff880c8e407ee0] notify_die at ffffffff810875de
#4 [ffff880c8e407f10] do_nmi at ffffffff81455e1c
#5 [ffff880c8e407f50] nmi at ffffffff81455710
[exception RIP: lnet_match_md+563]
RIP: ffffffffa03f4f33 RSP: ffff8804635b1b50 RFLAGS: 00000283
RAX: ffff88051285dac0 RBX: 0000000000000004 RCX: 000500080a6420a5
RDX: ffff88051285d840 RSI: 0000000000000000 RDI: 0000000000000004
RBP: ffff8804635b1be0 R8: 00000000000000c0 R9: 00000000000000c0
R10: 0000000000000004 R11: ffff88101d274c00 R12: 00000000000000c0
R13: ffff88051285dcc0 R14: 00000000000000c0 R15: 0000000000000001
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
— <NMI exception stack> —
#6 [ffff8804635b1b50] lnet_match_md at ffffffffa03f4f33
#7 [ffff8804635b1be8] lnet_parse at ffffffffa03f99cd
#8 [ffff8804635b1ce8] kiblnd_handle_rx at ffffffffa0662253
#9 [ffff8804635b1d78] kiblnd_rx_complete at ffffffffa0662e02
#10 [ffff8804635b1df8] kiblnd_complete at ffffffffa0662fa2
#11 [ffff8804635b1e38] kiblnd_scheduler at ffffffffa066331c
#12 [ffff8804635b1f48] kernel_thread at ffffffff8100d1aa
PID: 10877 TASK: ffff880466ad0d90 CPU: 4 COMMAND: "kiblnd_sd_26"
#0 [ffff880036647e80] crash_nmi_callback at ffffffff810266d6
#1 [ffff880036647e90] notifier_call_chain at ffffffff81457d55
#2 [ffff880036647ed0] atomic_notifier_call_chain at ffffffff81457dba
#3 [ffff880036647ee0] notify_die at ffffffff810875de
#4 [ffff880036647f10] do_nmi at ffffffff81455e1c
#5 [ffff880036647f50] nmi at ffffffff81455710
[exception RIP: _spin_lock+33]
RIP: ffffffff81455011 RSP: ffff88047c677be0 RFLAGS: 00000283
RAX: 0000000000003f01 RBX: ffff88014354e030 RCX: 0000000000000000
RDX: 0000000000003ef5 RSI: ffff88047e739160 RDI: ffffffffa040d340
RBP: ffff88047c677be0 R8: 0000000000000286 R9: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801eeeaca00
R13: 0000000000000001 R14: ffff88047cd29280 R15: 00000000000000c0
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
— <NMI exception stack> —
#6 [ffff88047c677be0] _spin_lock at ffffffff81455011
#7 [ffff88047c677be8] lnet_parse at ffffffffa03f937a
#8 [ffff88047c677ce8] kiblnd_handle_rx at ffffffffa0662253
#9 [ffff88047c677d78] kiblnd_rx_complete at ffffffffa0662e02
#10 [ffff88047c677df8] kiblnd_complete at ffffffffa0662fa2
#11 [ffff88047c677e38] kiblnd_scheduler at ffffffffa066331c
#12 [ffff88047c677f48] kernel_thread at ffffffff8100d1aa
For the complete crash output see attached file (crash.txt).
At the beginning we found the issue came from a wrong lnet routing configuration: some clients, seen as lustre routers by other clients, were dropping all the wrong packets thus consuming a lot of resources. Once we fixed the misconfiguration the problem comes from evicted clients still trying to reconnect to the servers. Clients get stuck being overloaded with lot of requests once they have been evicted.
So, where do you think we have the problem with the spin lock? do you think it's possible to have a less monolithic code in the lnet spin lock?
Thanks,