[LU-15125] kiblnd_connd kernel BUG at lib/list_debug.c:53! Created: 18/Oct/21 Updated: 17/Aug/22 Resolved: 03/Nov/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.15.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Chris Horn | Assignee: | Chris Horn |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
PID: 1626765 TASK: ffff89a7a1ccc740 CPU: 1 COMMAND: "kiblnd_connd"
#0 [ffffa395c6a3bbe0] machine_kexec at ffffffffb346156e
#1 [ffffa395c6a3bc38] __crash_kexec at ffffffffb358f94d
#2 [ffffa395c6a3bd00] crash_kexec at ffffffffb359083d
#3 [ffffa395c6a3bd18] oops_end at ffffffffb342434d
#4 [ffffa395c6a3bd38] do_trap at ffffffffb3420b13
#5 [ffffa395c6a3bd80] do_invalid_op at ffffffffb3421476
#6 [ffffa395c6a3bda0] invalid_op at ffffffffb3e00d64
[exception RIP: __list_del_entry_valid.cold.1+52]
RIP: ffffffffb38913c8 RSP: ffffa395c6a3be58 RFLAGS: 00010046
RAX: 0000000000000054 RBX: 0000000000000000 RCX: 0000000000000007
RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff89a7bec567c0
RBP: 0000000000000202 R8: 0000000000000000 R9: 0000000000aaaaaa
R10: 0000000000000000 R11: 0000000000000001 R12: 000000006165a0f9
R13: 0000000000000001 R14: dead000000000200 R15: ffff89a7a1e82418
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#7 [ffffa395c6a3be50] __list_del_entry_valid.cold.1 at ffffffffb38913c8
#8 [ffffa395c6a3be58] kiblnd_connd at ffffffffc170305f [ko2iblnd]
#9 [ffffa395c6a3bf10] kthread at ffffffffb35043a6
#10 [ffffa395c6a3bf50] ret_from_fork at ffffffffb3e0023f
[root@snx11922n000 ~]# pdsh -g lustre 'lctl get_param version' | dshbak -c ---------------- snx11922n[002-005] ---------------- version=2.14.55_10_g620dd1b [root@snx11922n000 ~]# 620dd1bf6d (es/dev/wc/testing-m5) LU-14437 gnilnd: Use NSEC_PER_USEC to convert nsec to usec
6bcc0798a4 LU-14402 osd-ldiskfs: disable pagecache bypass feature
9e6d2fa865 LU-14402 osd-ldiskfs: Page cache pages dirtied in writeback
f248dcbdda LUS-9546 build: Cray obs build support for master
b8e2e1c76d LU-14391 lnet: add config file support
6be8ea0e7b LU-14392 gnilnd: re-enable large i/o buffers
2de9d6bb32 LU-14409 ldiskfs: Add support for SUSE 5.3.18-24.46.1
966362f836 LU-15074 build: Use strlcpy if strscpy is not available
f8a7abd63f LU-13906 build: consistent use of %{name}
1a409a3e6a LU-14711 osc: Do not attempt sending empty pages
09e2e43241 (tag: v2_14_55, tag: 2.14.55) New tag 2.14.55
|
| Comments |
| Comment by Chris Horn [ 20/Oct/21 ] |
|
I suspect https://review.whamcloud.com/#/c/38845/ is responsible since there is no other recent change to kiblnd_connd(). |
| Comment by Chris Horn [ 20/Oct/21 ] |
|
Ah, the issue was actually with https://review.whamcloud.com/#/c/43419/ @@ -3571,9 +3571,10 @@ kiblnd_connd (void *arg)
spin_lock_irqsave(lock, flags);
}
- if (!list_empty(&kiblnd_data.kib_connd_waits)) {
- conn = list_entry(kiblnd_data.kib_connd_waits.next,
- struct kib_conn, ibc_list);
+ conn = list_first_entry_or_null(&kiblnd_data.kib_connd_waits,
+ struct kib_conn,
+ ibc_sched_list);
+ if (conn) {
list_del(&conn->ibc_list);
spin_unlock_irqrestore(lock, flags);
Wrong list_head field used in the list_first_entry_or_null() macro |
| Comment by Gerrit Updater [ 20/Oct/21 ] |
|
"Chris Horn <chris.horn@hpe.com>" uploaded a new patch: https://review.whamcloud.com/45316 |
| Comment by Gerrit Updater [ 03/Nov/21 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45316/ |
| Comment by Peter Jones [ 03/Nov/21 ] |
|
Landed for 2.15 |