[LU-15125] kiblnd_connd kernel BUG at lib/list_debug.c:53! Created: 18/Oct/21  Updated: 17/Aug/22  Resolved: 03/Nov/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.15.0

Type: Bug Priority: Blocker
Reporter: Chris Horn Assignee: Chris Horn
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   
PID: 1626765  TASK: ffff89a7a1ccc740  CPU: 1   COMMAND: "kiblnd_connd"
 #0 [ffffa395c6a3bbe0] machine_kexec at ffffffffb346156e
 #1 [ffffa395c6a3bc38] __crash_kexec at ffffffffb358f94d
 #2 [ffffa395c6a3bd00] crash_kexec at ffffffffb359083d
 #3 [ffffa395c6a3bd18] oops_end at ffffffffb342434d
 #4 [ffffa395c6a3bd38] do_trap at ffffffffb3420b13
 #5 [ffffa395c6a3bd80] do_invalid_op at ffffffffb3421476
 #6 [ffffa395c6a3bda0] invalid_op at ffffffffb3e00d64
    [exception RIP: __list_del_entry_valid.cold.1+52]
    RIP: ffffffffb38913c8  RSP: ffffa395c6a3be58  RFLAGS: 00010046
    RAX: 0000000000000054  RBX: 0000000000000000  RCX: 0000000000000007
    RDX: 0000000000000000  RSI: 0000000000000004  RDI: ffff89a7bec567c0
    RBP: 0000000000000202   R8: 0000000000000000   R9: 0000000000aaaaaa
    R10: 0000000000000000  R11: 0000000000000001  R12: 000000006165a0f9
    R13: 0000000000000001  R14: dead000000000200  R15: ffff89a7a1e82418
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffffa395c6a3be50] __list_del_entry_valid.cold.1 at ffffffffb38913c8
 #8 [ffffa395c6a3be58] kiblnd_connd at ffffffffc170305f [ko2iblnd]
 #9 [ffffa395c6a3bf10] kthread at ffffffffb35043a6
#10 [ffffa395c6a3bf50] ret_from_fork at ffffffffb3e0023f
[root@snx11922n000 ~]# pdsh -g lustre 'lctl get_param version' | dshbak -c
----------------
snx11922n[002-005]
----------------
version=2.14.55_10_g620dd1b
[root@snx11922n000 ~]#
620dd1bf6d (es/dev/wc/testing-m5) LU-14437 gnilnd: Use NSEC_PER_USEC to convert nsec to usec
6bcc0798a4 LU-14402 osd-ldiskfs: disable pagecache bypass feature
9e6d2fa865 LU-14402 osd-ldiskfs: Page cache pages dirtied in writeback
f248dcbdda LUS-9546 build: Cray obs build support for master
b8e2e1c76d LU-14391 lnet: add config file support
6be8ea0e7b LU-14392 gnilnd: re-enable large i/o buffers
2de9d6bb32 LU-14409 ldiskfs: Add support for SUSE 5.3.18-24.46.1
966362f836 LU-15074 build: Use strlcpy if strscpy is not available
f8a7abd63f LU-13906 build: consistent use of %{name}
1a409a3e6a LU-14711 osc: Do not attempt sending empty pages
09e2e43241 (tag: v2_14_55, tag: 2.14.55) New tag 2.14.55


 Comments   
Comment by Chris Horn [ 20/Oct/21 ]

I suspect https://review.whamcloud.com/#/c/38845/ is responsible since there is no other recent change to kiblnd_connd().

Comment by Chris Horn [ 20/Oct/21 ]

Ah, the issue was actually with https://review.whamcloud.com/#/c/43419/

@@ -3571,9 +3571,10 @@ kiblnd_connd (void *arg)
 			spin_lock_irqsave(lock, flags);
 		}

-		if (!list_empty(&kiblnd_data.kib_connd_waits)) {
-			conn = list_entry(kiblnd_data.kib_connd_waits.next,
-					  struct kib_conn, ibc_list);
+		conn = list_first_entry_or_null(&kiblnd_data.kib_connd_waits,
+						struct kib_conn,
+						ibc_sched_list);
+		if (conn) {
 			list_del(&conn->ibc_list);
 			spin_unlock_irqrestore(lock, flags);

Wrong list_head field used in the list_first_entry_or_null() macro

Comment by Gerrit Updater [ 20/Oct/21 ]

"Chris Horn <chris.horn@hpe.com>" uploaded a new patch: https://review.whamcloud.com/45316
Subject: LU-15125 o2iblnd: wrong list used for kib_connd_waits
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: c3bbdf8477cd89cdaed09e3b8b7d5db7266adc17

Comment by Gerrit Updater [ 03/Nov/21 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45316/
Subject: LU-15125 o2iblnd: wrong list used for kib_connd_waits
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: f6f1e395cd26369d7441a70eb5d598ea64f1589a

Comment by Peter Jones [ 03/Nov/21 ]

Landed for 2.15

Generated at Sat Feb 10 03:15:40 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.