[LU-13675] LNetError: 14769:0:(o2iblnd.h:1003:kiblnd_queue2str()) LBUG Created: 15/Jun/20 Updated: 23/Jun/20 Resolved: 23/Jun/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.14.0 |
| Fix Version/s: | Lustre 2.14.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Shuichi Ihara | Assignee: | Andreas Dilger |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
2.13.54_44_gf3fef81 |
||
| Issue Links: |
|
||||||||
| Severity: | 2 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
1 x server(CentoOS7.8), 1 client (CentOS8.1) and both server and client installed OFED-5.0 # ofed_info | head -1 MLNX_OFED_LINUX-5.0-2.1.8.0 (OFED-5.0-2.1.8): When client mounts lustre, both server and client crashed with follwoing LBUG. Server [482108.891327] LNetError: 14769:0:(o2iblnd.h:1003:kiblnd_queue2str()) LBUG [482108.891395] Pid: 14769, comm: kiblnd_connd 3.10.0-1127.10.1.el7.x86_64 #1 SMP Wed Jun 3 14:28:03 UTC 2020 [482108.891397] Call Trace: [482108.891412] [<ffffffffc146f67c>] libcfs_call_trace+0x8c/0xc0 [libcfs] [482108.891436] [<ffffffffc146f99c>] lbug_with_loc+0x4c/0xa0 [libcfs] [482108.891448] [<ffffffffc15b82cb>] kiblnd_need_noop.part.21+0x0/0x36 [ko2iblnd] [482108.891463] [<ffffffffc15aa581>] kiblnd_check_txs_locked+0x421/0x490 [ko2iblnd] [482108.891474] [<ffffffffc15b107b>] kiblnd_check_conns+0x3cb/0x880 [ko2iblnd] [482108.891485] [<ffffffffc15b6273>] kiblnd_connd+0x813/0x9e0 [ko2iblnd] [482108.891495] [<ffffffff9bec6691>] kthread+0xd1/0xe0 [482108.891506] [<ffffffff9c592d37>] ret_from_fork_nospec_end+0x0/0x39 [482108.891514] [<ffffffffffffffff>] 0xffffffffffffffff [482108.891553] Kernel panic - not syncing: LBUG [482108.891593] CPU: 3 PID: 14769 Comm: kiblnd_connd Kdump: loaded Tainted: P OE ------------ 3.10.0-1127.10.1.el7.x86_64 #1 [482108.891682] Hardware name: Supermicro SYS-2028U-TN24R4T+/X10DRU-i+, BIOS 3.2 06/11/2019 [482108.891742] Call Trace: [482108.891773] [<ffffffff9c57ffa5>] dump_stack+0x19/0x1b [482108.891817] [<ffffffff9c579541>] panic+0xe8/0x21f [482108.891869] [<ffffffffc146f9eb>] lbug_with_loc+0x9b/0xa0 [libcfs] [482108.891925] [<ffffffffc15b82cb>] kiblnd_queue2str.part.17+0x1a/0x1a [ko2iblnd] [482108.891988] [<ffffffffc15aa581>] kiblnd_check_txs_locked+0x421/0x490 [ko2iblnd] [482108.892053] [<ffffffffc15b107b>] kiblnd_check_conns+0x3cb/0x880 [ko2iblnd] [482108.892110] [<ffffffff9beae150>] ? __internal_add_timer+0x130/0x130 [482108.892168] [<ffffffffc15b6273>] kiblnd_connd+0x813/0x9e0 [ko2iblnd] [482108.892221] [<ffffffff9c585942>] ? __schedule+0x402/0x840 [482108.892268] [<ffffffff9bedb990>] ? wake_up_state+0x20/0x20 [482108.892321] [<ffffffffc15b5a60>] ? kiblnd_cm_callback+0x2380/0x2380 [ko2iblnd] [482108.892380] [<ffffffff9bec6691>] kthread+0xd1/0xe0 [482108.892423] [<ffffffff9bec65c0>] ? insert_kthread_work+0x40/0x40 [482108.892473] [<ffffffff9c592d37>] ret_from_fork_nospec_begin+0x21/0x21 [482108.892527] [<ffffffff9bec65c0>] ? insert_kthread_work+0x40/0x40 Client [487085.899074] LNetError: 32398:0:(o2iblnd.h:1003:kiblnd_queue2str()) LBUG [487085.900509] Pid: 32398, comm: kiblnd_connd 4.18.0-147.8.1.el8_1.x86_64 #1 SMP Thu Apr 9 13:49:54 UTC 2020 [487085.900510] Call Trace: [487085.900531] libcfs_call_trace+0x86/0xc0 [libcfs] [487085.900537] lbug_with_loc+0x43/0x80 [libcfs] [487085.900546] kiblnd_queue2str.part.19+0x16/0x20 [ko2iblnd] [487085.900551] kiblnd_check_txs_locked+0x39c/0x3a0 [ko2iblnd] [487085.900556] kiblnd_check_conns+0x58b/0x920 [ko2iblnd] [487085.900561] kiblnd_connd+0x9c2/0xa60 [ko2iblnd] [487085.900564] kthread+0x112/0x130 [487085.900567] ret_from_fork+0x1f/0x40 [487085.900568] 0xffffffffffffffff [487085.900569] Kernel panic - not syncing: LBUG [487085.901751] CPU: 4 PID: 32398 Comm: kiblnd_connd Kdump: loaded Tainted: G OE --------- -t - 4.18.0-147.8.1.el8_1.x86_64 #1 [487085.904110] Hardware name: Intel Corporation S2600BPB/S2600BPB, BIOS SE5C620.86B.02.01.0010.010620200716 01/06/2020 [487085.905298] Call Trace: [487085.906489] dump_stack+0x5c/0x80 [487085.907663] panic+0xe7/0x247 [487085.908837] lbug_with_loc.cold.8+0x18/0x18 [libcfs] [487085.910002] kiblnd_queue2str.part.19+0x16/0x20 [ko2iblnd] [487085.911147] kiblnd_check_txs_locked+0x39c/0x3a0 [ko2iblnd] [487085.912287] kiblnd_check_conns+0x58b/0x920 [ko2iblnd] [487085.913424] kiblnd_connd+0x9c2/0xa60 [ko2iblnd] [487085.914557] ? wake_up_q+0x70/0x70 [487085.915677] ? kiblnd_cm_callback+0x2230/0x2230 [ko2iblnd] [487085.916799] kthread+0x112/0x130 [487085.917912] ? kthread_flush_work_fn+0x10/0x10 [487085.919036] ret_from_fork+0x1f/0x40 |
| Comments |
| Comment by Shuichi Ihara [ 15/Jun/20 ] |
|
it seems that a regression came from commit 7308662efc. reverting that commit didn't cause crashes. |
| Comment by Andreas Dilger [ 17/Jun/20 ] |
|
That is patch https://review.whamcloud.com/33235 " |
| Comment by Gerrit Updater [ 17/Jun/20 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38958 |
| Comment by Gerrit Updater [ 23/Jun/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38958/ |
| Comment by Peter Jones [ 23/Jun/20 ] |
|
Landed for 2.14 |