Details
-
Bug
-
Resolution: Fixed
-
Critical
-
None
-
Lustre 2.12.4
-
2.12.4_5.chaos
toss 3.6-2 (RHEL 7.8)
-
3
-
9223372036854775807
Description
On compute node, ASSERT fails and node crashes. One node reports two failed ASSERTs in the dumped log:
LustreError: 13759:0:(ldlm_lock.c:213:ldlm_lock_put()) ASSERTION( (((( lock))->l_flags & (1ULL << 50)) != 0) ) failed: LustreError: 10188:0:(ldlm_lock.c:205:ldlm_lock_put()) ASSERTION( atomic_read(&lock->l_refc) > 0 ) failed: LustreError: 10188:0:(ldlm_lock.c:205:ldlm_lock_put()) LBUG Pid: 10188, comm: ldlm_bl_16 3.10.0-1127.0.0.1chaos.ch6.x86_64 #1 SMP Fri Apr 3 08:56:52 PDT 2020 Call Trace: [<ffffffffc0a637ec>] libcfs_call_trace+0x8c/0xd0 [libcfs] [<ffffffffc0a638ac>] lbug_with_loc+0x4c/0xa0 [libcfs] [<ffffffffc16cb366>] ldlm_lock_put+0x616/0x7b0 [ptlrpc] [<ffffffffc0c5828b>] osc_extent_put+0x6b/0x320 [osc] [<ffffffffc0c645fb>] osc_cache_wait_range+0x30b/0x960 [osc] [<ffffffffc0c655ce>] osc_cache_writeback_range+0x97e/0x1000 [osc] [<ffffffffc0c51195>] osc_lock_flush+0x195/0x290 [osc] [<ffffffffc0c51653>] osc_ldlm_blocking_ast+0x2e3/0x3a0 [osc] [<ffffffffc16d2dea>] ldlm_cancel_callback+0x8a/0x330 [ptlrpc] [<ffffffffc16ea620>] ldlm_cli_cancel_local+0xa0/0x3f0 [ptlrpc] [<ffffffffc16f03f7>] ldlm_cli_cancel+0x157/0x620 [ptlrpc] [<ffffffffc0c514ea>] osc_ldlm_blocking_ast+0x17a/0x3a0 [osc] [<ffffffffc16fc618>] ldlm_handle_bl_callback+0xf8/0x4f0 [ptlrpc] [<ffffffffc16fd230>] ldlm_bl_thread_main+0x820/0xa60 [ptlrpc] [<ffffffffbaccca01>] kthread+0xd1/0xe0 [<ffffffffbb3bff5d>] ret_from_fork_nospec_begin+0x7/0x21 [<ffffffffffffffff>] 0xffffffffffffffff Kernel panic - not syncing: LBUG CPU: 53 PID: 10188 Comm: ldlm_bl_16 Kdump: loaded Tainted: G OE ------------ T 3.10.0-1127.0.0.1chaos.ch6.x86_64 #1 Hardware name: Penguin Computing Relion OCP1930e/S2600KPR, BIOS SE5C610.86B.01.01.0027.071020182329 07/10/2018
The other reports the same ASSERT twice:
LustreError: 20571:0:(ldlm_lock.c:213:ldlm_lock_put()) ASSERTION( (((( lock))->l_flags & (1ULL << 50)) != 0) ) failed: LustreError: 36887:0:(ldlm_lock.c:213:ldlm_lock_put()) ASSERTION( (((( lock))->l_flags & (1ULL << 50)) != 0) ) failed: LustreError: 36887:0:(ldlm_lock.c:213:ldlm_lock_put()) LBUG Pid: 36887, comm: ldlm_bl_62 3.10.0-1127.0.0.1chaos.ch6.x86_64 #1 SMP Fri Apr 3 08:56:52 PDT 2020 Call Trace: [<ffffffffc0a727ec>] libcfs_call_trace+0x8c/0xd0 [libcfs] [<ffffffffc0a728ac>] lbug_with_loc+0x4c/0xa0 [libcfs] [<ffffffffc176f3ca>] ldlm_lock_put+0x67a/0x7b0 [ptlrpc] [<ffffffffc1773058>] ldlm_lock_match_with_skip+0x3b8/0x860 [ptlrpc] [<ffffffffc0d982d2>] osc_match_base+0x102/0x290 [osc] [<ffffffffc0da3dfc>] osc_obj_dlmlock_at_pgoff+0x14c/0x2c0 [osc] [<ffffffffc0d9c358>] osc_req_attr_set+0x128/0x610 [osc] [<ffffffffc1549b13>] cl_req_attr_set+0x63/0x160 [obdclass] [<ffffffffc0d969f3>] osc_build_rpc+0x483/0x1080 [osc] [<ffffffffc0db1cbd>] osc_io_unplug0+0xecd/0x19c0 [osc] [<ffffffffc0db6620>] osc_cache_writeback_range+0x9d0/0x1000 [osc] [<ffffffffc0da2195>] osc_lock_flush+0x195/0x290 [osc] [<ffffffffc0da2653>] osc_ldlm_blocking_ast+0x2e3/0x3a0 [osc] [<ffffffffc1776dea>] ldlm_cancel_callback+0x8a/0x330 [ptlrpc] [<ffffffffc178e620>] ldlm_cli_cancel_local+0xa0/0x3f0 [ptlrpc] [<ffffffffc17943f7>] ldlm_cli_cancel+0x157/0x620 [ptlrpc] [<ffffffffc0da24ea>] osc_ldlm_blocking_ast+0x17a/0x3a0 [osc] [<ffffffffc17a0618>] ldlm_handle_bl_callback+0xf8/0x4f0 [ptlrpc] [<ffffffffc17a1230>] ldlm_bl_thread_main+0x820/0xa60 [ptlrpc] [<ffffffffab4cca01>] kthread+0xd1/0xe0 [<ffffffffabbbff5d>] ret_from_fork_nospec_begin+0x7/0x21 [<ffffffffffffffff>] 0xffffffffffffffff Kernel panic - not syncing: LBUG CPU: 20 PID: 36887 Comm: ldlm_bl_62 Kdump: loaded Tainted: G OE ------------ T 3.10.0-1127.0.0.1chaos.ch6.x86_64 #1
From /tftpboot/dumps/192.168.64.82-2020-08-12-13:23:27/vmcore-dmesg.txt
and /tftpboot/dumps/192.168.66.180-2020-08-12-16:39:36/vmcore-dmesg.txt