Details
-
Bug
-
Resolution: Fixed
-
Critical
-
None
-
Lustre 2.12.4
-
2.12.4_5.chaos
toss 3.6-2 (RHEL 7.8)
-
3
-
9223372036854775807
Description
On compute node, ASSERT fails and node crashes. One node reports two failed ASSERTs in the dumped log:
LustreError: 13759:0:(ldlm_lock.c:213:ldlm_lock_put()) ASSERTION( (((( lock))->l_flags & (1ULL << 50)) != 0) ) failed: LustreError: 10188:0:(ldlm_lock.c:205:ldlm_lock_put()) ASSERTION( atomic_read(&lock->l_refc) > 0 ) failed: LustreError: 10188:0:(ldlm_lock.c:205:ldlm_lock_put()) LBUG Pid: 10188, comm: ldlm_bl_16 3.10.0-1127.0.0.1chaos.ch6.x86_64 #1 SMP Fri Apr 3 08:56:52 PDT 2020 Call Trace: [<ffffffffc0a637ec>] libcfs_call_trace+0x8c/0xd0 [libcfs] [<ffffffffc0a638ac>] lbug_with_loc+0x4c/0xa0 [libcfs] [<ffffffffc16cb366>] ldlm_lock_put+0x616/0x7b0 [ptlrpc] [<ffffffffc0c5828b>] osc_extent_put+0x6b/0x320 [osc] [<ffffffffc0c645fb>] osc_cache_wait_range+0x30b/0x960 [osc] [<ffffffffc0c655ce>] osc_cache_writeback_range+0x97e/0x1000 [osc] [<ffffffffc0c51195>] osc_lock_flush+0x195/0x290 [osc] [<ffffffffc0c51653>] osc_ldlm_blocking_ast+0x2e3/0x3a0 [osc] [<ffffffffc16d2dea>] ldlm_cancel_callback+0x8a/0x330 [ptlrpc] [<ffffffffc16ea620>] ldlm_cli_cancel_local+0xa0/0x3f0 [ptlrpc] [<ffffffffc16f03f7>] ldlm_cli_cancel+0x157/0x620 [ptlrpc] [<ffffffffc0c514ea>] osc_ldlm_blocking_ast+0x17a/0x3a0 [osc] [<ffffffffc16fc618>] ldlm_handle_bl_callback+0xf8/0x4f0 [ptlrpc] [<ffffffffc16fd230>] ldlm_bl_thread_main+0x820/0xa60 [ptlrpc] [<ffffffffbaccca01>] kthread+0xd1/0xe0 [<ffffffffbb3bff5d>] ret_from_fork_nospec_begin+0x7/0x21 [<ffffffffffffffff>] 0xffffffffffffffff Kernel panic - not syncing: LBUG CPU: 53 PID: 10188 Comm: ldlm_bl_16 Kdump: loaded Tainted: G OE ------------ T 3.10.0-1127.0.0.1chaos.ch6.x86_64 #1 Hardware name: Penguin Computing Relion OCP1930e/S2600KPR, BIOS SE5C610.86B.01.01.0027.071020182329 07/10/2018
The other reports the same ASSERT twice:
LustreError: 20571:0:(ldlm_lock.c:213:ldlm_lock_put()) ASSERTION( (((( lock))->l_flags & (1ULL << 50)) != 0) ) failed: LustreError: 36887:0:(ldlm_lock.c:213:ldlm_lock_put()) ASSERTION( (((( lock))->l_flags & (1ULL << 50)) != 0) ) failed: LustreError: 36887:0:(ldlm_lock.c:213:ldlm_lock_put()) LBUG Pid: 36887, comm: ldlm_bl_62 3.10.0-1127.0.0.1chaos.ch6.x86_64 #1 SMP Fri Apr 3 08:56:52 PDT 2020 Call Trace: [<ffffffffc0a727ec>] libcfs_call_trace+0x8c/0xd0 [libcfs] [<ffffffffc0a728ac>] lbug_with_loc+0x4c/0xa0 [libcfs] [<ffffffffc176f3ca>] ldlm_lock_put+0x67a/0x7b0 [ptlrpc] [<ffffffffc1773058>] ldlm_lock_match_with_skip+0x3b8/0x860 [ptlrpc] [<ffffffffc0d982d2>] osc_match_base+0x102/0x290 [osc] [<ffffffffc0da3dfc>] osc_obj_dlmlock_at_pgoff+0x14c/0x2c0 [osc] [<ffffffffc0d9c358>] osc_req_attr_set+0x128/0x610 [osc] [<ffffffffc1549b13>] cl_req_attr_set+0x63/0x160 [obdclass] [<ffffffffc0d969f3>] osc_build_rpc+0x483/0x1080 [osc] [<ffffffffc0db1cbd>] osc_io_unplug0+0xecd/0x19c0 [osc] [<ffffffffc0db6620>] osc_cache_writeback_range+0x9d0/0x1000 [osc] [<ffffffffc0da2195>] osc_lock_flush+0x195/0x290 [osc] [<ffffffffc0da2653>] osc_ldlm_blocking_ast+0x2e3/0x3a0 [osc] [<ffffffffc1776dea>] ldlm_cancel_callback+0x8a/0x330 [ptlrpc] [<ffffffffc178e620>] ldlm_cli_cancel_local+0xa0/0x3f0 [ptlrpc] [<ffffffffc17943f7>] ldlm_cli_cancel+0x157/0x620 [ptlrpc] [<ffffffffc0da24ea>] osc_ldlm_blocking_ast+0x17a/0x3a0 [osc] [<ffffffffc17a0618>] ldlm_handle_bl_callback+0xf8/0x4f0 [ptlrpc] [<ffffffffc17a1230>] ldlm_bl_thread_main+0x820/0xa60 [ptlrpc] [<ffffffffab4cca01>] kthread+0xd1/0xe0 [<ffffffffabbbff5d>] ret_from_fork_nospec_begin+0x7/0x21 [<ffffffffffffffff>] 0xffffffffffffffff Kernel panic - not syncing: LBUG CPU: 20 PID: 36887 Comm: ldlm_bl_62 Kdump: loaded Tainted: G OE ------------ T 3.10.0-1127.0.0.1chaos.ch6.x86_64 #1
From /tftpboot/dumps/192.168.64.82-2020-08-12-13:23:27/vmcore-dmesg.txt
and /tftpboot/dumps/192.168.66.180-2020-08-12-16:39:36/vmcore-dmesg.txt
Peter,
We have never seen this under an earlier version of Lustre 2.12.x, but this is the first 2.12 we deployed widely.
We have never seen this under any 2.10.x version.
thanks