[LU-14780] LustreError: 4936:0:(file.c:4985:ll_layout_lock_set()) LBUG Created: 23/Jun/21  Updated: 17/Sep/21  Resolved: 12/Jul/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.15.0

Type: Bug Priority: Major
Reporter: Zhenyu Xu Assignee: Zhenyu Xu
Resolution: Fixed Votes: 0
Labels: LTS12

Issue Links:
Duplicate
duplicates LU-7538 file.c:3891:ll_layout_lock_set()) LBUG Resolved
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

A case happened on client nodes which LBUG on "using ls -l", however, if "lfs getstripe" it first, I can then "ls -l" without triggering the LBUG.

2021-06-03 10:50:33 [ 1584.601278] LustreError: 4936:0:(file.c:4985:ll_layout_lock_set()) ASSERTION( ldlm_has_layout(lock) ) failed: 
2021-06-03 10:50:33 [ 1584.606428] LustreError: 4938:0:(file.c:4985:ll_layout_lock_set()) ASSERTION( ldlm_has_layout(lock) ) failed: 
2021-06-03 10:50:33 [ 1584.612486] LustreError: 4936:0:(file.c:4985:ll_layout_lock_set()) LBUG
2021-06-03 10:50:33 [ 1584.612489] Pid: 4936, comm: dsync 4.18.0-240.15.1.el8.x86_64 #1 SMP Wed Mar 17 23:53:03 UTC 2021
2021-06-03 10:50:33 [ 1584.623732] LustreError: 4938:0:(file.c:4985:ll_layout_lock_set()) LBUG
2021-06-03 10:50:33 [ 1584.631129] Call Trace:
2021-06-03 10:50:33 [ 1584.631157]  libcfs_call_trace+0x86/0xc0 [libcfs]
2021-06-03 10:50:33 [ 1584.656942]  lbug_with_loc+0x43/0x80 [libcfs]
2021-06-03 10:50:33 [ 1584.661880]  ll_layout_lock_set+0xac/0x610 [lustre]
2021-06-03 10:50:33 [ 1584.665010] LustreError: 4935:0:(file.c:4985:ll_layout_lock_set()) ASSERTION( ldlm_has_layout(lock) ) failed: 
2021-06-03 10:50:33 [ 1584.667385]  ll_layout_refresh+0x1b7/0x310 [lustre]
2021-06-03 10:50:33 [ 1584.678675] LustreError: 4935:0:(file.c:4985:ll_layout_lock_set()) LBUG
2021-06-03 10:50:33 [ 1584.684169]  vvp_io_init+0x20c/0x340 [lustre]
2021-06-03 10:50:33 [ 1584.695274] LustreError: 4931:0:(file.c:4985:ll_layout_lock_set()) ASSERTION( ldlm_has_layout(lock) ) failed: 
2021-06-03 10:50:33 [ 1584.696524]  cl_io_init0.isra.16+0x83/0x130 [obdclass]
2021-06-03 10:50:33 [ 1584.707788] LustreError: 4931:0:(file.c:4985:ll_layout_lock_set()) LBUG
2021-06-03 10:50:33 [ 1584.713559]  cl_glimpse_size0+0x97/0x240 [lustre]
2021-06-03 10:50:33 [ 1584.726287]  ll_getattr+0x1d6/0x460 [lustre]
2021-06-03 10:50:33 [ 1584.726847] LustreError: 4939:0:(file.c:4985:ll_layout_lock_set()) ASSERTION( ldlm_has_layout(lock) ) failed: 
2021-06-03 10:50:33 [ 1584.731123]  vfs_statx+0x8a/0xd0
2021-06-03 10:50:33 [ 1584.731125]  __do_sys_newlstat+0x39/0x70
2021-06-03 10:50:33 [ 1584.731129]  do_syscall_64+0x5b/0x1a0
2021-06-03 10:50:33 [ 1584.731132]  entry_SYSCALL_64_after_hwframe+0x65/0xca
2021-06-03 10:50:33 [ 1584.731135]  0xffffffffffffffff
2021-06-03 10:50:33 [ 1584.731138] Kernel panic - not syncing: LBUG
2021-06-03 10:50:33 [ 1584.731139] Pid: 4938, comm: dsync 4.18.0-240.15.1.el8.x86_64 #1 SMP Wed Mar 17 23:53:03 UTC 2021
2021-06-03 10:50:33 [ 1584.731140] Call Trace:
2021-06-03 10:50:33 [ 1584.731170]  libcfs_call_trace+0x86/0xc0 [libcfs]
2021-06-03 10:50:33 [ 1584.731175]  lbug_with_loc+0x43/0x80 [libcfs]
2021-06-03 10:50:33 [ 1584.731201]  ll_layout_lock_set+0xac/0x610 [lustre]
2021-06-03 10:50:34 [ 1584.731209]  ll_layout_refresh+0x1b7/0x310 [lustre]
2021-06-03 10:50:34 [ 1584.731220]  vvp_io_init+0x20c/0x340 [lustre]
2021-06-03 10:50:34 [ 1584.731257]  cl_io_init0.isra.16+0x83/0x130 [obdclass]
2021-06-03 10:50:34 [ 1584.731267]  cl_glimpse_size0+0x97/0x240 [lustre]
2021-06-03 10:50:34 [ 1584.731275]  ll_getattr+0x1d6/0x460 [lustre]
2021-06-03 10:50:34 [ 1584.731280]  vfs_statx+0x8a/0xd0
2021-06-03 10:50:34 [ 1584.731281]  __do_sys_newlstat+0x39/0x70
2021-06-03 10:50:34 [ 1584.731285]  do_syscall_64+0x5b/0x1a0
2021-06-03 10:50:34 [ 1584.731288]  entry_SYSCALL_64_after_hwframe+0x65/0xca
2021-06-03 10:50:34 [ 1584.731290]  0xffffffffffffffff
2021-06-03 10:50:34 [ 1584.731292] Pid: 4935, comm: dsync 4.18.0-240.15.1.el8.x86_64 #1 SMP Wed Mar 17 23:53:03 UTC 2021
2021-06-03 10:50:34 [ 1584.731293] Call Trace:
2021-06-03 10:50:34 [ 1584.731314]  libcfs_call_trace+0x86/0xc0 [libcfs]
2021-06-03 10:50:34 [ 1584.731318]  lbug_with_loc+0x43/0x80 [libcfs]
2021-06-03 10:50:34 [ 1584.731338]  ll_layout_lock_set+0xac/0x610 [lustre]
2021-06-03 10:50:34 [ 1584.731347]  ll_layout_refresh+0x1b7/0x310 [lustre]
2021-06-03 10:50:34 [ 1584.731357]  vvp_io_init+0x20c/0x340 [lustre]
2021-06-03 10:50:34 [ 1584.731386]  cl_io_init0.isra.16+0x83/0x130 [obdclass]
2021-06-03 10:50:34 [ 1584.731396]  cl_glimpse_size0+0x97/0x240 [lustre]
2021-06-03 10:50:34 [ 1584.731405]  ll_getattr+0x1d6/0x460 [lustre]
2021-06-03 10:50:34 [ 1584.731408]  vfs_statx+0x8a/0xd0
2021-06-03 10:50:34 [ 1584.731410]  __do_sys_newlstat+0x39/0x70
2021-06-03 10:50:34 [ 1584.731413]  do_syscall_64+0x5b/0x1a0
2021-06-03 10:50:34 [ 1584.731415]  entry_SYSCALL_64_after_hwframe+0x65/0xca
2021-06-03 10:50:34 [ 1584.731417]  0xffffffffffffffff
2021-06-03 10:50:34 [ 1584.731418] Pid: 4931, comm: dsync 4.18.0-240.15.1.el8.x86_64 #1 SMP Wed Mar 17 23:53:03 UTC 2021
2021-06-03 10:50:34 [ 1584.731420] Call Trace:
2021-06-03 10:50:34 [ 1584.731434]  libcfs_call_trace+0x86/0xc0 [libcfs]
2021-06-03 10:50:34 [ 1584.731439]  lbug_with_loc+0x43/0x80 [libcfs]
2021-06-03 10:50:34 [ 1584.731451]  ll_layout_lock_set+0xac/0x610 [lustre]
2021-06-03 10:50:34 [ 1584.731462]  ll_layout_refresh+0x1b7/0x310 [lustre]
2021-06-03 10:50:34 [ 1584.731474]  vvp_io_init+0x20c/0x340 [lustre]
2021-06-03 10:50:34 [ 1584.731492]  cl_io_init0.isra.16+0x83/0x130 [obdclass]
2021-06-03 10:50:34 [ 1584.731503]  cl_glimpse_size0+0x97/0x240 [lustre]
2021-06-03 10:50:34 [ 1584.731513]  ll_getattr+0x1d6/0x460 [lustre]
2021-06-03 10:50:34 [ 1584.731516]  vfs_statx+0x8a/0xd0
2021-06-03 10:50:34 [ 1584.731518]  __do_sys_newlstat+0x39/0x70
2021-06-03 10:50:34 [ 1584.731520]  do_syscall_64+0x5b/0x1a0
2021-06-03 10:50:34 [ 1584.731522]  entry_SYSCALL_64_after_hwframe+0x65/0xca
2021-06-03 10:50:34 [ 1584.731523]  0xffffffffffffffff
2021-06-03 10:50:34 [ 1584.742387] LustreError: 4939:0:(file.c:4985:ll_layout_lock_set()) LBUG
2021-06-03 10:50:34 [ 1584.746013] CPU: 29 PID: 4936 Comm: dsync Tainted: G           OE    --------- -  - 4.18.0-240.15.1.el8.x86_64 #1
2021-06-03 10:50:34 [ 1584.750438] Pid: 4939, comm: dsync 4.18.0-240.15.1.el8.x86_64 #1 SMP Wed Mar 17 23:53:03 UTC 2021
2021-06-03 10:50:34 [ 1584.754554] Call Trace:
2021-06-03 10:50:34 [ 1584.760241] Call Trace:
2021-06-03 10:50:34 [ 1584.763786]  dump_stack+0x5c/0x80
2021-06-03 10:50:34 [ 1584.768589]  libcfs_call_trace+0x86/0xc0 [libcfs]
2021-06-03 10:50:34 [ 1584.778951]  panic+0xe7/0x2a9
2021-06-03 10:50:34 [ 1584.781733]  lbug_with_loc+0x43/0x80 [libcfs]
2021-06-03 10:50:34 [ 1584.787014]  lbug_with_loc.cold.3+0x18/0x18 [libcfs]
2021-06-03 10:50:34 [ 1584.791932]  ll_layout_lock_set+0xac/0x610 [lustre]
2021-06-03 10:50:34 [ 1584.797413]  ll_layout_lock_set+0xac/0x610 [lustre]
2021-06-03 10:50:34 [ 1584.802900]  ll_layout_refresh+0x1b7/0x310 [lustre]
2021-06-03 10:50:34 [ 1584.807797]  ll_layout_refresh+0x1b7/0x310 [lustre]
2021-06-03 10:50:34 [ 1584.813573]  vvp_io_init+0x20c/0x340 [lustre]
2021-06-03 10:50:34 [ 1584.818854]  vvp_io_init+0x20c/0x340 [lustre]
2021-06-03 10:50:34 [ 1584.823667]  cl_io_init0.isra.16+0x83/0x130 [obdclass]
2021-06-03 10:50:34 [ 1584.827821]  cl_io_init0.isra.16+0x83/0x130 [obdclass]
2021-06-03 10:50:34 [ 1584.832766]  cl_glimpse_size0+0x97/0x240 [lustre]
2021-06-03 10:50:34 [ 1584.837401]  cl_glimpse_size0+0x97/0x240 [lustre]
2021-06-03 10:50:34 [ 1584.843609]  ll_getattr+0x1d6/0x460 [lustre]
2021-06-03 10:50:34 [ 1584.847642]  ll_getattr+0x1d6/0x460 [lustre]
2021-06-03 10:50:34 [ 1584.858503]  vfs_statx+0x8a/0xd0
2021-06-03 10:50:34 [ 1584.861769]  vfs_statx+0x8a/0xd0
2021-06-03 10:50:34 [ 1584.867591]  __do_sys_newlstat+0x39/0x70
2021-06-03 10:50:34 [ 1584.872992]  __do_sys_newlstat+0x39/0x70
2021-06-03 10:50:34 [ 1584.878985]  do_syscall_64+0x5b/0x1a0
2021-06-03 10:50:34 [ 1584.884952]  do_syscall_64+0x5b/0x1a0
2021-06-03 10:50:34 [ 1584.890314]  entry_SYSCALL_64_after_hwframe+0x65/0xca
2021-06-03 10:50:34 [ 1584.896536]  entry_SYSCALL_64_after_hwframe+0x65/0xca
2021-06-03 10:50:34 [ 1584.902384]  0xffffffffffffffff
2021-06-03 10:50:34 [ 1584.907682] RIP: 0033:0x7f1686fc6d89
2021-06-03 10:50:34 [ 1585.205327] Code: 64 c7 00 16 00 00 00 b8 ff ff ff ff c3 0f 1f 40 00 f3 0f 1e fa 48 89 f0 83 ff 01 77 34 48 89 c7 48 89 d6 b8 06 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 07 c3 66 0f 1f 44 00 00 48 8b 15 c9 00 2d 00
2021-06-03 10:50:34 [ 1585.226999] RSP: 002b:00007ffdf03e1458 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
2021-06-03 10:50:34 [ 1585.235853] RAX: ffffffffffffffda RBX: 00007ffdf03e2750 RCX: 00007f1686fc6d89
2021-06-03 10:50:34 [ 1585.244257] RDX: 00007ffdf03e15f0 RSI: 00007ffdf03e15f0 RDI: 00007ffdf03e1680
2021-06-03 10:50:34 [ 1585.252633] RBP: 00007ffdf03e1480 R08: 0000000000000000 R09: ffffff00000fffff
2021-06-03 10:50:34 [ 1585.260999] R10: 00262f4a53a31a46 R11: 0000000000000246 R12: 00000000023e72f0
2021-06-03 10:50:34 [ 1585.269375] R13: 00007f1687dda308 R14: 00007f1687dda380 R15: 00007f1687dda2e8
2021-06-03 10:50:35 [ 1586.310396] Shutting down cpus with NMI
2021-06-03 10:50:35 [ 1586.438268] Kernel Offset: 0x37a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
2021-06-03 10:50:35 [ 1586.499673] ---[ end Kernel panic - not syncing: LBUG ]---

While after rebooting one of our MDSs several times (in relation to other Lustre issues) this is no longer reproducible. So it may have been induced by one of the MDTs being in a bad state.



 Comments   
Comment by Gerrit Updater [ 23/Jun/21 ]

Bobi Jam (bobijam@hotmail.com) uploaded a new patch: https://review.whamcloud.com/44054
Subject: LU-14780 llite: failed ASSERTION(ldlm_has_layout(lock))
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: eac0f10c2adaddc61d73a13cae114f05afe4861c

Comment by Gerrit Updater [ 12/Jul/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/44054/
Subject: LU-14780 llite: failed ASSERTION(ldlm_has_layout(lock))
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 1b166d6dd6a2f39dfe35b60be169b288665d0283

Comment by Peter Jones [ 12/Jul/21 ]

Landed for 2.15

Comment by Gerrit Updater [ 12/Jul/21 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/44214
Subject: LU-14780 llite: failed ASSERTION(ldlm_has_layout(lock))
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 864c86978d7a267046e08f3cee8d0bd1da74126a

Generated at Sat Feb 10 03:12:43 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.