[LU-16468] some miscellaneous IOs need to protect accessing layout Created: 13/Jan/23  Updated: 11/Dec/23  Resolved: 18/Nov/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Major
Reporter: Zhenyu Xu Assignee: Zhenyu Xu
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
is duplicated by LU-16926 Crash in lov_attr_get_composite() run... Resolved
Related
is related to LU-16926 Crash in lov_attr_get_composite() run... Resolved
is related to LU-17177 prevent DoM read-on-open with FLR Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   
[ 4822.168266] LustreError: 24479:0:(file.c:249:ll_close_inode_openhandle()) Skipped 10 previous similar messages
[ 4968.682022] LustreError: 24474:0:(lov_cl_internal.h:399:lov_mirror_entry()) ASSERTION( i < lov->u.composite.lo_mirror_count ) failed: [0x2000059f1:0x8551:0x0] entry 2, mirror_count 2
[ 4968.686010] LustreError: 24474:0:(lov_cl_internal.h:399:lov_mirror_entry()) LBUG
[ 4968.687780] Pid: 24474, comm: lpurge 4.18.0-425.3.1.el8_lustre.x86_64 #1 SMP Wed Jan 4 21:00:25 UTC 2023
[ 4968.690555] Call Trace TBD:
[ 4968.691565] [<0>] libcfs_call_trace+0x6f/0xa0 [libcfs]
[ 4968.693025] [<0>] lbug_with_loc+0x3f/0x70 [libcfs]
[ 4968.694560] [<0>] lov_io_layout_at+0x1ab/0x1e0 [lov]
[ 4968.695786] [<0>] lov_page_init_composite+0xe6/0x5d0 [lov]
[ 4968.696852] [<0>] cl_page_alloc+0x1a8/0x650 [obdclass]
[ 4968.698057] [<0>] cl_page_find+0x188/0x230 [obdclass]
[ 4968.699329] [<0>] ll_dom_finish_open+0x4dd/0x920 [lustre]
[ 4968.700569] [<0>] ll_lookup_it_finish.constprop.30+0xf2b/0x1160 [lustre]
[ 4968.701950] [<0>] ll_lookup_it+0x6d1/0x16b0 [lustre]
[ 4968.703089] [<0>] ll_atomic_open+0x25a/0x1a60 [lustre]
[ 4968.704242] [<0>] path_openat+0xf0d/0x1500
[ 4968.705284] [<0>] do_filp_open+0x93/0x100
[ 4968.706235] [<0>] do_sys_openat2+0x211/0x2b0
[ 4968.707292] [<0>] do_sys_open+0x4b/0x80
[ 4968.708306] [<0>] do_syscall_64+0x5b/0x1b0
[ 4968.709377] [<0>] entry_SYSCALL_64_after_hwframe+0x61/0xc6
[ 4968.710705] Kernel panic - not syncing: LBUG
[ 4968.711793] CPU: 19 PID: 24474 Comm: lpurge Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-425.3.1.el8_lustre.x86_64 #1
[ 4968.730358] Hardware name: DDN SFA200NVX2E, BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[ 4968.732451] Call Trace:
[ 4968.733260]  dump_stack+0x41/0x60
[ 4968.734186]  panic+0xe7/0x2ac
[ 4968.735036]  ? entry_SYSCALL_64_after_hwframe+0x61/0xc6
[ 4968.736296]  lbug_with_loc.cold.8+0x18/0x18 [libcfs]
[ 4968.737517]  lov_io_layout_at+0x1ab/0x1e0 [lov]
[ 4968.738632]  lov_page_init_composite+0xe6/0x5d0 [lov]
[ 4968.739715]  cl_page_alloc+0x1a8/0x650 [obdclass]
[ 4968.740682]  ? ll_file_futimes_3+0x250/0x250 [lustre]
[ 4968.741663]  cl_page_find+0x188/0x230 [obdclass]
[ 4968.742752]  ll_dom_finish_open+0x4dd/0x920 [lustre]
[ 4968.743991]  ? __ldlm_handle2lock+0x100/0x3e0 [ptlrpc]
[ 4968.745269]  ? mdc_set_lock_data+0x12c/0x1f0 [mdc]
[ 4968.746250]  ll_lookup_it_finish.constprop.30+0xf2b/0x1160 [lustre]
[ 4968.747671]  ? ll_intent_lock+0x426/0x840 [lustre]
[ 4968.748812]  ll_lookup_it+0x6d1/0x16b0 [lustre]
[ 4968.749914]  ? __wake_up_common_lock+0x89/0xc0
[ 4968.750963]  ? kfree+0x22e/0x250
[ 4968.751791]  ? __req_capsule_get+0x556/0x770 [ptlrpc]
[ 4968.753013]  ? page_counter_cancel+0x1f/0x40
[ 4968.753870]  ? page_counter_uncharge+0x1d/0x40
[ 4968.754827]  ? page_counter_cancel+0x1f/0x40
[ 4968.755788]  ? page_counter_uncharge+0x1d/0x40
[ 4968.756726]  ? drain_stock.isra.51+0x5b/0x80
[ 4968.757574]  ? memcg_slab_post_alloc_hook+0x13a/0x1a0
[ 4968.758594]  ll_atomic_open+0x25a/0x1a60 [lustre]
[ 4968.759614]  ? d_alloc_parallel+0xaa/0x4b0
[ 4968.760463]  path_openat+0xf0d/0x1500
[ 4968.761256]  ? kfree+0x22e/0x250
[ 4968.761990]  do_filp_open+0x93/0x100
[ 4968.762784]  ? list_lru_add+0xd4/0x140
[ 4968.763594]  ? getname_flags+0x4a/0x1e0
[ 4968.764409]  ? __check_object_size+0xac/0x173
[ 4968.765383]  ? __alloc_fd+0x44/0x150
[ 4968.766151]  do_sys_openat2+0x211/0x2b0
[ 4968.767022]  do_sys_open+0x4b/0x80
[ 4968.767739]  do_syscall_64+0x5b/0x1b0
[ 4968.768525]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
[ 4968.769437] RIP: 0033:0x7f15ebb00a84
[ 4968.770039] Code: 89 7c 24 18 44 89 54 24 0c e8 58 37 f0 ff 44 8b 54 24 0c 8b 54 24 1c 41 89 c0 48 8b 74 24 10 8b 7c 24 18 b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 32 44 89 c7 89 44 24 0c e8 88 37 f0 ff 8b 44
[ 4968.773294] RSP: 002b:00007f15eb149da0 EFLAGS: 00000293 ORIG_RAX: 0000000000000101
[ 4968.774682] RAX: ffffffffffffffda RBX: 00007f15c5a0bf80 RCX: 00007f15ebb00a84
[ 4968.776169] RDX: 0000000000000002 RSI: 00007f15eb149e70 RDI: 0000000000000005
[ 4968.777516] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007f15eb149b87
[ 4968.779007] R10: 0000000000000000 R11: 0000000000000293 R12: 00007f15eb15b650
[ 4968.780482] R13: 00007ffd0895450f R14: 0000000000000000 R15: 0000000000000001


 Comments   
Comment by Gerrit Updater [ 13/Jan/23 ]

"Zhenyu Xu <bobijam@hotmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49622
Subject: LU-16468 llite: protect layout before read IO going
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 387d1898c4b5541ae6013c2c8d25eab66812278e

Comment by Gerrit Updater [ 18/Nov/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49622/
Subject: LU-16468 llite: protect layout before read IO going
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e050b91c6c471d3576eba3bbf4f3c31aef644e3f

Comment by Peter Jones [ 18/Nov/23 ]

Landed for 2.16

Generated at Sat Feb 10 03:27:17 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.