Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.16.0
-
None
-
3
-
9223372036854775807
Description
There's a periodic crash in sanity-flr in master that looks like this:
[26047.097521] Lustre: DEBUG MARKER: == sanity-flr test 200b: racing IO, mirror extend and resync ========================================================== 09:51:50 (1693475510) [26047.201126] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre2 [26047.214264] Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,flock trevis-23vm8@tcp:/lustre /mnt/lustre2 [26047.252017] Lustre: Mounted lustre-client [26047.252911] Lustre: Skipped 1 previous similar message [26047.286711] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre3 [26047.298657] Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,flock trevis-23vm8@tcp:/lustre /mnt/lustre3 [26053.066268] LustreError: 11-0: lustre-OST0003-osc-ffff9e711d5ac000: operation ost_fallocate to node 10.240.38.66@tcp failed: rc = -524 [26098.433176] Lustre: *** cfs_fail_loc=1423, val=0*** [26098.437521] LustreError: 329308:0:(vvp_page.c:119:vvp_vmpage_error()) LBUG [26098.438976] Pid: 329308, comm: ptlrpcd_00_01 4.18.0-477.15.1.el8_8.x86_64 #1 SMP Fri Jun 2 08:27:19 EDT 2023 [26098.440832] Call Trace TBD: [26098.441616] [<0>] libcfs_call_trace+0x6f/0xa0 [libcfs] [26098.442707] [<0>] lbug_with_loc+0x3f/0x70 [libcfs] [26098.443675] [<0>] vvp_page_completion_write+0x2f7/0x400 [lustre] [26098.445022] [<0>] cl_page_completion+0x170/0x430 [obdclass] [26098.446365] [<0>] osc_ap_completion.isra.34+0x138/0x3e0 [osc] [26098.447567] [<0>] osc_extent_finish+0x203/0x9f0 [osc] [26098.448577] [<0>] brw_interpret+0x1c3/0xdb0 [osc] [26098.449527] [<0>] ptlrpc_check_set+0x53a/0x1e70 [ptlrpc] [26098.450855] [<0>] ptlrpcd+0x856/0xa70 [ptlrpc] [26098.451782] [<0>] kthread+0x134/0x150 [26098.452577] [<0>] ret_from_fork+0x35/0x40 [26098.453420] Kernel panic - not syncing: LBUG
latest occurrence: https://testing.whamcloud.com/test_sets/1e9bca24-c969-4c92-a668-84678334a22a
First occurrence in maloo on May 24: https://testing.whamcloud.com/test_sets/0c20e209-9122-4091-8244-e0c7b6ac8b2a
First ever occurrence happened to be in janitor in special testing of this patch (where the test was added even: https://review.whamcloud.com/c/fs/lustre-release/+/46413
The patch landed in April 2023 so it's likely the culprit here.