[LU-17070] vvp_vmpage_error()) LBUG Created: 31/Aug/23 Updated: 01/Sep/23 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.16.0 |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Oleg Drokin | Assignee: | Zhenyu Xu |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
There's a periodic crash in sanity-flr in master that looks like this: [26047.097521] Lustre: DEBUG MARKER: == sanity-flr test 200b: racing IO, mirror extend and resync ========================================================== 09:51:50 (1693475510) [26047.201126] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre2 [26047.214264] Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,flock trevis-23vm8@tcp:/lustre /mnt/lustre2 [26047.252017] Lustre: Mounted lustre-client [26047.252911] Lustre: Skipped 1 previous similar message [26047.286711] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre3 [26047.298657] Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,flock trevis-23vm8@tcp:/lustre /mnt/lustre3 [26053.066268] LustreError: 11-0: lustre-OST0003-osc-ffff9e711d5ac000: operation ost_fallocate to node 10.240.38.66@tcp failed: rc = -524 [26098.433176] Lustre: *** cfs_fail_loc=1423, val=0*** [26098.437521] LustreError: 329308:0:(vvp_page.c:119:vvp_vmpage_error()) LBUG [26098.438976] Pid: 329308, comm: ptlrpcd_00_01 4.18.0-477.15.1.el8_8.x86_64 #1 SMP Fri Jun 2 08:27:19 EDT 2023 [26098.440832] Call Trace TBD: [26098.441616] [<0>] libcfs_call_trace+0x6f/0xa0 [libcfs] [26098.442707] [<0>] lbug_with_loc+0x3f/0x70 [libcfs] [26098.443675] [<0>] vvp_page_completion_write+0x2f7/0x400 [lustre] [26098.445022] [<0>] cl_page_completion+0x170/0x430 [obdclass] [26098.446365] [<0>] osc_ap_completion.isra.34+0x138/0x3e0 [osc] [26098.447567] [<0>] osc_extent_finish+0x203/0x9f0 [osc] [26098.448577] [<0>] brw_interpret+0x1c3/0xdb0 [osc] [26098.449527] [<0>] ptlrpc_check_set+0x53a/0x1e70 [ptlrpc] [26098.450855] [<0>] ptlrpcd+0x856/0xa70 [ptlrpc] [26098.451782] [<0>] kthread+0x134/0x150 [26098.452577] [<0>] ret_from_fork+0x35/0x40 [26098.453420] Kernel panic - not syncing: LBUG latest occurrence: https://testing.whamcloud.com/test_sets/1e9bca24-c969-4c92-a668-84678334a22a First occurrence in maloo on May 24: https://testing.whamcloud.com/test_sets/0c20e209-9122-4091-8244-e0c7b6ac8b2a First ever occurrence happened to be in janitor in special testing of this patch (where the test was added even: https://review.whamcloud.com/c/fs/lustre-release/+/46413 The patch landed in April 2023 so it's likely the culprit here. |
| Comments |
| Comment by Andreas Dilger [ 01/Sep/23 ] |
|
Bobijam, could you please take a look. |