[LU-14919] Crash servers at umount when if fakeio enabled Created: 09/Aug/21  Updated: 12/Aug/21

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.15.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Shuichi Ihara Assignee: Patrick Farrell
Resolution: Unresolved Votes: 0
Labels: None
Environment:

master


Attachments: Text File vmcore-dmesg.txt    
Issue Links:
Related
is related to LU-14935 Improve fake i/o performance Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Lustre servers always crashes at umount OST/MDT when if fake-io (lctl set_param fail_loc=0x238) enabled.

[ 3501.686893] Lustre: Skipped 160413797 previous similar messages
[ 3574.947325] Lustre: *** cfs_fail_loc=238, val=0***
[ 3574.947329] Lustre: Skipped 1158454 previous similar messages
[73578.576748] Lustre: Failing over ai400x-OST0001
[73578.580965] Lustre: server umount ai400x-OST0001 complete
[73578.763701] LustreError: 12767:0:(osd_handler.c:7614:osd_key_fini()) ASSERTION( PageLocked(page) ) failed: 
[73578.763703] LustreError: 2209:0:(osd_handler.c:7614:osd_key_fini()) ASSERTION( PageLocked(page) ) failed: 
[73578.763705] LustreError: 2214:0:(osd_handler.c:7614:osd_key_fini()) ASSERTION( PageLocked(page) ) failed: 
[73578.763707] LustreError: 12768:0:(osd_handler.c:7614:osd_key_fini()) ASSERTION( PageLocked(page) ) failed: 
[73578.763708] LustreError: 2209:0:(osd_handler.c:7614:osd_key_fini()) LBUG
[73578.763709] LustreError: 2214:0:(osd_handler.c:7614:osd_key_fini()) LBUG
[73578.763712] LustreError: 12768:0:(osd_handler.c:7614:osd_key_fini()) LBUG
[73578.763714] Pid: 2209, comm: ll_ost_io01_015 3.10.0-1160.31.1.el7_lustre.ddn15.x86_64 #1 SMP Sat Jun 19 00:02:22 PDT 2021
[73578.763714] Call Trace:
[73578.763782] [<0>] libcfs_call_trace+0x90/0xf0 [libcfs]
[73578.763789] [<0>] lbug_with_loc+0x4c/0xa0 [libcfs]
[73578.763830] [<0>] osd_key_fini+0x847/0x980 [osd_ldiskfs]
[73578.763991] [<0>] key_fini+0x53/0x170 [obdclass]
[73578.764031] [<0>] lu_context_fini+0x4d/0x260 [obdclass]
[73578.764266] [<0>] ptlrpc_main+0x7a0/0x14d0 [ptlrpc]
[73578.764280] [<0>] kthread+0xd1/0xe0
[73578.764284] [<0>] ret_from_fork_nospec_begin+0x7/0x21
[73578.764422] [<0>] 0xfffffffffffffffe
[73578.764424] Pid: 2214, comm: ll_ost_io01_017 3.10.0-1160.31.1.el7_lustre.ddn15.x86_64 #1 SMP Sat Jun 19 00:02:22 PDT 2021
[73578.764425] Kernel panic - not syncing: LBUG
[73578.764426] Call Trace:
[73578.764428] CPU: 5 PID: 2209 Comm: ll_ost_io01_015 Kdump: loaded Tainted: G           OE  ------------ T 3.10.0-1160.31.1.el7_lustre.ddn15.x86_64 #1
[73578.764429] Hardware name: DDN SFA400NVXE, BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[73578.764430] Call Trace:
[73578.764434]  [<ffffffff8f5835a9>] dump_stack+0x19/0x1b
[73578.764443] [<0>] libcfs_call_trace+0x90/0xf0 [libcfs]
[73578.764446]  [<ffffffff8f57d2b1>] panic+0xe8/0x21f
[73578.764453] [<0>] lbug_with_loc+0x4c/0xa0 [libcfs]
[73578.764461]  [<ffffffffc0b392cb>] lbug_with_loc+0x9b/0xa0 [libcfs]
[73578.764474] [<0>] osd_key_fini+0x847/0x980 [osd_ldiskfs]
[73578.764486]  [<ffffffffc14f6a67>] osd_key_fini+0x847/0x980 [osd_ldiskfs]
[73578.764518] [<0>] key_fini+0x53/0x170 [obdclass]
[73578.764549]  [<ffffffffc0c92613>] key_fini+0x53/0x170 [obdclass]
[73578.764577] [<0>] lu_context_fini+0x4d/0x260 [obdclass]
[73578.764608]  [<ffffffffc0c93fbd>] lu_context_fini+0x4d/0x260 [obdclass]
[73578.764659] [<0>] ptlrpc_main+0x7a0/0x14d0 [ptlrpc]
[73578.764710]  [<ffffffffc0f97f30>] ptlrpc_main+0x7a0/0x14d0 [ptlrpc]
[73578.764713] [<0>] kthread+0xd1/0xe0
[73578.764716]  [<ffffffff8eed4a8e>] ? finish_task_switch+0x4e/0x1c0
[73578.764718] [<0>] ret_from_fork_nospec_begin+0x7/0x21
[73578.764769]  [<ffffffffc0f97790>] ? ptlrpc_wait_event+0x5c0/0x5c0 [ptlrpc]
[73578.764784] [<0>] 0xfffffffffffffffe
[73578.764786]  [<ffffffff8eec5e31>] kthread+0xd1/0xe0
[73578.764787] Pid: 12768, comm: ll_ost_io01_001 3.10.0-1160.31.1.el7_lustre.ddn15.x86_64 #1 SMP Sat Jun 19 00:02:22 PDT 2021
[73578.764789]  [<ffffffff8eec5d60>] ? insert_kthread_work+0x40/0x40
[73578.764790] Call Trace:
[73578.764792]  [<ffffffff8f595ddd>] ret_from_fork_nospec_begin+0x7/0x21
[73578.764794]  [<ffffffff8eec5d60>] ? insert_kthread_work+0x40/0x40
[73578.764801] [<0>] libcfs_call_trace+0x90/0xf0 [libcfs]
[73578.764807] [<0>] lbug_with_loc+0x4c/0xa0 [libcfs]
[73578.764816] [<0>] osd_key_fini+0x847/0x980 [osd_ldiskfs]


 Comments   
Comment by Gerrit Updater [ 12/Aug/21 ]

"Patrick Farrell <pfarrell@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/44653
Subject: LU-14919 osd-ldiskfs: Fix fake i/o page unlocking
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 451c0470b215b127e4dbf6a51a187876a3711ada

Generated at Sat Feb 10 03:13:53 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.