[LU-16139] statahead: avoid new RPC and long wait in the statahead interpret callback Created: 07/Sep/22  Updated: 25/Oct/22  Resolved: 25/Oct/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Minor
Reporter: Qian Yingjin Assignee: Qian Yingjin
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-14139 batched statahead processing Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

The replay-dual/26 timeout with the following stack trace:

[Tue Sep  6 05:21:32 2022] task:ptlrpcd_00_01   state:I stack:    0 pid: 8026 ppid:     2 flags:0x80004080
[Tue Sep  6 05:21:32 2022] Call Trace:
[Tue Sep  6 05:21:32 2022]  __schedule+0x2bd/0x760
[Tue Sep  6 05:21:32 2022]  schedule+0x37/0xa0
[Tue Sep  6 05:21:32 2022]  osc_extent_wait+0x44d/0x560 [osc]
[Tue Sep  6 05:21:32 2022]  ? finish_wait+0x80/0x80
[Tue Sep  6 05:21:32 2022]  osc_cache_wait_range+0x2b8/0x930 [osc]
[Tue Sep  6 05:21:32 2022]  osc_io_fsync_end+0x67/0x80 [osc]
[Tue Sep  6 05:21:32 2022]  cl_io_end+0x58/0x130 [obdclass]
[Tue Sep  6 05:21:32 2022]  lov_io_end_wrapper+0xcf/0xe0 [lov]
[Tue Sep  6 05:21:32 2022]  lov_io_fsync_end+0x6f/0x1c0 [lov]
[Tue Sep  6 05:21:32 2022]  cl_io_end+0x58/0x130 [obdclass]
[Tue Sep  6 05:21:32 2022]  cl_io_loop+0xa7/0x200 [obdclass]
[Tue Sep  6 05:21:32 2022]  cl_sync_file_range+0x2c9/0x340 [lustre]
[Tue Sep  6 05:21:32 2022]  vvp_prune+0x5d/0x1e0 [lustre]
[Tue Sep  6 05:21:32 2022]  cl_object_prune+0x58/0x130 [obdclass]
[Tue Sep  6 05:21:32 2022]  lov_layout_change.isra.47+0x1ba/0x640 [lov]
[Tue Sep  6 05:21:32 2022]  lov_conf_set+0x38d/0x4e0 [lov]
[Tue Sep  6 05:21:32 2022]  cl_conf_set+0x60/0x140 [obdclass]
[Tue Sep  6 05:21:32 2022]  cl_file_inode_init+0xc8/0x380 [lustre]
[Tue Sep  6 05:21:32 2022]  ll_update_inode+0x432/0x6e0 [lustre]
[Tue Sep  6 05:21:32 2022]  ll_iget+0x227/0x320 [lustre]
[Tue Sep  6 05:21:32 2022]  ll_prep_inode+0x344/0xb60 [lustre]
[Tue Sep  6 05:21:32 2022]  ll_statahead_interpret_common.isra.26+0x69/0x830 [lustre]
[Tue Sep  6 05:21:32 2022]  ll_statahead_interpret+0x2c8/0x5b0 [lustre]
[Tue Sep  6 05:21:32 2022]  mdc_intent_getattr_async_interpret+0x14a/0x3e0 [mdc]
[Tue Sep  6 05:21:32 2022]  ptlrpc_check_set+0x5b8/0x1fe0 [ptlrpc]
[Tue Sep  6 05:21:32 2022]  ptlrpcd+0x6c6/0xa50 [ptlrpc]
[Tue Sep  6 05:21:32 2022]  ? do_wait_intr_irq+0xb0/0xb0
[Tue Sep  6 05:21:32 2022]  ? ptlrpcd_add_req+0x2f0/0x2f0 [ptlrpc]
[Tue Sep  6 05:21:32 2022]  kthread+0x116/0x130
[Tue Sep  6 05:21:32 2022]  ? kthread_flush_work_fn+0x10/0x10
[Tue Sep  6 05:21:32 2022]  ret_from_fork+0x35/0x40

The reason is that we wait for file range sync during the layout change for the regular file, it is dangerous to block the ptlrpcd interpret callback context for a long time.

The solution is use work queue to do the @ll_prep_inode call in a separate thread.



 Comments   
Comment by Qian Yingjin [ 07/Sep/22 ]

https://review.whamcloud.com/#/c/fs/lustre-release/+/48451 LU-16139 statahead: avoid to block ptlrpcd interpret context

Comment by Gerrit Updater [ 25/Oct/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48451/
Subject: LU-16139 statahead: avoid to block ptlrpcd interpret context
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 2e0897439014338553a51fae338fb2c1b655f067

Comment by Peter Jones [ 25/Oct/22 ]

Landed for 2.16

Generated at Sat Feb 10 03:24:21 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.