Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16139

statahead: avoid new RPC and long wait in the statahead interpret callback

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      The replay-dual/26 timeout with the following stack trace:

      [Tue Sep  6 05:21:32 2022] task:ptlrpcd_00_01   state:I stack:    0 pid: 8026 ppid:     2 flags:0x80004080
      [Tue Sep  6 05:21:32 2022] Call Trace:
      [Tue Sep  6 05:21:32 2022]  __schedule+0x2bd/0x760
      [Tue Sep  6 05:21:32 2022]  schedule+0x37/0xa0
      [Tue Sep  6 05:21:32 2022]  osc_extent_wait+0x44d/0x560 [osc]
      [Tue Sep  6 05:21:32 2022]  ? finish_wait+0x80/0x80
      [Tue Sep  6 05:21:32 2022]  osc_cache_wait_range+0x2b8/0x930 [osc]
      [Tue Sep  6 05:21:32 2022]  osc_io_fsync_end+0x67/0x80 [osc]
      [Tue Sep  6 05:21:32 2022]  cl_io_end+0x58/0x130 [obdclass]
      [Tue Sep  6 05:21:32 2022]  lov_io_end_wrapper+0xcf/0xe0 [lov]
      [Tue Sep  6 05:21:32 2022]  lov_io_fsync_end+0x6f/0x1c0 [lov]
      [Tue Sep  6 05:21:32 2022]  cl_io_end+0x58/0x130 [obdclass]
      [Tue Sep  6 05:21:32 2022]  cl_io_loop+0xa7/0x200 [obdclass]
      [Tue Sep  6 05:21:32 2022]  cl_sync_file_range+0x2c9/0x340 [lustre]
      [Tue Sep  6 05:21:32 2022]  vvp_prune+0x5d/0x1e0 [lustre]
      [Tue Sep  6 05:21:32 2022]  cl_object_prune+0x58/0x130 [obdclass]
      [Tue Sep  6 05:21:32 2022]  lov_layout_change.isra.47+0x1ba/0x640 [lov]
      [Tue Sep  6 05:21:32 2022]  lov_conf_set+0x38d/0x4e0 [lov]
      [Tue Sep  6 05:21:32 2022]  cl_conf_set+0x60/0x140 [obdclass]
      [Tue Sep  6 05:21:32 2022]  cl_file_inode_init+0xc8/0x380 [lustre]
      [Tue Sep  6 05:21:32 2022]  ll_update_inode+0x432/0x6e0 [lustre]
      [Tue Sep  6 05:21:32 2022]  ll_iget+0x227/0x320 [lustre]
      [Tue Sep  6 05:21:32 2022]  ll_prep_inode+0x344/0xb60 [lustre]
      [Tue Sep  6 05:21:32 2022]  ll_statahead_interpret_common.isra.26+0x69/0x830 [lustre]
      [Tue Sep  6 05:21:32 2022]  ll_statahead_interpret+0x2c8/0x5b0 [lustre]
      [Tue Sep  6 05:21:32 2022]  mdc_intent_getattr_async_interpret+0x14a/0x3e0 [mdc]
      [Tue Sep  6 05:21:32 2022]  ptlrpc_check_set+0x5b8/0x1fe0 [ptlrpc]
      [Tue Sep  6 05:21:32 2022]  ptlrpcd+0x6c6/0xa50 [ptlrpc]
      [Tue Sep  6 05:21:32 2022]  ? do_wait_intr_irq+0xb0/0xb0
      [Tue Sep  6 05:21:32 2022]  ? ptlrpcd_add_req+0x2f0/0x2f0 [ptlrpc]
      [Tue Sep  6 05:21:32 2022]  kthread+0x116/0x130
      [Tue Sep  6 05:21:32 2022]  ? kthread_flush_work_fn+0x10/0x10
      [Tue Sep  6 05:21:32 2022]  ret_from_fork+0x35/0x40
      

      The reason is that we wait for file range sync during the layout change for the regular file, it is dangerous to block the ptlrpcd interpret callback context for a long time.

      The solution is use work queue to do the @ll_prep_inode call in a separate thread.

      Attachments

        Issue Links

          Activity

            People

              qian_wc Qian Yingjin
              qian_wc Qian Yingjin
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: