Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16180

lustre 2.14.0_ddn54 + 5.15 kernel soft cpu lockups

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • Lustre 2.16.0
    • None
    • 3
    • 9223372036854775807

    Description

      While testing Lustre client code against a 5.15 kernel system, soft cpu lockups were caused when doing some FIO based tests:

      [Wed Aug 10 13:40:59 2022] watchdog: BUG: soft lockup - CPU#9 stuck for 48s! [ptlrpcd_04_01:1734]
      [Wed Aug 10 13:40:59 2022] CPU: 9 PID: 1734 Comm: ptlrpcd_04_01 Tainted: G O L 5.15.43.hrtdev #1
      [Wed Aug 10 13:40:59 2022] Hardware name: OpenStack Foundation OpenStack Nova, BIOS 1.12.0-1 04/01/2014
      [Wed Aug 10 13:40:59 2022] RIP: 0010:_raw_spin_unlock_irqrestore+0x21/0x30
      [Wed Aug 10 13:40:59 2022] Code: 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 c6 07 00 0f 1f 40 00 f7 c6 00 02 00 00 75 02 5d c3 fb 66 0f 1f 44 00 00 <5d> c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 55 48
      [Wed Aug 10 13:40:59 2022] RSP: 0018:ffffb91341a4bba8 EFLAGS: 00000206
      [Wed Aug 10 13:40:59 2022] RAX: ffffe4a150371b80 RBX: ffffe4a150371b80 RCX: 00000000ffffffff
      [Wed Aug 10 13:40:59 2022] RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffffa406f4e9b050
      [Wed Aug 10 13:40:59 2022] RBP: ffffb91341a4bba8 R08: 000000000000007d R09: 00000000000b6557
      [Wed Aug 10 13:40:59 2022] R10: 0000000000000009 R11: ffffb91341a4bb78 R12: ffffa406f4e9b000
      [Wed Aug 10 13:40:59 2022] R13: 0000000000000002 R14: 0000000000000003 R15: 0000000000000002
      [Wed Aug 10 13:40:59 2022] FS: 0000000000000000(0000) GS:ffffa40d51a40000(0000) knlGS:0000000000000000
      [Wed Aug 10 13:40:59 2022] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [Wed Aug 10 13:40:59 2022] CR2: 0000000000e319cc CR3: 00000001099dc000 CR4: 00000000003506e0
      [Wed Aug 10 13:40:59 2022] Call Trace:
      [Wed Aug 10 13:40:59 2022] <TASK>
      [Wed Aug 10 13:40:59 2022] __page_cache_release+0x1d5/0x220
      [Wed Aug 10 13:40:59 2022] __put_page+0x3a/0x90
      [Wed Aug 10 13:40:59 2022] ptlrpc_release_bulk_page_pin+0x51/0x90 [ptlrpc]
      [Wed Aug 10 13:40:59 2022] ptlrpc_free_bulk+0x95/0x500 [ptlrpc]
      [Wed Aug 10 13:40:59 2022] __ptlrpc_req_finished+0x350/0x730 [ptlrpc]
      [Wed Aug 10 13:40:59 2022] ptlrpc_free_request+0x65/0x70 [ptlrpc]
      [Wed Aug 10 13:40:59 2022] ptlrpc_free_committed+0x110/0x6f0 [ptlrpc]
      [Wed Aug 10 13:40:59 2022] after_reply+0x8ea/0xd80 [ptlrpc]
      [Wed Aug 10 13:40:59 2022] ptlrpc_check_set+0xb29/0x1c90 [ptlrpc]
      [Wed Aug 10 13:40:59 2022] ptlrpcd_check+0x399/0x580 [ptlrpc]
      [Wed Aug 10 13:40:59 2022] ? timer_update_keys+0x40/0x40
      [Wed Aug 10 13:40:59 2022] ptlrpcd+0x3c9/0x4d0 [ptlrpc]
      [Wed Aug 10 13:40:59 2022] ? wait_woken+0x70/0x70
      [Wed Aug 10 13:40:59 2022] ? ptlrpcd_check+0x580/0x580 [ptlrpc]
      

      This is pretty reproducible by just running fio and doing buffered writes.

      Attachments

        Activity

          [LU-16180] lustre 2.14.0_ddn54 + 5.15 kernel soft cpu lockups
          pjones Peter Jones added a comment -

          Landed for 2.16

          pjones Peter Jones added a comment - Landed for 2.16

          "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48629/
          Subject: LU-16180 ptlrpc: reduce lock contention in ptlrpc_free_committed
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: d3074511f3ee322d841c0c0e7f644422e85a543e

          gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48629/ Subject: LU-16180 ptlrpc: reduce lock contention in ptlrpc_free_committed Project: fs/lustre-release Branch: master Current Patch Set: Commit: d3074511f3ee322d841c0c0e7f644422e85a543e

          "Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/48629
          Subject: LU-16180 ptlrpc: add cond_resched after ptlrpc_free_request
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 50feeddfc0c54720b87735c8c2eba6a98d00b7a4

          gerrit Gerrit Updater added a comment - "Jian Yu <yujian@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/48629 Subject: LU-16180 ptlrpc: add cond_resched after ptlrpc_free_request Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 50feeddfc0c54720b87735c8c2eba6a98d00b7a4

          People

            yujian Jian Yu
            yujian Jian Yu
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: