Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.14.0, Lustre 2.12.6
    • Lustre 2.12.5
    • None
    • 3
    • 9223372036854775807

    Description

      Recently, with Lustre 2.12.5, we experienced a blocked Lustre client on the Robinhood server with the following task info: 

      Aug 24 23:05:15 fir-rbh01 kernel: INFO: task robinhood:10783 blocked for more than 120 seconds.
      Aug 24 23:05:15 fir-rbh01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      Aug 24 23:05:15 fir-rbh01 kernel: robinhood       D ffffa13de5779040     0 10783      1 0x00000080
      Aug 24 23:05:15 fir-rbh01 kernel: Call Trace:
      Aug 24 23:05:15 fir-rbh01 kernel:  [<ffffffffa84fa3d2>] ? security_inode_permission+0x22/0x30
      Aug 24 23:05:15 fir-rbh01 kernel:  [<ffffffffa896be39>] schedule_preempt_disabled+0x29/0x70
      Aug 24 23:05:15 fir-rbh01 kernel:  [<ffffffffa8969db7>] __mutex_lock_slowpath+0xc7/0x1d0
      Aug 24 23:05:15 fir-rbh01 kernel:  [<ffffffffa896919f>] mutex_lock+0x1f/0x2f
      Aug 24 23:05:15 fir-rbh01 kernel:  [<ffffffffa8961dd2>] lookup_slow+0x33/0xa7
      Aug 24 23:05:15 fir-rbh01 kernel:  [<ffffffffa8453268>] path_lookupat+0x838/0x8b0
      Aug 24 23:05:15 fir-rbh01 kernel:  [<ffffffffa84342b2>] ? __mem_cgroup_commit_charge+0xe2/0x2f0
      Aug 24 23:05:15 fir-rbh01 kernel:  [<ffffffffa841d665>] ? kmem_cache_alloc+0x35/0x1f0
      Aug 24 23:05:15 fir-rbh01 kernel:  [<ffffffffa845415f>] ? getname_flags+0x4f/0x1a0
      Aug 24 23:05:15 fir-rbh01 kernel:  [<ffffffffa845330b>] filename_lookup+0x2b/0xc0
      Aug 24 23:05:15 fir-rbh01 kernel:  [<ffffffffa84552f7>] user_path_at_empty+0x67/0xc0
      Aug 24 23:05:15 fir-rbh01 kernel:  [<ffffffffa83ed3ad>] ? handle_mm_fault+0x39d/0x9b0
      Aug 24 23:05:15 fir-rbh01 kernel:  [<ffffffffa8455361>] user_path_at+0x11/0x20
      Aug 24 23:05:15 fir-rbh01 kernel:  [<ffffffffa8448223>] vfs_fstatat+0x63/0xc0
      Aug 24 23:05:15 fir-rbh01 kernel:  [<ffffffffa8448641>] SYSC_newlstat+0x31/0x60
      Aug 24 23:05:15 fir-rbh01 kernel:  [<ffffffffa8972925>] ? do_page_fault+0x35/0x90
      Aug 24 23:05:15 fir-rbh01 kernel:  [<ffffffffa8448aae>] SyS_newlstat+0xe/0x10
      Aug 24 23:05:15 fir-rbh01 kernel:  [<ffffffffa8977ddb>] system_call_fastpath+0x22/0x27
      Aug 24 23:05:15 fir-rbh01 kernel: INFO: task robinhood:10784 blocked for more than 120 seconds.
      Aug 24 23:05:15 fir-rbh01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      Aug 24 23:05:15 fir-rbh01 kernel: robinhood       D ffffa13de5778000     0 10784      1 0x00000080
      Aug 24 23:05:15 fir-rbh01 kernel: Call Trace:
      Aug 24 23:05:15 fir-rbh01 kernel:  [<ffffffffa84fa3d2>] ? security_inode_permission+0x22/0x30
      Aug 24 23:05:15 fir-rbh01 kernel:  [<ffffffffa896be39>] schedule_preempt_disabled+0x29/0x70
      Aug 24 23:05:15 fir-rbh01 kernel:  [<ffffffffa8969db7>] __mutex_lock_slowpath+0xc7/0x1d0
      Aug 24 23:05:15 fir-rbh01 kernel:  [<ffffffffa896919f>] mutex_lock+0x1f/0x2f
      Aug 24 23:05:15 fir-rbh01 kernel:  [<ffffffffa8961dd2>] lookup_slow+0x33/0xa7
      Aug 24 23:05:15 fir-rbh01 kernel:  [<ffffffffa8453268>] path_lookupat+0x838/0x8b0
      Aug 24 23:05:15 fir-rbh01 kernel:  [<ffffffffa836a57d>] ? tracing_record_cmdline+0x1d/0x120
      Aug 24 23:05:15 fir-rbh01 kernel:  [<ffffffffa841d665>] ? kmem_cache_alloc+0x35/0x1f0
      Aug 24 23:05:15 fir-rbh01 kernel:  [<ffffffffa845415f>] ? getname_flags+0x4f/0x1a0
      Aug 24 23:05:15 fir-rbh01 kernel:  [<ffffffffa845330b>] filename_lookup+0x2b/0xc0
      Aug 24 23:05:15 fir-rbh01 kernel:  [<ffffffffa84552f7>] user_path_at_empty+0x67/0xc0
      Aug 24 23:05:15 fir-rbh01 kernel:  [<ffffffffa8455361>] user_path_at+0x11/0x20
      Aug 24 23:05:15 fir-rbh01 kernel:  [<ffffffffa8448223>] vfs_fstatat+0x63/0xc0
      Aug 24 23:05:15 fir-rbh01 kernel:  [<ffffffffa8448641>] SYSC_newlstat+0x31/0x60
      Aug 24 23:05:15 fir-rbh01 kernel:  [<ffffffffa833a176>] ? __audit_syscall_exit+0x1e6/0x280
      Aug 24 23:05:15 fir-rbh01 kernel:  [<ffffffffa8448aae>] SyS_newlstat+0xe/0x10
      Aug 24 23:05:15 fir-rbh01 kernel:  [<ffffffffa8977ddb>] system_call_fastpath+0x22/0x27
      

      The rest of the clients were ok but I guess a inode was blocked. The only solution I found was to restart the MDS (here, fir-MDT0003).

      I took a crash dump of the MDS when this happened and is it available on the FTP server as fir-md1-s4_vmcore_2020-08-25-08-27-56.

      Also, I'm providing the following files from the crash dump:

      Lately, our MDS has been highly loaded with a lot of mdt_rdpg threads, but until then didn't block on any inode. This is different and worth investigating in my opinion.

      I'm wondering if this could be due to DNE as I'm seeing a lot of those:

      PID: 21603  TASK: ffff9cf0f86ec100  CPU: 41  COMMAND: "mdt_rdpg01_017"
       #0 [ffff9d00ff888e48] crash_nmi_callback at ffffffff87456027
       #1 [ffff9d00ff888e58] nmi_handle at ffffffff87b6f91c
       #2 [ffff9d00ff888eb0] do_nmi at ffffffff87b6fb3d
       #3 [ffff9d00ff888ef0] end_repeat_nmi at ffffffff87b6ed89
          [exception RIP: fld_cache_lookup+157]
          RIP: ffffffffc089d9dd  RSP: ffff9ceff320ba50  RFLAGS: 00000202
          RAX: 0000000000000003  RBX: ffff9d10f868cf00  RCX: ffff9d10f27ad6c0
          RDX: ffff9d103898cbb8  RSI: ffff9d10f27ad6c0  RDI: ffff9d10f868cf28
          RBP: ffff9ceff320ba68   R8: 00000002800432da   R9: 0000000000000007
          R10: ffff9ce79d0a6700  R11: ffff9ce79d0a6700  R12: 00000002800432e6
          R13: ffff9d103898cbb8  R14: 00000002800432e6  R15: ffff9d00456b7000
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
      --- <NMI exception stack> ---
       #4 [ffff9ceff320ba50] fld_cache_lookup at ffffffffc089d9dd [fld]
       #5 [ffff9ceff320ba70] fld_local_lookup at ffffffffc089ef02 [fld]
       #6 [ffff9ceff320bab8] osd_fld_lookup at ffffffffc13ae0e8 [osd_ldiskfs]
       #7 [ffff9ceff320bac8] osd_remote_fid at ffffffffc13ae233 [osd_ldiskfs]
       #8 [ffff9ceff320bb08] osd_it_ea_rec at ffffffffc13b73eb [osd_ldiskfs]
       #9 [ffff9ceff320bb70] lod_it_rec at ffffffffc15f6fb7 [lod]
      #10 [ffff9ceff320bb80] mdd_dir_page_build at ffffffffc167edd7 [mdd]
      #11 [ffff9ceff320bbe8] dt_index_walk at ffffffffc0d082a0 [obdclass]
      #12 [ffff9ceff320bc58] mdd_readpage at ffffffffc16805bf [mdd]
      #13 [ffff9ceff320bc90] mdt_readpage at ffffffffc14dc82a [mdt]
      #14 [ffff9ceff320bcd0] tgt_request_handle at ffffffffc0ff866a [ptlrpc]
      #15 [ffff9ceff320bd58] ptlrpc_server_handle_request at ffffffffc0f9b44b [ptlrpc]
      #16 [ffff9ceff320bdf8] ptlrpc_main at ffffffffc0f9edb4 [ptlrpc]
      #17 [ffff9ceff320bec8] kthread at ffffffff874c2e81
      #18 [ffff9ceff320bf50] ret_from_fork_nospec_begin at ffffffff87b77c24
      

      Thanks!
      Stephane
       

      Attachments

        Activity

          [LU-13922] Client blocked in lstat()

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40135/
          Subject: LU-13922 osd-ldiskfs: no need to add OI cache in readdir
          Project: fs/lustre-release
          Branch: b2_12
          Current Patch Set:
          Commit: bdef16d60713743e832fbb9d14ecb5cd116398c7

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40135/ Subject: LU-13922 osd-ldiskfs: no need to add OI cache in readdir Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: bdef16d60713743e832fbb9d14ecb5cd116398c7

          Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40135
          Subject: LU-13922 osd-ldiskfs: no need to add OI cache in readdir
          Project: fs/lustre-release
          Branch: b2_12
          Current Patch Set: 1
          Commit: 71927c664d6cfe8b1d931394739a23e7ceec79bd

          gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40135 Subject: LU-13922 osd-ldiskfs: no need to add OI cache in readdir Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 71927c664d6cfe8b1d931394739a23e7ceec79bd
          pjones Peter Jones added a comment -

          Landed for 2.14

          pjones Peter Jones added a comment - Landed for 2.14

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39782/
          Subject: LU-13922 osd-ldiskfs: no need to add OI cache in readdir
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: bc5934632df10aaa02b32b8254a473c14c6f8104

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39782/ Subject: LU-13922 osd-ldiskfs: no need to add OI cache in readdir Project: fs/lustre-release Branch: master Current Patch Set: Commit: bc5934632df10aaa02b32b8254a473c14c6f8104

          We have been using this patch on top of 2.12.5 for 3 full days now on a MDS in production and the server load is definitely reduced. MDS is also more responsive from clients even under heavy readdir() load from specific jobs. No side-effect noticed so far. Thanks Lai!

          sthiell Stephane Thiell added a comment - We have been using this patch on top of 2.12.5 for 3 full days now on a MDS in production and the server load is definitely reduced. MDS is also more responsive from clients even under heavy readdir() load from specific jobs. No side-effect noticed so far. Thanks Lai!
          laisiyao Lai Siyao added a comment -

          Great, it will be landed when review passed.

          laisiyao Lai Siyao added a comment - Great, it will be landed when review passed.

          Hi Lai,

          Thanks so much! We're running with your patch on this MDS. It seems to help. MDS load went down from 200+ to 10-20 with the same workload. BTW, we have identified a user who is likely generating this load: she has 2500 running jobs and uses Python's glob on large directories (110k+ entries). Perfect for testing readdir scalability!

          sthiell Stephane Thiell added a comment - Hi Lai, Thanks so much! We're running with your patch on this MDS. It seems to help. MDS load went down from 200+ to 10-20 with the same workload. BTW, we have identified a user who is likely generating this load: she has 2500 running jobs and uses Python's glob on large directories (110k+ entries). Perfect for testing readdir scalability!

          Lai, that sounds great! Thanks! After migrating a lot of files to this MDT0003, we've seen the overall MDS load increasing with mdt_rdpg03 threads taking a lot of cpu time. We will be happy to test your patch when it's ready.

          sthiell Stephane Thiell added a comment - Lai, that sounds great! Thanks! After migrating a lot of files to this MDT0003, we've seen the overall MDS load increasing with mdt_rdpg03 threads taking a lot of cpu time. We will be happy to test your patch when it's ready.

          Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39782
          Subject: LU-13922 osd-ldiskfs: no need to add OI cache in readdir
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 03120b1a1266fcbd77e00709623d11803a91f655

          gerrit Gerrit Updater added a comment - Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39782 Subject: LU-13922 osd-ldiskfs: no need to add OI cache in readdir Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 03120b1a1266fcbd77e00709623d11803a91f655
          laisiyao Lai Siyao added a comment -

          fld_cache_lookup() doesn't look to be the cause of this deadlock, because all places it's called are taking read lock. But it can be optimized: it doesn't look necessary to check FID is remote in readdir, I'll update a patch for this.

          laisiyao Lai Siyao added a comment - fld_cache_lookup() doesn't look to be the cause of this deadlock, because all places it's called are taking read lock. But it can be optimized: it doesn't look necessary to check FID is remote in readdir, I'll update a patch for this.

          People

            laisiyao Lai Siyao
            sthiell Stephane Thiell
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: