Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10237

"ls" hangs on a particular directory

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.11.0, Lustre 2.10.4
    • Lustre 2.5.3, Lustre 2.8.0
    • None
    • OLCF Atlas production system: clients running 2.8.0+ (with patches), server running 2.5.5+ (with patches)
    • 3
    • 9223372036854775807

    Description

      On atlas2 file system, we have a particular directory, any operations such as "ls" or "stat" will completely hang the process. This incurs no OS error or Lustre error from the client side. On server side, we did observe OI scrub message a few times, which may suggest there is some MDS data inconsistency, and it is "trying" to do the fix but no avail. We can't correlate the two yet.

      Ops teams have collected traces on the client side by:

      mount -t lustre 10.36.226.77@o2ib:/atlas2 /lustre/atlas2 -o rw,flock,nosuid,nodev
      lctl set_param osc/*/checksums 0
      echo β€œall” > /proc/sys/lnet/debug
      echo β€œ1024” > /proc/sys/lnet/debug_mb

      Step2: cd /lustre/atlas2/path/to/offending_directory/
      Step3: ls

      Step1: lctl dk > /dev/null
      Step4: Wait 30 seconds
      Step5: lctl dk > atlas2-mds3_ls_for_fprof.out

      the log is attached.

      Attachments

        Issue Links

          Activity

            [LU-10237] "ls" hangs on a particular directory

            John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30903/
            Subject: LU-10237 mdc: interruptable during RPC retry for EINPROGRESS
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set:
            Commit: 822d5ce80dd357b53c0414cc299fadef0db076d1

            gerrit Gerrit Updater added a comment - John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30903/ Subject: LU-10237 mdc: interruptable during RPC retry for EINPROGRESS Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: 822d5ce80dd357b53c0414cc299fadef0db076d1

            Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30903
            Subject: LU-10237 mdc: interruptable during RPC retry for EINPROGRESS
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: 588cb51b4a26cd07c036ee68451bc151e7eb73bd

            gerrit Gerrit Updater added a comment - Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30903 Subject: LU-10237 mdc: interruptable during RPC retry for EINPROGRESS Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: 588cb51b4a26cd07c036ee68451bc151e7eb73bd
            pjones Peter Jones added a comment -

            Landed for 2.11

            pjones Peter Jones added a comment - Landed for 2.11

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30166/
            Subject: LU-10237 mdc: interruptable during RPC retry for EINPROGRESS
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 9c596a4996ee242aa1b954f5f2f19101d3941bf0

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30166/ Subject: LU-10237 mdc: interruptable during RPC retry for EINPROGRESS Project: fs/lustre-release Branch: master Current Patch Set: Commit: 9c596a4996ee242aa1b954f5f2f19101d3941bf0

            Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/30166
            Subject: LU-10237 mdc: interruptable during RPC retry for EINPROGRESS
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 0a84ebd38c71747b44bad7a1c00ee39f4b7ff759

            gerrit Gerrit Updater added a comment - Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/30166 Subject: LU-10237 mdc: interruptable during RPC retry for EINPROGRESS Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 0a84ebd38c71747b44bad7a1c00ee39f4b7ff759

            While LU-8696 fixed the actual problem of the MDT inconsistency, it would also be useful fix the client-side handling of this error, so that the userspace process could be interrupted if there is a problem.

            adilger Andreas Dilger added a comment - While LU-8696 fixed the actual problem of the MDT inconsistency, it would also be useful fix the client-side handling of this error, so that the userspace process could be interrupted if there is a problem.

            People

              yong.fan nasf (Inactive)
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: