Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.13.0, Lustre 2.12.3
    • Lustre 2.13.0
    • Any lustre client
    • 3
    • 9223372036854775807

    Description

      In recent testing I have encounter cases were running lfs df never returns. Today I sat down to track down why it never returns. The reason is this loop never exits in showdf() in lfs.c.

      for (i = 0, state = stat->os_state; state != 0; i++)

      { ... }

      Attachments

        Activity

          [LU-12501] lfs df can never return

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35662/
          Subject: LU-12501 utils: fix 'lfs df' printing loop
          Project: fs/lustre-release
          Branch: b2_12
          Current Patch Set:
          Commit: 2734b902c296f3f6cbc1522a579f23bd86d1be45

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35662/ Subject: LU-12501 utils: fix 'lfs df' printing loop Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: 2734b902c296f3f6cbc1522a579f23bd86d1be45

          Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35662
          Subject: LU-12501 utils: fix 'lfs df' printing loop
          Project: fs/lustre-release
          Branch: b2_12
          Current Patch Set: 1
          Commit: aa801af165b903b4ae15f0ef975e8b7bf0413878

          gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35662 Subject: LU-12501 utils: fix 'lfs df' printing loop Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: aa801af165b903b4ae15f0ef975e8b7bf0413878

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35456/
          Subject: LU-12501 utils: fix 'lfs df' printing loop
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: e4d92a8a08acbdca6634decd4deb9fe5678ad7ba

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35456/ Subject: LU-12501 utils: fix 'lfs df' printing loop Project: fs/lustre-release Branch: master Current Patch Set: Commit: e4d92a8a08acbdca6634decd4deb9fe5678ad7ba

          Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35456
          Subject: LU-12501 utils: fix 'lfs df' printing loop
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: f9bd4be4e79684099fa0a3d09f7991be991ec180

          gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35456 Subject: LU-12501 utils: fix 'lfs df' printing loop Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: f9bd4be4e79684099fa0a3d09f7991be991ec180

          So Andreas suggest  a few ideas.

          I'd prefer it even more if the "NONROT" state was added to the obd_statfs_state_names[] array, and was printed with the "MNTDF_VERBOSE" flag is passed, and masked up in mntdf() otherwise. That avoids special-casing the code here, and pushes the "presentation decision" up toward where options are handled.

          As for what letter to use for "NONROT", one option would be to use a lower-case letter to indicate that it is not a "problem" with the target, like 'f' for "flash" or "fast", but I'm open to other options. I'd prefer to avoid overloading 'n' so early.

          We might even consider changing the OS_STATE_NONROT flag to count from 0x80000000 downward to make it possible to programatically separate error states from informational states, even though this would slow down showdf() by a few cycles for flash OSTs. The original NONROT patch was landed as v2_12_53-108-g68635c3, so it isn't in a 2.13 release yet, and the backport hasn't been landed to b2_12 yet, so I don't think there isn't a release that includes this flag yet.

          I'm going to do the simplest fix but if someone wants something more we can do another patch to enhance this feature.

          simmonsja James A Simmons added a comment - So Andreas suggest  a few ideas. I'd prefer it even more if the "NONROT" state was added to the obd_statfs_state_names[] array, and was printed with the "MNTDF_VERBOSE" flag is passed, and masked up in mntdf() otherwise. That avoids special-casing the code here, and pushes the "presentation decision" up toward where options are handled. As for what letter to use for "NONROT", one option would be to use a lower-case letter to indicate that it is not a "problem" with the target, like 'f' for "flash" or "fast", but I'm open to other options. I'd prefer to avoid overloading 'n' so early. We might even consider changing the OS_STATE_NONROT flag to count from 0x80000000 downward to make it possible to programatically separate error states from informational states, even though this would slow down showdf() by a few cycles for flash OSTs. The original NONROT patch was landed as v2_12_53-108-g68635c3, so it isn't in a 2.13 release yet, and the backport hasn't been landed to b2_12 yet, so I don't think there isn't a release that includes this flag yet. I'm going to do the simplest fix but if someone wants something more we can do another patch to enhance this feature.
          pjones Peter Jones added a comment -

          Aha! Yes, I can see why that would slip through. Thanks James!

          pjones Peter Jones added a comment - Aha! Yes, I can see why that would slip through. Thanks James!

          Yes I tracked the issue Peter. Currently my test setup is all SSDs so I see this problem.

          simmonsja James A Simmons added a comment - Yes I tracked the issue Peter. Currently my test setup is all SSDs so I see this problem.

          James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/35403
          Subject: LU-12501 utils: stop showdf() endless loop
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: eba159fd6d2b45689072cd59dff3c77af83c00aa

          gerrit Gerrit Updater added a comment - James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/35403 Subject: LU-12501 utils: stop showdf() endless loop Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: eba159fd6d2b45689072cd59dff3c77af83c00aa
          pjones Peter Jones added a comment -

          Does that mean that you are able to run git bisect to identify the commit that introduced the problem?

          pjones Peter Jones added a comment - Does that mean that you are able to run git bisect to identify the commit that introduced the problem?
          simmonsja James A Simmons added a comment - - edited

          The last couple of weeks  I started to see it. For some reason I see this all the time on our test bed.

          simmonsja James A Simmons added a comment - - edited The last couple of weeks  I started to see it. For some reason I see this all the time on our test bed.

          People

            adilger Andreas Dilger
            simmonsja James A Simmons
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: