Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8152

show OST/MDT read-only status in "lctl dl" and/or "lfs df"

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.10.0
    • None
    • None
    • 9223372036854775807

    Description

      To aid in debugging server problems when the filesystem has gone read-only, it would be useful to print "RO" for the device status in "lctl dl" instead of "UP", if that is possible without significant layering violations.

      It would also be useful to print out the OS_STATE_DEGRADED and OS_STATE_READONLY flags with "lfs df", possibly one letter per flag [DR] after the device names, like:

      $ lfs df
      UUID                   1K-blocks        Used   Available Use% Mounted on
      myth-MDT0000_UUID        9174328      584036     8066004   7% /myth[MDT:0]
      myth-OST0000_UUID     3880285544  3451122460   235124108  94% /myth[OST:0] DR
      myth-OST0001_UUID     3886052008  3362805556   328933836  91% /myth[OST:1]
      myth-OST0002_UUID     3880285544  3547714164   216149060  94% /myth[OST:2] D
      myth-OST0003_UUID     5840741952  3831815716  1716851948  69% /myth[OST:3]
      OST0004             : inactive device
      

      We might consider to require a -v flag if there is a concern that this would break the output format, but I don't think that it will.

      Attachments

        Issue Links

          Activity

            [LU-8152] show OST/MDT read-only status in "lctl dl" and/or "lfs df"
            pjones Peter Jones added a comment -

            Landed for 2.10

            pjones Peter Jones added a comment - Landed for 2.10

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/23330/
            Subject: LU-8152 utils: improve “lfs df” to show device status
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 95d7592a9f33f62accade96d81e0cc3ca0fb94e2

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/23330/ Subject: LU-8152 utils: improve “lfs df” to show device status Project: fs/lustre-release Branch: master Current Patch Set: Commit: 95d7592a9f33f62accade96d81e0cc3ca0fb94e2

            Jian Yu (jian.yu@intel.com) uploaded a new patch: http://review.whamcloud.com/23330
            Subject: LU-8152 utils: improve “lfs df” to show device status
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: f892e9c4b0216b7d48fae0ed06ebab5e84abab1b

            gerrit Gerrit Updater added a comment - Jian Yu (jian.yu@intel.com) uploaded a new patch: http://review.whamcloud.com/23330 Subject: LU-8152 utils: improve “lfs df” to show device status Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: f892e9c4b0216b7d48fae0ed06ebab5e84abab1b
            yujian Jian Yu added a comment -

            Sure, Andreas. I'll create the patch.

            yujian Jian Yu added a comment - Sure, Andreas. I'll create the patch.

            Jian, could you please make a patch for "lfs df" to print the OS_STATE_* flags? This should be fairly fast as it only invokes a small change to showdf(), and testing is straight forward since the "obdfilter.*.degraded" tunable can be set directly from the test script.

            adilger Andreas Dilger added a comment - Jian, could you please make a patch for "lfs df" to print the OS_STATE_* flags? This should be fairly fast as it only invokes a small change to showdf(), and testing is straight forward since the "obdfilter.*.degraded" tunable can be set directly from the test script.
            pjones Peter Jones added a comment -

            Jian

            Could you please look into this issue?

            Thanks

            Peter

            pjones Peter Jones added a comment - Jian Could you please look into this issue? Thanks Peter

            /proc/fs/lustre/health_status should report LDISKFS errors and LBUG...not sure ZFS errors.

            gabriele.paciucci Gabriele Paciucci (Inactive) added a comment - /proc/fs/lustre/health_status should report LDISKFS errors and LBUG...not sure ZFS errors.

            However the messages did not make it to the IML aggregated log. Seems like IML should look for LDISKFS entries as well as LUSTRE or there should be a LUSTRE error generated as well.

            ghagensen Gary Hagensen (Inactive) added a comment - However the messages did not make it to the IML aggregated log. Seems like IML should look for LDISKFS entries as well as LUSTRE or there should be a LUSTRE error generated as well.

            oops.. did find a entry in dmesg and /var/log/messages on the MDS from LDISK_FS about the read-only transition. ALso noted the time which was in the evening when I was re-creating the cluster. No IO.

            ghagensen Gary Hagensen (Inactive) added a comment - oops.. did find a entry in dmesg and /var/log/messages on the MDS from LDISK_FS about the read-only transition. ALso noted the time which was in the evening when I was re-creating the cluster. No IO.

            an observation on why this should get fixed soon. I am doing a training in Brazil and two of the students had 2 OSTs go read-only on them in their virtual machines. I am assuming because they started using the OSTs when they were still in recovery as there can't be bad hardware in VMs. IML and lctl dl says everything is happy but cat /proc/fs/lustre/lod/Lustre01-MDT0000-mdtlov/target_obd showed two OSTs as INACTIVE.

            They discovered this when setting the stripe count on a directory to 8 and the files under the directory only using 6 OSTs. No complaints from set_stripe, etc that that many where not available. I only knew where to look because of this JIRA.

            I was also a bit surprised that using a OST while in recovery (just doing file operations) would cause the OSTs to go INACTIVE. Nothing in the logs that I could see indicating the transition to inactive. This was the case when this JIRA was generated also.

            ghagensen Gary Hagensen (Inactive) added a comment - an observation on why this should get fixed soon. I am doing a training in Brazil and two of the students had 2 OSTs go read-only on them in their virtual machines. I am assuming because they started using the OSTs when they were still in recovery as there can't be bad hardware in VMs. IML and lctl dl says everything is happy but cat /proc/fs/lustre/lod/Lustre01-MDT0000-mdtlov/target_obd showed two OSTs as INACTIVE. They discovered this when setting the stripe count on a directory to 8 and the files under the directory only using 6 OSTs. No complaints from set_stripe, etc that that many where not available. I only knew where to look because of this JIRA. I was also a bit surprised that using a OST while in recovery (just doing file operations) would cause the OSTs to go INACTIVE. Nothing in the logs that I could see indicating the transition to inactive. This was the case when this JIRA was generated also.

            People

              yujian Jian Yu
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: