Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-270

LDisk-fs warning (device md30): ldisk_multi_mount_protect: fsck is running on filesystem

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 1.8.6
    • Lustre 1.8.6
    • None
    • RHEL 5.5 and Lustre 1.8.0.1 on J4400's
    • 3
    • 10266

    Description

      OST 10 /dev/md30 resident on OSS3
      From /var/log/messages
      LDisk-fs warning (device md30): ldisk_multi_mount_protect: fsck is running on filesystem
      LDisk-fs warning (device md30): ldisk_multi_mount_protect: MMP failure info: <time in unix seconds>, last update node: OSS3, last update device /dev/md30

      This is a scenario that keeps sending the customer in circles. They know for certain that an fsck is not running. Since they know that they can try to turn the mmp bit off vi the following commands:

      To manually disable MMP, run:
      tune2fs -O ^mmp <device>
      To manually enable MMP, run:
      tune2fs -O mmp <device>

      These commands fail saying that valid superblock does not exist, but they can see their valid superblock (with mmp set) by running the following command:

      Tune2fs -l /dev/md30

      It is their understanding that a fix for this issue was released with a later version of Lustre, but aside from that, is there a way to do this?

      Customer contact is tyler.s.wiegers@lmco.com

      Attachments

        Activity

          [LU-270] LDisk-fs warning (device md30): ldisk_multi_mount_protect: fsck is running on filesystem

          We're getting those logs for you now, we have to re-type them since they are on a segregated system. We are strapped for time so as soon as you can respond that would be great, if we don't have this back up tomorrow morning we get to rebuild lustre to get the system up.

          If you are available for a phone call that would be great as well, we are available all night if necessary.

          Thanks!

          tyler.s.wiegers@lmco.com Tyler Wiegers (Inactive) added a comment - We're getting those logs for you now, we have to re-type them since they are on a segregated system. We are strapped for time so as soon as you can respond that would be great, if we don't have this back up tomorrow morning we get to rebuild lustre to get the system up. If you are available for a phone call that would be great as well, we are available all night if necessary. Thanks!

          Did ost11 and ost15 have any filesystem corruption when you ran e2fsck on them?

          When you report that the %used is different, is that from "lfs df" or "lfs df -i", or from "df" on the OSS node for the local OST mounpoints?

          You can check the recovery state of all OSTs on an OSS via "lctl get_param obdfilter.*.recovery_status". They should all report "status: COMPLETE" (or "INACTIVE" if recovery was never done since the OST was mounted).

          As for the OSTs being marked inactive, you can check the status of the connections on the MDS and clients via "lctl get_param osc.*.state". All of the connections should report "current_state: FULL" meaning that the OSCs are connected to the OSTs. Even so, if the OSTs are not started for some reason, it shouldn't prevent the clients from mounting.

          Can you please attach an excerpt from the syslog for a client trying to mount, and also from OST11 and OST15.

          adilger Andreas Dilger added a comment - Did ost11 and ost15 have any filesystem corruption when you ran e2fsck on them? When you report that the %used is different, is that from "lfs df" or "lfs df -i", or from "df" on the OSS node for the local OST mounpoints? You can check the recovery state of all OSTs on an OSS via "lctl get_param obdfilter.*.recovery_status". They should all report "status: COMPLETE" (or "INACTIVE" if recovery was never done since the OST was mounted). As for the OSTs being marked inactive, you can check the status of the connections on the MDS and clients via "lctl get_param osc.*.state". All of the connections should report "current_state: FULL" meaning that the OSCs are connected to the OSTs. Even so, if the OSTs are not started for some reason, it shouldn't prevent the clients from mounting. Can you please attach an excerpt from the syslog for a client trying to mount, and also from OST11 and OST15.

          Some additional data points.

          After unmounting and resetting, ost11 and 15 complete recovery ok, but we still aren't able to mount lustre on a client.

          OST 11 and 15 are showing very different % used values than all of our other OSTs (they should all be even because of the stripes we use).

          In messages on our MDT server (mds2) we get messages stating that ost11 is "INACTIVE" by administrator request.

          We also see eviction messages when trying to mount a client for ost11 and 15:
          This client was evicted by lustre-OST000b; in progress operations using this service will fail

          tyler.s.wiegers@lmco.com Tyler Wiegers (Inactive) added a comment - Some additional data points. After unmounting and resetting, ost11 and 15 complete recovery ok, but we still aren't able to mount lustre on a client. OST 11 and 15 are showing very different % used values than all of our other OSTs (they should all be even because of the stripes we use). In messages on our MDT server (mds2) we get messages stating that ost11 is "INACTIVE" by administrator request. We also see eviction messages when trying to mount a client for ost11 and 15: This client was evicted by lustre-OST000b; in progress operations using this service will fail

          Thanks Andreas

          Where we are at right now is that all the OST's can be mounted, however lustre cannot be successfully mounted.

          After having issues initially, we shut down all of our lustre clients, and cleanly rebooted all of our OSSs and MDSs. After bringing all the OSTs up, we had 2 OSTs (11 and 15) be in a "recovering" state that never finished (about 15 minutes after bringing up the client). We used lctl to abort recovery, and attempted mounting, which apeared to be successful. Running a df on /lustre after that segmentation faults.

          Additionally, when running lfs df throws the following error when it gets to ost11:
          error: llapi_obd_statfs failed: Bad address (-14)

          Doing an lctl dl on a client have all the OSTs as "UP", but the last number on each line is different for OST11 and OST15 (it's 5 for all OSTs, 4 for OST11/15)

          The mds's were showing that all the OSTs were "UP" as well, but the last numbers show all OSTs as 5

          tyler.s.wiegers@lmco.com Tyler Wiegers (Inactive) added a comment - Thanks Andreas Where we are at right now is that all the OST's can be mounted, however lustre cannot be successfully mounted. After having issues initially, we shut down all of our lustre clients, and cleanly rebooted all of our OSSs and MDSs. After bringing all the OSTs up, we had 2 OSTs (11 and 15) be in a "recovering" state that never finished (about 15 minutes after bringing up the client). We used lctl to abort recovery, and attempted mounting, which apeared to be successful. Running a df on /lustre after that segmentation faults. Additionally, when running lfs df throws the following error when it gets to ost11: error: llapi_obd_statfs failed: Bad address (-14) Doing an lctl dl on a client have all the OSTs as "UP", but the last number on each line is different for OST11 and OST15 (it's 5 for all OSTs, 4 for OST11/15) The mds's were showing that all the OSTs were "UP" as well, but the last numbers show all OSTs as 5

          You are correct - my sincere apologies. I was counting 2-byte fields starting in the second row instead of 4-byte fields starting in the first row. I've corrected the instructions in this bug in case they are re-used for similar problems in the future. We've discussed in the past to have a tool to repair this file automatically in case of corruption, and that is underscored by this issue.

          It looks like you (correctly) modified the 5th column, so all is well and no further action is needed.

          It looks like you couldn't have modified the 7th column, or the OST would have failed to mount. I did an audit of the code to see what is using these fields (the correct ldd_svindex field and the incorrect ldd_mount_type field). I found that the ldd_svindex field is only used in case the configuration database on the MGS is rewritten (due to --writeconf) and the OST is reconnecting to the MGS to recreate the configuration record. The ldd_mount_type field is used to determine the backing filesystem type (usually "ldiskfs" for type = 0x0001, but would have been "reiserfs" with type = 0x0003).

          If you want to be a bit safer in the future, you could use the "debugfs" command posted earlier to dump this file from all of the OSTs (it can safely be done while the OST is mounted) and save them to a safe location.

          Again, apologies for the mixup.

          adilger Andreas Dilger added a comment - You are correct - my sincere apologies. I was counting 2-byte fields starting in the second row instead of 4-byte fields starting in the first row. I've corrected the instructions in this bug in case they are re-used for similar problems in the future. We've discussed in the past to have a tool to repair this file automatically in case of corruption, and that is underscored by this issue. It looks like you (correctly) modified the 5th column, so all is well and no further action is needed. It looks like you couldn't have modified the 7th column, or the OST would have failed to mount. I did an audit of the code to see what is using these fields (the correct ldd_svindex field and the incorrect ldd_mount_type field). I found that the ldd_svindex field is only used in case the configuration database on the MGS is rewritten (due to --writeconf) and the OST is reconnecting to the MGS to recreate the configuration record. The ldd_mount_type field is used to determine the backing filesystem type (usually "ldiskfs" for type = 0x0001, but would have been "reiserfs" with type = 0x0003). If you want to be a bit safer in the future, you could use the "debugfs" command posted earlier to dump this file from all of the OSTs (it can safely be done while the OST is mounted) and save them to a safe location. Again, apologies for the mixup.

          Andreas, your procedure worked flawlessly and our OST is back up and running. We verified that the mountdata file was indeed zero length.

          One clarification that I would like to make though, we copied from ost7 and the following line to edit was different that what you had provided (for the entry to edit):

          0000010: 0200 0000 0200 0000 0700 0000 0100 0000

          This line you had indicated to modify the 7th entry, when we copied from ost07 it looked like the 5th entry should be modified instead.

          tyler.s.wiegers@lmco.com Tyler Wiegers (Inactive) added a comment - Andreas, your procedure worked flawlessly and our OST is back up and running. We verified that the mountdata file was indeed zero length. One clarification that I would like to make though, we copied from ost7 and the following line to edit was different that what you had provided (for the entry to edit): 0000010: 0200 0000 0200 0000 0700 0000 0100 0000 This line you had indicated to modify the 7th entry, when we copied from ost07 it looked like the 5th entry should be modified instead.
          tyler.s.wiegers@lmco.com Tyler Wiegers (Inactive) added a comment - - edited

          Thanks Peter, I was actually in the process of updating the bugs with our most up to date status and actions taken (the site was down earlier this morning when I tried).

          Again, we appreciate your support with all this!

          tyler.s.wiegers@lmco.com Tyler Wiegers (Inactive) added a comment - - edited Thanks Peter, I was actually in the process of updating the bugs with our most up to date status and actions taken (the site was down earlier this morning when I tried). Again, we appreciate your support with all this!
          pjones Peter Jones added a comment -

          Update from site - e2fsck completed on all OSTs and now running a full e2fsck before bringing filesystem back online

          pjones Peter Jones added a comment - Update from site - e2fsck completed on all OSTs and now running a full e2fsck before bringing filesystem back online

          Sam,

          When we did the firmware upgrades we had taken down lustre and rebooted every box to make sure it was all in a clean/unmounted state. We had 2 OST's not mounting at that point, with this most current problem popping up after the firmware upgrades. I'm not entirely convinced that the firmware upgrades actually caused this particular problem, we've been doing a lot to try to recover these OST's.

          Andreas,

          I will get our guys looking at the mountdata file right now. Hopefully we'll have an indication of whether this action helps in an hour or so.

          Thank you all so much for your support!

          tyler.s.wiegers@lmco.com Tyler Wiegers (Inactive) added a comment - Sam, When we did the firmware upgrades we had taken down lustre and rebooted every box to make sure it was all in a clean/unmounted state. We had 2 OST's not mounting at that point, with this most current problem popping up after the firmware upgrades. I'm not entirely convinced that the firmware upgrades actually caused this particular problem, we've been doing a lot to try to recover these OST's. Andreas, I will get our guys looking at the mountdata file right now. Hopefully we'll have an indication of whether this action helps in an hour or so. Thank you all so much for your support!

          > Regarding the CAMs and drive upgrades, we have seen the corrupted OSTs before on the Riverwalks
          > (J4400's) when disk firmware was upgraded without both Lustre and the md software raid shutdown
          > cleanly first. Is there any chance that this particular OST10 was not cleanly shutdown? We saw
          > many cases of software RAID corruption on the J4400's a couple of years ago,

          Beyond the HW/firmware issues, there was also a corruption problem due to the the mptsas driver
          which could redirect I/Os to the wrong drive

          The following comment from Sven explains how this bug was discovered:
          https://bugzilla.lustre.org/show_bug.cgi?id=21819#c27

          And the problem was fixed in the following bugzilla ticket:
          https://bugzilla.lustre.org/show_bug.cgi?id=22632

          However, it requires to install an extra package including the mptsas driver.

          Are you sure to use a mptsas driver which does not suffer from the same issue?

          > which was about the time early versions of 1.8 started to be used. There were several software
          > RAID corruption bugs that have since been fixed. Also, we have fixed many problems since the
          > early 1.8 releases in Lustre, so would encourage an upgrade to 1.8.5 at your earliest convenience.

          We indeed integrated several software raid fixes in 1.8 (e.g. bugzilla 19990, 22509 & 20533).
          Although i don't think any of them fixed real software RAID corruptions, it would still make
          sense to upgrade to 1.8.5 to benefit from those bug fixes which address real deadlocks and oops.

          johann Johann Lombardi (Inactive) added a comment - - edited > Regarding the CAMs and drive upgrades, we have seen the corrupted OSTs before on the Riverwalks > (J4400's) when disk firmware was upgraded without both Lustre and the md software raid shutdown > cleanly first. Is there any chance that this particular OST10 was not cleanly shutdown? We saw > many cases of software RAID corruption on the J4400's a couple of years ago, Beyond the HW/firmware issues, there was also a corruption problem due to the the mptsas driver which could redirect I/Os to the wrong drive The following comment from Sven explains how this bug was discovered: https://bugzilla.lustre.org/show_bug.cgi?id=21819#c27 And the problem was fixed in the following bugzilla ticket: https://bugzilla.lustre.org/show_bug.cgi?id=22632 However, it requires to install an extra package including the mptsas driver. Are you sure to use a mptsas driver which does not suffer from the same issue? > which was about the time early versions of 1.8 started to be used. There were several software > RAID corruption bugs that have since been fixed. Also, we have fixed many problems since the > early 1.8 releases in Lustre, so would encourage an upgrade to 1.8.5 at your earliest convenience. We indeed integrated several software raid fixes in 1.8 (e.g. bugzilla 19990, 22509 & 20533). Although i don't think any of them fixed real software RAID corruptions, it would still make sense to upgrade to 1.8.5 to benefit from those bug fixes which address real deadlocks and oops.
          pjones Peter Jones added a comment -

          Thanks Sam. It is interesting to hear a PS perspective. I know that you were involved in a number of similar deployments. It will be interesting to hear the assessment from engineering about whether a Lustre issue is indeed involved here. Andreas, what do you think?

          pjones Peter Jones added a comment - Thanks Sam. It is interesting to hear a PS perspective. I know that you were involved in a number of similar deployments. It will be interesting to hear the assessment from engineering about whether a Lustre issue is indeed involved here. Andreas, what do you think?

          People

            adilger Andreas Dilger
            dferber Dan Ferber (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: