Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-270

LDisk-fs warning (device md30): ldisk_multi_mount_protect: fsck is running on filesystem

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 1.8.6
    • Lustre 1.8.6
    • None
    • RHEL 5.5 and Lustre 1.8.0.1 on J4400's
    • 3
    • 10266

    Description

      OST 10 /dev/md30 resident on OSS3
      From /var/log/messages
      LDisk-fs warning (device md30): ldisk_multi_mount_protect: fsck is running on filesystem
      LDisk-fs warning (device md30): ldisk_multi_mount_protect: MMP failure info: <time in unix seconds>, last update node: OSS3, last update device /dev/md30

      This is a scenario that keeps sending the customer in circles. They know for certain that an fsck is not running. Since they know that they can try to turn the mmp bit off vi the following commands:

      To manually disable MMP, run:
      tune2fs -O ^mmp <device>
      To manually enable MMP, run:
      tune2fs -O mmp <device>

      These commands fail saying that valid superblock does not exist, but they can see their valid superblock (with mmp set) by running the following command:

      Tune2fs -l /dev/md30

      It is their understanding that a fix for this issue was released with a later version of Lustre, but aside from that, is there a way to do this?

      Customer contact is tyler.s.wiegers@lmco.com

      Attachments

        Activity

          [LU-270] LDisk-fs warning (device md30): ldisk_multi_mount_protect: fsck is running on filesystem

          Thanks Andreas

          Where we are at right now is that all the OST's can be mounted, however lustre cannot be successfully mounted.

          After having issues initially, we shut down all of our lustre clients, and cleanly rebooted all of our OSSs and MDSs. After bringing all the OSTs up, we had 2 OSTs (11 and 15) be in a "recovering" state that never finished (about 15 minutes after bringing up the client). We used lctl to abort recovery, and attempted mounting, which apeared to be successful. Running a df on /lustre after that segmentation faults.

          Additionally, when running lfs df throws the following error when it gets to ost11:
          error: llapi_obd_statfs failed: Bad address (-14)

          Doing an lctl dl on a client have all the OSTs as "UP", but the last number on each line is different for OST11 and OST15 (it's 5 for all OSTs, 4 for OST11/15)

          The mds's were showing that all the OSTs were "UP" as well, but the last numbers show all OSTs as 5

          tyler.s.wiegers@lmco.com Tyler Wiegers (Inactive) added a comment - Thanks Andreas Where we are at right now is that all the OST's can be mounted, however lustre cannot be successfully mounted. After having issues initially, we shut down all of our lustre clients, and cleanly rebooted all of our OSSs and MDSs. After bringing all the OSTs up, we had 2 OSTs (11 and 15) be in a "recovering" state that never finished (about 15 minutes after bringing up the client). We used lctl to abort recovery, and attempted mounting, which apeared to be successful. Running a df on /lustre after that segmentation faults. Additionally, when running lfs df throws the following error when it gets to ost11: error: llapi_obd_statfs failed: Bad address (-14) Doing an lctl dl on a client have all the OSTs as "UP", but the last number on each line is different for OST11 and OST15 (it's 5 for all OSTs, 4 for OST11/15) The mds's were showing that all the OSTs were "UP" as well, but the last numbers show all OSTs as 5

          You are correct - my sincere apologies. I was counting 2-byte fields starting in the second row instead of 4-byte fields starting in the first row. I've corrected the instructions in this bug in case they are re-used for similar problems in the future. We've discussed in the past to have a tool to repair this file automatically in case of corruption, and that is underscored by this issue.

          It looks like you (correctly) modified the 5th column, so all is well and no further action is needed.

          It looks like you couldn't have modified the 7th column, or the OST would have failed to mount. I did an audit of the code to see what is using these fields (the correct ldd_svindex field and the incorrect ldd_mount_type field). I found that the ldd_svindex field is only used in case the configuration database on the MGS is rewritten (due to --writeconf) and the OST is reconnecting to the MGS to recreate the configuration record. The ldd_mount_type field is used to determine the backing filesystem type (usually "ldiskfs" for type = 0x0001, but would have been "reiserfs" with type = 0x0003).

          If you want to be a bit safer in the future, you could use the "debugfs" command posted earlier to dump this file from all of the OSTs (it can safely be done while the OST is mounted) and save them to a safe location.

          Again, apologies for the mixup.

          adilger Andreas Dilger added a comment - You are correct - my sincere apologies. I was counting 2-byte fields starting in the second row instead of 4-byte fields starting in the first row. I've corrected the instructions in this bug in case they are re-used for similar problems in the future. We've discussed in the past to have a tool to repair this file automatically in case of corruption, and that is underscored by this issue. It looks like you (correctly) modified the 5th column, so all is well and no further action is needed. It looks like you couldn't have modified the 7th column, or the OST would have failed to mount. I did an audit of the code to see what is using these fields (the correct ldd_svindex field and the incorrect ldd_mount_type field). I found that the ldd_svindex field is only used in case the configuration database on the MGS is rewritten (due to --writeconf) and the OST is reconnecting to the MGS to recreate the configuration record. The ldd_mount_type field is used to determine the backing filesystem type (usually "ldiskfs" for type = 0x0001, but would have been "reiserfs" with type = 0x0003). If you want to be a bit safer in the future, you could use the "debugfs" command posted earlier to dump this file from all of the OSTs (it can safely be done while the OST is mounted) and save them to a safe location. Again, apologies for the mixup.

          Andreas, your procedure worked flawlessly and our OST is back up and running. We verified that the mountdata file was indeed zero length.

          One clarification that I would like to make though, we copied from ost7 and the following line to edit was different that what you had provided (for the entry to edit):

          0000010: 0200 0000 0200 0000 0700 0000 0100 0000

          This line you had indicated to modify the 7th entry, when we copied from ost07 it looked like the 5th entry should be modified instead.

          tyler.s.wiegers@lmco.com Tyler Wiegers (Inactive) added a comment - Andreas, your procedure worked flawlessly and our OST is back up and running. We verified that the mountdata file was indeed zero length. One clarification that I would like to make though, we copied from ost7 and the following line to edit was different that what you had provided (for the entry to edit): 0000010: 0200 0000 0200 0000 0700 0000 0100 0000 This line you had indicated to modify the 7th entry, when we copied from ost07 it looked like the 5th entry should be modified instead.
          tyler.s.wiegers@lmco.com Tyler Wiegers (Inactive) added a comment - - edited

          Thanks Peter, I was actually in the process of updating the bugs with our most up to date status and actions taken (the site was down earlier this morning when I tried).

          Again, we appreciate your support with all this!

          tyler.s.wiegers@lmco.com Tyler Wiegers (Inactive) added a comment - - edited Thanks Peter, I was actually in the process of updating the bugs with our most up to date status and actions taken (the site was down earlier this morning when I tried). Again, we appreciate your support with all this!
          pjones Peter Jones added a comment -

          Update from site - e2fsck completed on all OSTs and now running a full e2fsck before bringing filesystem back online

          pjones Peter Jones added a comment - Update from site - e2fsck completed on all OSTs and now running a full e2fsck before bringing filesystem back online

          Sam,

          When we did the firmware upgrades we had taken down lustre and rebooted every box to make sure it was all in a clean/unmounted state. We had 2 OST's not mounting at that point, with this most current problem popping up after the firmware upgrades. I'm not entirely convinced that the firmware upgrades actually caused this particular problem, we've been doing a lot to try to recover these OST's.

          Andreas,

          I will get our guys looking at the mountdata file right now. Hopefully we'll have an indication of whether this action helps in an hour or so.

          Thank you all so much for your support!

          tyler.s.wiegers@lmco.com Tyler Wiegers (Inactive) added a comment - Sam, When we did the firmware upgrades we had taken down lustre and rebooted every box to make sure it was all in a clean/unmounted state. We had 2 OST's not mounting at that point, with this most current problem popping up after the firmware upgrades. I'm not entirely convinced that the firmware upgrades actually caused this particular problem, we've been doing a lot to try to recover these OST's. Andreas, I will get our guys looking at the mountdata file right now. Hopefully we'll have an indication of whether this action helps in an hour or so. Thank you all so much for your support!

          > Regarding the CAMs and drive upgrades, we have seen the corrupted OSTs before on the Riverwalks
          > (J4400's) when disk firmware was upgraded without both Lustre and the md software raid shutdown
          > cleanly first. Is there any chance that this particular OST10 was not cleanly shutdown? We saw
          > many cases of software RAID corruption on the J4400's a couple of years ago,

          Beyond the HW/firmware issues, there was also a corruption problem due to the the mptsas driver
          which could redirect I/Os to the wrong drive

          The following comment from Sven explains how this bug was discovered:
          https://bugzilla.lustre.org/show_bug.cgi?id=21819#c27

          And the problem was fixed in the following bugzilla ticket:
          https://bugzilla.lustre.org/show_bug.cgi?id=22632

          However, it requires to install an extra package including the mptsas driver.

          Are you sure to use a mptsas driver which does not suffer from the same issue?

          > which was about the time early versions of 1.8 started to be used. There were several software
          > RAID corruption bugs that have since been fixed. Also, we have fixed many problems since the
          > early 1.8 releases in Lustre, so would encourage an upgrade to 1.8.5 at your earliest convenience.

          We indeed integrated several software raid fixes in 1.8 (e.g. bugzilla 19990, 22509 & 20533).
          Although i don't think any of them fixed real software RAID corruptions, it would still make
          sense to upgrade to 1.8.5 to benefit from those bug fixes which address real deadlocks and oops.

          johann Johann Lombardi (Inactive) added a comment - - edited > Regarding the CAMs and drive upgrades, we have seen the corrupted OSTs before on the Riverwalks > (J4400's) when disk firmware was upgraded without both Lustre and the md software raid shutdown > cleanly first. Is there any chance that this particular OST10 was not cleanly shutdown? We saw > many cases of software RAID corruption on the J4400's a couple of years ago, Beyond the HW/firmware issues, there was also a corruption problem due to the the mptsas driver which could redirect I/Os to the wrong drive The following comment from Sven explains how this bug was discovered: https://bugzilla.lustre.org/show_bug.cgi?id=21819#c27 And the problem was fixed in the following bugzilla ticket: https://bugzilla.lustre.org/show_bug.cgi?id=22632 However, it requires to install an extra package including the mptsas driver. Are you sure to use a mptsas driver which does not suffer from the same issue? > which was about the time early versions of 1.8 started to be used. There were several software > RAID corruption bugs that have since been fixed. Also, we have fixed many problems since the > early 1.8 releases in Lustre, so would encourage an upgrade to 1.8.5 at your earliest convenience. We indeed integrated several software raid fixes in 1.8 (e.g. bugzilla 19990, 22509 & 20533). Although i don't think any of them fixed real software RAID corruptions, it would still make sense to upgrade to 1.8.5 to benefit from those bug fixes which address real deadlocks and oops.
          pjones Peter Jones added a comment -

          Thanks Sam. It is interesting to hear a PS perspective. I know that you were involved in a number of similar deployments. It will be interesting to hear the assessment from engineering about whether a Lustre issue is indeed involved here. Andreas, what do you think?

          pjones Peter Jones added a comment - Thanks Sam. It is interesting to hear a PS perspective. I know that you were involved in a number of similar deployments. It will be interesting to hear the assessment from engineering about whether a Lustre issue is indeed involved here. Andreas, what do you think?
          adilger Andreas Dilger added a comment - - edited

          > LustreError: 25721:0:(obdmount.c:272:ldd_parse()) disk data size does not match: see 0 expect 12288

          This indicates that the CONFIGS/mountdata file is also corrupted (zero length file). It is possible to reconstruct this file by copying it from another OST and (unfortunately) binary editing the file. There are two fields that are unique to each OST that need to be modified.

          First, on an OSS node make a copy of this file from a working OST, say OST0001:

          OSS# debugfs -c -R "dump CONFIGS/mountdata /tmp/mountdata.ost01"

          {OST0001_dev}

          Now the mountdata.ost01 file needs to be edited to reflect that it is being used for OST0003. If you have a favorite binary editor that could be used. I use "xxd" from the "vim-common" package to convert it into ASCII to be edited, and then convert it back to binary.

          The important parts of the file are all at the beginning, the rest of the file is common to all OSTs:

          OSS# xxd /tmp/mountdata.ost01 /tmp/mountdata.ost01.asc
          OSS# vi /tmp/mountdata.ost01.asc

          0000000: 0100 d01d 0000 0000 0000 0000 0000 0000 ................
          0000010: 0200 0000 0200 0000 0100 0000 0100 0000 ................
          0000020: 6c75 7374 7265 0000 0000 0000 0000 0000 lustre..........
          0000030: 0000 0000 0000 0000 0000 0000 0000 0000 ................
          0000040: 0000 0000 0000 0000 0000 0000 0000 0000 ................
          0000050: 0000 0000 0000 0000 0000 0000 0000 0000 ................
          0000060: 6c75 7374 7265 2d4f 5354 3030 3031 0000 lustre-OST0001..
          0000070: 0000 0000 0000 0000 0000 0000 0000 0000 ................
          0000080: 0000 0000 0000 0000 0000 0000 0000 0000 ................
          0000090: 0000 0000 0000 0000 0000 0000 0000 0000 ................
          [snip]

          This is the "xxd" output showing a struct lustre_disk_data. The two fields that need to be edited are 0x0018 (ldd_svindex) and 0x0060 (ldd_svname).

          Edit the "0100" in the second row, fifth column to be "0300".
          Edit the "OST0001" line to be "OST0003":

          0000000: 0100 d01d 0000 0000 0000 0000 0000 0000 ................
          0000010: 0200 0000 0200 0000 0300 0000 0100 0000 ................
          0000020: 6c75 7374 7265 0000 0000 0000 0000 0000 lustre..........
          0000030: 0000 0000 0000 0000 0000 0000 0000 0000 ................
          0000040: 0000 0000 0000 0000 0000 0000 0000 0000 ................
          0000050: 0000 0000 0000 0000 0000 0000 0000 0000 ................
          0000060: 6c75 7374 7265 2d4f 5354 3030 3033 0000 lustre-OST0003..
          0000070: 0000 0000 0000 0000 0000 0000 0000 0000 ................
          0000080: 0000 0000 0000 0000 0000 0000 0000 0000 ................
          0000090: 0000 0000 0000 0000 0000 0000 0000 0000 ................

          Save the file, and convert it back to binary:

          OSS# xxd -r /tmp/mountdata.ost01.asc /tmp/mountdata.ost03

          Mount the OST0003 filesystem locally and copy this new file in place:

          OSS# mount -t ldiskfs

          {OST0003_dev}

          /mnt/lustre_ost03
          OSS# mv /mnt/lustre_ost03/CONFIGS/mountdata /mnt/lustre_ost03/CONFIGS/mountdata.broken
          OSS# cp /tmp/mountdata.ost03 /mnt/lustre_ost03/CONFIGS/mountdata
          OSS# umount /mnt/lustre_ost03

          The OST should now mount normally and identify itself as OST0003.

          adilger Andreas Dilger added a comment - - edited > LustreError: 25721:0:(obdmount.c:272:ldd_parse()) disk data size does not match: see 0 expect 12288 This indicates that the CONFIGS/mountdata file is also corrupted (zero length file). It is possible to reconstruct this file by copying it from another OST and (unfortunately) binary editing the file. There are two fields that are unique to each OST that need to be modified. First, on an OSS node make a copy of this file from a working OST, say OST0001: OSS# debugfs -c -R "dump CONFIGS/mountdata /tmp/mountdata.ost01" {OST0001_dev} Now the mountdata.ost01 file needs to be edited to reflect that it is being used for OST0003. If you have a favorite binary editor that could be used. I use "xxd" from the "vim-common" package to convert it into ASCII to be edited, and then convert it back to binary. The important parts of the file are all at the beginning, the rest of the file is common to all OSTs: OSS# xxd /tmp/mountdata.ost01 /tmp/mountdata.ost01.asc OSS# vi /tmp/mountdata.ost01.asc 0000000: 0100 d01d 0000 0000 0000 0000 0000 0000 ................ 0000010: 0200 0000 0200 0000 0100 0000 0100 0000 ................ 0000020: 6c75 7374 7265 0000 0000 0000 0000 0000 lustre.......... 0000030: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000040: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000050: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000060: 6c75 7374 7265 2d4f 5354 3030 3031 0000 lustre-OST0001.. 0000070: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000080: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000090: 0000 0000 0000 0000 0000 0000 0000 0000 ................ [snip] This is the "xxd" output showing a struct lustre_disk_data. The two fields that need to be edited are 0x0018 (ldd_svindex) and 0x0060 (ldd_svname). Edit the "0100" in the second row, fifth column to be "0300". Edit the "OST0001" line to be "OST0003": 0000000: 0100 d01d 0000 0000 0000 0000 0000 0000 ................ 0000010: 0200 0000 0200 0000 0300 0000 0100 0000 ................ 0000020: 6c75 7374 7265 0000 0000 0000 0000 0000 lustre.......... 0000030: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000040: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000050: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000060: 6c75 7374 7265 2d4f 5354 3030 3033 0000 lustre-OST0003.. 0000070: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000080: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000090: 0000 0000 0000 0000 0000 0000 0000 0000 ................ Save the file, and convert it back to binary: OSS# xxd -r /tmp/mountdata.ost01.asc /tmp/mountdata.ost03 Mount the OST0003 filesystem locally and copy this new file in place: OSS# mount -t ldiskfs {OST0003_dev} /mnt/lustre_ost03 OSS# mv /mnt/lustre_ost03/CONFIGS/mountdata /mnt/lustre_ost03/CONFIGS/mountdata.broken OSS# cp /tmp/mountdata.ost03 /mnt/lustre_ost03/CONFIGS/mountdata OSS# umount /mnt/lustre_ost03 The OST should now mount normally and identify itself as OST0003.

          Regarding the CAMs and drive upgrades, we have seen the corrupted OSTs before on the Riverwalks (J4400's) when disk firmware was upgraded without both Lustre and the md software raid shutdown cleanly first. Is there any chance that this particular OST10 was not cleanly shutdown? We saw many cases of software RAID corruption on the J4400's a couple of years ago, which was about the time early versions of 1.8 started to be used. There were several software RAID corruption bugs that have since been fixed. Also, we have fixed many problems since the early 1.8 releases in Lustre, so would encourage an upgrade to 1.8.5 at your earliest convenience.

          If both Lustre and the MD device were shutdown cleanly, then there should have been no problems like this. So, in that case, this would likely be a new bug that potentially still exists in the latest releases of Lustre.

          samb Sam Bigger (Inactive) added a comment - Regarding the CAMs and drive upgrades, we have seen the corrupted OSTs before on the Riverwalks (J4400's) when disk firmware was upgraded without both Lustre and the md software raid shutdown cleanly first. Is there any chance that this particular OST10 was not cleanly shutdown? We saw many cases of software RAID corruption on the J4400's a couple of years ago, which was about the time early versions of 1.8 started to be used. There were several software RAID corruption bugs that have since been fixed. Also, we have fixed many problems since the early 1.8 releases in Lustre, so would encourage an upgrade to 1.8.5 at your earliest convenience. If both Lustre and the MD device were shutdown cleanly, then there should have been no problems like this. So, in that case, this would likely be a new bug that potentially still exists in the latest releases of Lustre.

          It would be best to open up a new bug, It is not good that you are having all these errors after your firmware upgrade.
          It would be a good idea to run fsck -fn on all your disks, see if you have any other issues.

          cliffw Cliff White (Inactive) added a comment - It would be best to open up a new bug, It is not good that you are having all these errors after your firmware upgrade. It would be a good idea to run fsck -fn on all your disks, see if you have any other issues.

          People

            adilger Andreas Dilger
            dferber Dan Ferber (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: