Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15091

Trying to start OBD ls3-MDT0000_UUID using the wrong disk ls30000_UUID. Were the /dev/ assignments rearranged

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.14.0
    • zfs-2.1.0_1llnl
      lustre-2.14.0_5.llnl
      4.18.0-305.7.1.1toss.t4.x86_64
      rhel 8.4
    • 3
    • 9223372036854775807

    Description

      After renaming a file system and updating NIDs on the targets, MDT0000 fails to mount with the following error:

      LustreError: 157-3: Trying to start OBD ls3-MDT0000_UUID using the wrong disk ls30000_UUID. Were the /dev/ assignments rearranged?
      

      Note that lsd->lsd_uuid is missing "-MDT" between the fs name ("ls3") and the MDT index ("0000").

      The rename was probably accomplished with:

      tunefs.lustre --writeconf --fsname=ls3 --rename=lustre3 -v asp1/mdt1
      

      And the NID update was probably accomplished with:

      tunefs.lustre --param=mgsnode=172.19.1.141@o2ib100:172.19.1.142@o2ib100 --param=failover.node=172.19.1.141@o2ib100:172.19.1.142@o2ib100 asp1/mdt1
      

      Unfortunately I no longer have the output from those commands, and I'm not certain exactly when this occurred.

      This only occurred on one MDT out of 12 targets (4 MDT 8 OST). I don't know why this one was different.

      I don't think this is enough information to find the root cause and fix it, but am creating the issue in hopes it prompts anyone else who sees this issue to document what led up to it.

      Attachments

        Activity

          [LU-15091] Trying to start OBD ls3-MDT0000_UUID using the wrong disk ls30000_UUID. Were the /dev/ assignments rearranged

          For my reference, my local ticket is TOSS5317

          ofaaland Olaf Faaland added a comment - For my reference, my local ticket is TOSS5317
          pjones Peter Jones added a comment -

          Yang Sheng

          Any suggestions here?

          Peter

          pjones Peter Jones added a comment - Yang Sheng Any suggestions here? Peter
          ofaaland Olaf Faaland added a comment - - edited

          I am not certain, but it seems as if the only problem was the file system name in last_recvd. I stopped all the targets, mounted the dataset as type zfs ("mount -t zfs asp1/mdt1 /mnt/foo"), used a hex editor to alter /mnt/foo/last_recvd and set the correct target name at offset 0 in the file, and umounted /mnt/foo. That allowed the mount to proceed.

          ofaaland Olaf Faaland added a comment - - edited I am not certain, but it seems as if the only problem was the file system name in last_recvd. I stopped all the targets, mounted the dataset as type zfs ("mount -t zfs asp1/mdt1 /mnt/foo"), used a hex editor to alter /mnt/foo/last_recvd and set the correct target name at offset 0 in the file, and umounted /mnt/foo. That allowed the mount to proceed.
          ofaaland Olaf Faaland added a comment -

          Peter, I didn't label this topllnl because of the insufficient information.

          ofaaland Olaf Faaland added a comment - Peter, I didn't label this topllnl because of the insufficient information.

          People

            ys Yang Sheng
            ofaaland Olaf Faaland
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: