[LU-15091] Trying to start OBD ls3-MDT0000_UUID using the wrong disk ls30000_UUID. Were the /dev/ assignments rearranged Created: 12/Oct/21  Updated: 18/Oct/21

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.14.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Olaf Faaland Assignee: Yang Sheng
Resolution: Unresolved Votes: 0
Labels: llnl
Environment:

zfs-2.1.0_1llnl
lustre-2.14.0_5.llnl
4.18.0-305.7.1.1toss.t4.x86_64
rhel 8.4


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

After renaming a file system and updating NIDs on the targets, MDT0000 fails to mount with the following error:

LustreError: 157-3: Trying to start OBD ls3-MDT0000_UUID using the wrong disk ls30000_UUID. Were the /dev/ assignments rearranged?

Note that lsd->lsd_uuid is missing "-MDT" between the fs name ("ls3") and the MDT index ("0000").

The rename was probably accomplished with:

tunefs.lustre --writeconf --fsname=ls3 --rename=lustre3 -v asp1/mdt1

And the NID update was probably accomplished with:

tunefs.lustre --param=mgsnode=172.19.1.141@o2ib100:172.19.1.142@o2ib100 --param=failover.node=172.19.1.141@o2ib100:172.19.1.142@o2ib100 asp1/mdt1

Unfortunately I no longer have the output from those commands, and I'm not certain exactly when this occurred.

This only occurred on one MDT out of 12 targets (4 MDT 8 OST). I don't know why this one was different.

I don't think this is enough information to find the root cause and fix it, but am creating the issue in hopes it prompts anyone else who sees this issue to document what led up to it.



 Comments   
Comment by Olaf Faaland [ 12/Oct/21 ]

Peter, I didn't label this topllnl because of the insufficient information.

Comment by Olaf Faaland [ 13/Oct/21 ]

I am not certain, but it seems as if the only problem was the file system name in last_recvd. I stopped all the targets, mounted the dataset as type zfs ("mount -t zfs asp1/mdt1 /mnt/foo"), used a hex editor to alter /mnt/foo/last_recvd and set the correct target name at offset 0 in the file, and umounted /mnt/foo. That allowed the mount to proceed.

Comment by Peter Jones [ 13/Oct/21 ]

Yang Sheng

Any suggestions here?

Peter

Comment by Olaf Faaland [ 18/Oct/21 ]

For my reference, my local ticket is TOSS5317

Generated at Sat Feb 10 03:15:22 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.