> LustreError: 25721:0:(obdmount.c:272:ldd_parse()) disk data size does not match: see 0 expect 12288
This indicates that the CONFIGS/mountdata file is also corrupted (zero length file). It is possible to reconstruct this file by copying it from another OST and (unfortunately) binary editing the file. There are two fields that are unique to each OST that need to be modified.
First, on an OSS node make a copy of this file from a working OST, say OST0001:
OSS# debugfs -c -R "dump CONFIGS/mountdata /tmp/mountdata.ost01"
{OST0001_dev}
Now the mountdata.ost01 file needs to be edited to reflect that it is being used for OST0003. If you have a favorite binary editor that could be used. I use "xxd" from the "vim-common" package to convert it into ASCII to be edited, and then convert it back to binary.
The important parts of the file are all at the beginning, the rest of the file is common to all OSTs:
OSS# xxd /tmp/mountdata.ost01 /tmp/mountdata.ost01.asc
OSS# vi /tmp/mountdata.ost01.asc
0000000: 0100 d01d 0000 0000 0000 0000 0000 0000 ................
0000010: 0200 0000 0200 0000 0100 0000 0100 0000 ................
0000020: 6c75 7374 7265 0000 0000 0000 0000 0000 lustre..........
0000030: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000040: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000050: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000060: 6c75 7374 7265 2d4f 5354 3030 3031 0000 lustre-OST0001..
0000070: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000080: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000090: 0000 0000 0000 0000 0000 0000 0000 0000 ................
[snip]
This is the "xxd" output showing a struct lustre_disk_data. The two fields that need to be edited are 0x0018 (ldd_svindex) and 0x0060 (ldd_svname).
Edit the "0100" in the second row, fifth column to be "0300".
Edit the "OST0001" line to be "OST0003":
0000000: 0100 d01d 0000 0000 0000 0000 0000 0000 ................
0000010: 0200 0000 0200 0000 0300 0000 0100 0000 ................
0000020: 6c75 7374 7265 0000 0000 0000 0000 0000 lustre..........
0000030: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000040: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000050: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000060: 6c75 7374 7265 2d4f 5354 3030 3033 0000 lustre-OST0003..
0000070: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000080: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000090: 0000 0000 0000 0000 0000 0000 0000 0000 ................
Save the file, and convert it back to binary:
OSS# xxd -r /tmp/mountdata.ost01.asc /tmp/mountdata.ost03
Mount the OST0003 filesystem locally and copy this new file in place:
OSS# mount -t ldiskfs
{OST0003_dev}
/mnt/lustre_ost03
OSS# mv /mnt/lustre_ost03/CONFIGS/mountdata /mnt/lustre_ost03/CONFIGS/mountdata.broken
OSS# cp /tmp/mountdata.ost03 /mnt/lustre_ost03/CONFIGS/mountdata
OSS# umount /mnt/lustre_ost03
The OST should now mount normally and identify itself as OST0003.
You are correct - my sincere apologies. I was counting 2-byte fields starting in the second row instead of 4-byte fields starting in the first row. I've corrected the instructions in this bug in case they are re-used for similar problems in the future. We've discussed in the past to have a tool to repair this file automatically in case of corruption, and that is underscored by this issue.
It looks like you (correctly) modified the 5th column, so all is well and no further action is needed.
It looks like you couldn't have modified the 7th column, or the OST would have failed to mount. I did an audit of the code to see what is using these fields (the correct ldd_svindex field and the incorrect ldd_mount_type field). I found that the ldd_svindex field is only used in case the configuration database on the MGS is rewritten (due to --writeconf) and the OST is reconnecting to the MGS to recreate the configuration record. The ldd_mount_type field is used to determine the backing filesystem type (usually "ldiskfs" for type = 0x0001, but would have been "reiserfs" with type = 0x0003).
If you want to be a bit safer in the future, you could use the "debugfs" command posted earlier to dump this file from all of the OSTs (it can safely be done while the OST is mounted) and save them to a safe location.
Again, apologies for the mixup.