> LustreError: 25721:0:(obdmount.c:272:ldd_parse()) disk data size does not match: see 0 expect 12288
This indicates that the CONFIGS/mountdata file is also corrupted (zero length file). It is possible to reconstruct this file by copying it from another OST and (unfortunately) binary editing the file. There are two fields that are unique to each OST that need to be modified.
First, on an OSS node make a copy of this file from a working OST, say OST0001:
OSS# debugfs -c -R "dump CONFIGS/mountdata /tmp/mountdata.ost01"
{OST0001_dev}
Now the mountdata.ost01 file needs to be edited to reflect that it is being used for OST0003. If you have a favorite binary editor that could be used. I use "xxd" from the "vim-common" package to convert it into ASCII to be edited, and then convert it back to binary.
The important parts of the file are all at the beginning, the rest of the file is common to all OSTs:
OSS# xxd /tmp/mountdata.ost01 /tmp/mountdata.ost01.asc
OSS# vi /tmp/mountdata.ost01.asc
0000000: 0100 d01d 0000 0000 0000 0000 0000 0000 ................
0000010: 0200 0000 0200 0000 0100 0000 0100 0000 ................
0000020: 6c75 7374 7265 0000 0000 0000 0000 0000 lustre..........
0000030: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000040: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000050: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000060: 6c75 7374 7265 2d4f 5354 3030 3031 0000 lustre-OST0001..
0000070: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000080: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000090: 0000 0000 0000 0000 0000 0000 0000 0000 ................
[snip]
This is the "xxd" output showing a struct lustre_disk_data. The two fields that need to be edited are 0x0018 (ldd_svindex) and 0x0060 (ldd_svname).
Edit the "0100" in the second row, fifth column to be "0300".
Edit the "OST0001" line to be "OST0003":
0000000: 0100 d01d 0000 0000 0000 0000 0000 0000 ................
0000010: 0200 0000 0200 0000 0300 0000 0100 0000 ................
0000020: 6c75 7374 7265 0000 0000 0000 0000 0000 lustre..........
0000030: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000040: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000050: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000060: 6c75 7374 7265 2d4f 5354 3030 3033 0000 lustre-OST0003..
0000070: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000080: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0000090: 0000 0000 0000 0000 0000 0000 0000 0000 ................
Save the file, and convert it back to binary:
OSS# xxd -r /tmp/mountdata.ost01.asc /tmp/mountdata.ost03
Mount the OST0003 filesystem locally and copy this new file in place:
OSS# mount -t ldiskfs
{OST0003_dev}
/mnt/lustre_ost03
OSS# mv /mnt/lustre_ost03/CONFIGS/mountdata /mnt/lustre_ost03/CONFIGS/mountdata.broken
OSS# cp /tmp/mountdata.ost03 /mnt/lustre_ost03/CONFIGS/mountdata
OSS# umount /mnt/lustre_ost03
The OST should now mount normally and identify itself as OST0003.
Thanks Andreas
Where we are at right now is that all the OST's can be mounted, however lustre cannot be successfully mounted.
After having issues initially, we shut down all of our lustre clients, and cleanly rebooted all of our OSSs and MDSs. After bringing all the OSTs up, we had 2 OSTs (11 and 15) be in a "recovering" state that never finished (about 15 minutes after bringing up the client). We used lctl to abort recovery, and attempted mounting, which apeared to be successful. Running a df on /lustre after that segmentation faults.
Additionally, when running lfs df throws the following error when it gets to ost11:
error: llapi_obd_statfs failed: Bad address (-14)
Doing an lctl dl on a client have all the OSTs as "UP", but the last number on each line is different for OST11 and OST15 (it's 5 for all OSTs, 4 for OST11/15)
The mds's were showing that all the OSTs were "UP" as well, but the last numbers show all OSTs as 5