[LU-6041] Recover OST specific directories after e2fsck Created: 17/Dec/14 Updated: 22/Dec/14 Resolved: 22/Dec/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Jason Hill (Inactive) | Assignee: | Jian Yu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 16839 |
| Description |
|
A backend storage issue caused an OST to get mounted read-only. Remounting showed that the journal entry in the superblock was corrupt. The journal was removed and e2fsck continued; fixing every issue moved all entries into lost+found directory (when mounting the OST as ldiskfs). Next the ext3 internal journal was added back using tune2fs -j /dev/sdd; and another e2fsck was run. It completed successfully. Using ll_recover_lost_found_objs, it appears that the object data is back in place, but there is no CONFIGS, quota_slave, or REMOTE_PARENT_DIR entries. Additionally the oi.* entries are missing as well as the health_check and last_rcvd files. Is there any way to recover these entries? In the e2fsck output we saw things that look like At this time e2fsck runs to completion and complains about block bitmap differences (lots of them), and Free blocks count (12519 of these messages). |
| Comments |
| Comment by Jason Hill (Inactive) [ 17/Dec/14 ] |
|
We're running the following versions of e2fsprogs: |
| Comment by Peter Jones [ 17/Dec/14 ] |
|
Jian Could you please advise? Thanks Peter |
| Comment by Andreas Dilger [ 18/Dec/14 ] |
|
There are newer e2fsprogs-1.42.12-wc1, but they won't fix the missing files. That said, they should all be recreated when the OST is mounted again. Most of the files are not needed on OSTs in any case. |
| Comment by Jason Hill (Inactive) [ 18/Dec/14 ] |
|
Andreas, Something I didn't mention above (grave oversight, my fault) is that mount -t lustre /dev/sdd /tmp/lustre/fs/ost11 fails with "This device hasn't been formatted by Lustre". |
| Comment by Jason Hill (Inactive) [ 18/Dec/14 ] |
|
Would something as simple as a tunefs.lustre --writeconf (with the correct parameters) be sufficient to get this correct? Or would it be better to copy the CONFIGS directory from another OST and then do the writeconf be better? |
| Comment by Jason Hill (Inactive) [ 18/Dec/14 ] |
|
[oss3 /]# mount -t lustre /dev/sdd /mnt dmesg shows no errors. [oss3 /]# dmesg | grep -i lustre |
| Comment by Andreas Dilger [ 18/Dec/14 ] |
|
Is the CONFIGS/ directory possibly still in lost+found? There shouldn't be much left there after ll_recover_lost_found_objs moved all the objects back to their respective directories. In particular, CONFIGS/mountdata is one file that isn't recreated automatically at mount time, since that contains info on how to mount tge filesystem, and is what mount.lustre is looking for. We've recreated this file in the past by copying it from another OST and binary editing the OST index (two places in struct lustre_disk_data: ldd_svname and ldd_svindex). It might be just as easy to create a small test filesystem on a test node like: OSTCOUNT=1 FSNAME={name} sh llmount.sh
touch /tmp/ostN
mkfs.lustre --ost --mgsname=$HOSTNAME --index={index of broken OST} /tmp/ostN
losetup /dev/loop4 /tmp/ostN
mkdir /mnt/ostN
mount -t lustre /dev/loop4 /mnt/ostN
umount /mnt/ostN
mount -t ldiskfs /dev/loop4 /mnt/ostN
cp /mnt/ostN/CONFIGS/mountdata /tmp
|
| Comment by Jason Hill (Inactive) [ 22/Dec/14 ] |
|
So I tried the copy from another OST path, modified CONFIGS/mountdata and renamed CONFIGS/sithfs-OST000a to CONFIGS/sithfs-OST000b and then made the modifications there as well. Mounting now I get errors to stdout saying bad MGS specification and dmesg shows the following: [root@sith-oss3 CONFIGS]# dmesg I either missed something or need to remove something from the local (OST) CONFIGS directory, correct? It looks like Lustre is trying to do the right thing and copy the remote log down: [333607.733782] LustreError: 3510:0:(mgc_request.c:1707:mgc_llog_local_copy()) MGC10.36.227.244@o2ib: failed to copy remote log sithfs-OST000b: rc = -5 Is it just a permission issue? (perms look the same as the OST I copied the CONFIGS directory from.. [root@sith-oss3 CONFIGS]# ls -la |
| Comment by Jason Hill (Inactive) [ 22/Dec/14 ] |
|
I removed the CONFIGS/sithfs-OST000b file and remounted as lustre; got the "correct" version from the MGS. Now to see if the e2fsck completely screwed up the data or not. Thanks for your help! |
| Comment by Andreas Dilger [ 22/Dec/14 ] |
|
The CONFIGS/$fsname-OSTnnnn file is unique to each OST, so copying it from the other OST wouldn't help as you saw. It is fetched from the MGS at mount. |