[LU-3682] "tunefs.lustre --erase_params" corrupts running MGS when run against device node symlink Created: 01/Aug/13 Updated: 31/Jan/22 |
|
| Status: | Reopened |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | John Spray (Inactive) | Assignee: | Emoly Liu |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | patch | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9506 | ||||||||||||
| Description |
|
A script was erroneously running tunefs.lustre --erase_params against a running MGS. I would have expected this to refuse to run (similar to how mkfs.lustre will refuse to run on a device that is a running target). Instead, it does run. The first and second times it is run, it appears to succeed. After the third run the MGS appears to be corrupted. After some experimentation I think this only happens when passing tunefs.lustre a device node symlink. On this system this was happening like this: # ls -l /dev/disk/by-id/scsi-1dev.target0 lrwxrwxrwx 1 root root 9 Jul 30 10:10 /dev/disk/by-id/scsi-1dev.target0 -> ../../sdb Running against the /dev/sdb path seems to be safe (gives a 17 status code): # tunefs.lustre --erase-params /dev/sdb ; echo $?
checking for existing Lustre data: found
Reading CONFIGS/mountdata
Read previous values:
Target: MGS
Index: unassigned
Lustre FS:
Mount type: ldiskfs
Flags: 0x74
(MGS needs_index first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:
Permanent disk data:
Target: MGS
Index: unassigned
Lustre FS:
Mount type: ldiskfs
Flags: 0x74
(MGS needs_index first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:
17
...while running against the symlink seems to be unsafe (note that it returns 0 twice, before returning an error code and junk output): # tunefs.lustre --erase-params /dev/disk/by-id/scsi-1dev.target0 ; echo $? checking for existing Lustre data: found Reading CONFIGS/mountdata Read previous values: Target: MGS Index: unassigned Lustre FS: Mount type: ldiskfs Flags: 0x74 (MGS needs_index first_time update ) Persistent mount opts: user_xattr,errors=remount-ro Parameters: Permanent disk data: Target: MGS Index: unassigned Lustre FS: Mount type: ldiskfs Flags: 0x74 (MGS needs_index first_time update ) Persistent mount opts: user_xattr,errors=remount-ro Parameters: Writing CONFIGS/mountdata 0 [root@storage-0 log]# tunefs.lustre --erase-params /dev/disk/by-id/scsi-1dev.target0 ; echo $? checking for existing Lustre data: found Reading CONFIGS/mountdata Read previous values: Target: MGS Index: unassigned Lustre FS: Mount type: ldiskfs Flags: 0x74 (MGS needs_index first_time update ) Persistent mount opts: user_xattr,errors=remount-ro Parameters: Permanent disk data: Target: MGS Index: unassigned Lustre FS: Mount type: ldiskfs Flags: 0x74 (MGS needs_index first_time update ) Persistent mount opts: user_xattr,errors=remount-ro Parameters: Writing CONFIGS/mountdata 0 [root@storage-0 log]# tunefs.lustre --erase-params /dev/disk/by-id/scsi-1dev.target0 ; echo $? checking for existing Lustre data: found Reading CONFIGS/mountdata Read previous values: Target: Index: 10 Lustre FS: Mount type: h Flags: 0 () Persistent mount opts: Parameters:� tunefs.lustre FATAL: must set target type: MDT,OST,MGS tunefs.lustre: exiting with 22 (Invalid argument) 22 At the least, this tool should resolve symlinks to prevent running against a running target. Ideally, it would also use multi mount protection to be safe even when run from a different server when the target is mounted somewhere else. |
| Comments |
| Comment by Emoly Liu [ 13/Aug/13 ] |
|
John, could you please provide your mkfs options for mgs device? I tried many times on my local machine, but still can't reproduce this failure. It always returned 0 no matter I ran "tunefs.lustre --erase_params" against symlink or not. BTW, according to tunefs.lustre manual, "changes made here will affect a filesystem only when the target is next mounted.", so it should be OK to run it against a running target. |
| Comment by John Spray (Inactive) [ 13/Aug/13 ] |
|
The MGS was created with no fancy options (by IML), pretty much just mkfs.lustre --mgs /dev/disk/by-id/<foo> --failnode XYZ I did find that simply creating and then tunefs-ing the MGS didn't reproduce the corruption: it was happening when I had some MDTs and OSTs as well, and I had recently been doing writeconfs on those. I don't have a more specific set of steps than that I'm afraid. |
| Comment by John Spray (Inactive) [ 13/Aug/13 ] |
|
I was chatting to Johann before opening this ticket about whether the target should be shut down first: [8/1/13 12:00:03 PM] John Spray: is tunefs.lustre meant to be safe to run against a mounted target?
[8/1/13 12:00:30 PM] John Spray: (when doing things like --writeconf and --erase_params?)
[8/1/13 12:03:40 PM] John Spray: the writeconf procedure does ask one to stop the MDT + OST before doing it, but elsewhere in the manual (e.g. in "Changing the Address of a Failover Node") it doesn't say one way or the other.
[8/1/13 12:41:04 PM] Johann Lombardi: john: writeconf definitely requires to shut down the target. FYI, there is also a mount option to trigger a writeconf.
|
| Comment by Emoly Liu [ 14/Aug/13 ] |
|
Can you tell me your tunefs.lustre version? Your test showed flags=0x74((MGS needs_index first_time update )), but index option is invalid for MGS.
In my test, mkfs/tunefs.lustre is v2.4.90 and all MGS/MDT/OST are running. I tried different options and it seems that "--erase-params" fails only when --failnode option is applied. It looks like [root@centos6-3 tests]# tunefs.lustre --erase-params /tmp/lustre-mgs; echo $?
checking for existing Lustre data: found
Reading CONFIGS/mountdata
Read previous values:
Target: MGS
Index: unassigned
Lustre FS:
Mount type: ldiskfs
Flags: 0x4
(MGS )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: failover.node=10.211.55.5@tcp
Permanent disk data:
Target: MGS
Index: unassigned
Lustre FS:
Mount type: ldiskfs
Flags: 0x44
(MGS update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:
tunefs.lustre: Unable to mount /dev/loop3: Invalid argument
tunefs.lustre FATAL: failed to write local files
tunefs.lustre: exiting with 22 (Invalid argument)
22
But in your test (what I quoted), failnode was not set when failure happened. So it will be very helpful if you can reproduce it again and tell me how to do that. I also find a small bug in mkfs_lustre.c, but it should not cause this failure. |
| Comment by John Spray (Inactive) [ 19/Aug/13 ] |
|
Reproduced with:
With latest master, running tunefs.lustre on an MGS is broken ( My environment is two CentOS 6.4 virtual machines (called storage-0 and storage-1) using iSCSI storage devices. storage-0 is 172.16.252.175@tcp0 Here's how the filesystem is created: [storage_1] sudo: mkfs.lustre --mdt --index=0 --fsname=test0 --mgsnode=172.16.252.175@tcp --failnode=172.16.252.175@tcp /dev/disk/by-id/scsi-1dev.target1 [storage_0] sudo: mkfs.lustre --mgs --failnode=172.16.252.176@tcp /dev/disk/by-id/scsi-1dev.target0 [storage_1] sudo: mkfs.lustre --ost --index=1 --fsname=test0 --mgsnode=172.16.252.175@tcp --failnode=172.16.252.175@tcp /dev/disk/by-id/scsi-1dev.target3 [storage_0] sudo: mkfs.lustre --ost --index=0 --fsname=test0 --mgsnode=172.16.252.175@tcp --failnode=172.16.252.176@tcp /dev/disk/by-id/scsi-1dev.target2 [storage_0] sudo: mount -t lustre /dev/disk/by-id/scsi-1dev.target0 /mnt/lustre/test0-MGS [storage_1] sudo: mount -t lustre /dev/disk/by-id/scsi-1dev.target1 /mnt/lustre/test0-MDT0000 [storage_0] sudo: mount -t lustre /dev/disk/by-id/scsi-1dev.target2 /mnt/lustre/test0-OST0000 [storage_1] sudo: mount -t lustre /dev/disk/by-id/scsi-1dev.target3 /mnt/lustre/test0-OST0001 Then, while the filesystem is running, here's me reproducing the bug: [vagrant@storage-0 ~]$ ls -l /dev/disk/by-id/scsi-1dev.target0 lrwxrwxrwx 1 root root 9 Aug 19 13:06 /dev/disk/by-id/scsi-1dev.target0 -> ../../sdb [vagrant@storage-0 ~]$ sudo tunefs.lustre --erase-params /dev/disk/by-id/scsi-1dev.target0 ; echo $? checking for existing Lustre data: found Reading CONFIGS/mountdata Read previous values: Target: MGS Index: unassigned Lustre FS: Mount type: ldiskfs Flags: 0x4 (MGS ) Persistent mount opts: user_xattr,errors=remount-ro Parameters: failover.node=172.16.252.176@tcp Permanent disk data: Target: MGS Index: unassigned Lustre FS: Mount type: ldiskfs Flags: 0x44 (MGS update ) Persistent mount opts: user_xattr,errors=remount-ro Parameters: Writing CONFIGS/mountdata 0 [vagrant@storage-0 ~]$ sudo tunefs.lustre --erase-params /dev/disk/by-id/scsi-1dev.target0 ; echo $? checking for existing Lustre data: found Reading CONFIGS/mountdata Read previous values: Target: MGS Index: unassigned Lustre FS: Mount type: ldiskfs Flags: 0x44 (MGS update ) Persistent mount opts: user_xattr,errors=remount-ro Parameters: Permanent disk data: Target: MGS Index: unassigned Lustre FS: Mount type: ldiskfs Flags: 0x44 (MGS update ) Persistent mount opts: user_xattr,errors=remount-ro Parameters: Writing CONFIGS/mountdata 0 [vagrant@storage-0 ~]$ sudo tunefs.lustre --erase-params /dev/disk/by-id/scsi-1dev.target0 ; echo $? checking for existing Lustre data: found Reading CONFIGS/mountdata Read previous values: Target: MGS Index: 0 Lustre FS: Mount type: ext3 Flags: 0 () Persistent mount opts: Parameters: tunefs.lustre FATAL: must set target type: MDT,OST,MGS tunefs.lustre: exiting with 22 (Invalid argument) 22 |
| Comment by Emoly Liu [ 22/Aug/13 ] |
|
John, I can reproduce this problem following your steps. I will investigate it. |
| Comment by Emoly Liu [ 22/Aug/13 ] |
|
The root cause of this problem is in check_mtab_entry(). In this check, the running SCSI device returns EEXIST, but the symlink passes wrongly. I will fix it. |
| Comment by Emoly Liu [ 23/Aug/13 ] |
|
patch for master is at http://review.whamcloud.com/7433 |
| Comment by Emoly Liu [ 27/Aug/13 ] |
|
Similar issue happens to a running loop device. I am working on the patch to fix loop device path resolution and prevent tunefs.lustre from a running device. |
| Comment by Peter Jones [ 20/Sep/13 ] |
|
Landed for 2.5.0 |
| Comment by Peter Jones [ 24/Sep/13 ] |
|
Patch reverted due to regression |
| Comment by Emoly Liu [ 25/Sep/13 ] |
|
I resubmit a patch at http://review.whamcloud.com/#/c/7754 . Hope this time it will fix the problem thoroughly and won't cause ZFS failure. |