[LU-3682] "tunefs.lustre --erase_params" corrupts running MGS when run against device node symlink Created: 01/Aug/13  Updated: 31/Jan/22

Status: Reopened
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: John Spray (Inactive) Assignee: Emoly Liu
Resolution: Unresolved Votes: 0
Labels: patch

Issue Links:
Related
is related to LU-3991 mkfs.lustre failed to copy pool name Resolved
is related to LU-3768 "tunefs.lustre: '----index' only vali... Closed
Severity: 3
Rank (Obsolete): 9506

 Description   

A script was erroneously running tunefs.lustre --erase_params against a running MGS. I would have expected this to refuse to run (similar to how mkfs.lustre will refuse to run on a device that is a running target). Instead, it does run. The first and second times it is run, it appears to succeed. After the third run the MGS appears to be corrupted.

After some experimentation I think this only happens when passing tunefs.lustre a device node symlink. On this system this was happening like this:

# ls -l /dev/disk/by-id/scsi-1dev.target0
lrwxrwxrwx 1 root root 9 Jul 30 10:10 /dev/disk/by-id/scsi-1dev.target0 -> ../../sdb

Running against the /dev/sdb path seems to be safe (gives a 17 status code):

# tunefs.lustre --erase-params /dev/sdb ; echo $?
checking for existing Lustre data: found
Reading CONFIGS/mountdata

   Read previous values:
Target:     MGS
Index:      unassigned
Lustre FS:
Mount type: ldiskfs
Flags:      0x74
              (MGS needs_index first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:


   Permanent disk data:
Target:     MGS
Index:      unassigned
Lustre FS:
Mount type: ldiskfs
Flags:      0x74
              (MGS needs_index first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:

17

...while running against the symlink seems to be unsafe (note that it returns 0 twice, before returning an error code and junk output):

# tunefs.lustre --erase-params /dev/disk/by-id/scsi-1dev.target0 ; echo $?
checking for existing Lustre data: found
Reading CONFIGS/mountdata

   Read previous values:
Target:     MGS
Index:      unassigned
Lustre FS:
Mount type: ldiskfs
Flags:      0x74
              (MGS needs_index first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:


   Permanent disk data:
Target:     MGS
Index:      unassigned
Lustre FS:
Mount type: ldiskfs
Flags:      0x74
              (MGS needs_index first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:

Writing CONFIGS/mountdata
0
[root@storage-0 log]# tunefs.lustre --erase-params /dev/disk/by-id/scsi-1dev.target0 ; echo $?
checking for existing Lustre data: found
Reading CONFIGS/mountdata

   Read previous values:
Target:     MGS
Index:      unassigned
Lustre FS:
Mount type: ldiskfs
Flags:      0x74
              (MGS needs_index first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:


   Permanent disk data:
Target:     MGS
Index:      unassigned
Lustre FS:
Mount type: ldiskfs
Flags:      0x74
              (MGS needs_index first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:

Writing CONFIGS/mountdata
0
[root@storage-0 log]# tunefs.lustre --erase-params /dev/disk/by-id/scsi-1dev.target0 ; echo $?
checking for existing Lustre data: found
Reading CONFIGS/mountdata

   Read previous values:
Target:
Index:      10
Lustre FS:
Mount type: h
Flags:      0
              ()
Persistent mount opts:
Parameters:�


tunefs.lustre FATAL: must set target type: MDT,OST,MGS
tunefs.lustre: exiting with 22 (Invalid argument)
22

At the least, this tool should resolve symlinks to prevent running against a running target. Ideally, it would also use multi mount protection to be safe even when run from a different server when the target is mounted somewhere else.



 Comments   
Comment by Emoly Liu [ 13/Aug/13 ]

John, could you please provide your mkfs options for mgs device?

I tried many times on my local machine, but still can't reproduce this failure. It always returned 0 no matter I ran "tunefs.lustre --erase_params" against symlink or not.

BTW, according to tunefs.lustre manual, "changes made here will affect a filesystem only when the target is next mounted.", so it should be OK to run it against a running target.

Comment by John Spray (Inactive) [ 13/Aug/13 ]

The MGS was created with no fancy options (by IML), pretty much just

mkfs.lustre --mgs /dev/disk/by-id/<foo> --failnode XYZ

I did find that simply creating and then tunefs-ing the MGS didn't reproduce the corruption: it was happening when I had some MDTs and OSTs as well, and I had recently been doing writeconfs on those. I don't have a more specific set of steps than that I'm afraid.

Comment by John Spray (Inactive) [ 13/Aug/13 ]

I was chatting to Johann before opening this ticket about whether the target should be shut down first:

[8/1/13 12:00:03 PM] John Spray: is tunefs.lustre meant to be safe to run against a mounted target?
[8/1/13 12:00:30 PM] John Spray: (when doing things like --writeconf and --erase_params?)
[8/1/13 12:03:40 PM] John Spray: the writeconf procedure does ask one to stop the MDT + OST before doing it, but elsewhere in the manual (e.g. in "Changing the Address of a Failover Node") it doesn't say one way or the other.
[8/1/13 12:41:04 PM] Johann Lombardi: john: writeconf definitely requires to shut down the target. FYI, there is also a mount option to trigger a writeconf.
Comment by Emoly Liu [ 14/Aug/13 ]

Can you tell me your tunefs.lustre version? Your test showed flags=0x74((MGS needs_index first_time update )), but index option is invalid for MGS.

Read previous values:
Target: MGS
Index: unassigned
Lustre FS:
Mount type: ldiskfs
Flags: 0x74
(MGS needs_index first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:

Permanent disk data:
Target: MGS
Index: unassigned
Lustre FS:
Mount type: ldiskfs
Flags: 0x74
(MGS needs_index first_time update )
Persistent mount opts: user_xattr,errors=remount-ro

In my test, mkfs/tunefs.lustre is v2.4.90 and all MGS/MDT/OST are running. I tried different options and it seems that "--erase-params" fails only when --failnode option is applied. It looks like

[root@centos6-3 tests]# tunefs.lustre --erase-params /tmp/lustre-mgs; echo $?
checking for existing Lustre data: found
Reading CONFIGS/mountdata

   Read previous values:
Target:     MGS
Index:      unassigned
Lustre FS:  
Mount type: ldiskfs
Flags:      0x4
              (MGS )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: failover.node=10.211.55.5@tcp


   Permanent disk data:
Target:     MGS
Index:      unassigned
Lustre FS:  
Mount type: ldiskfs
Flags:      0x44
              (MGS update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:

tunefs.lustre: Unable to mount /dev/loop3: Invalid argument

tunefs.lustre FATAL: failed to write local files
tunefs.lustre: exiting with 22 (Invalid argument)
22

But in your test (what I quoted), failnode was not set when failure happened. So it will be very helpful if you can reproduce it again and tell me how to do that.
BTW, can you update lustre code and use lustre/utils/(mkfs,tunefs).lustre to have a test, and let me know the result? Thanks.

I also find a small bug in mkfs_lustre.c, but it should not cause this failure.

Comment by John Spray (Inactive) [ 19/Aug/13 ]

Reproduced with:

  • 2.4.0-2.6.32_358.6.2.el6_lustre.g230b174.x86_64_gd3f91c4.x86_64 (i.e. the 2.4.0 release) with e2fsprogs e2fsprogs-1.42.7.wc1-7.
  • lustre-2.4.91-2.6.32_358.14.1.el6_lustre.x86_64.x86_64 (i.e. latest master) with e2fsprogs master.

With latest master, running tunefs.lustre on an MGS is broken (LU-3768, I think you already found this), so I reproduce on another target instead, same result.

My environment is two CentOS 6.4 virtual machines (called storage-0 and storage-1) using iSCSI storage devices.

storage-0 is 172.16.252.175@tcp0
storage-1 is 172.16.252.176@tcp0

Here's how the filesystem is created:

[storage_1] sudo: mkfs.lustre --mdt --index=0 --fsname=test0 --mgsnode=172.16.252.175@tcp --failnode=172.16.252.175@tcp /dev/disk/by-id/scsi-1dev.target1
[storage_0] sudo: mkfs.lustre --mgs --failnode=172.16.252.176@tcp /dev/disk/by-id/scsi-1dev.target0
[storage_1] sudo: mkfs.lustre --ost --index=1 --fsname=test0 --mgsnode=172.16.252.175@tcp --failnode=172.16.252.175@tcp /dev/disk/by-id/scsi-1dev.target3
[storage_0] sudo: mkfs.lustre --ost --index=0 --fsname=test0 --mgsnode=172.16.252.175@tcp --failnode=172.16.252.176@tcp /dev/disk/by-id/scsi-1dev.target2
[storage_0] sudo: mount -t lustre /dev/disk/by-id/scsi-1dev.target0 /mnt/lustre/test0-MGS
[storage_1] sudo: mount -t lustre /dev/disk/by-id/scsi-1dev.target1 /mnt/lustre/test0-MDT0000
[storage_0] sudo: mount -t lustre /dev/disk/by-id/scsi-1dev.target2 /mnt/lustre/test0-OST0000
[storage_1] sudo: mount -t lustre /dev/disk/by-id/scsi-1dev.target3 /mnt/lustre/test0-OST0001

Then, while the filesystem is running, here's me reproducing the bug:

[vagrant@storage-0 ~]$ ls -l /dev/disk/by-id/scsi-1dev.target0
lrwxrwxrwx 1 root root 9 Aug 19 13:06 /dev/disk/by-id/scsi-1dev.target0 -> ../../sdb
[vagrant@storage-0 ~]$ sudo tunefs.lustre --erase-params /dev/disk/by-id/scsi-1dev.target0 ; echo $?
checking for existing Lustre data: found
Reading CONFIGS/mountdata

   Read previous values:
Target:     MGS
Index:      unassigned
Lustre FS:
Mount type: ldiskfs
Flags:      0x4
              (MGS )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: failover.node=172.16.252.176@tcp


   Permanent disk data:
Target:     MGS
Index:      unassigned
Lustre FS:
Mount type: ldiskfs
Flags:      0x44
              (MGS update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:

Writing CONFIGS/mountdata
0
[vagrant@storage-0 ~]$ sudo tunefs.lustre --erase-params /dev/disk/by-id/scsi-1dev.target0 ; echo $?
checking for existing Lustre data: found
Reading CONFIGS/mountdata

   Read previous values:
Target:     MGS
Index:      unassigned
Lustre FS:
Mount type: ldiskfs
Flags:      0x44
              (MGS update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:


   Permanent disk data:
Target:     MGS
Index:      unassigned
Lustre FS:
Mount type: ldiskfs
Flags:      0x44
              (MGS update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:

Writing CONFIGS/mountdata
0
[vagrant@storage-0 ~]$ sudo tunefs.lustre --erase-params /dev/disk/by-id/scsi-1dev.target0 ; echo $?
checking for existing Lustre data: found
Reading CONFIGS/mountdata

   Read previous values:
Target:     MGS
Index:      0
Lustre FS:
Mount type: ext3
Flags:      0
              ()
Persistent mount opts:
Parameters:


tunefs.lustre FATAL: must set target type: MDT,OST,MGS
tunefs.lustre: exiting with 22 (Invalid argument)
22
Comment by Emoly Liu [ 22/Aug/13 ]

John, I can reproduce this problem following your steps. I will investigate it.

Comment by Emoly Liu [ 22/Aug/13 ]

The root cause of this problem is in check_mtab_entry(). In this check, the running SCSI device returns EEXIST, but the symlink passes wrongly.

I will fix it.

Comment by Emoly Liu [ 23/Aug/13 ]

patch for master is at http://review.whamcloud.com/7433
b2_4 also needs this one.

Comment by Emoly Liu [ 27/Aug/13 ]

Similar issue happens to a running loop device. I am working on the patch to fix loop device path resolution and prevent tunefs.lustre from a running device.

Comment by Peter Jones [ 20/Sep/13 ]

Landed for 2.5.0

Comment by Peter Jones [ 24/Sep/13 ]

Patch reverted due to regression LU-3991

Comment by Emoly Liu [ 25/Sep/13 ]

I resubmit a patch at http://review.whamcloud.com/#/c/7754 . Hope this time it will fix the problem thoroughly and won't cause ZFS failure.

Generated at Sat Feb 10 01:36:00 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.